Blocker 1 (LFS roundtrip fails on PR #1):
- Add UPGRADE_EXTRA_ENV to gitea recipe_meta.py — after PR-head checkout
(compose.lfs.yml now in ABRA_DIR), add compose.lfs.yml to COMPOSE_FILE
and set SECRET_LFS_JWT_SECRET_VERSION=v1 so the upgrade chaos redeploy
actually runs with LFS enabled. Without this, the base install checks out
the 3.5.x tag (compose.lfs.yml removed), EXTRA_ENV sees no LFS, and the
upgrade chaos redeploy inherits the no-LFS .env — so the LFS test runs
(compose.lfs.yml is restored by recipe_checkout_ref) but LFS is off.
- Add abra.secret_generate(domain) in generic.perform_upgrade when
upgrade_env is non-empty — generates lfs_jwt_secret before chaos redeploy.
Blocker 2 (REF=main upgrade fails HC1):
- Always use recipe_head_commit (git rev-parse HEAD) for head_ref instead
of using ref directly. When ref="main" (a branch name), the HC1 commit
check "head_ref.startswith(chaos_commit)" always fails since "main" ≠ SHA.
recipe_head_commit returns the actual SHA after the fetch/checkout.
Side-fix (stale creds — build #675):
- ops.py pre_install: delete the per-domain creds file before calling
_ensure_admin. A fresh install wipes gitea's DB; any creds file from a
prior run on the same domain is stale and causes 401s in all API calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two critical issues prevent M2: (1) lfs_jwt_secret not generated via disk .env → LFS disabled in
container; (2) upgrade tier fails when REF=main. Details + fix hints in BUILDER-INBOX.md.
Run 674 (main): upgrade FAIL ("not intended PR-head"); run 676 (PR#1 LFS): test_lfs_roundtrip
fails at git-push batch endpoint (LFS not enabled in deployed container). Builder must fix before M2.
level=5/5 verified; 53/53 unit tests PASS (Adversary cold run from adv-clone);
code review: all test hooks have teeth; dep path correct; LFS skip correct.
One non-blocking finding: stale screenshot (pre-existing harness bug, manual run_id reuse).
LIVENESS PROTOCOL: declared per 10-min rule. Adversary pre-checks done
at 950ab8b, ready to verify. Claim posted at bac3662 (~20:13Z).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Unit 51/51 PASS, claude smoke PASS, opencode smoke PASS (own :4097), no
leftover aotest-* sessions/ports, cc-ci sessions intact. Cold-verified from
/tmp clone inside nix develop. HOW/EXPECTED/WHERE in STATUS-aotest.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
nixos-rebuild deployed fix; new nix store path 8qjh8apxcbs85 with /api/version probe;
deploy-proxy active(exited) at 13:43:15 UTC; cold-boot sim: proxy started active(exited)
with dashboard stopped; all 9 services 1/1; alert dir empty; rollback gate unchanged.
Phase pxgate DoD fully met. Builder may write ## DONE.
builder-clone was on restructure/concurrency (caef217, 288 behind main).
Switched to main at d23baf8. STATUS updated with git checkout main safeguard.
Adversary idle probes all PASS @13:31Z.
Active nix store (km6173hm5a...) calls ls5d6s7q...-runner/warm_reconcile.py which
still has health_domain=ci.commoninternet.net (OLD probe). Fix 0e9fd38 in git but not
deployed. Waiting for: cd /root/builder-clone && git pull && nixos-rebuild switch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cold verification of commit 0e9fd38:
1. Code change correct: health_path="/api/version", health_domain absent (falls back to
traefik.ci.commoninternet.net). Probe is traefik's own API, no backend dependency.
2. Controlled repro (dashboard=0): new probe → 200; old probe → 404. Cycle broken.
3. Consumer ordering unchanged: all After=deploy-proxy services unaffected; deploy-proxy
itself has no After=dashboard. Fix does not change any service ordering.
4. Alert dir empty: stale alert cleared.
5. proxy.nix comment updated correctly.
6. Gate has teeth: on curl failure, health_code() returns 0 (not 999 as STATUS claimed —
non-blocking doc discrepancy); 0 not in health_ok=(200,) → rollback triggers. Functional PASS.
7. DEFERRED entry closed, DECISIONS logged.
No blocking findings. M2 pending orchestrator cold-boot.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Independent cold read confirms the circular dependency (proxy health-gate polls
ci.commoninternet.net served by dashboard which is After=deploy-proxy). Root cause
is PROVEN LIVE by today's alert: 20260613T054428Z-traefik-unhealthy-on-latest.json.
Fix endpoint independently verified: /api/version on traefik.ci.commoninternet.net
returns 200 as soon as traefik is up, no dashboard dependency.
REVIEW-pxgate.md: orientation, M1/M2 acceptance criteria.
BACKLOG-pxgate.md: break-it probes P1–P5 to run at M1 gate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cold re-clone @a6f967f: cardinal (recipe,filename) set identical 64=64; 0 added/0
deleted test files, 5 non-R100 renames are docstring/comment only (no assertion/wait/
skip/sys.path change); orphan-test hunt found no droppable recipe-local test; alias
probe warns on both deprecated dirs; unit suite 18 passed; cfold sweep evidence audited
directly (all 20 recipes 5/5, custom counts match baseline, live_pr_apps=0). M1+M2 PASS.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>