Blocker 4 fix: abra `secret generate --all` uses .env.sample for length hints; the
lfs-plain-gitea PR has SECRET_LFS_JWT_SECRET_VERSION=v1 COMMENTED OUT, so abra produces
a wrong-length secret. gitea requires exactly 43 chars (32 bytes base64 URL-safe); wrong
length → gitea fatals trying to save the JWT secret to the read-only Docker Config
app.ini → health check fails → swarm rolls back.
Fix: new UPGRADE_SECRET_PREP hook (meta.py) called before `abra secret generate --all`
in the upgrade path. abra's `--all` is idempotent (skips existing secrets), so the
correctly pre-inserted secret survives. gitea's recipe_meta.py implements the hook using
`docker secret create` directly to guarantee correct format regardless of .env.sample.
Also consumes machine-docs/BUILDER-INBOX.md (Adversary Blocker 4 digest).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Build #684 (RECIPE=gitea REF=main PR=0): PASS level=5 — all tiers pass, LFS correctly
SKIP on main, HC1 SHA match (e6a1cc79=e6a1cc79). M2 main-branch DoD MET.
Build #685 (RECIPE=gitea PR=1 REF=357926f26e69): FAIL level=1 — new critical blocker:
upgrade chaos redeploy to PR head with compose.lfs.yml fails with rollback_completed.
Root cause: lfs_jwt_secret generated by abra --all with wrong length/format because
.env.sample in PR #1 has `SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43` COMMENTED OUT.
Gitea starts but fails health check on bad JWT secret → Docker swarm rolls back.
Also filed: cc-ci self-test lint failures (9 ruff format violations in gtea files),
drone dep path not re-verified via live CI since a121d2c.
M2 still NOT claimable — Builder must fix lfs_jwt_secret generation and re-trigger #685.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blocker 1 (LFS roundtrip fails on PR #1):
- Add UPGRADE_EXTRA_ENV to gitea recipe_meta.py — after PR-head checkout
(compose.lfs.yml now in ABRA_DIR), add compose.lfs.yml to COMPOSE_FILE
and set SECRET_LFS_JWT_SECRET_VERSION=v1 so the upgrade chaos redeploy
actually runs with LFS enabled. Without this, the base install checks out
the 3.5.x tag (compose.lfs.yml removed), EXTRA_ENV sees no LFS, and the
upgrade chaos redeploy inherits the no-LFS .env — so the LFS test runs
(compose.lfs.yml is restored by recipe_checkout_ref) but LFS is off.
- Add abra.secret_generate(domain) in generic.perform_upgrade when
upgrade_env is non-empty — generates lfs_jwt_secret before chaos redeploy.
Blocker 2 (REF=main upgrade fails HC1):
- Always use recipe_head_commit (git rev-parse HEAD) for head_ref instead
of using ref directly. When ref="main" (a branch name), the HC1 commit
check "head_ref.startswith(chaos_commit)" always fails since "main" ≠ SHA.
recipe_head_commit returns the actual SHA after the fetch/checkout.
Side-fix (stale creds — build #675):
- ops.py pre_install: delete the per-domain creds file before calling
_ensure_admin. A fresh install wipes gitea's DB; any creds file from a
prior run on the same domain is stale and causes 401s in all API calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two critical issues prevent M2: (1) lfs_jwt_secret not generated via disk .env → LFS disabled in
container; (2) upgrade tier fails when REF=main. Details + fix hints in BUILDER-INBOX.md.
Run 674 (main): upgrade FAIL ("not intended PR-head"); run 676 (PR#1 LFS): test_lfs_roundtrip
fails at git-push batch endpoint (LFS not enabled in deployed container). Builder must fix before M2.
level=5/5 verified; 53/53 unit tests PASS (Adversary cold run from adv-clone);
code review: all test hooks have teeth; dep path correct; LFS skip correct.
One non-blocking finding: stale screenshot (pre-existing harness bug, manual run_id reuse).
LIVENESS PROTOCOL: declared per 10-min rule. Adversary pre-checks done
at 950ab8b, ready to verify. Claim posted at bac3662 (~20:13Z).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Unit 51/51 PASS, claude smoke PASS, opencode smoke PASS (own :4097), no
leftover aotest-* sessions/ports, cc-ci sessions intact. Cold-verified from
/tmp clone inside nix develop. HOW/EXPECTED/WHERE in STATUS-aotest.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
nixos-rebuild deployed fix; new nix store path 8qjh8apxcbs85 with /api/version probe;
deploy-proxy active(exited) at 13:43:15 UTC; cold-boot sim: proxy started active(exited)
with dashboard stopped; all 9 services 1/1; alert dir empty; rollback gate unchanged.
Phase pxgate DoD fully met. Builder may write ## DONE.
builder-clone was on restructure/concurrency (caef217, 288 behind main).
Switched to main at d23baf8. STATUS updated with git checkout main safeguard.
Adversary idle probes all PASS @13:31Z.
Active nix store (km6173hm5a...) calls ls5d6s7q...-runner/warm_reconcile.py which
still has health_domain=ci.commoninternet.net (OLD probe). Fix 0e9fd38 in git but not
deployed. Waiting for: cd /root/builder-clone && git pull && nixos-rebuild switch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cold verification of commit 0e9fd38:
1. Code change correct: health_path="/api/version", health_domain absent (falls back to
traefik.ci.commoninternet.net). Probe is traefik's own API, no backend dependency.
2. Controlled repro (dashboard=0): new probe → 200; old probe → 404. Cycle broken.
3. Consumer ordering unchanged: all After=deploy-proxy services unaffected; deploy-proxy
itself has no After=dashboard. Fix does not change any service ordering.
4. Alert dir empty: stale alert cleared.
5. proxy.nix comment updated correctly.
6. Gate has teeth: on curl failure, health_code() returns 0 (not 999 as STATUS claimed —
non-blocking doc discrepancy); 0 not in health_ok=(200,) → rollback triggers. Functional PASS.
7. DEFERRED entry closed, DECISIONS logged.
No blocking findings. M2 pending orchestrator cold-boot.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>