1c/W4: throwaway reproduces cc-ci byte-identical + recovery-key decrypt; abra race found+fixed (serialized reconcilers)
All checks were successful
continuous-integration/drone/push Build is passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-27 17:59:39 +01:00
parent 7563d47228
commit 5cb0bccdfc

View File

@ -227,3 +227,29 @@ the published repo now builds to izsmiajw==running — this is the form the Adve
C4/W5 standard (Adversary dd710a6 == orchestrator guidance): keep DOMAIN=ci.commoninternet.net, verify
TLS locally on the VM via `curl --resolve …:443:127.0.0.1` (SNI ci.commoninternet.net), served leaf
fingerprint must == git cert leaf `57:8D:67:9E:…:B8:A6`; oneshots converge; only age key out-of-band.
## 2026-05-27 — W4 Step B: throwaway rebuilt; concurrent-abra race found + fixed
**Throwaway rebuild result (pre-fix config, clone @dd710a6):** `nixos-rebuild switch` BUILD succeeded
(2.8 G peak RAM < 4 GB, 11.5 min CPU) → toplevel **`izsmiajw…` == cc-ci's running system** (blank VM
reproduces cc-ci byte-for-byte from git + the bootstrap age key). **sops cert decrypted via the
RECOVERY key**: /var/lib/ci-certs/live/{fullchain,privkey}.pem → /run/secrets/*, sha256 `c1d96d61…`
(match). swarm-init + docker active (node Ready/Leader). BUT activation reported "error(s) while
switching": `deploy-proxy` + `deploy-drone` FAILED → system `degraded`.
**Root cause:** the abra reconcilers (proxy/drone/bridge/dashboard/backupbot) are all
`wantedBy multi-user.target`; drone/bridge/dashboard were `after deploy-proxy` but **concurrent with
each other**, and backupbot concurrent with proxy. On a FRESH `~/.abra` they race on catalogue/recipe
init → fast failures. Confirmed: `abra recipe fetch traefik` works fine alone (rc=0); re-running the
oneshots **sequentially** (`systemctl restart deploy-proxy; …drone; …bridge; …dashboard; …backupbot`)
→ ALL success, system `running`, **0 failed, all 6 stacks 1/1** (traefik app+socket-proxy, drone,
bridge, dashboard, backups) — identical to cc-ci.
**Fix (7563d47):** serialize the chain via ordering-only `after`:
proxy → drone → bridge → dashboard → backupbot (bridge after drone, dashboard after bridge, backupbot
after dashboard). So a single `nixos-rebuild switch` on a blank host converges with no concurrent abra.
New toplevel `ld19aj2…`. Deploying to cc-ci (reconcilers already deployed there ⇒ serial no-op
re-runs) + re-verify byte-identical, then **recreate the throwaway FRESH** to prove single-switch
convergence (authoritative C4; mirrors the Adversary's W5 cold test).
This is the LAST planned config change before W4 completes (config stable ld19aj2 thereafter).