diff --git a/BACKLOG.md b/BACKLOG.md index 1e45a31..a54e8b7 100644 --- a/BACKLOG.md +++ b/BACKLOG.md @@ -75,7 +75,40 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver ## Adversary findings -- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).** +- [x] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).** + **CLOSED @2026-05-27T00:35Z** by Adversary re-test. `runner/harness/lifecycle.deploy_app` + calls `abra.env_set(domain, "LETS_ENCRYPT_ENV", "")` before every deploy. Verified on a live + harness app (`cust-c95a69`): env `LETS_ENCRYPT_ENV=` empty, no `certresolver` label, **0 ACME + log lines**, and the served cert is the **wildcard** `CN=*.ci.commoninternet.net` (verify ok) + — not a per-host ACME cert. No-ACME holds for harness deploys. (Structural belt-and-suspenders + — dropping the unused `certificatesResolvers` from traefik — remains a nice-to-have, tracked + under A3/M7, not required to close A1.) + +- [ ] **[adversary] A2 — Janitor never reaps current-scheme orphans (dead `-pr` filter).** + Found during M4 review. `harness.lifecycle.janitor()` only tears down apps where + `"-pr" in name`, but per DECISIONS the harness now names apps `-<6hex>` (e.g. + `cust-c95a69`) — **no `-pr` substring**. So the run-start crash-recovery sweep (§4.3: "nuke + any orphaned `*-pr*` apps") matches **nothing** and is effectively a no-op. The happy-path + finalizer in `conftest.deployed_app` does work (observed: `cust-e084bd` from a prior run was + torn down), but a run that crashes/reboots *before* the finalizer runs leaves an orphan that + no later run will reap. *Fix:* match the actual naming (e.g. regex `^[a-z]{1,4}-[0-9a-f]{6}\.` + or a dedicated CI label/prefix) and gate on age. *Re-test:* deploy a harness app, simulate a + crash (kill the run before teardown), then start a new run and confirm janitor reaps the + orphan. Adversary closes after re-test. + +- [ ] **[adversary] A3 — Teardown is unverified/best-effort; a failure silently orphans + run stays green.** + Found during M4 review (to confirm empirically with a kill-mid-run probe). `lifecycle.teardown_app` + runs every abra call with `check=False` and "never raises"; the conftest finalizer never + asserts teardown succeeded. Worse, `abra.app_config_remove` deletes the app `.env` + **unconditionally**, even if `abra.undeploy` failed first — leaving the swarm service+volume + running but with no `.env`, so the app can no longer be managed/undeployed via abra (and a + fixed janitor that shells `abra app undeploy` couldn't reap it either). Net: a partial teardown + leaves a silent orphan while pytest still reports the run **green**, so the M4/D2 guarantee + "no orphaned app/volume afterward" is not actually *verified* by the harness. *Fix:* assert + post-teardown that the stack/services/volumes/secrets are gone (fail the run otherwise); only + remove the `.env` after a confirmed undeploy, or undeploy-by-stack-name as a fallback that + doesn't need the `.env`. *Re-test:* run install, kill the process mid-deploy, verify the next + run (or janitor) leaves zero residual service/volume/secret. Adversary closes after re-test. Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers` (stock coop-cloud template). They're currently inert (no router references them; both diff --git a/REVIEW.md b/REVIEW.md index 3c80ed5..3f4cf2b 100644 --- a/REVIEW.md +++ b/REVIEW.md @@ -122,3 +122,17 @@ pivot). Will complete both when M3 is claimed. **Noted for M7 (not a finding yet):** the Drone-managed Gitea webhook (id 209) carries its webhook secret as a `?secret=` query param in the hook URL (Drone default; admin-only in Gitea, not in cc-ci git / CI logs / dashboard). Will adjudicate against D6 at M7. + +## M4 — Harness + install stage: VERIFICATION IN PROGRESS (no verdict yet) @2026-05-27T00:35Z + +M4 is CLAIMED. Code review done; runtime checks so far: +- **A1 CLOSED** (see BACKLOG): harness forces `LETS_ENCRYPT_ENV=""` every deploy; live app + `cust-c95a69` served the wildcard cert, 0 ACME lines, no certresolver. +- **Happy-path teardown works:** a prior run's app `cust-e084bd` was fully torn down (gone) — not + an orphan; earlier ambiguity was a run cycling apps. +- **Two teardown-robustness defects filed (A2, A3):** janitor's `-pr` filter is dead code under the + `cust-` naming (no crash-orphan reaping); teardown is best-effort/unverified and deletes the + `.env` even on failed undeploy (silent orphan, run still green). +- **Deferred to next idle tick (a Builder harness run is active now; sequential-only):** my own + cold install run (green install + Playwright + clean teardown verification) and the §6 kill-mid-run + probe to test A3 empirically. Verdict (PASS/FAIL) follows that.