review: close A1 (no-ACME enforced); file A2 (dead janitor) + A3 (unverified teardown); M4 verify in progress
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
2026-05-27 00:52:35 +01:00
parent 38a145fd9c
commit 0fe3d7cda7
2 changed files with 48 additions and 1 deletions

View File

@ -75,7 +75,40 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver
## Adversary findings
<!-- Adversary-only section. Builder must not edit below this line. -->
- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
- [x] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
**CLOSED @2026-05-27T00:35Z** by Adversary re-test. `runner/harness/lifecycle.deploy_app`
calls `abra.env_set(domain, "LETS_ENCRYPT_ENV", "")` before every deploy. Verified on a live
harness app (`cust-c95a69`): env `LETS_ENCRYPT_ENV=` empty, no `certresolver` label, **0 ACME
log lines**, and the served cert is the **wildcard** `CN=*.ci.commoninternet.net` (verify ok)
— not a per-host ACME cert. No-ACME holds for harness deploys. (Structural belt-and-suspenders
— dropping the unused `certificatesResolvers` from traefik — remains a nice-to-have, tracked
under A3/M7, not required to close A1.)
- [ ] **[adversary] A2 — Janitor never reaps current-scheme orphans (dead `-pr` filter).**
Found during M4 review. `harness.lifecycle.janitor()` only tears down apps where
`"-pr" in name`, but per DECISIONS the harness now names apps `<recipe[:4]>-<6hex>` (e.g.
`cust-c95a69`) — **no `-pr` substring**. So the run-start crash-recovery sweep (§4.3: "nuke
any orphaned `*-pr*` apps") matches **nothing** and is effectively a no-op. The happy-path
finalizer in `conftest.deployed_app` does work (observed: `cust-e084bd` from a prior run was
torn down), but a run that crashes/reboots *before* the finalizer runs leaves an orphan that
no later run will reap. *Fix:* match the actual naming (e.g. regex `^[a-z]{1,4}-[0-9a-f]{6}\.`
or a dedicated CI label/prefix) and gate on age. *Re-test:* deploy a harness app, simulate a
crash (kill the run before teardown), then start a new run and confirm janitor reaps the
orphan. Adversary closes after re-test.
- [ ] **[adversary] A3 — Teardown is unverified/best-effort; a failure silently orphans + run stays green.**
Found during M4 review (to confirm empirically with a kill-mid-run probe). `lifecycle.teardown_app`
runs every abra call with `check=False` and "never raises"; the conftest finalizer never
asserts teardown succeeded. Worse, `abra.app_config_remove` deletes the app `.env`
**unconditionally**, even if `abra.undeploy` failed first — leaving the swarm service+volume
running but with no `.env`, so the app can no longer be managed/undeployed via abra (and a
fixed janitor that shells `abra app undeploy` couldn't reap it either). Net: a partial teardown
leaves a silent orphan while pytest still reports the run **green**, so the M4/D2 guarantee
"no orphaned app/volume afterward" is not actually *verified* by the harness. *Fix:* assert
post-teardown that the stack/services/volumes/secrets are gone (fail the run otherwise); only
remove the `.env` after a confirmed undeploy, or undeploy-by-stack-name as a fallback that
doesn't need the `.env`. *Re-test:* run install, kill the process mid-deploy, verify the next
run (or janitor) leaves zero residual service/volume/secret. Adversary closes after re-test.
Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static
config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers`
(stock coop-cloud template). They're currently inert (no router references them; both

View File

@ -122,3 +122,17 @@ pivot). Will complete both when M3 is claimed.
**Noted for M7 (not a finding yet):** the Drone-managed Gitea webhook (id 209) carries its webhook
secret as a `?secret=` query param in the hook URL (Drone default; admin-only in Gitea, not in cc-ci
git / CI logs / dashboard). Will adjudicate against D6 at M7.
## M4 — Harness + install stage: VERIFICATION IN PROGRESS (no verdict yet) @2026-05-27T00:35Z
M4 is CLAIMED. Code review done; runtime checks so far:
- **A1 CLOSED** (see BACKLOG): harness forces `LETS_ENCRYPT_ENV=""` every deploy; live app
`cust-c95a69` served the wildcard cert, 0 ACME lines, no certresolver.
- **Happy-path teardown works:** a prior run's app `cust-e084bd` was fully torn down (gone) — not
an orphan; earlier ambiguity was a run cycling apps.
- **Two teardown-robustness defects filed (A2, A3):** janitor's `-pr` filter is dead code under the
`cust-<hex>` naming (no crash-orphan reaping); teardown is best-effort/unverified and deletes the
`.env` even on failed undeploy (silent orphan, run still green).
- **Deferred to next idle tick (a Builder harness run is active now; sequential-only):** my own
cold install run (green install + Playwright + clean teardown verification) and the §6 kill-mid-run
probe to test A3 empirically. Verdict (PASS/FAIL) follows that.