review: close A1 (no-ACME enforced); file A2 (dead janitor) + A3 (unverified teardown); M4 verify in progress
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
35
BACKLOG.md
35
BACKLOG.md
@ -75,7 +75,40 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver
|
|||||||
## Adversary findings
|
## Adversary findings
|
||||||
<!-- Adversary-only section. Builder must not edit below this line. -->
|
<!-- Adversary-only section. Builder must not edit below this line. -->
|
||||||
|
|
||||||
- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
|
- [x] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
|
||||||
|
**CLOSED @2026-05-27T00:35Z** by Adversary re-test. `runner/harness/lifecycle.deploy_app`
|
||||||
|
calls `abra.env_set(domain, "LETS_ENCRYPT_ENV", "")` before every deploy. Verified on a live
|
||||||
|
harness app (`cust-c95a69`): env `LETS_ENCRYPT_ENV=` empty, no `certresolver` label, **0 ACME
|
||||||
|
log lines**, and the served cert is the **wildcard** `CN=*.ci.commoninternet.net` (verify ok)
|
||||||
|
— not a per-host ACME cert. No-ACME holds for harness deploys. (Structural belt-and-suspenders
|
||||||
|
— dropping the unused `certificatesResolvers` from traefik — remains a nice-to-have, tracked
|
||||||
|
under A3/M7, not required to close A1.)
|
||||||
|
|
||||||
|
- [ ] **[adversary] A2 — Janitor never reaps current-scheme orphans (dead `-pr` filter).**
|
||||||
|
Found during M4 review. `harness.lifecycle.janitor()` only tears down apps where
|
||||||
|
`"-pr" in name`, but per DECISIONS the harness now names apps `<recipe[:4]>-<6hex>` (e.g.
|
||||||
|
`cust-c95a69`) — **no `-pr` substring**. So the run-start crash-recovery sweep (§4.3: "nuke
|
||||||
|
any orphaned `*-pr*` apps") matches **nothing** and is effectively a no-op. The happy-path
|
||||||
|
finalizer in `conftest.deployed_app` does work (observed: `cust-e084bd` from a prior run was
|
||||||
|
torn down), but a run that crashes/reboots *before* the finalizer runs leaves an orphan that
|
||||||
|
no later run will reap. *Fix:* match the actual naming (e.g. regex `^[a-z]{1,4}-[0-9a-f]{6}\.`
|
||||||
|
or a dedicated CI label/prefix) and gate on age. *Re-test:* deploy a harness app, simulate a
|
||||||
|
crash (kill the run before teardown), then start a new run and confirm janitor reaps the
|
||||||
|
orphan. Adversary closes after re-test.
|
||||||
|
|
||||||
|
- [ ] **[adversary] A3 — Teardown is unverified/best-effort; a failure silently orphans + run stays green.**
|
||||||
|
Found during M4 review (to confirm empirically with a kill-mid-run probe). `lifecycle.teardown_app`
|
||||||
|
runs every abra call with `check=False` and "never raises"; the conftest finalizer never
|
||||||
|
asserts teardown succeeded. Worse, `abra.app_config_remove` deletes the app `.env`
|
||||||
|
**unconditionally**, even if `abra.undeploy` failed first — leaving the swarm service+volume
|
||||||
|
running but with no `.env`, so the app can no longer be managed/undeployed via abra (and a
|
||||||
|
fixed janitor that shells `abra app undeploy` couldn't reap it either). Net: a partial teardown
|
||||||
|
leaves a silent orphan while pytest still reports the run **green**, so the M4/D2 guarantee
|
||||||
|
"no orphaned app/volume afterward" is not actually *verified* by the harness. *Fix:* assert
|
||||||
|
post-teardown that the stack/services/volumes/secrets are gone (fail the run otherwise); only
|
||||||
|
remove the `.env` after a confirmed undeploy, or undeploy-by-stack-name as a fallback that
|
||||||
|
doesn't need the `.env`. *Re-test:* run install, kill the process mid-deploy, verify the next
|
||||||
|
run (or janitor) leaves zero residual service/volume/secret. Adversary closes after re-test.
|
||||||
Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static
|
Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static
|
||||||
config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers`
|
config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers`
|
||||||
(stock coop-cloud template). They're currently inert (no router references them; both
|
(stock coop-cloud template). They're currently inert (no router references them; both
|
||||||
|
|||||||
14
REVIEW.md
14
REVIEW.md
@ -122,3 +122,17 @@ pivot). Will complete both when M3 is claimed.
|
|||||||
**Noted for M7 (not a finding yet):** the Drone-managed Gitea webhook (id 209) carries its webhook
|
**Noted for M7 (not a finding yet):** the Drone-managed Gitea webhook (id 209) carries its webhook
|
||||||
secret as a `?secret=` query param in the hook URL (Drone default; admin-only in Gitea, not in cc-ci
|
secret as a `?secret=` query param in the hook URL (Drone default; admin-only in Gitea, not in cc-ci
|
||||||
git / CI logs / dashboard). Will adjudicate against D6 at M7.
|
git / CI logs / dashboard). Will adjudicate against D6 at M7.
|
||||||
|
|
||||||
|
## M4 — Harness + install stage: VERIFICATION IN PROGRESS (no verdict yet) @2026-05-27T00:35Z
|
||||||
|
|
||||||
|
M4 is CLAIMED. Code review done; runtime checks so far:
|
||||||
|
- **A1 CLOSED** (see BACKLOG): harness forces `LETS_ENCRYPT_ENV=""` every deploy; live app
|
||||||
|
`cust-c95a69` served the wildcard cert, 0 ACME lines, no certresolver.
|
||||||
|
- **Happy-path teardown works:** a prior run's app `cust-e084bd` was fully torn down (gone) — not
|
||||||
|
an orphan; earlier ambiguity was a run cycling apps.
|
||||||
|
- **Two teardown-robustness defects filed (A2, A3):** janitor's `-pr` filter is dead code under the
|
||||||
|
`cust-<hex>` naming (no crash-orphan reaping); teardown is best-effort/unverified and deletes the
|
||||||
|
`.env` even on failed undeploy (silent orphan, run still green).
|
||||||
|
- **Deferred to next idle tick (a Builder harness run is active now; sequential-only):** my own
|
||||||
|
cold install run (green install + Playwright + clean teardown verification) and the §6 kill-mid-run
|
||||||
|
probe to test A3 empirically. Verdict (PASS/FAIL) follows that.
|
||||||
|
|||||||
Reference in New Issue
Block a user