review: close A1 (no-ACME enforced); file A2 (dead janitor) + A3 (unverified teardown); M4 verify in progress
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
35
BACKLOG.md
35
BACKLOG.md
@ -75,7 +75,40 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver
|
||||
## Adversary findings
|
||||
<!-- Adversary-only section. Builder must not edit below this line. -->
|
||||
|
||||
- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
|
||||
- [x] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
|
||||
**CLOSED @2026-05-27T00:35Z** by Adversary re-test. `runner/harness/lifecycle.deploy_app`
|
||||
calls `abra.env_set(domain, "LETS_ENCRYPT_ENV", "")` before every deploy. Verified on a live
|
||||
harness app (`cust-c95a69`): env `LETS_ENCRYPT_ENV=` empty, no `certresolver` label, **0 ACME
|
||||
log lines**, and the served cert is the **wildcard** `CN=*.ci.commoninternet.net` (verify ok)
|
||||
— not a per-host ACME cert. No-ACME holds for harness deploys. (Structural belt-and-suspenders
|
||||
— dropping the unused `certificatesResolvers` from traefik — remains a nice-to-have, tracked
|
||||
under A3/M7, not required to close A1.)
|
||||
|
||||
- [ ] **[adversary] A2 — Janitor never reaps current-scheme orphans (dead `-pr` filter).**
|
||||
Found during M4 review. `harness.lifecycle.janitor()` only tears down apps where
|
||||
`"-pr" in name`, but per DECISIONS the harness now names apps `<recipe[:4]>-<6hex>` (e.g.
|
||||
`cust-c95a69`) — **no `-pr` substring**. So the run-start crash-recovery sweep (§4.3: "nuke
|
||||
any orphaned `*-pr*` apps") matches **nothing** and is effectively a no-op. The happy-path
|
||||
finalizer in `conftest.deployed_app` does work (observed: `cust-e084bd` from a prior run was
|
||||
torn down), but a run that crashes/reboots *before* the finalizer runs leaves an orphan that
|
||||
no later run will reap. *Fix:* match the actual naming (e.g. regex `^[a-z]{1,4}-[0-9a-f]{6}\.`
|
||||
or a dedicated CI label/prefix) and gate on age. *Re-test:* deploy a harness app, simulate a
|
||||
crash (kill the run before teardown), then start a new run and confirm janitor reaps the
|
||||
orphan. Adversary closes after re-test.
|
||||
|
||||
- [ ] **[adversary] A3 — Teardown is unverified/best-effort; a failure silently orphans + run stays green.**
|
||||
Found during M4 review (to confirm empirically with a kill-mid-run probe). `lifecycle.teardown_app`
|
||||
runs every abra call with `check=False` and "never raises"; the conftest finalizer never
|
||||
asserts teardown succeeded. Worse, `abra.app_config_remove` deletes the app `.env`
|
||||
**unconditionally**, even if `abra.undeploy` failed first — leaving the swarm service+volume
|
||||
running but with no `.env`, so the app can no longer be managed/undeployed via abra (and a
|
||||
fixed janitor that shells `abra app undeploy` couldn't reap it either). Net: a partial teardown
|
||||
leaves a silent orphan while pytest still reports the run **green**, so the M4/D2 guarantee
|
||||
"no orphaned app/volume afterward" is not actually *verified* by the harness. *Fix:* assert
|
||||
post-teardown that the stack/services/volumes/secrets are gone (fail the run otherwise); only
|
||||
remove the `.env` after a confirmed undeploy, or undeploy-by-stack-name as a fallback that
|
||||
doesn't need the `.env`. *Re-test:* run install, kill the process mid-deploy, verify the next
|
||||
run (or janitor) leaves zero residual service/volume/secret. Adversary closes after re-test.
|
||||
Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static
|
||||
config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers`
|
||||
(stock coop-cloud template). They're currently inert (no router references them; both
|
||||
|
||||
14
REVIEW.md
14
REVIEW.md
@ -122,3 +122,17 @@ pivot). Will complete both when M3 is claimed.
|
||||
**Noted for M7 (not a finding yet):** the Drone-managed Gitea webhook (id 209) carries its webhook
|
||||
secret as a `?secret=` query param in the hook URL (Drone default; admin-only in Gitea, not in cc-ci
|
||||
git / CI logs / dashboard). Will adjudicate against D6 at M7.
|
||||
|
||||
## M4 — Harness + install stage: VERIFICATION IN PROGRESS (no verdict yet) @2026-05-27T00:35Z
|
||||
|
||||
M4 is CLAIMED. Code review done; runtime checks so far:
|
||||
- **A1 CLOSED** (see BACKLOG): harness forces `LETS_ENCRYPT_ENV=""` every deploy; live app
|
||||
`cust-c95a69` served the wildcard cert, 0 ACME lines, no certresolver.
|
||||
- **Happy-path teardown works:** a prior run's app `cust-e084bd` was fully torn down (gone) — not
|
||||
an orphan; earlier ambiguity was a run cycling apps.
|
||||
- **Two teardown-robustness defects filed (A2, A3):** janitor's `-pr` filter is dead code under the
|
||||
`cust-<hex>` naming (no crash-orphan reaping); teardown is best-effort/unverified and deletes the
|
||||
`.env` even on failed undeploy (silent orphan, run still green).
|
||||
- **Deferred to next idle tick (a Builder harness run is active now; sequential-only):** my own
|
||||
cold install run (green install + Playwright + clean teardown verification) and the §6 kill-mid-run
|
||||
probe to test A3 empirically. Verdict (PASS/FAIL) follows that.
|
||||
|
||||
Reference in New Issue
Block a user