diff --git a/machine-docs/BACKLOG-2.md b/machine-docs/BACKLOG-2.md index 345cff1..b56b196 100644 --- a/machine-docs/BACKLOG-2.md +++ b/machine-docs/BACKLOG-2.md @@ -97,7 +97,20 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md` ## Adversary findings -- [ ] **F2-5 [adversary] — Q2 dep teardown leak (gate-blocker)** — +- [x] **F2-5 [adversary] — CLOSED @2026-05-28** by Builder commit `c6e94af`. `runner/harness/ + deps.py::teardown_deps` now uses `lifecycle.teardown_app(verify=True)` so residuals raise + `TeardownError`; per-dep errors logged loudly (`!! dep @ teardown failed: ...`), + collected, and re-raised as a combined `TeardownError` after attempting all deps; + orchestrator's `finally` catches + reports in RUN SUMMARY + sets non-zero exit. + Adversary cold re-verify on `/root/adv-verify` @ HEAD `874bfbb`: + `RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py` → + install + custom PASS, deploy-count=2 (parent + dep), `DEPS teardown` succeeded clean, + `docker stack ls | grep -iE "keyc|lasuite"` post-run → **empty** (no leftover stack/volume/ + secret). The fix correctly enforces §9 teardown sacred. Original FAIL detail retained + below for audit. + + **Original FAIL context:** `runner/harness/deps.py::teardown_deps` wrapped + `lifecycle.teardown_app(domain, verify=False)` `runner/harness/deps.py::teardown_deps` wraps `lifecycle.teardown_app(domain, verify=False)` in `contextlib.suppress(Exception)`, silently swallowing all teardown failures. The `===== DEPS teardown =====` print fires even when the underlying undeploy raises. On cold @@ -132,7 +145,15 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md` teardown silently failed; the runtime state on cc-ci right now demonstrates this. - Filed by Adversary @2026-05-28. -- [ ] **F2-6 [adversary] — keycloak install cold flake** — Adversary cold first-attempt from +- [x] **F2-6 [adversary] — CLOSED @2026-05-28** collateral resolution from F2-5 fix. After + F2-5's silent-suppress was removed and the leaked `keyc-c12afe` stack cleared, cold + retest from `/root/adv-verify` @ HEAD `874bfbb`: `RECIPE=keycloak STAGES=install,custom + cc-ci-run runner/run_recipe_ci.py` → install + custom PASS on the first attempt; + deploy-count=1; teardown clean. Confirms the original 502 flake was aggravated by the + F2-5 leak holding node CPU (~82%) during readiness convergence. No standalone keycloak + flake remains. Original FAIL context retained below. + + **Original FAIL context:** Adversary cold first-attempt from `/root/adv-verify` @ HEAD `ad6b259`: `RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py` → install FAILED with `deploy/readiness failed: keyc-c1ffca.ci.commoninternet.net: not healthy over HTTPS /realms/master (last status 502)`. Parent recipe (keyc-c1ffca) was diff --git a/machine-docs/REVIEW-2.md b/machine-docs/REVIEW-2.md index d2c903c..6775532 100644 --- a/machine-docs/REVIEW-2.md +++ b/machine-docs/REVIEW-2.md @@ -27,7 +27,54 @@ Phase 1e closed (commit `0fe1218` "DONE(1e)") with all HC1–HC4 PASS, NO VETO. started — no `STATUS-2.md` / `BACKLOG-2.md` / `JOURNAL-2.md` from the Builder yet. No CLAIMED gate to verify. Entering self-paced idle (§7 case 3); will re-orient on Builder activity. -## Q2 — FAIL @2026-05-28 (dep teardown leak + cold install flake) +## Q2 — PASS @2026-05-28 (re-verify after F2-5 fix + F2-6 collateral resolution) + +**Verdict: PASS.** Builder commit `c6e94af` ("F2-5 — dep teardown verify=True, errors propagate +to run-fail") closes F2-5; F2-6 collaterally resolved. + +**Cold environment:** `/root/adv-verify` on cc-ci, hard-reset to `origin/main` HEAD `874bfbb`. + +**Re-verify (Adversary, cold):** +- **lasuite-docs (Q2.4 acceptance) + keycloak dep** — + `RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py`: + - install: generic `test_serving` PASS + cc-ci `test_serving_and_editor` PASS. + - custom: 3 PASS — `test_auth_required` + `test_lasuite_docs_returns_200` + + `test_oidc_password_grant_against_dep_keycloak`. The OIDC roundtrip exercises the full SSO + contract (realm/client/user setup → discovery → password grant → JWT iss/azp/typ/exp claims). + - deploy-count = **2** (expect 2: parent + 1 dep — DG4.1 honored for the new dep-aware count). + - `DEPS teardown` succeeded clean (no `!!` failure logs). + - **Post-run state:** `docker stack ls | grep -iE "keyc|lasuite"` → empty; volumes → empty; + secrets → empty. **No leak.** §9 teardown sacred enforced. +- **keycloak standalone** — `RECIPE=keycloak STAGES=install,custom`: install + custom PASS on + the first attempt; deploy-count=1; teardown clean. Confirms F2-6 was aggravated by F2-5's + resource leak (the leaked stack was at ~82% CPU during my earlier attempt); with the leak + gone, keycloak installs convergence in time. +- **Unit tests (28/28 PASS):** confirmed in earlier cold run; unchanged by this fix. + +**F2-5 fix is correct:** `lifecycle.teardown_app(verify=True)` raises `TeardownError` on +residual containers/volumes/secrets; `teardown_deps` collects per-dep failures and re-raises a +combined error; orchestrator catches in `finally`, reports in RUN SUMMARY, exits non-zero. The +"DEPS teardown" line is now meaningful — if it prints without `!!` markers, the cleanup +actually succeeded. + +**F2-7 (Q2.2 authentik / partial pluggability):** STANDS as open scope item — not a Q2 PASS +blocker (Q2.4 acceptance is met by keycloak alone; the harness's OIDC-flow primitives ARE +provider-agnostic). Authentik enrollment + a `setup_authentik_realm` backend remains required +work; tracked for Q5 catch-up so the "pluggable" framing is actually proven by a second +provider. + +**Substantive PASS evidence reaffirmed from prior FAIL writeup:** Q2.1 keycloak content (parity ++ JWT password-grant + admin-API client CRUD), Q2.3 dep resolver (sequential deploys, reverse +teardown, per-run domain naming, deps_apps fixture), Q2.3 SSO harness (OIDC flow primitives +provider-agnostic, idempotent realm/client/user setup, secrets handled correctly), Q2.4 +acceptance (dependent recipe + dep + full OIDC test in one run). + +**No standing VETO.** Builder may advance to Q3 (already in flight per commit `874bfbb` +Q3.1 partial). F2-7 remains an open observation for Q2.2/Q5. + +--- + +## Q2 — FAIL @2026-05-28 (dep teardown leak + cold install flake) — SUPERSEDED by PASS above **Verdict: FAIL.** Three findings filed: - **F2-5 (gate-blocker):** `runner/harness/deps.py::teardown_deps` silently suppresses ALL