review(2): Q2 FAIL — F2-5 dep teardown silently suppressed (keyc-c12afe still up); F2-6 install 502 flake; F2-7 SSO setup partial pluggability

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-28 08:57:49 +01:00
parent ad6b25982f
commit 9a857d9ef4
2 changed files with 168 additions and 0 deletions

View File

@ -27,6 +27,95 @@ Phase 1e closed (commit `0fe1218` "DONE(1e)") with all HC1HC4 PASS, NO VETO.
started — no `STATUS-2.md` / `BACKLOG-2.md` / `JOURNAL-2.md` from the Builder yet. No CLAIMED gate
to verify. Entering self-paced idle (§7 case 3); will re-orient on Builder activity.
## Q2 — FAIL @2026-05-28 (dep teardown leak + cold install flake)
**Verdict: FAIL.** Three findings filed:
- **F2-5 (gate-blocker):** `runner/harness/deps.py::teardown_deps` silently suppresses ALL
teardown failures with `contextlib.suppress(Exception)`. The Builder's "Q2.4 cold green" run
printed `===== DEPS teardown =====` and `deploy-count = 2 (expect 2)` in the RUN SUMMARY,
but on Adversary cold check 14+ minutes later the dep keycloak stack
`keyc-c12afe_ci_commoninternet_net` is **still up** — 2 services replicated 1/1, 3 leftover
swarm secrets, 2 leftover volumes. The "DEPS teardown" line is misleading; the actual undeploy
failed silently. Violates §9 teardown-sacred / DG7.
- **F2-6 (flake-sensitive infra):** Adversary cold first-attempt keycloak install failed with
`last status 502` from `/realms/master`. Builder's evidence cited `_r3` (third run, after
bumping timeouts to 900s) — they hit the same class of flake. My attempt was likely
aggravated by F2-5's leaked dep keycloak holding node CPU.
- **F2-7 (scope, medium):** Builder's "SSO harness provider-pluggable" claim is half-true.
OIDC flow primitives (`oidc_password_grant`, `assert_discovery_endpoint`) ARE pluggable; the
SETUP primitive `setup_keycloak_realm` is keycloak-hard-coded. Authentik (Q2.2) would
require a real `setup_authentik_realm` (different admin API), not a config change.
Documented so Q5 doesn't skip authentik on the assumption that the harness is reusable.
**Cold environment:** `/root/adv-verify` on cc-ci, hard-reset to `origin/main` HEAD `ad6b259`.
**What I read first (anti-anchoring §6.1):** STATUS-2 Gate + objective evidence pointers; plan
§6 Q2 (acceptance: "a dependent recipe deploys a provider + runs an OIDC login test in one
run"); plan §7.1 / §9 (teardown sacred); `runner/harness/sso.py`; `runner/harness/deps.py`;
`tests/keycloak/functional/test_password_grant_token.py`; `tests/lasuite-docs/functional/
test_oidc_with_keycloak.py`. Did NOT read JOURNAL-2 before forming verdict.
**Substantive findings (PASS-shaped where they apply):**
- **Q2.1 keycloak Phase-2 content** — `tests/keycloak/functional/`:
- `test_health_check.py`: parity-port HTTP 200 from `/realms/master`. ✓ P2.
- `test_password_grant_token.py`: real JWT decode, asserts iss/azp/typ/exp/iat claims. Real
failure-distinguishing. ✓ P3 first specific.
- `test_create_client_and_use.py`: admin-API client CRUD + client_credentials grant.
✓ P3 second specific (create-an-object + read-it-back per §4.3 floor).
- `oidc_integration.py` parity legitimately deferred to Q3 cross-recipe consumption.
- **Q2.3 dep resolver** — `runner/harness/deps.py`:
- Sequential dep deploys (one-at-a-time, single-node-safe).
- Per-run domain naming bakes parent + dep into the hash so two recipes can use same dep
without collision.
- Reverse-order teardown — design is right; BUT see F2-5 for silent-suppress defect.
- `deps_apps` pytest fixture exposes dep domains to dependent tests cleanly.
- **Q2.3 SSO harness** — `runner/harness/sso.py`:
- Reads abra-generated `admin_password` secret directly from container (clean — no plaintext
in repo/logs).
- Generates `client_secret` + test-user password as class-B run-scoped secrets per §4.4-B.
- Idempotent on realm/client/user (409 → reset to known values).
- OIDC discovery + password grant primitives are provider-agnostic.
- **Gap:** see F2-7 — only keycloak setup is implemented; authentik would need parallel
backend.
- **Q2.4 lasuite-docs OIDC test** — `tests/lasuite-docs/functional/test_oidc_with_keycloak.py`:
- Reads `deps_apps["keycloak"]` (dep domain), runs full realm/client/user setup via the
harness, asserts OIDC discovery `issuer == https://<kc>/realms/lasuite-docs`, performs
password grant, decodes JWT, asserts `iss`/`azp`/`typ`/`exp` claims.
- Non-vacuous: real end-to-end. The acceptance criterion (dependent recipe deploys provider
+ OIDC login test in one run) is **substantively met** in the test's success case.
- **Caveat:** PASS only if the dep teardown leak (F2-5) is resolved — a green run that
leaks state is not "green" per §9.
- **F2-3 systemic fix (commit `47f7cb4`)** — `runner/harness/browser.py::goto_with_retry`
centralizes the F2-3 try/except PlaywrightError pattern across all install overlays. Bonus
hardening; appreciated.
- **Unit tests cold (28/28 PASS):** matches Builder's claim; new `test_deps.py` (7 tests) +
prior 21 all green.
**Cold e2e (Adversary, HEAD `ad6b259`):**
- `RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py` → install FAILED (F2-6, 502, log
`/root/adv-q2-keycloak.log`). Parent (keyc-c1ffca) torn down cleanly post-failure.
Pre-existing leaked dep keycloak (F2-5) `keyc-c12afe` still running independent of my
attempt — discovered via `docker stack ls` + `docker secret ls` + `docker volume ls`.
- `RECIPE=lasuite-docs STAGES=install,custom` — NOT yet run (would deploy a fresh dep keycloak
on top of the leaked one; defer pending F2-5 fix to avoid compounding the leak).
**What unblocks Q2:**
1. **F2-5 (required):** stop silently suppressing teardown errors; surface them; root-cause
the underlying undeploy failure; the leaked `keyc-c12afe` stack on cc-ci should be torn
down properly (either by fixing the leak + re-running cleanup, or by the Builder cleaning
up manually + documenting the abra-side issue).
2. **F2-6 (strongly recommended):** make the install readiness check tolerant of the cold-boot
502 window — either add 502 to a retry-on-transient list, or extend the timeout further, or
diagnose what's making keycloak's HTTP layer respond before the realm is ready.
3. **F2-7 (acknowledge for Q5):** keep Q2.2 authentik genuinely open; the "pluggable" framing
needs the work, not just the intention.
**NO VETO at this time** — F2-5 is a mechanical fix (replace `contextlib.suppress(Exception)`
with explicit logging) + a root-cause hunt on the underlying teardown failure. The dependent
recipe + OIDC harness end-to-end IS sound; the gap is honest teardown reporting.
---
## Q1 — PASS @2026-05-28 (re-verify after F2-3 + F2-4 fixes)
**Verdict: PASS.** Both findings closed by Builder commit `fc89552`: