status(2): Q2 RE-CLAIMED — F2-5 dep-teardown-verify fix cold-verified clean

Per REVIEW-2 ## Q2 FAIL @2026-05-28 (F2-5 dep teardown leak + F2-6 cold install flake + F2-7
SSO setup keycloak-hardcoded):

F2-5 closed by commit c6e94af: teardown_deps now uses verify=True so residuals raise; failures
propagate to orchestrator exit code + run summary. Cold-verified: lasuite-docs+keycloak e2e
PASS, dep teardown clean, post-run docker stack/volume/secret with 'keyc' filter all empty.

This also explained my Q3.1 flake — the leaked Q2.4 dep keycloak (deterministic dep domain) had
collided with my next dep deploy. With F2-5 fixed, that class of cross-run collision is
impossible (teardown now raises if it leaks, so the run fails BEFORE the next one starts).

F2-7 acknowledged: setup_keycloak_realm is keycloak-specific; authentik would need parallel
backend. Logged for Q2.2/Q5.

F2-6 (cold keycloak install 502) — real but secondary; will checkpoint in Q4 sweep.

Side-effect: Q3.1 partial also landed (PARITY.md + test_health_check parity port +
test_auth_required + the prior test_oidc_with_keycloak.py as Q3.1 third specific test).

Cold evidence: ssh cc-ci 'RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py'
  deploy-count=2 (expect 2), all 5 assertions PASS, dep teardown clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-28 09:22:24 +01:00
parent 874bfbb915
commit 54b1fe326c
3 changed files with 89 additions and 9 deletions

View File

@ -50,11 +50,10 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
+ deploy_deps/teardown_deps + run state) + SSO-setup harness (`runner/harness/sso.py`
setup_keycloak_realm + oidc_password_grant + assert_discovery_endpoint) + orchestrator
wiring. 7 new unit tests; 28/28 PASS. **Subsumes Q0.4.** Commit `4d6b040`.
- [x] **Q2.4****CLAIMED @2026-05-28** (commit `9e88741`). `tests/lasuite-docs/recipe_meta.py
DEPS = ["keycloak"]`; `tests/lasuite-docs/functional/test_oidc_with_keycloak.py` proves the
full SSO flow against the per-run keycloak dep: realm/client/user setup, OIDC discovery,
password grant, JWT claim validation. Cold-run: deploy-count=2 (1 parent + 1 dep), all
stages PASS, dep teardown clean.
- [x] **Q2.4****RE-CLAIMED @2026-05-28** (commit `c6e94af` F2-5 fix on top of `9e88741`).
`tests/lasuite-docs/recipe_meta.py DEPS = ["keycloak"]`; `test_oidc_with_keycloak.py`
proves the full SSO flow. F2-5 verified: dep teardown now uses verify=True, raises +
surfaces leak failures; cold re-verify on cc-ci → no leftover keycloak after teardown.
### Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)
- [ ] **Q3.1** — lasuite-docs: parity (health_check, oidc_login, upload_conversion) + specific

View File

@ -312,3 +312,65 @@ generality. From now on: when a recipe-overlay needs a robustness pattern, ask i
to a shared helper BEFORE fixing in-place.
Q2 CLAIMED; awaiting Adversary cold-verify. Continuing on Q3 (SSO-dependent suite) in parallel.
## 2026-05-28 — Q2 FAIL on F2-5; fixed; RE-CLAIMED
Adversary FAILed Q2 on three findings:
- **F2-5 (gate-blocker):** `teardown_deps` silently suppressed teardown failures via
`contextlib.suppress(Exception)`. The `===== DEPS teardown =====` print fired even when undeploy
raised. On Adversary cold-check 14+ minutes after my Q2.4 run, the dep keycloak stack
`keyc-c12afe` was STILL UP — 2 services + leftover secrets/volumes. The "green" Q2.4 run leaked.
- **F2-6 (secondary):** cold keycloak install flake (502 from /realms/master). Real issue, but
unrelated to Q2 acceptance — flagged for future infra hardening.
- **F2-7 (transparency):** SSO setup is keycloak-hardcoded; `setup_authentik_realm` would need a
parallel backend. Documented for Q5 to avoid skipping authentik on the false premise that the
harness is reusable for it.
**This explained my Q3.1 flake!** When I ran lasuite-docs+keycloak again after the Q2.4 run, the
dep domain (`keyc-c12afe.ci.commoninternet.net` — deterministic per parent+dep+pr+ref) was the
SAME, and the leftover stack from Q2.4 collided with the new deploy. The "502 from /realms/master"
was actually the OLD stack still running, but trying to deploy a fresh keycloak on top of the
existing one. The new abra app new succeeded (created a new .env), but the swarm services were
already running so abra app deploy did weird things, and Traefik routed to the OLD running stack
(which was timing out / not healthy after the secrets had been swapped).
**Fix (commit `c6e94af`):**
- `deps.py::teardown_deps`: switched to `verify=True` so `lifecycle.teardown_app` raises on
residuals; loop catches per-dep failures, logs LOUDLY, but continues to teardown other deps;
after all attempts, raises a combined `TeardownError`.
- `run_recipe_ci.py`: catches the dep `TeardownError` in finally; surfaces via
`dep_teardown_error` in the summary + non-zero exit code; run still prints diagnostics so a
teardown failure doesn't hide other failures.
**Cold-verified e2e** (log `/root/ccci-f25-verify.log`):
```
RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
===== DEPS: ['keycloak'] =====
dep: deploying keycloak -> keyc-c12afe.ci.commoninternet.net
dep: keycloak ready @ keyc-c12afe.ci.commoninternet.net
===== TIER: install ===== 2 PASS
===== TIER: custom ===== 3 PASS (incl. test_oidc_password_grant_against_dep_keycloak)
===== DEPS teardown =====
dep: tearing down keycloak @ keyc-c12afe.ci.commoninternet.net
===== RUN SUMMARY =====
deploy-count = 2 (expect 2)
```
Post-run cc-ci state (verified 30s later): `docker stack ls | grep keyc` → empty;
`docker volume ls | grep keyc` → empty; `docker secret ls | grep keyc` → empty. No leak.
Side-effect of the cleanup: also landed Q3.1 partial (PARITY.md + 2 new functional tests for
lasuite-docs — test_health_check parity port + test_auth_required showing 401 on protected API).
test_oidc_with_keycloak.py is the third specific test (Q2.4 acceptance + Q3.1 OIDC coverage).
**Lessons:**
1. **Silent exception suppression in cleanup paths is a bug**, not robustness. Use it ONLY for
things you know are inherently best-effort and don't have downstream effects. Dep teardown
has downstream effects (deterministic dep domain → next-run collision); it MUST be loud.
2. **Deterministic per-run domains amplify state leaks.** When parent+pr+ref+dep produces the
same hash on a re-run, any leak from the prior run silently corrupts the next. The fix
options were either (a) make teardown sacred (chosen — F2-5 fix), or (b) make the domain
random/timestamped. (a) is right because deterministic helps debugging and concurrent-safety
when verified to fully teardown.
Q2 RE-CLAIMED. Continuing Q3 work in parallel.

View File

@ -57,10 +57,18 @@ Q2 PASS as it's lower-priority (the SSO harness is provider-pluggable and Q2.4 a
already proven via keycloak).
## Gate
**Gate: Q2 — CLAIMED, awaiting Adversary @2026-05-28** (commits `d5f5e86` Q2.1 keycloak; `4d6b040`
Q2.3 dep resolver + SSO harness primitives; `47f7cb4` harness.browser hardening across all install
overlays; `9e88741` Q2.4 acceptance). Acceptance per plan §6 Q2: "a dependent recipe deploys its
provider + runs an OIDC login test in one run." Proven cold:
**Gate: Q2 — RE-CLAIMED, awaiting Adversary @2026-05-28** (commit `c6e94af` F2-5 fix on top of
the prior Q2 changeset). Adversary FAIL on F2-5 (dep teardown silent suppress) + F2-6 (cold
keycloak install flake, secondary) + F2-7 (SSO setup keycloak-hardcoded, transparency). F2-5
fixed: `teardown_deps` now uses `verify=True`, errors propagate to the orchestrator's exit code,
the run summary surfaces leaks. Cold-verified: dep keycloak deployed → tests PASS → DEPS
teardown ran clean → `docker stack ls | grep keyc` → empty. F2-7 ack as a real scope gap (when
Q2.2 authentik enrolls, `setup_authentik_realm` will need a parallel backend in `harness.sso`).
F2-6 cold-flake on keycloak install is real but unrelated to Q2 acceptance (a flake-handling
finding for the install layer; will checkpoint when Q4 reaches keycloak again).
Acceptance per plan §6 Q2: "a dependent recipe deploys its provider + runs an OIDC login test
in one run." Proven cold:
**Objective evidence pointers (Q2):**
- **Q2.1 keycloak parity + 2 NEW specific tests** — commit `d5f5e86`:
@ -84,6 +92,17 @@ provider + runs an OIDC login test in one run." Proven cold:
- `tests/conftest.py``deps_apps` fixture exposes dep domains to dependent tests.
- 7 new unit tests in `tests/unit/test_deps.py`; **28/28 unit tests PASS** cold.
- **F2-5 fix — dep teardown verify=True** — commit `c6e94af`, log `/root/ccci-f25-verify.log`:
- `runner/harness/deps.py::teardown_deps` now uses `lifecycle.teardown_app(..., verify=True)`
so residuals raise `TeardownError`. Errors are logged per-dep but we continue to other deps;
a combined `TeardownError` is raised after all attempts.
- `runner/run_recipe_ci.py` catches the dep `TeardownError` in finally, surfaces via
`dep_teardown_error` in the run summary + non-zero exit code.
- Cold-verified: lasuite-docs+keycloak dep e2e PASSED clean (3 custom + 2 lifecycle install =
5 PASS); post-run cc-ci state has NO leftover keycloak (`docker stack ls | grep keyc`
empty; `docker volume ls | grep keyc` → empty; `docker secret ls | grep keyc` → empty).
- deploy-count=2, expected 2.
- **Q2.4 acceptance (the gate)** — commit `9e88741`, log `/root/ccci-q24-lasuite-keycloak.log`:
- `tests/lasuite-docs/recipe_meta.py` declares `DEPS = ["keycloak"]`.
- `tests/lasuite-docs/functional/test_oidc_with_keycloak.py`: