status(2): Q2 RE-CLAIMED — F2-5 dep-teardown-verify fix cold-verified clean
Per REVIEW-2 ## Q2 FAIL @2026-05-28 (F2-5 dep teardown leak + F2-6 cold install flake + F2-7
SSO setup keycloak-hardcoded):
F2-5 closed by commit c6e94af: teardown_deps now uses verify=True so residuals raise; failures
propagate to orchestrator exit code + run summary. Cold-verified: lasuite-docs+keycloak e2e
PASS, dep teardown clean, post-run docker stack/volume/secret with 'keyc' filter all empty.
This also explained my Q3.1 flake — the leaked Q2.4 dep keycloak (deterministic dep domain) had
collided with my next dep deploy. With F2-5 fixed, that class of cross-run collision is
impossible (teardown now raises if it leaks, so the run fails BEFORE the next one starts).
F2-7 acknowledged: setup_keycloak_realm is keycloak-specific; authentik would need parallel
backend. Logged for Q2.2/Q5.
F2-6 (cold keycloak install 502) — real but secondary; will checkpoint in Q4 sweep.
Side-effect: Q3.1 partial also landed (PARITY.md + test_health_check parity port +
test_auth_required + the prior test_oidc_with_keycloak.py as Q3.1 third specific test).
Cold evidence: ssh cc-ci 'RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py'
deploy-count=2 (expect 2), all 5 assertions PASS, dep teardown clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -50,11 +50,10 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
|
||||
+ deploy_deps/teardown_deps + run state) + SSO-setup harness (`runner/harness/sso.py` —
|
||||
setup_keycloak_realm + oidc_password_grant + assert_discovery_endpoint) + orchestrator
|
||||
wiring. 7 new unit tests; 28/28 PASS. **Subsumes Q0.4.** Commit `4d6b040`.
|
||||
- [x] **Q2.4** — **CLAIMED @2026-05-28** (commit `9e88741`). `tests/lasuite-docs/recipe_meta.py
|
||||
DEPS = ["keycloak"]`; `tests/lasuite-docs/functional/test_oidc_with_keycloak.py` proves the
|
||||
full SSO flow against the per-run keycloak dep: realm/client/user setup, OIDC discovery,
|
||||
password grant, JWT claim validation. Cold-run: deploy-count=2 (1 parent + 1 dep), all
|
||||
stages PASS, dep teardown clean.
|
||||
- [x] **Q2.4** — **RE-CLAIMED @2026-05-28** (commit `c6e94af` F2-5 fix on top of `9e88741`).
|
||||
`tests/lasuite-docs/recipe_meta.py DEPS = ["keycloak"]`; `test_oidc_with_keycloak.py`
|
||||
proves the full SSO flow. F2-5 verified: dep teardown now uses verify=True, raises +
|
||||
surfaces leak failures; cold re-verify on cc-ci → no leftover keycloak after teardown.
|
||||
|
||||
### Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)
|
||||
- [ ] **Q3.1** — lasuite-docs: parity (health_check, oidc_login, upload_conversion) + specific
|
||||
|
||||
@ -312,3 +312,65 @@ generality. From now on: when a recipe-overlay needs a robustness pattern, ask i
|
||||
to a shared helper BEFORE fixing in-place.
|
||||
|
||||
Q2 CLAIMED; awaiting Adversary cold-verify. Continuing on Q3 (SSO-dependent suite) in parallel.
|
||||
|
||||
## 2026-05-28 — Q2 FAIL on F2-5; fixed; RE-CLAIMED
|
||||
|
||||
Adversary FAILed Q2 on three findings:
|
||||
- **F2-5 (gate-blocker):** `teardown_deps` silently suppressed teardown failures via
|
||||
`contextlib.suppress(Exception)`. The `===== DEPS teardown =====` print fired even when undeploy
|
||||
raised. On Adversary cold-check 14+ minutes after my Q2.4 run, the dep keycloak stack
|
||||
`keyc-c12afe` was STILL UP — 2 services + leftover secrets/volumes. The "green" Q2.4 run leaked.
|
||||
- **F2-6 (secondary):** cold keycloak install flake (502 from /realms/master). Real issue, but
|
||||
unrelated to Q2 acceptance — flagged for future infra hardening.
|
||||
- **F2-7 (transparency):** SSO setup is keycloak-hardcoded; `setup_authentik_realm` would need a
|
||||
parallel backend. Documented for Q5 to avoid skipping authentik on the false premise that the
|
||||
harness is reusable for it.
|
||||
|
||||
**This explained my Q3.1 flake!** When I ran lasuite-docs+keycloak again after the Q2.4 run, the
|
||||
dep domain (`keyc-c12afe.ci.commoninternet.net` — deterministic per parent+dep+pr+ref) was the
|
||||
SAME, and the leftover stack from Q2.4 collided with the new deploy. The "502 from /realms/master"
|
||||
was actually the OLD stack still running, but trying to deploy a fresh keycloak on top of the
|
||||
existing one. The new abra app new succeeded (created a new .env), but the swarm services were
|
||||
already running so abra app deploy did weird things, and Traefik routed to the OLD running stack
|
||||
(which was timing out / not healthy after the secrets had been swapped).
|
||||
|
||||
**Fix (commit `c6e94af`):**
|
||||
- `deps.py::teardown_deps`: switched to `verify=True` so `lifecycle.teardown_app` raises on
|
||||
residuals; loop catches per-dep failures, logs LOUDLY, but continues to teardown other deps;
|
||||
after all attempts, raises a combined `TeardownError`.
|
||||
- `run_recipe_ci.py`: catches the dep `TeardownError` in finally; surfaces via
|
||||
`dep_teardown_error` in the summary + non-zero exit code; run still prints diagnostics so a
|
||||
teardown failure doesn't hide other failures.
|
||||
|
||||
**Cold-verified e2e** (log `/root/ccci-f25-verify.log`):
|
||||
```
|
||||
RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
|
||||
===== DEPS: ['keycloak'] =====
|
||||
dep: deploying keycloak -> keyc-c12afe.ci.commoninternet.net
|
||||
dep: keycloak ready @ keyc-c12afe.ci.commoninternet.net
|
||||
===== TIER: install ===== 2 PASS
|
||||
===== TIER: custom ===== 3 PASS (incl. test_oidc_password_grant_against_dep_keycloak)
|
||||
===== DEPS teardown =====
|
||||
dep: tearing down keycloak @ keyc-c12afe.ci.commoninternet.net
|
||||
===== RUN SUMMARY =====
|
||||
deploy-count = 2 (expect 2)
|
||||
```
|
||||
|
||||
Post-run cc-ci state (verified 30s later): `docker stack ls | grep keyc` → empty;
|
||||
`docker volume ls | grep keyc` → empty; `docker secret ls | grep keyc` → empty. No leak.
|
||||
|
||||
Side-effect of the cleanup: also landed Q3.1 partial (PARITY.md + 2 new functional tests for
|
||||
lasuite-docs — test_health_check parity port + test_auth_required showing 401 on protected API).
|
||||
test_oidc_with_keycloak.py is the third specific test (Q2.4 acceptance + Q3.1 OIDC coverage).
|
||||
|
||||
**Lessons:**
|
||||
1. **Silent exception suppression in cleanup paths is a bug**, not robustness. Use it ONLY for
|
||||
things you know are inherently best-effort and don't have downstream effects. Dep teardown
|
||||
has downstream effects (deterministic dep domain → next-run collision); it MUST be loud.
|
||||
2. **Deterministic per-run domains amplify state leaks.** When parent+pr+ref+dep produces the
|
||||
same hash on a re-run, any leak from the prior run silently corrupts the next. The fix
|
||||
options were either (a) make teardown sacred (chosen — F2-5 fix), or (b) make the domain
|
||||
random/timestamped. (a) is right because deterministic helps debugging and concurrent-safety
|
||||
when verified to fully teardown.
|
||||
|
||||
Q2 RE-CLAIMED. Continuing Q3 work in parallel.
|
||||
|
||||
@ -57,10 +57,18 @@ Q2 PASS as it's lower-priority (the SSO harness is provider-pluggable and Q2.4 a
|
||||
already proven via keycloak).
|
||||
|
||||
## Gate
|
||||
**Gate: Q2 — CLAIMED, awaiting Adversary @2026-05-28** (commits `d5f5e86` Q2.1 keycloak; `4d6b040`
|
||||
Q2.3 dep resolver + SSO harness primitives; `47f7cb4` harness.browser hardening across all install
|
||||
overlays; `9e88741` Q2.4 acceptance). Acceptance per plan §6 Q2: "a dependent recipe deploys its
|
||||
provider + runs an OIDC login test in one run." Proven cold:
|
||||
**Gate: Q2 — RE-CLAIMED, awaiting Adversary @2026-05-28** (commit `c6e94af` F2-5 fix on top of
|
||||
the prior Q2 changeset). Adversary FAIL on F2-5 (dep teardown silent suppress) + F2-6 (cold
|
||||
keycloak install flake, secondary) + F2-7 (SSO setup keycloak-hardcoded, transparency). F2-5
|
||||
fixed: `teardown_deps` now uses `verify=True`, errors propagate to the orchestrator's exit code,
|
||||
the run summary surfaces leaks. Cold-verified: dep keycloak deployed → tests PASS → DEPS
|
||||
teardown ran clean → `docker stack ls | grep keyc` → empty. F2-7 ack as a real scope gap (when
|
||||
Q2.2 authentik enrolls, `setup_authentik_realm` will need a parallel backend in `harness.sso`).
|
||||
F2-6 cold-flake on keycloak install is real but unrelated to Q2 acceptance (a flake-handling
|
||||
finding for the install layer; will checkpoint when Q4 reaches keycloak again).
|
||||
|
||||
Acceptance per plan §6 Q2: "a dependent recipe deploys its provider + runs an OIDC login test
|
||||
in one run." Proven cold:
|
||||
|
||||
**Objective evidence pointers (Q2):**
|
||||
- **Q2.1 keycloak parity + 2 NEW specific tests** — commit `d5f5e86`:
|
||||
@ -84,6 +92,17 @@ provider + runs an OIDC login test in one run." Proven cold:
|
||||
- `tests/conftest.py` — `deps_apps` fixture exposes dep domains to dependent tests.
|
||||
- 7 new unit tests in `tests/unit/test_deps.py`; **28/28 unit tests PASS** cold.
|
||||
|
||||
- **F2-5 fix — dep teardown verify=True** — commit `c6e94af`, log `/root/ccci-f25-verify.log`:
|
||||
- `runner/harness/deps.py::teardown_deps` now uses `lifecycle.teardown_app(..., verify=True)`
|
||||
so residuals raise `TeardownError`. Errors are logged per-dep but we continue to other deps;
|
||||
a combined `TeardownError` is raised after all attempts.
|
||||
- `runner/run_recipe_ci.py` catches the dep `TeardownError` in finally, surfaces via
|
||||
`dep_teardown_error` in the run summary + non-zero exit code.
|
||||
- Cold-verified: lasuite-docs+keycloak dep e2e PASSED clean (3 custom + 2 lifecycle install =
|
||||
5 PASS); post-run cc-ci state has NO leftover keycloak (`docker stack ls | grep keyc` →
|
||||
empty; `docker volume ls | grep keyc` → empty; `docker secret ls | grep keyc` → empty).
|
||||
- deploy-count=2, expected 2.
|
||||
|
||||
- **Q2.4 acceptance (the gate)** — commit `9e88741`, log `/root/ccci-q24-lasuite-keycloak.log`:
|
||||
- `tests/lasuite-docs/recipe_meta.py` declares `DEPS = ["keycloak"]`.
|
||||
- `tests/lasuite-docs/functional/test_oidc_with_keycloak.py`:
|
||||
|
||||
Reference in New Issue
Block a user