Files
cc-ci/machine-docs/JOURNAL-2.md
autonomic-bot 5b34496557 fix(2): F2-11 — SSO-dep deps-not-ready SKIP no longer yields GREEN !testme
When a DEPS-declaring recipe's setup_custom_tests fails, its @requires_deps (SSO/OIDC)
tests skip; a skip-only pytest file exits 0 so the run previously reported overall=0
(GREEN) while the only SSO test never ran (violates P7). Fix preserves generic-tier
failure-isolation but corrects the green SIGNAL:
- conftest.pytest_collection_modifyitems counts skipped requires_deps tests and appends
  to $CCCI_DEPS_SKIP_REPORT.
- run_recipe_ci: sums the count, surfaces it in RUN SUMMARY, and new pure predicate
  sso_dep_unverified(declared, deps_ready, skipped) flips overall=1.
- 7 new unit tests (tests/unit/test_f211_sso_skip.py).

Verified deploy-free (rate-limit-independent): 35/35 unit PASS; cold real-test proof on
lasuite-docs test_oidc_with_keycloak.py -> 1 skipped + skip-report==1 -> orchestrator
would set overall=1. Full e2e deferred until Docker Hub rate limit lifts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 21:25:27 +01:00

592 lines
37 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# JOURNAL — Phase 2 (per-recipe test authoring)
Builder-private (append-only). Builder rationalisations, dead-ends, in-the-moment reasoning. The
Adversary does NOT read this before forming a verdict; objective evidence goes in STATUS-2 / REVIEW-2.
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
---
## 2026-05-28 — Phase 2 bootstrap
Phase 1e completed @2026-05-28 (commit 0fe1218, NO VETO, all HC1HC4 Adversary cold-verified PASS).
Foundation is in place: the orchestrator deploys ONCE per run, performs each lifecycle op ONCE
(install→deploy / upgrade→chaos-redeploy of PR head / backup→`abra app backup` / restore→`abra app
restore`), and runs **both** generic (`tests/_generic/test_<op>.py`) and overlay
(`tests/<recipe>/test_<op>.py`) assertion files **additively** against the shared post-op state.
Pre-op seeds live in optional `tests/<recipe>/ops.py` (`pre_install`/`pre_upgrade`/`pre_backup`/
`pre_restore`). The deploy-count guard (DG4.1) stays =1; teardown is sacred. Per Phase-1e HC1, the
upgrade tier proves PR-head was deployed via `chaos-version` label = `head_ref` (head SHA from
$REF). Per HC2, repo-local PR-authored code runs only for recipes on
`tests/repo-local-approved.txt` (default-deny).
**Bootstrap (this session):**
1. `git pull --rebase` — already up to date.
2. Verified §1 access: `ssh cc-ci` OK (NixOS 24.11), Gitea API HTTP 200, wildcard
`probe-$RANDOM.ci.commoninternet.net` resolves to gateway `143.244.213.108`.
3. Read the Phase-2 plan + plan.md §6.1/§7/§9 (loop protocol, single-writer ownership, gate
handshake, anti-drift). Read STATUS-1e + REVIEW-1e final to inherit the harness invariants
(HC1HC4 cold-verified PASS, F1e-2 not blocking).
4. Surveyed existing state: `tests/<recipe>/` already exists for **custom-html, cryptpad, keycloak,
lasuite-docs, matrix-synapse, n8n** — these were built out as Phase-1d/1e overlays + recipe_meta
+ ops.py. The lifecycle overlay model (test_install/upgrade/backup/restore.py + ops.py) is the
foundation. Phase 2 adds **parity-port functional tests** + **≥2 NEW recipe-specific tests** +
**dependency/SSO resolver** + **PARITY.md** per recipe.
5. Surveyed `references/recipe-maintainer` (mounted at `/srv/recipe-maintainer/`) — the parity
source. Per-recipe corpus:
- **custom-html** — health_check.py (200 check)
- **n8n** — health_check.py
- **keycloak** — health_check.py + oidc_integration.py (cross-recipe with lasuite-docs)
- **cryptpad** — health_check.py + oidc_login.py
- **lasuite-docs** — health_check.py + oidc_login.py + upload_conversion.py
- **lasuite-meet** — health_check.py + oidc_login.py + meeting_flow.py + webrtc-media.py +
webrtc-relay.py
- **matrix-synapse** — *shell* tests: compress_state.sh + test_complexity_limit.sh + test_purge.sh
(will port semantics to Python under cc-ci)
- **hedgedoc / authentik / immich / bluesky-pds / mumble / gitea / lichen / lichen-markdown** —
no `tests/` dir under recipe-info yet, will fill from plan §4.3 spec.
**Plan-shape orientation:**
- `tests/<recipe>/test_<op>.py` (lifecycle overlays) — already established.
- `tests/<recipe>/functional/` — Phase-2 introduces this subdir for parity-port + new specific tests.
Discovery currently globs `test_*.py` at the top level only; will need to recurse (Q0.2).
- `tests/<recipe>/playwright/` — same.
- `tests/<recipe>/PARITY.md` — Phase-2 introduces this; mapping table per recipe.
**Bootstrap commits incoming:**
- Add STATUS-2.md / BACKLOG-2.md / JOURNAL-2.md (this session).
- DECISIONS.md append: PARITY.md format, functional/ + playwright/ subdirs, dep-resolver shape.
Will now seed DECISIONS, then begin Q0.1 (vendor helpers into runner/harness/) — keeping the
custom-html overlay working as the reference recipe. The /loop will self-pace.
## 2026-05-28 — Q0 + Q1.1 landed; Q0 gate CLAIMED
Worked through Q0.1, Q0.2, Q0.3, Q1.1 in one stretch since they're tightly coupled:
**Q0.1**`runner/harness/http.py` is the canonical Phase-2 recipe-test HTTP API. Mirrors
`recipe-maintainer/utils/tests/helpers.py` shape (same function names, same return shapes) so
parity ports read 1:1, but self-contained (cc-ci runtime does NOT import recipe-maintainer per
DECISIONS Phase 2). Existing `lifecycle.http_get`/`http_fetch`/`http_body` stay — they're for
infra-level checks like Traefik-404 detection. `harness.http` is for recipe tests' API calls. SSL
context is `CERT_NONE` because per-run domains use the wildcard cert; the real-cert verification
happens in `generic.served_cert` once per run via the install tier.
**Q0.2** — discovery now recurses into `functional/` + `playwright/` subdirs. Surgically small change
to `custom_tests`; doesn't disturb the lifecycle-tier discovery (overlays still live at top-level).
Two new unit tests prove it (recursion works + HC2 gate still applies to subdirs). Pre-existing 8
discovery unit tests still pass.
**Q0.3 / Q1.1** — custom-html as the reference recipe:
- `PARITY.md` mapping table: 1 parity row (health_check) + 2 recipe-specific rows
(content_roundtrip + content_type_header) + a backup-integrity reference + a playwright reference.
- `functional/test_health_check.py` — parity port with `SOURCE: recipe-info/custom-html/tests/health_check.py` comment for audit.
- `functional/test_content_roundtrip.py` — NEW: write a `uuid.uuid4()` marker into nginx's
`/usr/share/nginx/html` volume, fetch over HTTPS, assert exact-byte match. Non-vacuous: a stale page
or misrouted backend can't return our random content.
- `functional/test_content_type_header.py` — NEW: write `.html` + `.txt` files with same body
("hello"), HEAD each, assert `Content-Type: text/html` and `text/plain`. Caught the case where nginx
MIME map breaks even when 200 still works.
- `playwright/test_browser_smoke.py` — P6: Chromium renders HTML, no console errors.
**E2E cold-verifiable evidence on cc-ci** (log `/root/ccci-q0-customhtml-full.log`):
```
RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py
===== TIER: install (generic=run, overlay=cc-ci:tests/custom-html/test_install.py) =====
... generic + overlay both PASS
===== TIER: upgrade =====
upgrade→PR-head: head_ref=8a026066 chaos-version=8a026066 version=1.10.0+1.28.0→1.11.0+1.29.0
... generic + overlay both PASS (data marker "upgrade-survives" survived chaos redeploy)
===== TIER: backup =====
... generic + overlay both PASS
===== TIER: restore =====
... generic + overlay both PASS (volume restored to "original")
===== TIER: custom =====
... 4 PASS (parity health_check, content_roundtrip, content_type_header, browser_smoke)
===== RUN SUMMARY =====
deploy-count = 1 (expect 1)
install : pass upgrade : pass backup : pass restore : pass custom : pass
```
That's the full Phase-2 pattern proven on the reference recipe:
- additive generic+overlay across 4 lifecycle ops (HC3),
- HC1 PR-head deploy proof via chaos-version label match,
- recipe-aware backup data-integrity (marker survives backup/restore cycle),
- 2 NEW recipe-specific functional tests beyond parity (P3 floor met),
- Playwright UI flow (P6),
- deploy-once + clean teardown.
**Q0.4 (dep resolver) deferred to Q2**: no Q1 recipe (custom-html + n8n) has deps, and the resolver
shape will be much clearer once we have keycloak+authentik to deploy as deps. Logged in BACKLOG-2.
**Q0 gate now CLAIMED.** Working in parallel on Q1.2 (n8n) while the Adversary cold-verifies.
## 2026-05-28 — F2-1 fix: synthetic-recipe fixture (Adversary FAIL on Q0)
The Adversary FAILed Q0 cold on F2-1: `tests/unit/test_discovery.py::test_custom_tests_repo_local_gated` (Phase-1e HC2 test) used the real recipe name `"custom-html"` and asserted
`custom_tests("custom-html", repo_local) == []`. Phase-2 commit `bec9265` added 4 legit non-lifecycle
tests under `tests/custom-html/{functional,playwright}/`, which `custom_tests()` now correctly
returns — so the `== []` assertion no longer holds. Behavior is right; the fixture was brittle.
My "21 passed" evidence was real on the Builder clone — but I had synced the new tests to cc-ci
**before** syncing the new custom-html functional/ tests, so at that moment the assertion still held.
The Adversary's cold re-run from origin/main pulled the full state and correctly caught the regression.
**Fix (commit `5741e88`):** switch to synthetic recipe + monkeypatch `discovery.cc_ci_dir` — same
pattern already used in the Phase-2 sibling `tests/unit/test_discovery_phase2.py`. 5-line change,
no behavior change. Cold-verifiable: `cc-ci-run -m pytest tests/unit -v` → 21/21 PASS.
F2-2 (scope observation) — the Adversary flagged that Q0.4 (dep resolver) and OIDC-flow primitive
are not yet implemented; explicitly deferred to Q2/Q3 in BACKLOG-2. Acknowledged in STATUS-2 gate
text.
**Lesson:** when adding new content to an existing recipe directory, scan the unit tests for any
that assume that directory is empty/lifecycle-only. The synthetic-recipe + monkeypatch pattern is
the right shape for all such unit tests; we should prefer it across the board.
**n8n probe ran in the background to validate endpoint shapes for Q1.2:**
- `/` → 200 text/html (the SPA)
- `/healthz` → 200 `{"status":"ok"}` (already used by install overlay)
- `/types/nodes.json` → 200 but size=31 bytes, not JSON (probably SPA fallback). REJECT this idea.
- Probe terminated before reaching `/rest/settings` / `/rest/login` (the JSON parse on
`/types/nodes.json` raised). Re-running probe now without the JSON gate.
Q0 re-claimed; awaiting Adversary re-verify. Continuing on Q1.2 (n8n) in parallel.
## 2026-05-28 — Q1.2 (n8n) green; Q1 CLAIMED
n8n's defining challenge for Phase 2 was the **boot race**: `/healthz` returns 200 long before the
n8n process is ready to serve REST. The REST endpoints serve a placeholder HTML page ("n8n is
starting up. Please wait") with status 200 during early boot, so a naive `status==200` test would
pass on the placeholder (vacuous). I avoided this in two ways:
1. **Functional tests poll for content-type=application/json** (not just status=200) — rejecting
the placeholder until the real JSON arrives. The retry envelope is the canonical
`harness.http.assert_converges`.
2. **The install overlay's Playwright now polls page.goto** until status==200 — because n8n's `/`
route registration can lag /healthz by several seconds (Run 1: status=200 with placeholder
body; Run 2: status=404 because the route wasn't registered yet). Both windows were caught and
handled.
The plan §4.3 mentioned "create a workflow via API, execute it, assert the result" as the n8n
specific test. I deferred that and chose `/rest/settings` + `/rest/login` JSON-shape assertions
instead, for these reasons:
- n8n requires owner setup before the REST API is unlocked for workflow creation. Doing that in
CI means generating an admin password, POSTing it to `/rest/owner/setup`, then proceeding —
doable, but introduces a write side-effect that complicates the install→upgrade→backup pipeline
(because the owner-setup state is in the n8n volume that backup/restore also exercises).
- The `/rest/settings` + `/rest/login` shape assertions are **equally non-vacuous**: they reject
the boot-placeholder, which the API would still serve if n8n's process is wedged. They prove
the REST subsystem AND the user-management/auth subsystem initialized — which is the
functional core of n8n's web layer.
- The lifecycle overlays already prove backup/restore data-integrity via a volume marker in
/home/node/.n8n. The owner-setup blob would also live in that volume; if the marker survives, so
does owner-setup state.
Decision recorded in BACKLOG-2 Q1.2 with rationale. The ≥2-specific floor is met by the two
JSON-API tests + the lifecycle data-integrity overlay (which IS recipe-specific behavior even
though it lives in the lifecycle tier — it tests n8n's volume contents survive a real abra backup).
**Cold-verifiable e2e on cc-ci** (log `/root/ccci-q1-n8n-r3.log`):
```
RECIPE=n8n cc-ci-run runner/run_recipe_ci.py
== head_ref='63dd3e0f94771f0527febe9948fa7eba61355c35' (ref=None)
===== TIER: upgrade =====
upgrade→PR-head: head_ref=63dd3e0f chaos-version=63dd3e0f version=3.1.0+2.9.4→3.2.0+2.20.6
... 5 lifecycle assertions + 3 custom-stage assertions ALL PASS ...
===== RUN SUMMARY =====
deploy-count = 1 (expect 1)
install : pass upgrade : pass backup : pass restore : pass custom : pass
```
Q1 CLAIMED. Working in parallel on Q2 (keycloak + authentik + OIDC-flow harness) while the
Adversary cold-verifies.
## 2026-05-28 — Q1 FAIL → F2-3 + F2-4 fix; Q1 RE-CLAIMED
The Adversary FAILed Q1 on two findings:
**F2-4 (the gate-blocker):** I rationalized skipping the workflow-create test because "n8n's REST
API requires owner setup". Per plan §7.1 verbatim, "needs SSO setup" / "needs another app
deployed" / "needs a browser" are NOT valid excuses — the SSO-setup harness, dependency resolver,
and Playwright exist precisely to remove these excuses. My rationale fell exactly into that
prohibited class. Owner setup is a one-POST run-scoped class-B secret per §4.4-B; the test should
do it.
This was a real mistake. I was anchoring on "ports must reflect the recipe-maintainer corpus",
and recipe-maintainer's n8n corpus has only `health_check.py`. But Phase 2 P3 is ABOVE parity —
the ≥2 specific tests have to be characteristic-of-the-recipe, and for n8n that's a workflow
round-trip, full stop.
**Fix:** `tests/n8n/functional/test_workflow_roundtrip.py` does exactly what §4.3 prescribed:
- POST `/rest/owner/setup` with a per-run generated email + password (class-B secret, never
persisted to disk, scrubbed from logs by the orchestrator's redaction filter).
- Capture the `Set-Cookie` (n8n's `n8n-auth` cookie) → cookie header for subsequent requests.
- POST `/rest/workflows` with a minimal Manual-Trigger workflow + a unique name.
- GET `/rest/workflows/<id>` with the cookie; assert id/name/nodes payload round-trip.
I intentionally stopped short of "execute the workflow" — manual triggers can't self-execute
without webhook activation (fragile, slow). Create-and-read-back is the workflow-engine
exercise; execution is a separate test if/when needed.
**F2-3 (cold-run flake):** my install-overlay retry loop caught HTTP status mismatches but let
Playwright exceptions (`net::ERR_NETWORK_CHANGED`) escape. The Adversary's first cold run
genuinely hit this — Playwright's underlying CDP connection can transiently drop, especially
under load on a single-node cc-ci. Wrapping `page.goto` in `try/except PlaywrightError` (caught
both the specific PlaywrightError class AND any other transient exception) makes the loop
behave the same way for connection failures as for status mismatches.
**Cold-verifiable e2e** (log `/root/ccci-q1-n8n-r4.log`, commit `fc89552`):
```
RECIPE=n8n cc-ci-run runner/run_recipe_ci.py
== head_ref='63dd3e0f' (ref=None)
... 5 lifecycle assertions + 4 custom-stage assertions ALL PASS ...
↑ including test_workflow_create_and_read_back (the §4.3 prescribed test) ↑
===== RUN SUMMARY =====
deploy-count = 1 (expect 1)
install : pass upgrade : pass backup : pass restore : pass custom : pass
```
**Lesson:** when the plan's §4.3 examples line up directly with a recipe (n8n → "create a
workflow via API"), do that test. The Adversary mandate (§7.1) specifically guards against
substituting endpoint-shape tests for characteristic-behavior tests. If owner-setup is required,
generate the credential per-run; if the API needs a session, capture and forward the cookie.
PARITY.md is for the recipe-maintainer ports; the ≥2 specific tests go above and beyond — they
shouldn't be constrained by what the parity corpus tested.
**Keycloak Q2.1 in flight, separate issue:** the keycloak install hit `not healthy over HTTPS
/realms/master (last status 502)` during the first attempt. The deployment dies before serving.
This is likely the HTTP_TIMEOUT=600 not being enough for a cold-start JVM + mariadb on this
host. Will investigate after Q1 RE-VERIFY lands.
## 2026-05-28 — Q2 CLAIMED — dep resolver + SSO harness + OIDC end-to-end
Q1 PASS landed. Then in one stretch:
**Q2.1 keycloak parity + 2 specific** (`d5f5e86`) — parity port + JWT password-grant test +
client_credentials grant + JWT claim validation. Bumped DEPLOY_TIMEOUT+HTTP_TIMEOUT to 900s after
the first attempt hit 502 from /realms/master at 600s (cold-start JVM+mariadb takes longer).
**Q2.3 — the foundational primitives** (`4d6b040`):
- `runner/harness/deps.py` — read `DEPS = [...]` from a recipe's `recipe_meta.py`; orchestrator
deploys each dep at a per-(parent, dep) domain before the recipe-under-test, tears down in
reverse order in finally. DG4.1 expected count is now 1 + len(deps_state).
- `runner/harness/sso.py``setup_keycloak_realm` (idempotent realm + confidential OIDC client
+ test user with class-B per-run-generated password); `oidc_password_grant` (real OIDC
password-grant flow); `assert_discovery_endpoint` (issuer matches per-run domain/realm).
- 7 unit tests in `tests/unit/test_deps.py`. The unit-test `test_dep_domain_distinct_per_parent`
caught a bug in my first dep_domain implementation (didn't include parent in the hash) — fixed
before pushing. 28/28 unit tests PASS cold.
**Q2.4 acceptance** (`9e88741`): added `DEPS = ["keycloak"]` to lasuite-docs's recipe_meta and
wrote `tests/lasuite-docs/functional/test_oidc_with_keycloak.py`. End-to-end on cc-ci:
```
RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
===== DEPS: ['keycloak'] =====
dep: deploying keycloak -> keyc-c12afe.ci.commoninternet.net
dep: keycloak ready @ keyc-c12afe.ci.commoninternet.net
===== TIER: install ===== 2 PASS (generic + cc-ci overlay)
===== TIER: custom ===== 1 PASS (test_oidc_password_grant_against_dep_keycloak)
===== DEPS teardown =====
===== RUN SUMMARY =====
deploy-count = 2 (expect 2)
```
The OIDC test asserts iss/azp/typ/exp on a real JWT — non-vacuous. The "dependent recipe deploys
its provider and runs an OIDC login test in one run" gate acceptance is met.
**Q2.2 authentik DEFERRED.** Q2 acceptance is keycloak-proven; authentik enrollment is
provider-pluggable (mirror the setup_keycloak_realm shape into a setup_authentik_provider when
a recipe declares authentik as its dep). Logged in BACKLOG-2; will land when Q3 lights up an
authentik-dependent recipe.
**Secondary fix during the stretch — F2-3 systemic** (`47f7cb4`): the same Playwright-error
escape that bit n8n bit custom-html during the deps-smoke test. Centralized the fix in
`runner/harness/browser.py::goto_with_retry` and applied to ALL install overlays + the
custom-html playwright smoke. Cold-verified on custom-html (all 5 stages PASS).
**Lesson:** the F2-3 fix should have been centralized the first time, not just patched
in-place on n8n. The cost of the rework was ~50 lines and one extra cold run. Worth it for the
generality. From now on: when a recipe-overlay needs a robustness pattern, ask if it generalizes
to a shared helper BEFORE fixing in-place.
Q2 CLAIMED; awaiting Adversary cold-verify. Continuing on Q3 (SSO-dependent suite) in parallel.
## 2026-05-28 — Q2 FAIL on F2-5; fixed; RE-CLAIMED
Adversary FAILed Q2 on three findings:
- **F2-5 (gate-blocker):** `teardown_deps` silently suppressed teardown failures via
`contextlib.suppress(Exception)`. The `===== DEPS teardown =====` print fired even when undeploy
raised. On Adversary cold-check 14+ minutes after my Q2.4 run, the dep keycloak stack
`keyc-c12afe` was STILL UP — 2 services + leftover secrets/volumes. The "green" Q2.4 run leaked.
- **F2-6 (secondary):** cold keycloak install flake (502 from /realms/master). Real issue, but
unrelated to Q2 acceptance — flagged for future infra hardening.
- **F2-7 (transparency):** SSO setup is keycloak-hardcoded; `setup_authentik_realm` would need a
parallel backend. Documented for Q5 to avoid skipping authentik on the false premise that the
harness is reusable for it.
**This explained my Q3.1 flake!** When I ran lasuite-docs+keycloak again after the Q2.4 run, the
dep domain (`keyc-c12afe.ci.commoninternet.net` — deterministic per parent+dep+pr+ref) was the
SAME, and the leftover stack from Q2.4 collided with the new deploy. The "502 from /realms/master"
was actually the OLD stack still running, but trying to deploy a fresh keycloak on top of the
existing one. The new abra app new succeeded (created a new .env), but the swarm services were
already running so abra app deploy did weird things, and Traefik routed to the OLD running stack
(which was timing out / not healthy after the secrets had been swapped).
**Fix (commit `c6e94af`):**
- `deps.py::teardown_deps`: switched to `verify=True` so `lifecycle.teardown_app` raises on
residuals; loop catches per-dep failures, logs LOUDLY, but continues to teardown other deps;
after all attempts, raises a combined `TeardownError`.
- `run_recipe_ci.py`: catches the dep `TeardownError` in finally; surfaces via
`dep_teardown_error` in the summary + non-zero exit code; run still prints diagnostics so a
teardown failure doesn't hide other failures.
**Cold-verified e2e** (log `/root/ccci-f25-verify.log`):
```
RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
===== DEPS: ['keycloak'] =====
dep: deploying keycloak -> keyc-c12afe.ci.commoninternet.net
dep: keycloak ready @ keyc-c12afe.ci.commoninternet.net
===== TIER: install ===== 2 PASS
===== TIER: custom ===== 3 PASS (incl. test_oidc_password_grant_against_dep_keycloak)
===== DEPS teardown =====
dep: tearing down keycloak @ keyc-c12afe.ci.commoninternet.net
===== RUN SUMMARY =====
deploy-count = 2 (expect 2)
```
Post-run cc-ci state (verified 30s later): `docker stack ls | grep keyc` → empty;
`docker volume ls | grep keyc` → empty; `docker secret ls | grep keyc` → empty. No leak.
Side-effect of the cleanup: also landed Q3.1 partial (PARITY.md + 2 new functional tests for
lasuite-docs — test_health_check parity port + test_auth_required showing 401 on protected API).
test_oidc_with_keycloak.py is the third specific test (Q2.4 acceptance + Q3.1 OIDC coverage).
**Lessons:**
1. **Silent exception suppression in cleanup paths is a bug**, not robustness. Use it ONLY for
things you know are inherently best-effort and don't have downstream effects. Dep teardown
has downstream effects (deterministic dep domain → next-run collision); it MUST be loud.
2. **Deterministic per-run domains amplify state leaks.** When parent+pr+ref+dep produces the
same hash on a re-run, any leak from the prior run silently corrupts the next. The fix
options were either (a) make teardown sacred (chosen — F2-5 fix), or (b) make the domain
random/timestamped. (a) is right because deterministic helps debugging and concurrent-safety
when verified to fully teardown.
Q2 RE-CLAIMED. Continuing Q3 work in parallel.
## 2026-05-28 — Q2 PASS; Q3.1 + Q3.4 partial; checkpoint
**Progress checkpoint:**
- Q0 ✓ Adversary PASS — harness primitives + discovery
- Q1 ✓ Adversary PASS — custom-html + n8n full Phase-2 (parity + ≥2 specific)
- Q2 ✓ Adversary PASS — keycloak + dep resolver + SSO harness + Q2.4 acceptance
- Q3.1 lasuite-docs partial — parity health_check + 2 specific (auth_required + oidc_with_keycloak)
- Q3.4 cryptpad partial — parity + 2 specific (spa_assets + Playwright render)
- Q3.2/Q3.3/Q3.5: not started
- Q4: 10 recipes not started
- Q5.1 docs partial; Q5.2/Q5.3 not done
**Open deferrals (per §7.1) tracked for Adversary sign-off:**
1. lasuite-docs deeper OIDC tests (oidc_login.py + upload_conversion.py + create-a-doc) — needs
install_steps.sh to wire dep keycloak's client_secret + OIDC env into the parent .env.
2. cryptpad create-a-pad deeper test — CryptPad's pad-creation flow is version-specific (DECISIONS
Phase-2 Q3.4 section logs the rationale).
3. Q2.2 authentik enrollment + setup_authentik_realm backend in harness.sso (F2-7).
**Pattern learned this session:**
- When a test fails on the first cold run, ALWAYS check whether the failure is the test code OR
the underlying behavior. The cryptpad story: my first /api/config test was wrong (the
endpoint doesn't exist); my second test_websocket_endpoint was wrong (the websocket path
doesn't return 4xx on plain HTTP); the Playwright pad-init was over-ambitious for the version.
Each iteration cost a 5-7min e2e cycle. Lesson: **probe BEFORE writing assertions** — for new
recipes, do a manual `curl` survey of the actual endpoint surface, then write tests against
that. (For Q3.5 immich and Q3.2 lasuite-drive I should plan a probe phase first.)
## 2026-05-28 — Q4.1 matrix-synapse code-only; deploy blocked on host capacity
Wrote Phase-2 content for matrix-synapse (PARITY.md + 3 functional tests, plan §4.3 prescribed
register-and-message + federation-version). Test code is correct.
E2e cold-verify BLOCKED:
- r1: `/_synapse/admin/v1/register` returned 404 — recipe doesn't route admin endpoints publicly.
Pivoted to public client API + `ENABLE_REGISTRATION=true` via EXTRA_ENV.
- r2: abra deploy timed out at 300s (recipe's TIMEOUT env). Bumped to 900s via EXTRA_ENV.
- r3: abra deploy still timed out, this time at 900s.
- **Discovered cc-ci disk was 90% full** (10GB of reclaimable Docker images from prior runs).
- Pruned: disk freed to 55% used (12GB free). Should be plenty.
- r4: STILL abra deploy timed out at 900s. So not a disk issue — synapse + pgautoupgrade
cold-start is genuinely slow on this single-node 3.5GB-RAM host. Bigger deploys take longer
than the harness allows.
**Operator-level intervention needed** to unblock matrix-synapse + similar heavy recipes:
- More resources (RAM/CPU) on cc-ci host, OR
- A deploy-time-budget strategy (bump abra TIMEOUT beyond 900s — risky), OR
- A sequenced deploy mode that lets very-slow recipes have more time without blocking the
generic harness.
For now: code is committed; e2e is blocked; will pivot to other recipes (Q3.3, Q3.5) or wait
for operator. Filed PushNotification to user.
## Decision log
Given the conversation has been very long + multiple heavy recipes are blocked on host capacity,
this is a natural pause point. Summary status:
- Q0/Q1/Q2 Adversary PASS ✓ (foundational harness, custom-html + n8n + keycloak full Phase-2)
- Q2.4 acceptance proven (dep resolver + SSO harness end-to-end with lasuite-docs+keycloak)
- Q3.1 (lasuite-docs) partial — parity + 2 specific; deeper OIDC env wiring deferred
- Q3.4 (cryptpad) partial — parity + 2 specific; deeper create-pad deferred with rationale
- Q4.1 (matrix-synapse) code-only — e2e blocked on host capacity
- Q5.1 docs partial — enroll-recipe.md Phase-2 contract pass landed
- Q3.2/Q3.3/Q3.5 + remaining Q4 + Q5.2/Q5.3 not started
The remaining work is substantial AND much of it touches the same host-capacity ceiling we hit
on matrix-synapse. The right next step is operator review of cc-ci's resource budget, not more
autonomous churn. Sending PushNotification.
## 2026-05-28 — Post-capacity-unblock sprint: matrix-synapse + bluesky-pds GREEN
Operator capacity-unblocked cc-ci (RAM 4→8GB, other VMs stopped). Resumed Phase 2.
**matrix-synapse (Q4.1) — cold green:**
- r5: still timed out (turns out not just capacity)
- Discovered the actual issue: synapse REFUSES to start with `ENABLE_REGISTRATION=true` UNLESS
`enable_registration_without_verification=true` ALSO set (anti-spam guard). The recipe doesn't
expose the second env. Looped log lines: `Error in configuration: You have enabled open
registration without any verification.`
- Pivoted: dropped ENABLE_REGISTRATION; use the shared-secret admin register endpoint via
`exec_in_app curl http://localhost:8008/_synapse/admin/v1/register` — bypasses public router
(where /_synapse/admin/* returns 404), uses the abra-generated registration_shared_secret
with HMAC-SHA1 per Synapse spec.
- r6: full register-2-users + send/receive message GREEN (sees a misplaced root-level copy of
the test ran TWICE — once at root, once at functional/ — the functional/ one passed; root
copy was sync residue).
- r7 (post-cleanup): clean GREEN. 5 assertions PASS (parity health + federation version + the
§4.3 prescribed register-and-message + 2 install).
**bluesky-pds (Q4.3) — new enrollment + cold green:**
- Probed: `/xrpc/_health` available; recipe needs `pds_plc_rotation_key` secret (marked
`generate=false` in recipe; secp256k1 32-byte hex).
- Wrote `install_steps.sh` that generates the key with cc-ci-run python's `secrets.token_bytes(32)
.hex()` (random 32 bytes are almost-always valid secp256k1; P(invalid) ~= 2^-128 — equivalent
to the openssl path the recipe README uses). Inserted via `abra app secret insert` under
TTY-wrap.
- r1: `/.well-known/atproto-did` test failed (PDS doesn't auto-publish a server-DID at the bare
domain). Replaced with `test_session_auth.py` — GET `/xrpc/com.atproto.server.getSession`
expecting 401 + XRPC error envelope. This is the recipe-defining auth contract.
- r4 (final): install + 3 functional tests all PASS, deploy-count=1.
**Pattern reinforcement (from cryptpad lesson + n8n lesson):**
- "probe before assert" applied successfully here. The 4 e2e iterations on bluesky-pds were each
for a real failure mode I learned from. Each iteration tightened the test design.
- Capacity unblock fixed the matrix-synapse timeout BUT the synapse open-registration check
was independent. Capacity + recipe-specific config both matter.
**Phase 2 status (current):**
- Q0/Q1/Q2 Adversary PASS ✓
- Q3.1 partial (lasuite-docs), Q3.4 partial (cryptpad), Q4.1 done (matrix-synapse), Q4.3 done (bluesky-pds)
- Q5.1 docs partial
- Remaining: Q3.2/3.3/3.5 + Q4.2/4-10 + the deferred follow-ups (lasuite-docs OIDC wiring,
cryptpad create-pad, matrix-synapse shell-script ports)
Pausing for Adversary cold-verify of Q4.1+Q4.3 (and re-verify of Q3.1+Q3.4 if updated). Will
resume on watchdog ping.
## 2026-05-28 (later) — Q3.2 lasuite-drive base-deploy verify: disk → prune → Docker Hub rate limit; + Gitea outage
Resumed loop to cold-verify the lasuite-drive base deploy (the f59d8e6 commit deferred OIDC/specific
tests until the ~10-service base converges). Chain of events:
1. **First install run timed out at abra TIMEOUT=900.** abra log root cause was NOT slowness but
`FATAL: could not write init file: No space left on device` in postgres init — cc-ci `/` was at
**89% (2.9 GB free)**. The ~2GB onlyoffice + ~1GB collabora pulls filled the disk; postgres
couldn't initialise. Stack is actually **12 services** (app, backend, celery, celery-beat, db,
redis, minio, minio-createbuckets[0/0 one-shot], mailcatcher, web/nginx, collabora, **onlyoffice**)
— bigger than the recipe_meta header noted; it ships BOTH office backends by default.
2. **Freed disk via `docker image prune -af`** → reclaimed 10.1 GB (30 dangling images from prior
recipe runs); host went 2.9 GB → 14 GB free. Bumped abra TIMEOUT 900→1500, DEPLOY_TIMEOUT
1200→1800 (recipe_meta.py edit; not yet committed — Gitea down, see below).
3. **Second run progressed far** — db, collabora, onlyoffice, backend, celery, app all reached 1/1.
But minio/redis/web/mailcatcher stuck at 0/1 in an instant Assigned→Rejected loop ("No such
image"). Manual `docker pull minio/minio:...` returned **`toomanyrequests: You have reached your
unauthenticated pull rate limit`**. The prune wiped these (previously-cached) small images, and
the full cold re-pull of 12 images — on top of today's many recipe deploys (matrix-synapse,
bluesky, ghost, uptime-kuma, keycloak, lasuite-docs, cryptpad retries) — exhausted Docker Hub's
per-IP anonymous quota. Big images pulled first; the 4 small ones got starved.
**Lesson:** pruning is double-edged on this host — it frees disk but forces re-pulls that burn the
anonymous rate limit. The real fix is authenticated registry pulls (plan §1.5 "registry pull
credentials") + trimming heavy stacks (lasuite-drive does not need BOTH collabora and onlyoffice
for WOPI parity — one office backend suffices; disabling onlyoffice cuts the biggest image + RAM).
4. **Gitea (git.autonomic.zone) is down** — bare host `/`, unauth `/api/v1/version`, and authed repo
API all return plain-text `404 page not found` (Go default ServeMux 404 = backend down, proxy has
no upstream). Same from both my sandbox and cc-ci (same IP 116.203.211.204), so it's a real
instance outage, not my creds/path. Adversary's `/root/adv-verify` clone is stale at 1aaf3bd
(clean, no inbox) → Adversary runs in its own sandbox; the only shared channel (Gitea) is dead.
**Two watchdog pings arrived (REVIEW-2 update + BUILDER-INBOX.md) that I CANNOT consume** until
Gitea recovers — will pull + act the instant it's back.
Action: interrupted the stuck deploy (let abra TIMEOUT fire for clean teardown). Recording finding;
notifying operator (registry creds per §1.5 + Gitea outage). Idle-retry both until recovery.
### Correction (same session): cannot trim onlyoffice — recipe-as-is rule
Investigated the "disable onlyoffice to shrink the stack" idea from the entry above. The lasuite-drive
recipe ships a **single `compose.yml`** with collabora AND onlyoffice as unconditional services — no
`COMPOSE_FILE`/compose-profile toggle in `.env.sample`. Disabling onlyoffice would require editing the
recipe's `compose.yml`, which violates "test the recipe as-is / never modify the recipe under test"
(§7-equivalent corner-cut). So **the trim avenue is closed** — I test all 12 services. The only
legitimate levers for the rate-limit problem are: (1) **registry pull credentials** (the §1.5 operator
finding — requested), and (2) **don't `docker image prune` aggressively** between runs (it forces cold
re-pulls that burn the anonymous quota; let the cache persist). Disk pressure must instead be managed
by pruning ONLY truly-dangling images, or by the operator growing the cc-ci disk.
(Also noted: recipe env is `ONLY_OFFICE_DOMAIN`, underscore — my EXTRA_ENV flattened COLLABORA/MINIO
domains but not onlyoffice's; only matters for the WOPI/TLS path, to revisit when base converges.)
## 2026-05-28 (later) — Gitea restored; consumed Adversary inbox; fixed F2-11 (SSO-skip-goes-green)
Gitea (git.autonomic.zone) recovered ~21:08Z (orchestrator confirmed). Reconciled: `git pull --rebase`
(up to date), pushed my 2 queued local commits (1138d77 + 4a118ea → origin), then a 3rd pull picked up
the Adversary's `b941f55` (its outage-queued writes: F2-11 + REVIEW-2 idle checkpoint + BUILDER-INBOX).
Consumed + deleted BUILDER-INBOX. The 3 watchdog pings during the outage were phantoms (Adversary's
failed push retries) — nothing was lost.
**Adversary's BUILDER-INBOX (digested):** DONE-gate warnings (F2-7 authentik, F2-9 cryptpad create-pad,
ghost §4.3 create-post floor, Q3.2 drive specifics, full P1P8 Q5 re-verify) — all need deploys, so
gated on the Docker Hub rate limit. Plus **F2-11** (medium, not a VETO), which is pure code → fixed it
now (rate-limit-independent).
**F2-11 — SSO-dep "deps-not-ready" SKIP must not yield a GREEN run.** Adversary cold-proved: when
`setup_custom_tests` fails for a DEPS-declaring recipe, `CCCI_DEPS_READY=0` → conftest skips every
`@requires_deps` test → a skip-only pytest file exits 0 → `run_custom` returns "pass" → `overall=0` →
`!testme` GREEN while the only SSO/OIDC test never ran. Violates P7.
Why my fix is shaped this way: the failure-isolation design (a transient SSO-setup failure must not
break the *generic* tier signal) is correct and I kept it — generic tier results stand untouched. The
defect was only that the green SIGNAL was indistinguishable from "SSO verified." So I correct the
signal, not the isolation:
- `conftest.pytest_collection_modifyitems` now COUNTS the requires_deps tests it skips and appends the
count to `$CCCI_DEPS_SKIP_REPORT` (one line per pytest invocation; orchestrator sums across the
per-custom-file loop). Chose a filesystem report (not exit code) because pytest has no "fail on
skip" and a skip-only file legitimately exits 0 — the orchestrator already shares run-scoped temp
files with the pytest subprocess (depsfile/statefile/countfile), so this matches the pattern.
- `run_recipe_ci`: reads + sums the count, surfaces it in RUN SUMMARY (`custom: pass (N requires_deps
SKIPPED ... SSO UNVERIFIED)`), and a new pure predicate `sso_dep_unverified(declared, deps_ready,
skipped)` flips `overall=1` when a recipe declares DEPS + deps not ready + ≥1 requires_deps skipped.
Gated on skip>0 so a deps-declaring recipe with no requires_deps tests isn't false-failed.
Verified (both deploy-free — rate-limit-independent):
1. `cc-ci-run -m pytest tests/unit -q` → **35 passed** (28 prior + 7 new in test_f211_sso_skip.py:
predicate truth table + conftest skip/record/append/noop-when-ready).
2. Cold real-test proof on cc-ci: `CCCI_DEPS_READY=0 CCCI_DEPS_SKIP_REPORT=/tmp/f211-skip.txt
cc-ci-run -m pytest tests/lasuite-docs/functional/test_oidc_with_keycloak.py -rs` → `1 skipped`,
`PYTEST_EXIT=0` (the hazard), but `/tmp/f211-skip.txt` now contains `1` → orchestrator would compute
`sso_dep_unverified(["keycloak"], False, 1)=True` → `overall=1`. Hazard closed.
Full e2e (real deploy with a forced setup_custom_tests failure → observe overall=1) deferred to when
the Docker Hub rate limit lifts; the unit + cold-real-test proofs cover the predicate, the conftest
signal on real files, and the count flow — only the sequential read→sum→predicate→overall wiring is
unexercised by a live run, and it's straight-line code.