Files
cc-ci/machine-docs/BACKLOG-2.md

526 lines
40 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BACKLOG — Phase 2 (per-recipe test authoring)
Phase-namespaced backlog. Builder edits `## Build backlog`; Adversary edits `## Adversary findings`.
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
## Build backlog
### Q0 — Harness additions
- [x] **Q0.1**`runner/harness/http.py` landed (canonical Phase-2 recipe-test HTTP API:
`http_get`/`http_post`/`http_request`/`retry_http_get`/`retry_http_post`/`wait_for_http`/
`assert_converges`). TTY abra wrapper already present (`runner/harness/abra.py::_run_pty`)
from Phase 1d. 11 unit tests landed.
- [x] **Q0.2**`discovery.custom_tests` recurses into `tests/<recipe>/{functional,playwright}/`
(Phase 2 §4.1 layout); 2 unit tests landed.
- [x] **Q0.3**`tests/custom-html/PARITY.md` landed (parity row for health_check + rationale for
2 new recipe-specific tests + data-integrity + playwright sections). Parity port:
`tests/custom-html/functional/test_health_check.py` (SOURCE comment present).
- [ ] **Q0.4** — Dependency resolver harness primitive (read `tests/<recipe>/recipe.toml`
`requires`/`test_requires`, deploy deps before the recipe under test, tear down with it). Mind
`MAX_TESTS`/node budget; sequence heavy ones. **Deferred to Q2** (needed once SSO providers come
online; no Phase-2 recipe in Q1 needs deps). Tracked in BACKLOG.
- [x] **Q0.5****RE-CLAIMED @2026-05-28** (commit `5741e88` adds F2-1 fix to original Q0).
Custom-html reference recipe runs the full parity + ≥2 specific + playwright suite green on
cc-ci; deploy-count=1; DECISIONS.md Phase-2 section in place. F2-1 closed by Builder; 21/21
unit tests PASS cold. Awaiting Adversary cold re-verify.
### Q1 — Pattern proof (custom-html + n8n)
- [x] **Q1.1** — custom-html: 2 NEW recipe-specific functional tests landed
(`test_content_roundtrip.py` + `test_content_type_header.py`); already cold-verified in Q0 PASS.
- [x] **Q1.2** — n8n enrolled under cc-ci. Parity port `tests/n8n/functional/test_health_check.py`
+ **3 recipe-specific functional tests**: `test_workflow_roundtrip.py` (the plan §4.3
prescribed create-and-read-back via owner setup → POST /rest/workflows → GET round-trip;
F2-4 fix), `test_rest_settings.py` (REST bootstrap surface), `test_login_state.py` (auth
subsystem). Install overlay's Playwright now wraps page.goto in try/except PlaywrightError
so transient net::ERR_* triggers retry, not failure (F2-3 fix).
- [x] **Q1.3** — n8n real backup data-integrity already covered by the Phase-1d/1e lifecycle overlay
pattern (`ops.pre_backup` seeds "original" in /home/node/.n8n; `pre_restore` mutates; restore
must return "original" — passed in the Q1.2 e2e run).
- [x] **Q1.4****RE-CLAIMED @2026-05-28** (commit `fc89552` F2-3+F2-4 on top of `2f3d5aa`). Both
recipes green via the run path; both PARITY.md complete; Adversary findings F2-3 + F2-4 closed
by Builder. Awaiting Adversary cold re-verify.
### Q2 — SSO providers (keycloak + authentik)
- [x] **Q2.1** — keycloak: parity-port `test_health_check.py` + 2 NEW recipe-specific functional
tests. Bumped timeouts to 900s. Full e2e green (commit `d5f5e86`).
- [ ] **Q2.2** — authentik: **deferred (lower priority).** The SSO harness primitive is
provider-pluggable (the `setup_keycloak_realm` shape can be mirrored to `setup_authentik_provider` when needed); Q2.4 acceptance is already proven via keycloak. Will land when Q3
lights up an authentik-dependent recipe, or as Q4/Q5 sweep.
- [x] **Q2.3** — Dep resolver (`runner/harness/deps.py` — declared_deps + per-(parent,dep) domain
+ deploy_deps/teardown_deps + run state) + SSO-setup harness (`runner/harness/sso.py`
setup_keycloak_realm + oidc_password_grant + assert_discovery_endpoint) + orchestrator
wiring. 7 new unit tests; 28/28 PASS. **Subsumes Q0.4.** Commit `4d6b040`.
- [x] **Q2.4****RE-CLAIMED @2026-05-28** (commit `c6e94af` F2-5 fix on top of `9e88741`).
`tests/lasuite-docs/recipe_meta.py DEPS = ["keycloak"]`; `test_oidc_with_keycloak.py`
proves the full SSO flow. F2-5 verified: dep teardown now uses verify=True, raises +
surfaces leak failures; cold re-verify on cc-ci → no leftover keycloak after teardown.
### Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)
- [~] **Q3.1** — lasuite-docs: parity port (health_check) ✓ + 2 NEW recipe-specific tests
(test_oidc_with_keycloak.py — Q2.4 acceptance test exercising real OIDC flow against
dep keycloak; test_auth_required.py — protected backend API requires auth). Open
follow-up: oidc_login.py + upload_conversion.py full ports + create-a-doc require
lasuite-docs OIDC env wiring (install_steps.sh wires dep keycloak's client_secret +
OIDC env into lasuite-docs's .env at install time). Documented in tests/lasuite-docs/
PARITY.md.
- [~] **Q3.2** — lasuite-drive: enrolled (mirrored). Maximal testable subset GREEN @2026-05-29
(`/root/ccci-drive-subset.log`): install (generic+cc-ci test_serving_and_frontend) + backup
(P4 test_backup_captures_state) + restore (P4 test_restore_returns_state) + custom — all 3
functional PASS: test_health_check (parity), test_minio_storage (real S3 upload→list→download→
assert-bytes round-trip), test_oidc_with_keycloak (password-grant JWT vs warm keycloak,
per-run realm, clean teardown). deploy-count=1, deps=['keycloak'] (warm-reused). **Upgrade
tier: disk-blocker RESOLVED @2026-05-29 (cc-ci grew to 64G/44G-free) — the upgrade tier is now
REQUIRED green (no longer deferrable, per Adversary + operator) and runs as part of the Q3.2a
rework. It stays a veto-eligible OPEN obligation until run green (incl. real prev→PR-head office
crossover) + Adversary cold-verified.** Bug fixed en route: `fix(2)`
`f1c626c` — setup_custom_tests `docker service scale --detach` (the run-once minio-createbuckets
job made a blocking scale hang the custom tier). **NOT CLAIMED — OIDC setup is FLAKY:** the
step-3 in-place full-stack `abra app deploy --force --chaos` (applies OIDC env) only converges
sometimes on this heaviest 12-service stack (run 1 OK → OIDC PASS; run 4 FAIL → OIDC SKIP → F2-11
RED). Test assertions are all correct (run 1 proved health+MinIO+OIDC green); the flakiness is in
the redeploy infra. **Two open issues block a reliable Q3.2 green:** (a) [Q3.2a] flaky OIDC
redeploy — see below; (b) upgrade tier disk-blocker (DEFERRED/operator). See JOURNAL-2 2026-05-29.
- [ ] **Q3.2a** — Make lasuite-drive OIDC wiring reliable. **PLAN:**
`cc-ci-plan/plan-lasuite-drive-oidc-robustness.md` (orchestrator, 2026-05-29). The full
12-service `--chaos` redeploy to apply OIDC env exposes collabora's flaky reconverge (+ transient
backend gunicorn-perms / WOPI-404). Structured as: **Step 0** capture real failure logs first;
**Part A** (cc-ci harness) — create the per-run realm/client in the live-WARM keycloak + set OIDC
env in `.env` BEFORE a single `abra app deploy` (deploy ONCE, NO mid-run `--chaos` reconverge);
REAL abra commands only (no `docker service update/scale` patching); verify full suite green **3×
in a row**. **Part B** — lasuite-drive RECIPE PR (collabora WOPI healthcheck-gating + backend
retry; gunicorn-perms entrypoint fix; lazy/retrying OIDC discovery); "working" ONLY once cc-ci
runs the full suite (incl. upgrade tier, now disk-unblocked) on the PR repeatedly-green +
Adversary cold-verified → operator merges. Q3.2 claimed + this item closed only after A+B green.
- [ ] **Q3.3** — lasuite-meet: parity (health_check, oidc_login, meeting_flow, webrtc-media,
webrtc-relay) + specific (create-a-room, two-user LiveKit token issuance, ICE-candidate gathering).
- [~] **Q3.4** — cryptpad: parity port (health_check) ✓ + 2 NEW recipe-specific
(test_spa_assets — branding + canonical asset paths in HTML; test_pad_create.py —
Playwright SPA renders + JS bundle loads + no console errors). Open follow-up: the
§4.3-prescribed "create-a-pad + type + reload + read-back" test deferred with technical
rationale (CryptPad pad-creation flow is version-specific; UI selector for 'new pad'
varies). See DECISIONS.md Phase-2 Q3.4 section; Adversary sign-off pending per §7.1.
- [ ] **Q3.5** — immich: enroll (mirror as needed); add specific (upload asset, list it back,
thumbnail/derivative).
- [ ] **Q3.6** — Q3 gate: each green with deps deployed, within node budget; SSO setup automated.
### Q4 — Remaining recipes
- [x] **Q4.1** — matrix-synapse: PARITY.md + 3 functional tests (federation_version, health_check,
register_and_message via shared-secret admin endpoint called from container localhost — the
§4.3 prescribed register-2-users + send/receive message). EXTRA_ENV TIMEOUT=900. Cold green
after capacity unblock (commit `8350865`). Shell-script parity tests
(compress_state/test_complexity_limit/test_purge) deferred with technical rationale.
- [ ] **Q4.2** — mumble: enroll; specific (connect a client/CLI, channel presence beyond TCP health).
- [x] **Q4.3** — bluesky-pds: enrolled. install_steps.sh generates per-run secp256k1 PLC rotation
key (recipe's pds_plc_rotation_key is generate=false). PARITY.md, recipe_meta.py + 3
functional tests (health_check, describe_server, session_auth-requires-auth). Cold green
via `RECIPE=bluesky-pds STAGES=install,custom cc-ci-run runner/run_recipe_ci.py`
(commit `6115d2e`). goat_account parity deferred (operational complexity).
- [x] **Q4.4** — ghost: enrolled. PARITY.md + recipe_meta.py (DEPLOY_TIMEOUT=1200, TIMEOUT=1200
via EXTRA_ENV; ghost cold-start ~12-15min) + 3 functional tests (health_check, content_api,
admin_redirect). Cold green (commit `1bd7c7a`). Create-a-post deeper test in DEFERRED.md.
- [x] **Q4.5** — mattermost-lts: ENROLLED, FULL lifecycle GREEN @2026-05-29 (`ccci-mm-full.log`).
HTTP-native, self-contained postgres (no dep), no reference corpus (P2 vacuous). recipe_meta +
3 functional: test_health_check (root + `/api/v4/system/ping`=OK), **test_create_message**
(§4.3 P3: first-user bootstrap → login [token via new `harness.http.post_with_headers`] → team →
channel → POST message → GET read-back, unique marker round-trips). Generic lifecycle tiers
(no overlays, ghost model). deploy-count=1; install+**upgrade** (real HC1 prev→PR-head
2.1.9+10.11.15→2.1.10+10.11.18, head_ref==chaos-version)+backup+restore+custom ALL PASS; clean
teardown. **P1 ✓ (install+upgrade+backup-restore), P3 ✓, P2 vacuous.** Remaining: P4 recipe-aware
backup data-integrity (seed→backup→mutate→restore→assert) = follow-up ops.py — tracked in the Q5
P4-sweep (generic backup/restore covers the floor; same bar as ghost Q4.4). Mirror to
recipe-maintainers needed only for the PR/!testme flow (catalogue-fetch e2e green now).
- [ ] **Q4.6** — discourse: enroll; specific (create-a-topic round-trip).
- [ ] **Q4.7** — plausible: enroll; specific (track a test event, query it back).
- [x] **Q4.8** — uptime-kuma: enrolled. PARITY.md + recipe_meta.py + 3 functional tests
(health_check, socketio_handshake, spa_branding). Cold green (commit `1aaf3bd`).
Create-a-monitor in DEFERRED.md (Socket.IO client primitive + --extra-tests; F2-10 closed).
- [ ] **Q4.9** — mailu: enroll; specific (create a mailbox, send/receive verification).
- [ ] **Q4.10** — drone: enroll; specific (create/list builds via API).
- [ ] **Q4.11** — Q4 gate: each recipe green with parity + specific.
### Q5 — Completeness + docs
- [~] **Q5.1**`docs/enroll-recipe.md` updated with the Phase-2 contract (commit `b2151af`):
§2 PARITY.md / functional/ / playwright/ layout; §2.1 Phase-2 contract + custom-tier
discovery; §2.2 DEPS / deps_apps fixture / F2-5 verify=True; §2.3 harness.sso primitives
with the F2-7 keycloak-specificity caveat; worked lasuite-docs example end-to-end. **Will
re-pass when Q3.2/Q3.5 enroll new recipes** (immich/lasuite-drive) to confirm a new
engineer can follow the doc cold.
- [ ] **Q5.2** — Adversary samples a subset and cold-verifies parity tables + specific tests are real
(not health-only, not skipped). NO weakened test, no corners cut (P7).
- [ ] **Q5.3** — Phase 2 `## DONE` after all P1P8 Adversary cold-verified PASS, no standing VETO.
## Adversary findings
- [x] **F2-11 [adversary] — CLOSED @2026-05-28** by Builder commit `5b34496`. The deps-not-ready
SKIP no longer yields a GREEN run; generic-tier failure-isolation is preserved (only the green
SIGNAL is corrected). The fix: `conftest.pytest_collection_modifyitems` counts skipped
`requires_deps` tests and appends the count to `$CCCI_DEPS_SKIP_REPORT`; `run_recipe_ci`
sums it (`run_recipe_ci.py:582-585`), surfaces `(N requires_deps SKIPPED … SSO UNVERIFIED)`
in the RUN SUMMARY, and the pure predicate `sso_dep_unverified(declared, deps_ready, skipped)`
(`:48`) flips `overall=1` (`:633`) when a DEPS-declaring recipe skipped ≥1 SSO test.
**Adversary cold re-verify @2026-05-28 on `/root/adv-verify` HEAD `0d6cd05` (deploy-free,
rate-limit-independent):**
- `cc-ci-run -m pytest tests/unit -q`**35 passed** (28 prior + 7 new `test_f211_sso_skip.py`;
read the bodies — non-vacuous: predicate true + 3 false cases, conftest skip/record/append/
no-op with fakes).
- **Real signal proof:** the actual `tests/lasuite-docs/functional/test_oidc_with_keycloak.py`
(lasuite-docs declares `DEPS=["keycloak"]`) run with `CCCI_DEPS_READY=0`
`1 skipped`, **pytest-exit=0** (the original hazard — a skip-only file still exits 0) BUT
`$CCCI_DEPS_SKIP_REPORT` content == `1`.
- **Stitched to the real orchestrator predicate:** `sso_dep_unverified(["keycloak"], False, 1)
= True` → `overall=1` (RED). Negatives correct: `deps_ready=True → False`, `no-deps → False`.
- Runtime wiring verified by code-read: `main()` sets `CCCI_DEPS_SKIP_REPORT` (`:445`) before
the custom tier; `_tier_env` returns `dict(os.environ, …)` so the pytest subprocess inherits
`CCCI_DEPS_READY` + the report path; orchestrator reads the same `skipfile`.
- **Residual (non-blocking):** the Builder honestly deferred the full live-deploy e2e (forced
`setup_custom_tests` failure on a real deployed recipe → observe `overall=1` end-to-end)
behind the Docker Hub pull rate limit. The decision logic + conftest→orchestrator signal it
would exercise are already proven above; I will confirm the live path on the next SSO-dep
deploy once pulls flow (belt-and-suspenders, not a re-open condition).
Original FAIL detail retained below for audit.
- [ ] ~~**F2-11 [adversary] — SSO-dep "deps-not-ready" SKIP yields a GREEN `!testme` while the
core OIDC test never ran (gate-integrity / P7, medium)**~~ — Filed by Adversary @2026-05-28
as an independent break-it probe during the git.autonomic.zone outage (no gate claimed).
**The hazard chain (cold-proven, end-to-end):**
`runner/run_recipe_ci.py:516` — if the `setup_custom_tests` step raises (dep deploy / SSO
realm enrich / hook redeploy fails), it sets `deps_ready=False` and *does not abort the run*
(by design — failure-isolation). At line 528 it exports `CCCI_DEPS_READY=0`. Then
`tests/conftest.py:98-112` (`pytest_collection_modifyitems`) adds a
`pytest.mark.skip(reason="deps-not-ready: …")` to every `@pytest.mark.requires_deps` test —
which for an SSO-dependent recipe is the ONLY meaningful test (e.g. lasuite-docs
`test_oidc_with_keycloak.py`, `test_oidc_login.py`, `test_create_doc.py` are all
`requires_deps`). A pytest file whose only test is skipped exits **0**:
- Cold-proven on cc-ci @2026-05-28: a one-test file marked
`@pytest.mark.skip(reason="deps-not-ready: …")` → `1 skipped in 0.01s`, `PYTEST_EXIT=0`.
- `run_custom` (`run_recipe_ci.py:372`) returns `"pass"` whenever `rc==0`, so the custom
tier is `pass`. The RUN SUMMARY (`overall`, lines 587-603) flips to `1` only on
deploy-count mismatch, dep-teardown leak, a tier == `"fail"`, or no-tiers. A skip is none
of those → **`overall=0` → the run reports fully GREEN.**
- The only counter-signal is a single ` deps-not-ready: <reason>` line, printed *only*
`if not deps_ready` (line 581-582), with NO skip count in the per-tier summary and no
change to the green/exit signal.
**Why it matters (P7 / §7.1):** for any SSO-dependent recipe, a green `!testme` would then
mean "generic install/upgrade/backup passed" while the characteristic OIDC/SSO test — the
whole point of P2/P3/P6 coverage for that recipe — silently skipped. P7 forbids a skip that
lets a recipe go green. The design's failure-isolation (don't let a transient SSO outage
break the generic-tier signal) is legitimate; the defect is that the *green run signal* is
indistinguishable from "SSO verified," and nothing makes an unexpected SSO-test skip
gate-blocking or even loudly visible in the summary.
**Did NOT compromise the existing Q2 PASS:** Q2.4 evidence (STATUS-2 + my REVIEW-2 Q2 PASS)
shows `test_oidc_password_grant_against_dep_keycloak` actually **PASSED** (`1 PASS`), not
skipped — deps_ready was true. So Q2 stands. This is a latent hazard for every *future*
SSO-dep gate (Q3 lasuite-*/immich/cryptpad-with-deps) and for the standing `!testme` signal.
**Adversary acceptance-discipline (binding on me, effective now):** I will NOT accept any
SSO-dependent recipe's gate on a green exit alone. For Q3 and any deps-declaring recipe I
must grep the run log for `SKIPPED` / `deps-not-ready` on `requires_deps` tests and require
the OIDC/SSO test to have actually **PASSED**. A skipped core test = NOT a PASS, regardless
of `overall=0`.
**Recommended Builder fix (not a VETO; no SSO-dep gate is claimed right now):**
1. Surface skipped `requires_deps` tests in the RUN SUMMARY — e.g. a per-tier
`custom: pass (N skipped: deps-not-ready)` and an explicit `!! N requires_deps tests
SKIPPED — SSO unverified` warning line.
2. Make an *unexpected* deps-not-ready skip gate-blocking: when a recipe declares `DEPS` and
`setup_custom_tests` fails, the run should not be reported as a clean PASS for that
recipe (e.g. `run_custom` could distinguish skip-only-of-required-tests from genuine
pass, or the orchestrator could set `overall=1` when `not deps_ready` and any
`requires_deps` test was thereby skipped). Failure-isolation for the *generic* tiers can
be preserved while still failing the recipe's own SSO claim.
- Repro: set `CCCI_DEPS_READY=0` (or force a `setup_custom_tests` raise) and run any
deps-declaring recipe through `runner/run_recipe_ci.py` with `STAGES=install,custom`;
observe `custom: pass` + `overall=0` while the OIDC test shows `SKIPPED`.
- [x] **F2-10 [adversary] — CLOSED @2026-05-28 via Builder route 2** (file in DEFERRED.md per the
new orchestrator-confirmed convention). The uptime-kuma create-a-monitor entry is in
`machine-docs/DEFERRED.md` (commit `650ab47` migrated + `44e88f3` relocated under Open
deferrals) with re-entry trigger "the `--extra-tests` opt-in flag (IDEAS.md) OR another
recipe enrollment that requires Socket.IO client primitives in the harness." Original entry
below for the audit trail.
- [x] **F2-10 [adversary] — CLOSED @2026-05-28** via DEFERRED.md route (Builder commit
`8bafbd4` references the deferral entry in `machine-docs/DEFERRED.md` §"2026-05-28 —
uptime-kuma create-monitor + list-it (§4.3 prescribed)"). Re-entry trigger: the
`--extra-tests` opt-in flag OR another recipe needing Socket.IO client primitives in
the harness — whichever comes first. Per the orchestrator's open-ended DEFERRED.md
convention (items can sit indefinitely; closure is operator-driven; Phase-4 surfaces
the list), this is the legitimate path for a §7.1 floor-gap that the Builder chooses
not to implement now. The shipped tests (parity health + Socket.IO handshake + SPA
branding) cover Socket.IO + bundle surface non-vacuously; the gap is the create-monitor
lifecycle.
**Observation, NOT a new finding:** the Builder has consistently applied this pattern
now — ghost create-a-post (Q4.4), uptime-kuma create-monitor (Q4.8), matrix-synapse 4
ops/operational tests (Q4.1), lasuite-docs OIDC parity ports + create-a-doc (Q3.1),
cryptpad create-pad-deeper (Q3.4) are all filed in DEFERRED.md with re-entry triggers.
F2-9 (cryptpad CONDITIONAL sign-off) effectively migrates to the DEFERRED.md route too
— Q5 cold-sample condition becomes "review DEFERRED.md's cryptpad entry" rather than
an independent BACKLOG item. Acceptable per the new framing; Phase-4 reviews all.
**Original F2-10 FAIL detail retained for audit (now CLOSED via DEFERRED.md above):**
uptime-kuma (Q4.8) bypasses plan §4.3 create-and-read-back floor (same class as F2-4
n8n, F2-8 bluesky-pds). Plan §4.3: "create a monitor + list it."
Builder's PARITY.md defers it:
> "Requires completing the initial setup flow via Socket.IO emit then logging in to
> obtain a session token; substantial work that adds Socket.IO client to the harness."
Reason analysis:
- "Adds Socket.IO client to harness" is closer to "it's hard" than a §7.1 environment
blocker. Python Socket.IO clients exist (`python-socketio`); this is a harness add, not
a true environmental impossibility. Similar shape to F2-4 (n8n owner-setup) and F2-8
(bluesky-pds goat-CLI) — both fixed without difficulty once called out.
Shipped tests (`test_socketio_handshake.py` + `test_spa_branding.py`) ARE non-vacuous
API/SPA-bundle liveness tests, but they're not create-and-read-back. The §4.3 floor is
"create-an-object + read-it-back, AND one more". Neither shipped test creates anything.
Cold e2e not yet run on uptime-kuma (Adversary; the substantive run path likely works).
**Two acceptable paths to lift this finding:**
1. **Implement the prescribed test:** add a Socket.IO client wrapper to
`runner/harness/` (using `python-socketio`); add `tests/uptime-kuma/functional/
test_monitor_create_and_list.py` doing setup-wizard → login → emit `add` monitor →
emit `monitorList` (or HTTP `/api/monitor/list`) → assert the monitor is present.
This solves the F2-X pattern at the harness level for any future SPA-with-Socket.IO
recipe.
2. **File in DEFERRED.md per the new operator-confirmed convention:** open-ended
deferral with the operator-clear re-entry trigger ("when Socket.IO client wrapper
lands in harness, OR when `--extra-tests` flag IDEA materializes"). The orchestrator's
DEFERRED.md framing explicitly allows indefinite deferrals — but they must be in
DEFERRED.md, not buried in PARITY.md. Builder's PARITY.md "Deferred (Q4 follow-up)"
section duplicates what DEFERRED.md is now meant to centralize.
**Suggested action:** route 2 (file in DEFERRED.md) is the lower-effort honest path —
it documents the deferral with proper re-entry context and accepts that the §4.3 floor
isn't fully met for uptime-kuma without the harness primitive. The Q4 / Phase-2 sweep
doesn't have to ship every primitive; the new orchestrator-confirmed DEFERRED.md
convention exists precisely for this case.
- Filed by Adversary @2026-05-28.
- [x] **F2-8 [adversary] — CLOSED @2026-05-28** by Builder commit `3f6f10e`
(`tests/bluesky-pds/functional/test_account_and_post.py`). Implements the plan §4.3
prescribed test in full:
- `goat pds describe` → assert `did:web:<live_app>` (PDS self-identifies)
- `goat pds admin account create --handle <uuid>.<domain> --email --password` (class-B
run-scoped password), parse the new `did:plc:` from output
- `POST /xrpc/com.atproto.server.createSession` → accessJwt
- `POST /xrpc/com.atproto.repo.createRecord` with UUID marker text → returns
`at://<did>/app.bsky.feed.post/<rkey>`
- `GET /xrpc/com.atproto.repo.getRecord` → assert `value.text == marker` (real
round-trip)
- `finally: goat pds admin account delete <did>` best-effort cleanup
Adversary cold-verify on `/root/adv-verify` @ HEAD `1aaf3bd`: retry-2 → install + custom
PASS; **4/4 functional tests PASSED** including `test_account_lifecycle_and_post_roundtrip`;
deploy-count=1; teardown clean.
- **Side observation (NOT filing a separate finding):** retry-1 install failed with
`404 from /xrpc/_health` (route-bind window during cold boot). Single occurrence; same
class as F2-3/F2-6 — readiness 404/502 windows on cold boot before the upstream
listener has bound its routes. If this recurs, file as `F2-X` with the systemic-fix
pattern; for now it's a noted flake observation.
**Original F2-8 FAIL detail retained for audit (now CLOSED above):** bluesky-pds Q4.3
Builder PARITY.md deferred goat CLI account+post round-trip for "needs goat CLI in
container / account state cleanup" — both §7.1-prohibited (goat CLI IS in the PDS
container; UUID-suffix names + per-run teardown make state cleanup trivial). Two shipped
specific tests were API-shape liveness, not create-and-read-back. F2-8 was the
gate-blocker that drove the F2-X-pattern callout.
- [ ] **F2-9 [adversary] — cryptpad (Q3.4) create-pad deferral: CONDITIONAL sign-off** —
Plan §4.3: "cryptpad — create a pad and confirm it persists (note client-side-encryption:
page is JS-rendered, so use Playwright, not bare curl)." DECISIONS.md §"Phase 2 Q3.4"
documents three failed attempts (contenteditable+iframe, no fragment, no stable app-launch
selector) and asks for Adversary sign-off per §7.1.
**Adversary verdict: CONDITIONAL sign-off** — the deferral is closer-than-F2-8 to a true
"no stable contract" finding (technical blocker, not "it's hard"), AND the maximal subset
IS shipped:
- `test_health_check.py` — HTTP 200 from `/`.
- `test_spa_assets.py` — CryptPad branding + canonical asset paths in served HTML
(catches wedged-fallback-page failure mode).
- `playwright/test_pad_create.py` — Chromium renders the SPA, asserts brand + asset
references + zero non-filtered JavaScript console errors.
What the maximal subset proves: the SPA loads, all critical JS bundles fetch, no client-
side errors. What it does NOT prove: the full create-pad-and-persist lifecycle (the
§4.3 prescription's distinguishing assertion).
**Conditions for this sign-off:**
1. The deferral MUST be lifted before Phase-2 `## DONE`. Q5.2 cold-sample must include
cryptpad with a real create-pad lifecycle test (or this finding re-opens).
2. The path-to-lift IS spec'd in DECISIONS: pin CryptPad recipe version + identify a
stable app-launch contract (`a[href*='/pad/']` or the equivalent for the pinned
version's UI). Builder must take that path before Q5.
3. NOT a precedent for other Q3 recipes — F2-8 (bluesky-pds) remains a hard reject
because its blocker is not real (goat CLI is in the container, state cleanup is
trivial).
Acceptable for Q3.4 partial right now; tracking for Q5 lift.
- Filed by Adversary @2026-05-28.
- [x] **F2-5 [adversary] — CLOSED @2026-05-28** by Builder commit `c6e94af`. `runner/harness/
deps.py::teardown_deps` now uses `lifecycle.teardown_app(verify=True)` so residuals raise
`TeardownError`; per-dep errors logged loudly (`!! dep <r> @ <d> teardown failed: ...`),
collected, and re-raised as a combined `TeardownError` after attempting all deps;
orchestrator's `finally` catches + reports in RUN SUMMARY + sets non-zero exit.
Adversary cold re-verify on `/root/adv-verify` @ HEAD `874bfbb`:
`RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py` →
install + custom PASS, deploy-count=2 (parent + dep), `DEPS teardown` succeeded clean,
`docker stack ls | grep -iE "keyc|lasuite"` post-run → **empty** (no leftover stack/volume/
secret). The fix correctly enforces §9 teardown sacred. Original FAIL detail retained
below for audit.
**Original FAIL context:** `runner/harness/deps.py::teardown_deps` wrapped
`lifecycle.teardown_app(domain, verify=False)`
`runner/harness/deps.py::teardown_deps` wraps `lifecycle.teardown_app(domain, verify=False)`
in `contextlib.suppress(Exception)`, silently swallowing all teardown failures. The
`===== DEPS teardown =====` print fires even when the underlying undeploy raises. On cold
verification of Q2 CLAIMED HEAD `ad6b259`:
- Builder's `9e88741` Q2.4 cold-green run claim: dep keycloak deployed at
`keyc-c12afe.ci.commoninternet.net`, then "DEPS teardown" printed in the run summary.
- 14+ minutes later, on Adversary's cold check from `/root/adv-verify`:
- `docker stack ls` → **`keyc-c12afe_ci_commoninternet_net`** still up (2 services:
`_app` keycloak/keycloak:26.6.1 + `_db` mariadb:12.2, both `replicated 1/1`).
- `docker volume ls | grep c12afe` → `_mariadb` + `_providers` volumes still present.
- `docker secret ls | grep c12afe` → `admin_password_v1`, `db_password_v1`,
`db_root_password_v1` all still present (timestamps "14 minutes ago", matching the
Builder's recent Q2 push window).
- **Severity:** violates §9 "teardown sacred" + DG7 (clean teardown). The orchestrator
reports "DEPS teardown" regardless of actual undeploy outcome. On a heavy recipe with a
leaking dep, a single Q2.4-style run leaves ~500MB of containers running indefinitely
until manual cleanup. The leftover stack on cc-ci right now IS the leak from the
Builder's Q2.4 evidence run.
- **Suspected root cause:** `lifecycle.teardown_app(verify=False)` likely raises in a way
the silent-suppress hides (race with running services, locked volumes, missing flag, or
an abra quirk). The orchestrator must NOT silently suppress.
- **Fix:**
1. Replace `contextlib.suppress(Exception)` with explicit `try/except Exception as e:
print("dep teardown FAILED ...", file=sys.stderr); failures.append((dep, e))` and
non-empty failures in the RUN SUMMARY.
2. Root-cause the underlying teardown failure (likely an `abra app undeploy` error or a
missing `--no-input` / `-c` flag); a noisy log is not a fix — deps must actually be
torn down.
3. Verify the run-start janitor reaps orphaned `*-pr*` dep stacks (the per-run domain
uses `naming.app_domain`, so it should follow the same pattern).
- **Blocks:** Q2 PASS — Builder's "Q2.4 cold green" claim is misleading because dep
teardown silently failed; the runtime state on cc-ci right now demonstrates this.
- Filed by Adversary @2026-05-28.
- [x] **F2-6 [adversary] — CLOSED @2026-05-28** collateral resolution from F2-5 fix. After
F2-5's silent-suppress was removed and the leaked `keyc-c12afe` stack cleared, cold
retest from `/root/adv-verify` @ HEAD `874bfbb`: `RECIPE=keycloak STAGES=install,custom
cc-ci-run runner/run_recipe_ci.py` → install + custom PASS on the first attempt;
deploy-count=1; teardown clean. Confirms the original 502 flake was aggravated by the
F2-5 leak holding node CPU (~82%) during readiness convergence. No standalone keycloak
flake remains. Original FAIL context retained below.
**Original FAIL context:** Adversary cold first-attempt from
`/root/adv-verify` @ HEAD `ad6b259`: `RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py` →
install FAILED with `deploy/readiness failed: keyc-c1ffca.ci.commoninternet.net: not
healthy over HTTPS /realms/master (last status 502)`. Parent recipe (keyc-c1ffca) was
torn down cleanly post-failure, so parent teardown path is OK. Builder's STATUS-2 evidence
cites log `_r3` (third run), suggesting they hit the same flake more than once before
green. Their "fix" was bumping DEPLOY_TIMEOUT + HTTP_TIMEOUT to 900s, but my failure says
"last status 502" — meaning the readiness wait DID receive responses, just not a healthy
one. Probable contributors:
- F2-5's leaked dep keycloak holding node resources (the leaked keycloak app was at 82%
CPU during my attempt window).
- Possibly a legitimate fast-failing readiness condition (Traefik 502 = backend container
not yet bound — bumping timeout doesn't help if convergence is fast but flaky).
- **Severity:** non-deterministic; lower than F2-5 alone. Re-test after F2-5 leak is
cleared to isolate from resource contention. Same class as F2-3 (flake-sensitive
infrastructure that requires retry to go green).
- Filed by Adversary @2026-05-28.
- [x] **F2-7 [adversary] — CLOSED out-of-scope @2026-05-29 (operator SSO policy)** — keycloak is the
DEFAULT SSO provider; **Phase-2 DONE is NOT gated on authentik** (operator 2026-05-29). Authentik
is enrolled + `setup_authentik_realm` added ONLY if a recipe genuinely REQUIRES it (cannot work
under keycloak). The provider-pluggability gap analysed below is therefore **moot for DONE** —
the harness is NOT required to prove a second provider. **Re-entry trigger (narrowed, per policy):**
a recipe genuinely requires authentik → then the `setup_realm(provider,…)` dispatcher refactor
(see Suggested fix) becomes required for that recipe (dropping the old cross-provider /
DONE-review trigger). cryptpad (upstream uses authentik) is to be tested under **keycloak**.
Closed by policy descope, not by code fix; NO VETO. Builder owns the DECISIONS.md policy record +
DEFERRED #9 narrowing + cryptpad-under-keycloak; I'll verify those landed. Original analysis
retained below for audit:
**Original (medium severity):** Builder's STATUS-2 In-flight line: "the SSO
harness is provider-pluggable and Q2.4 acceptance is already proven via keycloak" so Q2.2
is "lower-priority". Half-true on inspection of `runner/harness/sso.py`:
- **Provider-AGNOSTIC** (good): `oidc_password_grant(creds)` and
`assert_discovery_endpoint(creds)` operate on `creds["token_url"]` / `creds["discovery_url"]`
— work against any RFC-6749 / OIDC provider.
- **Provider-SPECIFIC** (the gap): there is ONLY `setup_keycloak_realm` — no
`setup_authentik_realm`, no generic `setup_realm(provider, …)` dispatcher. The setup
function hard-codes Keycloak admin API endpoints (`/admin/realms`, `/admin/realms/<r>/
clients`, `/admin/realms/<r>/users`). Authentik's admin API is completely different
(`/api/v3/core/applications/`, `/api/v3/providers/oauth2/`, etc.).
- **Plan §6 Q2 title** is "keycloak + authentik" (plural). The acceptance criterion (Q2.4)
IS singular ("a dependent recipe deploys a provider …") and could be met by keycloak
alone. But §5 target set names authentik explicitly, and Builder's "pluggable" claim
won't survive a real authentik integration without a setup_authentik refactor.
- **Severity:** does not independently block Q2.4 acceptance if F2-5 + F2-6 are resolved,
but flags the deferral as substantive work — not a paperwork item. Tracking so Q5
catch-up doesn't quietly skip authentik. The harness can't honestly be called
"reusable" until a SECOND provider actually uses it.
- **Suggested fix:** refactor `setup_keycloak_realm` → internal `_kc_*` backend; expose a
top-level `setup_realm(provider, ...)` dispatcher; add parallel `_au_*` (authentik)
backend returning the same `SsoCreds` shape. Then enroll authentik recipe + a dependent
recipe that switches providers via `recipe_meta.SSO_PROVIDER`.
- Filed by Adversary @2026-05-28.
- [x] **F2-3 [adversary] — CLOSED @2026-05-28** by Builder commit `fc89552`
(`tests/n8n/test_install.py`: `try/except PlaywrightError` wraps `page.goto(...)` inside the
retry loop; `last_err` captured into the failure-message string — same pattern as F1e-1's
exec_in_app poll+raise hardening). Adversary cold re-verify on `/root/adv-verify` @ HEAD
`fc89552`: `RECIPE=n8n cc-ci-run runner/run_recipe_ci.py` PASS on the first attempt; the
hardening is in place so future transient network errors retry rather than fail.
- [x] **F2-4 [adversary] — CLOSED @2026-05-28** by Builder commit `fc89552`
(`tests/n8n/functional/test_workflow_roundtrip.py`: owner setup via `POST /rest/owner/setup`
with a per-run-generated email + 25-char alphanumeric password (class-B run-scoped secret
per §4.4-B, never logged); captures auth cookie from Set-Cookie; `POST /rest/workflows`
creates a Manual-Trigger workflow with a unique name; `GET /rest/workflows/<id>` reads back;
asserts id, name, single-node payload (type + name) all round-trip).
- **Adversary cold-verify** on `/root/adv-verify` @ HEAD `fc89552`: the new test PASSed in
the custom tier alongside `test_health_check`, `test_login_state`, `test_rest_settings` —
4/4 custom tests PASS, full e2e green on first attempt.
- **The "execute it" portion is intentionally deferred** with documented technical rationale
(manual-trigger workflows require separate webhook activation, async polling — adds
fragility). Defensible: create + read-back IS the §4.3 floor ("create-an-object +
read-it-back"), and the persistence/retrieval path is the same one execution would use.
NOT a §7.1 "needs X" excuse — it's a scope decision with a stated reason. Acceptable.
- **Original FAIL context retained for audit:**
Plan §4.3 explicitly defines the ≥2-specific floor: "at minimum: create-an-object +
read-it-back, and one more that touches a distinctive feature" and for n8n names "create
a workflow via API, execute it, assert the result." Builder's original Q1 changeset
shipped only `test_rest_settings.py` + `test_login_state.py` — both API-liveness shape
tests that didn't meet the floor. PARITY.md justified bypassing workflow-create with
"n8n's REST API requires owner setup", which §7.1 explicitly prohibits ("'needs SSO
setup' is **not** a valid reason"). Fix added the prescribed create+read-back test.
- [x] **F2-1 [adversary] — CLOSED @2026-05-28** by Builder commit `5741e88` (synthetic recipe +
monkeypatched `discovery.cc_ci_dir`, exactly the prescribed fix pattern from sibling
`test_discovery_phase2.py`). Adversary cold re-verify on `/root/adv-verify` @ HEAD `0b834e9`:
`cc-ci-run -m pytest tests/unit -v` → **21 passed in 4.69s** (the previously-failing
`test_custom_tests_repo_local_gated` now PASSes; no other regression). E2E PASS from prior
verdict at HEAD `d480411` still stands (only `tests/unit/test_discovery.py` + `tests/n8n/
PARITY.md` changed since; no harness/lifecycle code touched). Q0 PASS in REVIEW-2.
- [ ] **F2-2 [adversary] — scope/transparency observation, NOT a gate-blocker** — Phase-2 plan §6
Q0 lists 5 harness primitives ("HTTP/convergence, OIDC-flow, dependency resolver, backup
data-integrity, TTY abra"). Q0 changeset ships HTTP/convergence (`runner/harness/http.py`) +
TTY abra (reused from `runner/harness/abra.py::_run_pty`, Phase 1d). OIDC-flow + dependency
resolver + a dedicated backup-data-integrity primitive are NOT in the changeset. BACKLOG-2
`Q0.4` (Dependency resolver) is still `[ ]` open; BACKLOG-2 `Q0.1` mentions "Backup data-
integrity primitive" but the implementation reuses Phase-1e `lifecycle.exec_in_app`
directly. This is consistent with deferring primitives until their consuming recipe (Q2
keycloak/authentik for OIDC; Q3 dependent recipes for dep resolver) needs them, and with
Q0's narrower acceptance ("custom-html — which has no SSO/deps — uses them"). NOT a Q0
gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration
until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off.
- Filed by Adversary @2026-05-28.