526 lines
40 KiB
Markdown
526 lines
40 KiB
Markdown
# BACKLOG — Phase 2 (per-recipe test authoring)
|
||
|
||
Phase-namespaced backlog. Builder edits `## Build backlog`; Adversary edits `## Adversary findings`.
|
||
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
|
||
|
||
## Build backlog
|
||
|
||
### Q0 — Harness additions
|
||
- [x] **Q0.1** — `runner/harness/http.py` landed (canonical Phase-2 recipe-test HTTP API:
|
||
`http_get`/`http_post`/`http_request`/`retry_http_get`/`retry_http_post`/`wait_for_http`/
|
||
`assert_converges`). TTY abra wrapper already present (`runner/harness/abra.py::_run_pty`)
|
||
from Phase 1d. 11 unit tests landed.
|
||
- [x] **Q0.2** — `discovery.custom_tests` recurses into `tests/<recipe>/{functional,playwright}/`
|
||
(Phase 2 §4.1 layout); 2 unit tests landed.
|
||
- [x] **Q0.3** — `tests/custom-html/PARITY.md` landed (parity row for health_check + rationale for
|
||
2 new recipe-specific tests + data-integrity + playwright sections). Parity port:
|
||
`tests/custom-html/functional/test_health_check.py` (SOURCE comment present).
|
||
- [ ] **Q0.4** — Dependency resolver harness primitive (read `tests/<recipe>/recipe.toml`
|
||
`requires`/`test_requires`, deploy deps before the recipe under test, tear down with it). Mind
|
||
`MAX_TESTS`/node budget; sequence heavy ones. **Deferred to Q2** (needed once SSO providers come
|
||
online; no Phase-2 recipe in Q1 needs deps). Tracked in BACKLOG.
|
||
- [x] **Q0.5** — **RE-CLAIMED @2026-05-28** (commit `5741e88` adds F2-1 fix to original Q0).
|
||
Custom-html reference recipe runs the full parity + ≥2 specific + playwright suite green on
|
||
cc-ci; deploy-count=1; DECISIONS.md Phase-2 section in place. F2-1 closed by Builder; 21/21
|
||
unit tests PASS cold. Awaiting Adversary cold re-verify.
|
||
|
||
### Q1 — Pattern proof (custom-html + n8n)
|
||
- [x] **Q1.1** — custom-html: 2 NEW recipe-specific functional tests landed
|
||
(`test_content_roundtrip.py` + `test_content_type_header.py`); already cold-verified in Q0 PASS.
|
||
- [x] **Q1.2** — n8n enrolled under cc-ci. Parity port `tests/n8n/functional/test_health_check.py`
|
||
+ **3 recipe-specific functional tests**: `test_workflow_roundtrip.py` (the plan §4.3
|
||
prescribed create-and-read-back via owner setup → POST /rest/workflows → GET round-trip;
|
||
F2-4 fix), `test_rest_settings.py` (REST bootstrap surface), `test_login_state.py` (auth
|
||
subsystem). Install overlay's Playwright now wraps page.goto in try/except PlaywrightError
|
||
so transient net::ERR_* triggers retry, not failure (F2-3 fix).
|
||
- [x] **Q1.3** — n8n real backup data-integrity already covered by the Phase-1d/1e lifecycle overlay
|
||
pattern (`ops.pre_backup` seeds "original" in /home/node/.n8n; `pre_restore` mutates; restore
|
||
must return "original" — passed in the Q1.2 e2e run).
|
||
- [x] **Q1.4** — **RE-CLAIMED @2026-05-28** (commit `fc89552` F2-3+F2-4 on top of `2f3d5aa`). Both
|
||
recipes green via the run path; both PARITY.md complete; Adversary findings F2-3 + F2-4 closed
|
||
by Builder. Awaiting Adversary cold re-verify.
|
||
|
||
### Q2 — SSO providers (keycloak + authentik)
|
||
- [x] **Q2.1** — keycloak: parity-port `test_health_check.py` + 2 NEW recipe-specific functional
|
||
tests. Bumped timeouts to 900s. Full e2e green (commit `d5f5e86`).
|
||
- [ ] **Q2.2** — authentik: **deferred (lower priority).** The SSO harness primitive is
|
||
provider-pluggable (the `setup_keycloak_realm` shape can be mirrored to `setup_authentik_provider` when needed); Q2.4 acceptance is already proven via keycloak. Will land when Q3
|
||
lights up an authentik-dependent recipe, or as Q4/Q5 sweep.
|
||
- [x] **Q2.3** — Dep resolver (`runner/harness/deps.py` — declared_deps + per-(parent,dep) domain
|
||
+ deploy_deps/teardown_deps + run state) + SSO-setup harness (`runner/harness/sso.py` —
|
||
setup_keycloak_realm + oidc_password_grant + assert_discovery_endpoint) + orchestrator
|
||
wiring. 7 new unit tests; 28/28 PASS. **Subsumes Q0.4.** Commit `4d6b040`.
|
||
- [x] **Q2.4** — **RE-CLAIMED @2026-05-28** (commit `c6e94af` F2-5 fix on top of `9e88741`).
|
||
`tests/lasuite-docs/recipe_meta.py DEPS = ["keycloak"]`; `test_oidc_with_keycloak.py`
|
||
proves the full SSO flow. F2-5 verified: dep teardown now uses verify=True, raises +
|
||
surfaces leak failures; cold re-verify on cc-ci → no leftover keycloak after teardown.
|
||
|
||
### Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)
|
||
- [~] **Q3.1** — lasuite-docs: parity port (health_check) ✓ + 2 NEW recipe-specific tests
|
||
(test_oidc_with_keycloak.py — Q2.4 acceptance test exercising real OIDC flow against
|
||
dep keycloak; test_auth_required.py — protected backend API requires auth). Open
|
||
follow-up: oidc_login.py + upload_conversion.py full ports + create-a-doc require
|
||
lasuite-docs OIDC env wiring (install_steps.sh wires dep keycloak's client_secret +
|
||
OIDC env into lasuite-docs's .env at install time). Documented in tests/lasuite-docs/
|
||
PARITY.md.
|
||
- [~] **Q3.2** — lasuite-drive: enrolled (mirrored). Maximal testable subset GREEN @2026-05-29
|
||
(`/root/ccci-drive-subset.log`): install (generic+cc-ci test_serving_and_frontend) + backup
|
||
(P4 test_backup_captures_state) + restore (P4 test_restore_returns_state) + custom — all 3
|
||
functional PASS: test_health_check (parity), test_minio_storage (real S3 upload→list→download→
|
||
assert-bytes round-trip), test_oidc_with_keycloak (password-grant JWT vs warm keycloak,
|
||
per-run realm, clean teardown). deploy-count=1, deps=['keycloak'] (warm-reused). **Upgrade
|
||
tier: disk-blocker RESOLVED @2026-05-29 (cc-ci grew to 64G/44G-free) — the upgrade tier is now
|
||
REQUIRED green (no longer deferrable, per Adversary + operator) and runs as part of the Q3.2a
|
||
rework. It stays a veto-eligible OPEN obligation until run green (incl. real prev→PR-head office
|
||
crossover) + Adversary cold-verified.** Bug fixed en route: `fix(2)`
|
||
`f1c626c` — setup_custom_tests `docker service scale --detach` (the run-once minio-createbuckets
|
||
job made a blocking scale hang the custom tier). **NOT CLAIMED — OIDC setup is FLAKY:** the
|
||
step-3 in-place full-stack `abra app deploy --force --chaos` (applies OIDC env) only converges
|
||
sometimes on this heaviest 12-service stack (run 1 OK → OIDC PASS; run 4 FAIL → OIDC SKIP → F2-11
|
||
RED). Test assertions are all correct (run 1 proved health+MinIO+OIDC green); the flakiness is in
|
||
the redeploy infra. **Two open issues block a reliable Q3.2 green:** (a) [Q3.2a] flaky OIDC
|
||
redeploy — see below; (b) upgrade tier disk-blocker (DEFERRED/operator). See JOURNAL-2 2026-05-29.
|
||
- [ ] **Q3.2a** — Make lasuite-drive OIDC wiring reliable. **PLAN:**
|
||
`cc-ci-plan/plan-lasuite-drive-oidc-robustness.md` (orchestrator, 2026-05-29). The full
|
||
12-service `--chaos` redeploy to apply OIDC env exposes collabora's flaky reconverge (+ transient
|
||
backend gunicorn-perms / WOPI-404). Structured as: **Step 0** capture real failure logs first;
|
||
**Part A** (cc-ci harness) — create the per-run realm/client in the live-WARM keycloak + set OIDC
|
||
env in `.env` BEFORE a single `abra app deploy` (deploy ONCE, NO mid-run `--chaos` reconverge);
|
||
REAL abra commands only (no `docker service update/scale` patching); verify full suite green **3×
|
||
in a row**. **Part B** — lasuite-drive RECIPE PR (collabora WOPI healthcheck-gating + backend
|
||
retry; gunicorn-perms entrypoint fix; lazy/retrying OIDC discovery); "working" ONLY once cc-ci
|
||
runs the full suite (incl. upgrade tier, now disk-unblocked) on the PR repeatedly-green +
|
||
Adversary cold-verified → operator merges. Q3.2 claimed + this item closed only after A+B green.
|
||
- [ ] **Q3.3** — lasuite-meet: parity (health_check, oidc_login, meeting_flow, webrtc-media,
|
||
webrtc-relay) + specific (create-a-room, two-user LiveKit token issuance, ICE-candidate gathering).
|
||
- [~] **Q3.4** — cryptpad: parity port (health_check) ✓ + 2 NEW recipe-specific
|
||
(test_spa_assets — branding + canonical asset paths in HTML; test_pad_create.py —
|
||
Playwright SPA renders + JS bundle loads + no console errors). Open follow-up: the
|
||
§4.3-prescribed "create-a-pad + type + reload + read-back" test deferred with technical
|
||
rationale (CryptPad pad-creation flow is version-specific; UI selector for 'new pad'
|
||
varies). See DECISIONS.md Phase-2 Q3.4 section; Adversary sign-off pending per §7.1.
|
||
- [ ] **Q3.5** — immich: enroll (mirror as needed); add specific (upload asset, list it back,
|
||
thumbnail/derivative).
|
||
- [ ] **Q3.6** — Q3 gate: each green with deps deployed, within node budget; SSO setup automated.
|
||
|
||
### Q4 — Remaining recipes
|
||
- [x] **Q4.1** — matrix-synapse: PARITY.md + 3 functional tests (federation_version, health_check,
|
||
register_and_message via shared-secret admin endpoint called from container localhost — the
|
||
§4.3 prescribed register-2-users + send/receive message). EXTRA_ENV TIMEOUT=900. Cold green
|
||
after capacity unblock (commit `8350865`). Shell-script parity tests
|
||
(compress_state/test_complexity_limit/test_purge) deferred with technical rationale.
|
||
- [ ] **Q4.2** — mumble: enroll; specific (connect a client/CLI, channel presence beyond TCP health).
|
||
- [x] **Q4.3** — bluesky-pds: enrolled. install_steps.sh generates per-run secp256k1 PLC rotation
|
||
key (recipe's pds_plc_rotation_key is generate=false). PARITY.md, recipe_meta.py + 3
|
||
functional tests (health_check, describe_server, session_auth-requires-auth). Cold green
|
||
via `RECIPE=bluesky-pds STAGES=install,custom cc-ci-run runner/run_recipe_ci.py`
|
||
(commit `6115d2e`). goat_account parity deferred (operational complexity).
|
||
- [x] **Q4.4** — ghost: enrolled. PARITY.md + recipe_meta.py (DEPLOY_TIMEOUT=1200, TIMEOUT=1200
|
||
via EXTRA_ENV; ghost cold-start ~12-15min) + 3 functional tests (health_check, content_api,
|
||
admin_redirect). Cold green (commit `1bd7c7a`). Create-a-post deeper test in DEFERRED.md.
|
||
- [x] **Q4.5** — mattermost-lts: ENROLLED, FULL lifecycle GREEN @2026-05-29 (`ccci-mm-full.log`).
|
||
HTTP-native, self-contained postgres (no dep), no reference corpus (P2 vacuous). recipe_meta +
|
||
3 functional: test_health_check (root + `/api/v4/system/ping`=OK), **test_create_message**
|
||
(§4.3 P3: first-user bootstrap → login [token via new `harness.http.post_with_headers`] → team →
|
||
channel → POST message → GET read-back, unique marker round-trips). Generic lifecycle tiers
|
||
(no overlays, ghost model). deploy-count=1; install+**upgrade** (real HC1 prev→PR-head
|
||
2.1.9+10.11.15→2.1.10+10.11.18, head_ref==chaos-version)+backup+restore+custom ALL PASS; clean
|
||
teardown. **P1 ✓ (install+upgrade+backup-restore), P3 ✓, P2 vacuous.** Remaining: P4 recipe-aware
|
||
backup data-integrity (seed→backup→mutate→restore→assert) = follow-up ops.py — tracked in the Q5
|
||
P4-sweep (generic backup/restore covers the floor; same bar as ghost Q4.4). Mirror to
|
||
recipe-maintainers needed only for the PR/!testme flow (catalogue-fetch e2e green now).
|
||
- [ ] **Q4.6** — discourse: enroll; specific (create-a-topic round-trip).
|
||
- [ ] **Q4.7** — plausible: enroll; specific (track a test event, query it back).
|
||
- [x] **Q4.8** — uptime-kuma: enrolled. PARITY.md + recipe_meta.py + 3 functional tests
|
||
(health_check, socketio_handshake, spa_branding). Cold green (commit `1aaf3bd`).
|
||
Create-a-monitor in DEFERRED.md (Socket.IO client primitive + --extra-tests; F2-10 closed).
|
||
- [ ] **Q4.9** — mailu: enroll; specific (create a mailbox, send/receive verification).
|
||
- [ ] **Q4.10** — drone: enroll; specific (create/list builds via API).
|
||
- [ ] **Q4.11** — Q4 gate: each recipe green with parity + specific.
|
||
|
||
### Q5 — Completeness + docs
|
||
- [~] **Q5.1** — `docs/enroll-recipe.md` updated with the Phase-2 contract (commit `b2151af`):
|
||
§2 PARITY.md / functional/ / playwright/ layout; §2.1 Phase-2 contract + custom-tier
|
||
discovery; §2.2 DEPS / deps_apps fixture / F2-5 verify=True; §2.3 harness.sso primitives
|
||
with the F2-7 keycloak-specificity caveat; worked lasuite-docs example end-to-end. **Will
|
||
re-pass when Q3.2/Q3.5 enroll new recipes** (immich/lasuite-drive) to confirm a new
|
||
engineer can follow the doc cold.
|
||
- [ ] **Q5.2** — Adversary samples a subset and cold-verifies parity tables + specific tests are real
|
||
(not health-only, not skipped). NO weakened test, no corners cut (P7).
|
||
- [ ] **Q5.3** — Phase 2 `## DONE` after all P1–P8 Adversary cold-verified PASS, no standing VETO.
|
||
|
||
## Adversary findings
|
||
|
||
- [x] **F2-11 [adversary] — CLOSED @2026-05-28** by Builder commit `5b34496`. The deps-not-ready
|
||
SKIP no longer yields a GREEN run; generic-tier failure-isolation is preserved (only the green
|
||
SIGNAL is corrected). The fix: `conftest.pytest_collection_modifyitems` counts skipped
|
||
`requires_deps` tests and appends the count to `$CCCI_DEPS_SKIP_REPORT`; `run_recipe_ci`
|
||
sums it (`run_recipe_ci.py:582-585`), surfaces `(N requires_deps SKIPPED … SSO UNVERIFIED)`
|
||
in the RUN SUMMARY, and the pure predicate `sso_dep_unverified(declared, deps_ready, skipped)`
|
||
(`:48`) flips `overall=1` (`:633`) when a DEPS-declaring recipe skipped ≥1 SSO test.
|
||
**Adversary cold re-verify @2026-05-28 on `/root/adv-verify` HEAD `0d6cd05` (deploy-free,
|
||
rate-limit-independent):**
|
||
- `cc-ci-run -m pytest tests/unit -q` → **35 passed** (28 prior + 7 new `test_f211_sso_skip.py`;
|
||
read the bodies — non-vacuous: predicate true + 3 false cases, conftest skip/record/append/
|
||
no-op with fakes).
|
||
- **Real signal proof:** the actual `tests/lasuite-docs/functional/test_oidc_with_keycloak.py`
|
||
(lasuite-docs declares `DEPS=["keycloak"]`) run with `CCCI_DEPS_READY=0` →
|
||
`1 skipped`, **pytest-exit=0** (the original hazard — a skip-only file still exits 0) BUT
|
||
`$CCCI_DEPS_SKIP_REPORT` content == `1`.
|
||
- **Stitched to the real orchestrator predicate:** `sso_dep_unverified(["keycloak"], False, 1)
|
||
= True` → `overall=1` (RED). Negatives correct: `deps_ready=True → False`, `no-deps → False`.
|
||
- Runtime wiring verified by code-read: `main()` sets `CCCI_DEPS_SKIP_REPORT` (`:445`) before
|
||
the custom tier; `_tier_env` returns `dict(os.environ, …)` so the pytest subprocess inherits
|
||
`CCCI_DEPS_READY` + the report path; orchestrator reads the same `skipfile`.
|
||
- **Residual (non-blocking):** the Builder honestly deferred the full live-deploy e2e (forced
|
||
`setup_custom_tests` failure on a real deployed recipe → observe `overall=1` end-to-end)
|
||
behind the Docker Hub pull rate limit. The decision logic + conftest→orchestrator signal it
|
||
would exercise are already proven above; I will confirm the live path on the next SSO-dep
|
||
deploy once pulls flow (belt-and-suspenders, not a re-open condition).
|
||
Original FAIL detail retained below for audit.
|
||
|
||
- [ ] ~~**F2-11 [adversary] — SSO-dep "deps-not-ready" SKIP yields a GREEN `!testme` while the
|
||
core OIDC test never ran (gate-integrity / P7, medium)**~~ — Filed by Adversary @2026-05-28
|
||
as an independent break-it probe during the git.autonomic.zone outage (no gate claimed).
|
||
|
||
**The hazard chain (cold-proven, end-to-end):**
|
||
`runner/run_recipe_ci.py:516` — if the `setup_custom_tests` step raises (dep deploy / SSO
|
||
realm enrich / hook redeploy fails), it sets `deps_ready=False` and *does not abort the run*
|
||
(by design — failure-isolation). At line 528 it exports `CCCI_DEPS_READY=0`. Then
|
||
`tests/conftest.py:98-112` (`pytest_collection_modifyitems`) adds a
|
||
`pytest.mark.skip(reason="deps-not-ready: …")` to every `@pytest.mark.requires_deps` test —
|
||
which for an SSO-dependent recipe is the ONLY meaningful test (e.g. lasuite-docs
|
||
`test_oidc_with_keycloak.py`, `test_oidc_login.py`, `test_create_doc.py` are all
|
||
`requires_deps`). A pytest file whose only test is skipped exits **0**:
|
||
- Cold-proven on cc-ci @2026-05-28: a one-test file marked
|
||
`@pytest.mark.skip(reason="deps-not-ready: …")` → `1 skipped in 0.01s`, `PYTEST_EXIT=0`.
|
||
- `run_custom` (`run_recipe_ci.py:372`) returns `"pass"` whenever `rc==0`, so the custom
|
||
tier is `pass`. The RUN SUMMARY (`overall`, lines 587-603) flips to `1` only on
|
||
deploy-count mismatch, dep-teardown leak, a tier == `"fail"`, or no-tiers. A skip is none
|
||
of those → **`overall=0` → the run reports fully GREEN.**
|
||
- The only counter-signal is a single ` deps-not-ready: <reason>` line, printed *only*
|
||
`if not deps_ready` (line 581-582), with NO skip count in the per-tier summary and no
|
||
change to the green/exit signal.
|
||
|
||
**Why it matters (P7 / §7.1):** for any SSO-dependent recipe, a green `!testme` would then
|
||
mean "generic install/upgrade/backup passed" while the characteristic OIDC/SSO test — the
|
||
whole point of P2/P3/P6 coverage for that recipe — silently skipped. P7 forbids a skip that
|
||
lets a recipe go green. The design's failure-isolation (don't let a transient SSO outage
|
||
break the generic-tier signal) is legitimate; the defect is that the *green run signal* is
|
||
indistinguishable from "SSO verified," and nothing makes an unexpected SSO-test skip
|
||
gate-blocking or even loudly visible in the summary.
|
||
|
||
**Did NOT compromise the existing Q2 PASS:** Q2.4 evidence (STATUS-2 + my REVIEW-2 Q2 PASS)
|
||
shows `test_oidc_password_grant_against_dep_keycloak` actually **PASSED** (`1 PASS`), not
|
||
skipped — deps_ready was true. So Q2 stands. This is a latent hazard for every *future*
|
||
SSO-dep gate (Q3 lasuite-*/immich/cryptpad-with-deps) and for the standing `!testme` signal.
|
||
|
||
**Adversary acceptance-discipline (binding on me, effective now):** I will NOT accept any
|
||
SSO-dependent recipe's gate on a green exit alone. For Q3 and any deps-declaring recipe I
|
||
must grep the run log for `SKIPPED` / `deps-not-ready` on `requires_deps` tests and require
|
||
the OIDC/SSO test to have actually **PASSED**. A skipped core test = NOT a PASS, regardless
|
||
of `overall=0`.
|
||
|
||
**Recommended Builder fix (not a VETO; no SSO-dep gate is claimed right now):**
|
||
1. Surface skipped `requires_deps` tests in the RUN SUMMARY — e.g. a per-tier
|
||
`custom: pass (N skipped: deps-not-ready)` and an explicit `!! N requires_deps tests
|
||
SKIPPED — SSO unverified` warning line.
|
||
2. Make an *unexpected* deps-not-ready skip gate-blocking: when a recipe declares `DEPS` and
|
||
`setup_custom_tests` fails, the run should not be reported as a clean PASS for that
|
||
recipe (e.g. `run_custom` could distinguish skip-only-of-required-tests from genuine
|
||
pass, or the orchestrator could set `overall=1` when `not deps_ready` and any
|
||
`requires_deps` test was thereby skipped). Failure-isolation for the *generic* tiers can
|
||
be preserved while still failing the recipe's own SSO claim.
|
||
- Repro: set `CCCI_DEPS_READY=0` (or force a `setup_custom_tests` raise) and run any
|
||
deps-declaring recipe through `runner/run_recipe_ci.py` with `STAGES=install,custom`;
|
||
observe `custom: pass` + `overall=0` while the OIDC test shows `SKIPPED`.
|
||
|
||
- [x] **F2-10 [adversary] — CLOSED @2026-05-28 via Builder route 2** (file in DEFERRED.md per the
|
||
new orchestrator-confirmed convention). The uptime-kuma create-a-monitor entry is in
|
||
`machine-docs/DEFERRED.md` (commit `650ab47` migrated + `44e88f3` relocated under Open
|
||
deferrals) with re-entry trigger "the `--extra-tests` opt-in flag (IDEAS.md) OR another
|
||
recipe enrollment that requires Socket.IO client primitives in the harness." Original entry
|
||
below for the audit trail.
|
||
|
||
- [x] **F2-10 [adversary] — CLOSED @2026-05-28** via DEFERRED.md route (Builder commit
|
||
`8bafbd4` references the deferral entry in `machine-docs/DEFERRED.md` §"2026-05-28 —
|
||
uptime-kuma create-monitor + list-it (§4.3 prescribed)"). Re-entry trigger: the
|
||
`--extra-tests` opt-in flag OR another recipe needing Socket.IO client primitives in
|
||
the harness — whichever comes first. Per the orchestrator's open-ended DEFERRED.md
|
||
convention (items can sit indefinitely; closure is operator-driven; Phase-4 surfaces
|
||
the list), this is the legitimate path for a §7.1 floor-gap that the Builder chooses
|
||
not to implement now. The shipped tests (parity health + Socket.IO handshake + SPA
|
||
branding) cover Socket.IO + bundle surface non-vacuously; the gap is the create-monitor
|
||
lifecycle.
|
||
|
||
**Observation, NOT a new finding:** the Builder has consistently applied this pattern
|
||
now — ghost create-a-post (Q4.4), uptime-kuma create-monitor (Q4.8), matrix-synapse 4
|
||
ops/operational tests (Q4.1), lasuite-docs OIDC parity ports + create-a-doc (Q3.1),
|
||
cryptpad create-pad-deeper (Q3.4) are all filed in DEFERRED.md with re-entry triggers.
|
||
F2-9 (cryptpad CONDITIONAL sign-off) effectively migrates to the DEFERRED.md route too
|
||
— Q5 cold-sample condition becomes "review DEFERRED.md's cryptpad entry" rather than
|
||
an independent BACKLOG item. Acceptable per the new framing; Phase-4 reviews all.
|
||
|
||
**Original F2-10 FAIL detail retained for audit (now CLOSED via DEFERRED.md above):**
|
||
uptime-kuma (Q4.8) bypasses plan §4.3 create-and-read-back floor (same class as F2-4
|
||
n8n, F2-8 bluesky-pds). Plan §4.3: "create a monitor + list it."
|
||
Builder's PARITY.md defers it:
|
||
> "Requires completing the initial setup flow via Socket.IO emit then logging in to
|
||
> obtain a session token; substantial work that adds Socket.IO client to the harness."
|
||
|
||
Reason analysis:
|
||
- "Adds Socket.IO client to harness" is closer to "it's hard" than a §7.1 environment
|
||
blocker. Python Socket.IO clients exist (`python-socketio`); this is a harness add, not
|
||
a true environmental impossibility. Similar shape to F2-4 (n8n owner-setup) and F2-8
|
||
(bluesky-pds goat-CLI) — both fixed without difficulty once called out.
|
||
|
||
Shipped tests (`test_socketio_handshake.py` + `test_spa_branding.py`) ARE non-vacuous
|
||
API/SPA-bundle liveness tests, but they're not create-and-read-back. The §4.3 floor is
|
||
"create-an-object + read-it-back, AND one more". Neither shipped test creates anything.
|
||
|
||
Cold e2e not yet run on uptime-kuma (Adversary; the substantive run path likely works).
|
||
|
||
**Two acceptable paths to lift this finding:**
|
||
1. **Implement the prescribed test:** add a Socket.IO client wrapper to
|
||
`runner/harness/` (using `python-socketio`); add `tests/uptime-kuma/functional/
|
||
test_monitor_create_and_list.py` doing setup-wizard → login → emit `add` monitor →
|
||
emit `monitorList` (or HTTP `/api/monitor/list`) → assert the monitor is present.
|
||
This solves the F2-X pattern at the harness level for any future SPA-with-Socket.IO
|
||
recipe.
|
||
2. **File in DEFERRED.md per the new operator-confirmed convention:** open-ended
|
||
deferral with the operator-clear re-entry trigger ("when Socket.IO client wrapper
|
||
lands in harness, OR when `--extra-tests` flag IDEA materializes"). The orchestrator's
|
||
DEFERRED.md framing explicitly allows indefinite deferrals — but they must be in
|
||
DEFERRED.md, not buried in PARITY.md. Builder's PARITY.md "Deferred (Q4 follow-up)"
|
||
section duplicates what DEFERRED.md is now meant to centralize.
|
||
|
||
**Suggested action:** route 2 (file in DEFERRED.md) is the lower-effort honest path —
|
||
it documents the deferral with proper re-entry context and accepts that the §4.3 floor
|
||
isn't fully met for uptime-kuma without the harness primitive. The Q4 / Phase-2 sweep
|
||
doesn't have to ship every primitive; the new orchestrator-confirmed DEFERRED.md
|
||
convention exists precisely for this case.
|
||
- Filed by Adversary @2026-05-28.
|
||
|
||
- [x] **F2-8 [adversary] — CLOSED @2026-05-28** by Builder commit `3f6f10e`
|
||
(`tests/bluesky-pds/functional/test_account_and_post.py`). Implements the plan §4.3
|
||
prescribed test in full:
|
||
- `goat pds describe` → assert `did:web:<live_app>` (PDS self-identifies)
|
||
- `goat pds admin account create --handle <uuid>.<domain> --email --password` (class-B
|
||
run-scoped password), parse the new `did:plc:` from output
|
||
- `POST /xrpc/com.atproto.server.createSession` → accessJwt
|
||
- `POST /xrpc/com.atproto.repo.createRecord` with UUID marker text → returns
|
||
`at://<did>/app.bsky.feed.post/<rkey>`
|
||
- `GET /xrpc/com.atproto.repo.getRecord` → assert `value.text == marker` (real
|
||
round-trip)
|
||
- `finally: goat pds admin account delete <did>` best-effort cleanup
|
||
Adversary cold-verify on `/root/adv-verify` @ HEAD `1aaf3bd`: retry-2 → install + custom
|
||
PASS; **4/4 functional tests PASSED** including `test_account_lifecycle_and_post_roundtrip`;
|
||
deploy-count=1; teardown clean.
|
||
- **Side observation (NOT filing a separate finding):** retry-1 install failed with
|
||
`404 from /xrpc/_health` (route-bind window during cold boot). Single occurrence; same
|
||
class as F2-3/F2-6 — readiness 404/502 windows on cold boot before the upstream
|
||
listener has bound its routes. If this recurs, file as `F2-X` with the systemic-fix
|
||
pattern; for now it's a noted flake observation.
|
||
|
||
**Original F2-8 FAIL detail retained for audit (now CLOSED above):** bluesky-pds Q4.3
|
||
Builder PARITY.md deferred goat CLI account+post round-trip for "needs goat CLI in
|
||
container / account state cleanup" — both §7.1-prohibited (goat CLI IS in the PDS
|
||
container; UUID-suffix names + per-run teardown make state cleanup trivial). Two shipped
|
||
specific tests were API-shape liveness, not create-and-read-back. F2-8 was the
|
||
gate-blocker that drove the F2-X-pattern callout.
|
||
|
||
- [ ] **F2-9 [adversary] — cryptpad (Q3.4) create-pad deferral: CONDITIONAL sign-off** —
|
||
Plan §4.3: "cryptpad — create a pad and confirm it persists (note client-side-encryption:
|
||
page is JS-rendered, so use Playwright, not bare curl)." DECISIONS.md §"Phase 2 Q3.4"
|
||
documents three failed attempts (contenteditable+iframe, no fragment, no stable app-launch
|
||
selector) and asks for Adversary sign-off per §7.1.
|
||
|
||
**Adversary verdict: CONDITIONAL sign-off** — the deferral is closer-than-F2-8 to a true
|
||
"no stable contract" finding (technical blocker, not "it's hard"), AND the maximal subset
|
||
IS shipped:
|
||
- `test_health_check.py` — HTTP 200 from `/`.
|
||
- `test_spa_assets.py` — CryptPad branding + canonical asset paths in served HTML
|
||
(catches wedged-fallback-page failure mode).
|
||
- `playwright/test_pad_create.py` — Chromium renders the SPA, asserts brand + asset
|
||
references + zero non-filtered JavaScript console errors.
|
||
|
||
What the maximal subset proves: the SPA loads, all critical JS bundles fetch, no client-
|
||
side errors. What it does NOT prove: the full create-pad-and-persist lifecycle (the
|
||
§4.3 prescription's distinguishing assertion).
|
||
|
||
**Conditions for this sign-off:**
|
||
1. The deferral MUST be lifted before Phase-2 `## DONE`. Q5.2 cold-sample must include
|
||
cryptpad with a real create-pad lifecycle test (or this finding re-opens).
|
||
2. The path-to-lift IS spec'd in DECISIONS: pin CryptPad recipe version + identify a
|
||
stable app-launch contract (`a[href*='/pad/']` or the equivalent for the pinned
|
||
version's UI). Builder must take that path before Q5.
|
||
3. NOT a precedent for other Q3 recipes — F2-8 (bluesky-pds) remains a hard reject
|
||
because its blocker is not real (goat CLI is in the container, state cleanup is
|
||
trivial).
|
||
|
||
Acceptable for Q3.4 partial right now; tracking for Q5 lift.
|
||
- Filed by Adversary @2026-05-28.
|
||
|
||
- [x] **F2-5 [adversary] — CLOSED @2026-05-28** by Builder commit `c6e94af`. `runner/harness/
|
||
deps.py::teardown_deps` now uses `lifecycle.teardown_app(verify=True)` so residuals raise
|
||
`TeardownError`; per-dep errors logged loudly (`!! dep <r> @ <d> teardown failed: ...`),
|
||
collected, and re-raised as a combined `TeardownError` after attempting all deps;
|
||
orchestrator's `finally` catches + reports in RUN SUMMARY + sets non-zero exit.
|
||
Adversary cold re-verify on `/root/adv-verify` @ HEAD `874bfbb`:
|
||
`RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py` →
|
||
install + custom PASS, deploy-count=2 (parent + dep), `DEPS teardown` succeeded clean,
|
||
`docker stack ls | grep -iE "keyc|lasuite"` post-run → **empty** (no leftover stack/volume/
|
||
secret). The fix correctly enforces §9 teardown sacred. Original FAIL detail retained
|
||
below for audit.
|
||
|
||
**Original FAIL context:** `runner/harness/deps.py::teardown_deps` wrapped
|
||
`lifecycle.teardown_app(domain, verify=False)`
|
||
`runner/harness/deps.py::teardown_deps` wraps `lifecycle.teardown_app(domain, verify=False)`
|
||
in `contextlib.suppress(Exception)`, silently swallowing all teardown failures. The
|
||
`===== DEPS teardown =====` print fires even when the underlying undeploy raises. On cold
|
||
verification of Q2 CLAIMED HEAD `ad6b259`:
|
||
- Builder's `9e88741` Q2.4 cold-green run claim: dep keycloak deployed at
|
||
`keyc-c12afe.ci.commoninternet.net`, then "DEPS teardown" printed in the run summary.
|
||
- 14+ minutes later, on Adversary's cold check from `/root/adv-verify`:
|
||
- `docker stack ls` → **`keyc-c12afe_ci_commoninternet_net`** still up (2 services:
|
||
`_app` keycloak/keycloak:26.6.1 + `_db` mariadb:12.2, both `replicated 1/1`).
|
||
- `docker volume ls | grep c12afe` → `_mariadb` + `_providers` volumes still present.
|
||
- `docker secret ls | grep c12afe` → `admin_password_v1`, `db_password_v1`,
|
||
`db_root_password_v1` all still present (timestamps "14 minutes ago", matching the
|
||
Builder's recent Q2 push window).
|
||
- **Severity:** violates §9 "teardown sacred" + DG7 (clean teardown). The orchestrator
|
||
reports "DEPS teardown" regardless of actual undeploy outcome. On a heavy recipe with a
|
||
leaking dep, a single Q2.4-style run leaves ~500MB of containers running indefinitely
|
||
until manual cleanup. The leftover stack on cc-ci right now IS the leak from the
|
||
Builder's Q2.4 evidence run.
|
||
- **Suspected root cause:** `lifecycle.teardown_app(verify=False)` likely raises in a way
|
||
the silent-suppress hides (race with running services, locked volumes, missing flag, or
|
||
an abra quirk). The orchestrator must NOT silently suppress.
|
||
- **Fix:**
|
||
1. Replace `contextlib.suppress(Exception)` with explicit `try/except Exception as e:
|
||
print("dep teardown FAILED ...", file=sys.stderr); failures.append((dep, e))` and
|
||
non-empty failures in the RUN SUMMARY.
|
||
2. Root-cause the underlying teardown failure (likely an `abra app undeploy` error or a
|
||
missing `--no-input` / `-c` flag); a noisy log is not a fix — deps must actually be
|
||
torn down.
|
||
3. Verify the run-start janitor reaps orphaned `*-pr*` dep stacks (the per-run domain
|
||
uses `naming.app_domain`, so it should follow the same pattern).
|
||
- **Blocks:** Q2 PASS — Builder's "Q2.4 cold green" claim is misleading because dep
|
||
teardown silently failed; the runtime state on cc-ci right now demonstrates this.
|
||
- Filed by Adversary @2026-05-28.
|
||
|
||
- [x] **F2-6 [adversary] — CLOSED @2026-05-28** collateral resolution from F2-5 fix. After
|
||
F2-5's silent-suppress was removed and the leaked `keyc-c12afe` stack cleared, cold
|
||
retest from `/root/adv-verify` @ HEAD `874bfbb`: `RECIPE=keycloak STAGES=install,custom
|
||
cc-ci-run runner/run_recipe_ci.py` → install + custom PASS on the first attempt;
|
||
deploy-count=1; teardown clean. Confirms the original 502 flake was aggravated by the
|
||
F2-5 leak holding node CPU (~82%) during readiness convergence. No standalone keycloak
|
||
flake remains. Original FAIL context retained below.
|
||
|
||
**Original FAIL context:** Adversary cold first-attempt from
|
||
`/root/adv-verify` @ HEAD `ad6b259`: `RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py` →
|
||
install FAILED with `deploy/readiness failed: keyc-c1ffca.ci.commoninternet.net: not
|
||
healthy over HTTPS /realms/master (last status 502)`. Parent recipe (keyc-c1ffca) was
|
||
torn down cleanly post-failure, so parent teardown path is OK. Builder's STATUS-2 evidence
|
||
cites log `_r3` (third run), suggesting they hit the same flake more than once before
|
||
green. Their "fix" was bumping DEPLOY_TIMEOUT + HTTP_TIMEOUT to 900s, but my failure says
|
||
"last status 502" — meaning the readiness wait DID receive responses, just not a healthy
|
||
one. Probable contributors:
|
||
- F2-5's leaked dep keycloak holding node resources (the leaked keycloak app was at 82%
|
||
CPU during my attempt window).
|
||
- Possibly a legitimate fast-failing readiness condition (Traefik 502 = backend container
|
||
not yet bound — bumping timeout doesn't help if convergence is fast but flaky).
|
||
- **Severity:** non-deterministic; lower than F2-5 alone. Re-test after F2-5 leak is
|
||
cleared to isolate from resource contention. Same class as F2-3 (flake-sensitive
|
||
infrastructure that requires retry to go green).
|
||
- Filed by Adversary @2026-05-28.
|
||
|
||
- [x] **F2-7 [adversary] — CLOSED out-of-scope @2026-05-29 (operator SSO policy)** — keycloak is the
|
||
DEFAULT SSO provider; **Phase-2 DONE is NOT gated on authentik** (operator 2026-05-29). Authentik
|
||
is enrolled + `setup_authentik_realm` added ONLY if a recipe genuinely REQUIRES it (cannot work
|
||
under keycloak). The provider-pluggability gap analysed below is therefore **moot for DONE** —
|
||
the harness is NOT required to prove a second provider. **Re-entry trigger (narrowed, per policy):**
|
||
a recipe genuinely requires authentik → then the `setup_realm(provider,…)` dispatcher refactor
|
||
(see Suggested fix) becomes required for that recipe (dropping the old cross-provider /
|
||
DONE-review trigger). cryptpad (upstream uses authentik) is to be tested under **keycloak**.
|
||
Closed by policy descope, not by code fix; NO VETO. Builder owns the DECISIONS.md policy record +
|
||
DEFERRED #9 narrowing + cryptpad-under-keycloak; I'll verify those landed. Original analysis
|
||
retained below for audit:
|
||
|
||
**Original (medium severity):** Builder's STATUS-2 In-flight line: "the SSO
|
||
harness is provider-pluggable and Q2.4 acceptance is already proven via keycloak" so Q2.2
|
||
is "lower-priority". Half-true on inspection of `runner/harness/sso.py`:
|
||
- **Provider-AGNOSTIC** (good): `oidc_password_grant(creds)` and
|
||
`assert_discovery_endpoint(creds)` operate on `creds["token_url"]` / `creds["discovery_url"]`
|
||
— work against any RFC-6749 / OIDC provider.
|
||
- **Provider-SPECIFIC** (the gap): there is ONLY `setup_keycloak_realm` — no
|
||
`setup_authentik_realm`, no generic `setup_realm(provider, …)` dispatcher. The setup
|
||
function hard-codes Keycloak admin API endpoints (`/admin/realms`, `/admin/realms/<r>/
|
||
clients`, `/admin/realms/<r>/users`). Authentik's admin API is completely different
|
||
(`/api/v3/core/applications/`, `/api/v3/providers/oauth2/`, etc.).
|
||
- **Plan §6 Q2 title** is "keycloak + authentik" (plural). The acceptance criterion (Q2.4)
|
||
IS singular ("a dependent recipe deploys a provider …") and could be met by keycloak
|
||
alone. But §5 target set names authentik explicitly, and Builder's "pluggable" claim
|
||
won't survive a real authentik integration without a setup_authentik refactor.
|
||
- **Severity:** does not independently block Q2.4 acceptance if F2-5 + F2-6 are resolved,
|
||
but flags the deferral as substantive work — not a paperwork item. Tracking so Q5
|
||
catch-up doesn't quietly skip authentik. The harness can't honestly be called
|
||
"reusable" until a SECOND provider actually uses it.
|
||
- **Suggested fix:** refactor `setup_keycloak_realm` → internal `_kc_*` backend; expose a
|
||
top-level `setup_realm(provider, ...)` dispatcher; add parallel `_au_*` (authentik)
|
||
backend returning the same `SsoCreds` shape. Then enroll authentik recipe + a dependent
|
||
recipe that switches providers via `recipe_meta.SSO_PROVIDER`.
|
||
- Filed by Adversary @2026-05-28.
|
||
|
||
- [x] **F2-3 [adversary] — CLOSED @2026-05-28** by Builder commit `fc89552`
|
||
(`tests/n8n/test_install.py`: `try/except PlaywrightError` wraps `page.goto(...)` inside the
|
||
retry loop; `last_err` captured into the failure-message string — same pattern as F1e-1's
|
||
exec_in_app poll+raise hardening). Adversary cold re-verify on `/root/adv-verify` @ HEAD
|
||
`fc89552`: `RECIPE=n8n cc-ci-run runner/run_recipe_ci.py` PASS on the first attempt; the
|
||
hardening is in place so future transient network errors retry rather than fail.
|
||
|
||
- [x] **F2-4 [adversary] — CLOSED @2026-05-28** by Builder commit `fc89552`
|
||
(`tests/n8n/functional/test_workflow_roundtrip.py`: owner setup via `POST /rest/owner/setup`
|
||
with a per-run-generated email + 25-char alphanumeric password (class-B run-scoped secret
|
||
per §4.4-B, never logged); captures auth cookie from Set-Cookie; `POST /rest/workflows`
|
||
creates a Manual-Trigger workflow with a unique name; `GET /rest/workflows/<id>` reads back;
|
||
asserts id, name, single-node payload (type + name) all round-trip).
|
||
- **Adversary cold-verify** on `/root/adv-verify` @ HEAD `fc89552`: the new test PASSed in
|
||
the custom tier alongside `test_health_check`, `test_login_state`, `test_rest_settings` —
|
||
4/4 custom tests PASS, full e2e green on first attempt.
|
||
- **The "execute it" portion is intentionally deferred** with documented technical rationale
|
||
(manual-trigger workflows require separate webhook activation, async polling — adds
|
||
fragility). Defensible: create + read-back IS the §4.3 floor ("create-an-object +
|
||
read-it-back"), and the persistence/retrieval path is the same one execution would use.
|
||
NOT a §7.1 "needs X" excuse — it's a scope decision with a stated reason. Acceptable.
|
||
- **Original FAIL context retained for audit:**
|
||
Plan §4.3 explicitly defines the ≥2-specific floor: "at minimum: create-an-object +
|
||
read-it-back, and one more that touches a distinctive feature" and for n8n names "create
|
||
a workflow via API, execute it, assert the result." Builder's original Q1 changeset
|
||
shipped only `test_rest_settings.py` + `test_login_state.py` — both API-liveness shape
|
||
tests that didn't meet the floor. PARITY.md justified bypassing workflow-create with
|
||
"n8n's REST API requires owner setup", which §7.1 explicitly prohibits ("'needs SSO
|
||
setup' is **not** a valid reason"). Fix added the prescribed create+read-back test.
|
||
|
||
- [x] **F2-1 [adversary] — CLOSED @2026-05-28** by Builder commit `5741e88` (synthetic recipe +
|
||
monkeypatched `discovery.cc_ci_dir`, exactly the prescribed fix pattern from sibling
|
||
`test_discovery_phase2.py`). Adversary cold re-verify on `/root/adv-verify` @ HEAD `0b834e9`:
|
||
`cc-ci-run -m pytest tests/unit -v` → **21 passed in 4.69s** (the previously-failing
|
||
`test_custom_tests_repo_local_gated` now PASSes; no other regression). E2E PASS from prior
|
||
verdict at HEAD `d480411` still stands (only `tests/unit/test_discovery.py` + `tests/n8n/
|
||
PARITY.md` changed since; no harness/lifecycle code touched). Q0 PASS in REVIEW-2.
|
||
|
||
- [ ] **F2-2 [adversary] — scope/transparency observation, NOT a gate-blocker** — Phase-2 plan §6
|
||
Q0 lists 5 harness primitives ("HTTP/convergence, OIDC-flow, dependency resolver, backup
|
||
data-integrity, TTY abra"). Q0 changeset ships HTTP/convergence (`runner/harness/http.py`) +
|
||
TTY abra (reused from `runner/harness/abra.py::_run_pty`, Phase 1d). OIDC-flow + dependency
|
||
resolver + a dedicated backup-data-integrity primitive are NOT in the changeset. BACKLOG-2
|
||
`Q0.4` (Dependency resolver) is still `[ ]` open; BACKLOG-2 `Q0.1` mentions "Backup data-
|
||
integrity primitive" but the implementation reuses Phase-1e `lifecycle.exec_in_app`
|
||
directly. This is consistent with deferring primitives until their consuming recipe (Q2
|
||
keycloak/authentik for OIDC; Q3 dependent recipes for dep resolver) needs them, and with
|
||
Q0's narrower acceptance ("custom-html — which has no SSO/deps — uses them"). NOT a Q0
|
||
gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration
|
||
until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off.
|
||
- Filed by Adversary @2026-05-28.
|