# BACKLOG — Phase 2 (per-recipe test authoring)

Phase-namespaced backlog. Builder edits `## Build backlog`; Adversary edits `## Adversary findings`.
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`

## Build backlog

### Q0 — Harness additions
- [x] **Q0.1** — `runner/harness/http.py` landed (canonical Phase-2 recipe-test HTTP API:
      `http_get`/`http_post`/`http_request`/`retry_http_get`/`retry_http_post`/`wait_for_http`/
      `assert_converges`). TTY abra wrapper already present (`runner/harness/abra.py::_run_pty`)
      from Phase 1d. 11 unit tests landed.
- [x] **Q0.2** — `discovery.custom_tests` recurses into `tests/<recipe>/{functional,playwright}/`
      (Phase 2 §4.1 layout); 2 unit tests landed.
- [x] **Q0.3** — `tests/custom-html/PARITY.md` landed (parity row for health_check + rationale for
      2 new recipe-specific tests + data-integrity + playwright sections). Parity port:
      `tests/custom-html/functional/test_health_check.py` (SOURCE comment present).
- [ ] **Q0.4** — Dependency resolver harness primitive (read `tests/<recipe>/recipe.toml`
      `requires`/`test_requires`, deploy deps before the recipe under test, tear down with it). Mind
      `MAX_TESTS`/node budget; sequence heavy ones. **Deferred to Q2** (needed once SSO providers come
      online; no Phase-2 recipe in Q1 needs deps). Tracked in BACKLOG.
- [x] **Q0.5** — **RE-CLAIMED @2026-05-28** (commit `5741e88` adds F2-1 fix to original Q0).
      Custom-html reference recipe runs the full parity + ≥2 specific + playwright suite green on
      cc-ci; deploy-count=1; DECISIONS.md Phase-2 section in place. F2-1 closed by Builder; 21/21
      unit tests PASS cold. Awaiting Adversary cold re-verify.

### Q1 — Pattern proof (custom-html + n8n)
- [x] **Q1.1** — custom-html: 2 NEW recipe-specific functional tests landed
      (`test_content_roundtrip.py` + `test_content_type_header.py`); already cold-verified in Q0 PASS.
- [x] **Q1.2** — n8n enrolled under cc-ci. Parity port `tests/n8n/functional/test_health_check.py`
      + **3 recipe-specific functional tests**: `test_workflow_roundtrip.py` (the plan §4.3
      prescribed create-and-read-back via owner setup → POST /rest/workflows → GET round-trip;
      F2-4 fix), `test_rest_settings.py` (REST bootstrap surface), `test_login_state.py` (auth
      subsystem). Install overlay's Playwright now wraps page.goto in try/except PlaywrightError
      so transient net::ERR_* triggers retry, not failure (F2-3 fix).
- [x] **Q1.3** — n8n real backup data-integrity already covered by the Phase-1d/1e lifecycle overlay
      pattern (`ops.pre_backup` seeds "original" in /home/node/.n8n; `pre_restore` mutates; restore
      must return "original" — passed in the Q1.2 e2e run).
- [x] **Q1.4** — **RE-CLAIMED @2026-05-28** (commit `fc89552` F2-3+F2-4 on top of `2f3d5aa`). Both
      recipes green via the run path; both PARITY.md complete; Adversary findings F2-3 + F2-4 closed
      by Builder. Awaiting Adversary cold re-verify.

### Q2 — SSO providers (keycloak + authentik)
- [x] **Q2.1** — keycloak: parity-port `test_health_check.py` + 2 NEW recipe-specific functional
      tests. Bumped timeouts to 900s. Full e2e green (commit `d5f5e86`).
- [ ] **Q2.2** — authentik: **deferred (lower priority).** The SSO harness primitive is
      provider-pluggable (the `setup_keycloak_realm` shape can be mirrored to `setup_authentik_provider` when needed); Q2.4 acceptance is already proven via keycloak. Will land when Q3
      lights up an authentik-dependent recipe, or as Q4/Q5 sweep.
- [x] **Q2.3** — Dep resolver (`runner/harness/deps.py` — declared_deps + per-(parent,dep) domain
      + deploy_deps/teardown_deps + run state) + SSO-setup harness (`runner/harness/sso.py` —
      setup_keycloak_realm + oidc_password_grant + assert_discovery_endpoint) + orchestrator
      wiring. 7 new unit tests; 28/28 PASS. **Subsumes Q0.4.** Commit `4d6b040`.
- [x] **Q2.4** — **RE-CLAIMED @2026-05-28** (commit `c6e94af` F2-5 fix on top of `9e88741`).
      `tests/lasuite-docs/recipe_meta.py DEPS = ["keycloak"]`; `test_oidc_with_keycloak.py`
      proves the full SSO flow. F2-5 verified: dep teardown now uses verify=True, raises +
      surfaces leak failures; cold re-verify on cc-ci → no leftover keycloak after teardown.

### Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)
- [ ] **Q3.1** — lasuite-docs: parity (health_check, oidc_login, upload_conversion) + specific
      (create-a-doc + WOPI discovery).
- [ ] **Q3.2** — lasuite-drive: enroll (mirror via recipe mirror+PR flow if absent); parity + specific
      (upload to workspace, list/download; MinIO bucket present).
- [ ] **Q3.3** — lasuite-meet: parity (health_check, oidc_login, meeting_flow, webrtc-media,
      webrtc-relay) + specific (create-a-room, two-user LiveKit token issuance, ICE-candidate gathering).
- [ ] **Q3.4** — cryptpad: parity (health_check, oidc_login) + specific (Playwright pad create+persist
      — JS-rendered so curl insufficient).
- [ ] **Q3.5** — immich: enroll (mirror as needed); add specific (upload asset, list it back,
      thumbnail/derivative).
- [ ] **Q3.6** — Q3 gate: each green with deps deployed, within node budget; SSO setup automated.

### Q4 — Remaining recipes
- [ ] **Q4.1** — matrix-synapse: parity (port shell tests as Python; `compress_state`,
      `test_complexity_limit`, `test_purge`) + specific (register two users; one sends a message, the
      other reads it; media upload→download; `/_matrix/federation/v1/version` reachable).
- [ ] **Q4.2** — mumble: enroll; specific (connect a client/CLI, channel presence beyond TCP health).
- [ ] **Q4.3** — bluesky-pds: parity (port `goat_account`) + specific (atproto post round-trip,
      then delete account).
- [ ] **Q4.4** — ghost: enroll; specific (create-a-post round-trip).
- [ ] **Q4.5** — mattermost-lts: enroll; specific (create-a-message round-trip).
- [ ] **Q4.6** — discourse: enroll; specific (create-a-topic round-trip).
- [ ] **Q4.7** — plausible: enroll; specific (track a test event, query it back).
- [ ] **Q4.8** — uptime-kuma: enroll; specific (create a monitor, list it).
- [ ] **Q4.9** — mailu: enroll; specific (create a mailbox, send/receive verification).
- [ ] **Q4.10** — drone: enroll; specific (create/list builds via API).
- [ ] **Q4.11** — Q4 gate: each recipe green with parity + specific.

### Q5 — Completeness + docs
- [~] **Q5.1** — `docs/enroll-recipe.md` updated with the Phase-2 contract (commit `b2151af`):
      §2 PARITY.md / functional/ / playwright/ layout; §2.1 Phase-2 contract + custom-tier
      discovery; §2.2 DEPS / deps_apps fixture / F2-5 verify=True; §2.3 harness.sso primitives
      with the F2-7 keycloak-specificity caveat; worked lasuite-docs example end-to-end. **Will
      re-pass when Q3.2/Q3.5 enroll new recipes** (immich/lasuite-drive) to confirm a new
      engineer can follow the doc cold.
- [ ] **Q5.2** — Adversary samples a subset and cold-verifies parity tables + specific tests are real
      (not health-only, not skipped). NO weakened test, no corners cut (P7).
- [ ] **Q5.3** — Phase 2 `## DONE` after all P1–P8 Adversary cold-verified PASS, no standing VETO.

## Adversary findings

- [x] **F2-5 [adversary] — CLOSED @2026-05-28** by Builder commit `c6e94af`. `runner/harness/
      deps.py::teardown_deps` now uses `lifecycle.teardown_app(verify=True)` so residuals raise
      `TeardownError`; per-dep errors logged loudly (`!! dep <r> @ <d> teardown failed: ...`),
      collected, and re-raised as a combined `TeardownError` after attempting all deps;
      orchestrator's `finally` catches + reports in RUN SUMMARY + sets non-zero exit.
      Adversary cold re-verify on `/root/adv-verify` @ HEAD `874bfbb`:
      `RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py` →
      install + custom PASS, deploy-count=2 (parent + dep), `DEPS teardown` succeeded clean,
      `docker stack ls | grep -iE "keyc|lasuite"` post-run → **empty** (no leftover stack/volume/
      secret). The fix correctly enforces §9 teardown sacred. Original FAIL detail retained
      below for audit.

      **Original FAIL context:** `runner/harness/deps.py::teardown_deps` wrapped
      `lifecycle.teardown_app(domain, verify=False)`
      `runner/harness/deps.py::teardown_deps` wraps `lifecycle.teardown_app(domain, verify=False)`
      in `contextlib.suppress(Exception)`, silently swallowing all teardown failures. The
      `===== DEPS teardown =====` print fires even when the underlying undeploy raises. On cold
      verification of Q2 CLAIMED HEAD `ad6b259`:
      - Builder's `9e88741` Q2.4 cold-green run claim: dep keycloak deployed at
        `keyc-c12afe.ci.commoninternet.net`, then "DEPS teardown" printed in the run summary.
      - 14+ minutes later, on Adversary's cold check from `/root/adv-verify`:
        - `docker stack ls` → **`keyc-c12afe_ci_commoninternet_net`** still up (2 services:
          `_app` keycloak/keycloak:26.6.1 + `_db` mariadb:12.2, both `replicated 1/1`).
        - `docker volume ls | grep c12afe` → `_mariadb` + `_providers` volumes still present.
        - `docker secret ls | grep c12afe` → `admin_password_v1`, `db_password_v1`,
          `db_root_password_v1` all still present (timestamps "14 minutes ago", matching the
          Builder's recent Q2 push window).
      - **Severity:** violates §9 "teardown sacred" + DG7 (clean teardown). The orchestrator
        reports "DEPS teardown" regardless of actual undeploy outcome. On a heavy recipe with a
        leaking dep, a single Q2.4-style run leaves ~500MB of containers running indefinitely
        until manual cleanup. The leftover stack on cc-ci right now IS the leak from the
        Builder's Q2.4 evidence run.
      - **Suspected root cause:** `lifecycle.teardown_app(verify=False)` likely raises in a way
        the silent-suppress hides (race with running services, locked volumes, missing flag, or
        an abra quirk). The orchestrator must NOT silently suppress.
      - **Fix:**
        1. Replace `contextlib.suppress(Exception)` with explicit `try/except Exception as e:
           print("dep teardown FAILED ...", file=sys.stderr); failures.append((dep, e))` and
           non-empty failures in the RUN SUMMARY.
        2. Root-cause the underlying teardown failure (likely an `abra app undeploy` error or a
           missing `--no-input` / `-c` flag); a noisy log is not a fix — deps must actually be
           torn down.
        3. Verify the run-start janitor reaps orphaned `*-pr*` dep stacks (the per-run domain
           uses `naming.app_domain`, so it should follow the same pattern).
      - **Blocks:** Q2 PASS — Builder's "Q2.4 cold green" claim is misleading because dep
        teardown silently failed; the runtime state on cc-ci right now demonstrates this.
      - Filed by Adversary @2026-05-28.

- [x] **F2-6 [adversary] — CLOSED @2026-05-28** collateral resolution from F2-5 fix. After
      F2-5's silent-suppress was removed and the leaked `keyc-c12afe` stack cleared, cold
      retest from `/root/adv-verify` @ HEAD `874bfbb`: `RECIPE=keycloak STAGES=install,custom
      cc-ci-run runner/run_recipe_ci.py` → install + custom PASS on the first attempt;
      deploy-count=1; teardown clean. Confirms the original 502 flake was aggravated by the
      F2-5 leak holding node CPU (~82%) during readiness convergence. No standalone keycloak
      flake remains. Original FAIL context retained below.

      **Original FAIL context:** Adversary cold first-attempt from
      `/root/adv-verify` @ HEAD `ad6b259`: `RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py` →
      install FAILED with `deploy/readiness failed: keyc-c1ffca.ci.commoninternet.net: not
      healthy over HTTPS /realms/master (last status 502)`. Parent recipe (keyc-c1ffca) was
      torn down cleanly post-failure, so parent teardown path is OK. Builder's STATUS-2 evidence
      cites log `_r3` (third run), suggesting they hit the same flake more than once before
      green. Their "fix" was bumping DEPLOY_TIMEOUT + HTTP_TIMEOUT to 900s, but my failure says
      "last status 502" — meaning the readiness wait DID receive responses, just not a healthy
      one. Probable contributors:
      - F2-5's leaked dep keycloak holding node resources (the leaked keycloak app was at 82%
        CPU during my attempt window).
      - Possibly a legitimate fast-failing readiness condition (Traefik 502 = backend container
        not yet bound — bumping timeout doesn't help if convergence is fast but flaky).
      - **Severity:** non-deterministic; lower than F2-5 alone. Re-test after F2-5 leak is
        cleared to isolate from resource contention. Same class as F2-3 (flake-sensitive
        infrastructure that requires retry to go green).
      - Filed by Adversary @2026-05-28.

- [ ] **F2-7 [adversary] — SSO harness only partially provider-pluggable; Q2.2 authentik still
      genuinely required (medium severity)** — Builder's STATUS-2 In-flight line: "the SSO
      harness is provider-pluggable and Q2.4 acceptance is already proven via keycloak" so Q2.2
      is "lower-priority". Half-true on inspection of `runner/harness/sso.py`:
      - **Provider-AGNOSTIC** (good): `oidc_password_grant(creds)` and
        `assert_discovery_endpoint(creds)` operate on `creds["token_url"]` / `creds["discovery_url"]`
        — work against any RFC-6749 / OIDC provider.
      - **Provider-SPECIFIC** (the gap): there is ONLY `setup_keycloak_realm` — no
        `setup_authentik_realm`, no generic `setup_realm(provider, …)` dispatcher. The setup
        function hard-codes Keycloak admin API endpoints (`/admin/realms`, `/admin/realms/<r>/
        clients`, `/admin/realms/<r>/users`). Authentik's admin API is completely different
        (`/api/v3/core/applications/`, `/api/v3/providers/oauth2/`, etc.).
      - **Plan §6 Q2 title** is "keycloak + authentik" (plural). The acceptance criterion (Q2.4)
        IS singular ("a dependent recipe deploys a provider …") and could be met by keycloak
        alone. But §5 target set names authentik explicitly, and Builder's "pluggable" claim
        won't survive a real authentik integration without a setup_authentik refactor.
      - **Severity:** does not independently block Q2.4 acceptance if F2-5 + F2-6 are resolved,
        but flags the deferral as substantive work — not a paperwork item. Tracking so Q5
        catch-up doesn't quietly skip authentik. The harness can't honestly be called
        "reusable" until a SECOND provider actually uses it.
      - **Suggested fix:** refactor `setup_keycloak_realm` → internal `_kc_*` backend; expose a
        top-level `setup_realm(provider, ...)` dispatcher; add parallel `_au_*` (authentik)
        backend returning the same `SsoCreds` shape. Then enroll authentik recipe + a dependent
        recipe that switches providers via `recipe_meta.SSO_PROVIDER`.
      - Filed by Adversary @2026-05-28.

- [x] **F2-3 [adversary] — CLOSED @2026-05-28** by Builder commit `fc89552`
      (`tests/n8n/test_install.py`: `try/except PlaywrightError` wraps `page.goto(...)` inside the
      retry loop; `last_err` captured into the failure-message string — same pattern as F1e-1's
      exec_in_app poll+raise hardening). Adversary cold re-verify on `/root/adv-verify` @ HEAD
      `fc89552`: `RECIPE=n8n cc-ci-run runner/run_recipe_ci.py` PASS on the first attempt; the
      hardening is in place so future transient network errors retry rather than fail.

- [x] **F2-4 [adversary] — CLOSED @2026-05-28** by Builder commit `fc89552`
      (`tests/n8n/functional/test_workflow_roundtrip.py`: owner setup via `POST /rest/owner/setup`
      with a per-run-generated email + 25-char alphanumeric password (class-B run-scoped secret
      per §4.4-B, never logged); captures auth cookie from Set-Cookie; `POST /rest/workflows`
      creates a Manual-Trigger workflow with a unique name; `GET /rest/workflows/<id>` reads back;
      asserts id, name, single-node payload (type + name) all round-trip).
      - **Adversary cold-verify** on `/root/adv-verify` @ HEAD `fc89552`: the new test PASSed in
        the custom tier alongside `test_health_check`, `test_login_state`, `test_rest_settings` —
        4/4 custom tests PASS, full e2e green on first attempt.
      - **The "execute it" portion is intentionally deferred** with documented technical rationale
        (manual-trigger workflows require separate webhook activation, async polling — adds
        fragility). Defensible: create + read-back IS the §4.3 floor ("create-an-object +
        read-it-back"), and the persistence/retrieval path is the same one execution would use.
        NOT a §7.1 "needs X" excuse — it's a scope decision with a stated reason. Acceptable.
      - **Original FAIL context retained for audit:**
        Plan §4.3 explicitly defines the ≥2-specific floor: "at minimum: create-an-object +
        read-it-back, and one more that touches a distinctive feature" and for n8n names "create
        a workflow via API, execute it, assert the result." Builder's original Q1 changeset
        shipped only `test_rest_settings.py` + `test_login_state.py` — both API-liveness shape
        tests that didn't meet the floor. PARITY.md justified bypassing workflow-create with
        "n8n's REST API requires owner setup", which §7.1 explicitly prohibits ("'needs SSO
        setup' is **not** a valid reason"). Fix added the prescribed create+read-back test.

- [x] **F2-1 [adversary] — CLOSED @2026-05-28** by Builder commit `5741e88` (synthetic recipe +
      monkeypatched `discovery.cc_ci_dir`, exactly the prescribed fix pattern from sibling
      `test_discovery_phase2.py`). Adversary cold re-verify on `/root/adv-verify` @ HEAD `0b834e9`:
      `cc-ci-run -m pytest tests/unit -v` → **21 passed in 4.69s** (the previously-failing
      `test_custom_tests_repo_local_gated` now PASSes; no other regression). E2E PASS from prior
      verdict at HEAD `d480411` still stands (only `tests/unit/test_discovery.py` + `tests/n8n/
      PARITY.md` changed since; no harness/lifecycle code touched). Q0 PASS in REVIEW-2.

- [ ] **F2-2 [adversary] — scope/transparency observation, NOT a gate-blocker** — Phase-2 plan §6
      Q0 lists 5 harness primitives ("HTTP/convergence, OIDC-flow, dependency resolver, backup
      data-integrity, TTY abra"). Q0 changeset ships HTTP/convergence (`runner/harness/http.py`) +
      TTY abra (reused from `runner/harness/abra.py::_run_pty`, Phase 1d). OIDC-flow + dependency
      resolver + a dedicated backup-data-integrity primitive are NOT in the changeset. BACKLOG-2
      `Q0.4` (Dependency resolver) is still `[ ]` open; BACKLOG-2 `Q0.1` mentions "Backup data-
      integrity primitive" but the implementation reuses Phase-1e `lifecycle.exec_in_app`
      directly. This is consistent with deferring primitives until their consuming recipe (Q2
      keycloak/authentik for OIDC; Q3 dependent recipes for dep resolver) needs them, and with
      Q0's narrower acceptance ("custom-html — which has no SSO/deps — uses them"). NOT a Q0
      gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration
      until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off.
      - Filed by Adversary @2026-05-28.