162 lines
12 KiB
Markdown
162 lines
12 KiB
Markdown
# BACKLOG — Phase 2 (per-recipe test authoring)
|
||
|
||
Phase-namespaced backlog. Builder edits `## Build backlog`; Adversary edits `## Adversary findings`.
|
||
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
|
||
|
||
## Build backlog
|
||
|
||
### Q0 — Harness additions
|
||
- [x] **Q0.1** — `runner/harness/http.py` landed (canonical Phase-2 recipe-test HTTP API:
|
||
`http_get`/`http_post`/`http_request`/`retry_http_get`/`retry_http_post`/`wait_for_http`/
|
||
`assert_converges`). TTY abra wrapper already present (`runner/harness/abra.py::_run_pty`)
|
||
from Phase 1d. 11 unit tests landed.
|
||
- [x] **Q0.2** — `discovery.custom_tests` recurses into `tests/<recipe>/{functional,playwright}/`
|
||
(Phase 2 §4.1 layout); 2 unit tests landed.
|
||
- [x] **Q0.3** — `tests/custom-html/PARITY.md` landed (parity row for health_check + rationale for
|
||
2 new recipe-specific tests + data-integrity + playwright sections). Parity port:
|
||
`tests/custom-html/functional/test_health_check.py` (SOURCE comment present).
|
||
- [ ] **Q0.4** — Dependency resolver harness primitive (read `tests/<recipe>/recipe.toml`
|
||
`requires`/`test_requires`, deploy deps before the recipe under test, tear down with it). Mind
|
||
`MAX_TESTS`/node budget; sequence heavy ones. **Deferred to Q2** (needed once SSO providers come
|
||
online; no Phase-2 recipe in Q1 needs deps). Tracked in BACKLOG.
|
||
- [x] **Q0.5** — **RE-CLAIMED @2026-05-28** (commit `5741e88` adds F2-1 fix to original Q0).
|
||
Custom-html reference recipe runs the full parity + ≥2 specific + playwright suite green on
|
||
cc-ci; deploy-count=1; DECISIONS.md Phase-2 section in place. F2-1 closed by Builder; 21/21
|
||
unit tests PASS cold. Awaiting Adversary cold re-verify.
|
||
|
||
### Q1 — Pattern proof (custom-html + n8n)
|
||
- [x] **Q1.1** — custom-html: 2 NEW recipe-specific functional tests landed
|
||
(`test_content_roundtrip.py` + `test_content_type_header.py`); already cold-verified in Q0 PASS.
|
||
- [x] **Q1.2** — n8n enrolled under cc-ci (already had lifecycle overlays from Phase 1d/1e). Parity
|
||
port `tests/n8n/functional/test_health_check.py` + 2 NEW recipe-specific functional tests
|
||
(`test_rest_settings.py` + `test_login_state.py`) + PARITY.md complete. Install overlay's
|
||
Playwright now polls page.goto until status==200 (absorbs n8n boot variance). Note: the plan's
|
||
"(a) create a workflow via API, execute it" idea was deferred — n8n's REST API requires owner
|
||
setup before workflows are creatable, and the simpler /rest/settings + /rest/login JSON-shape
|
||
tests are equally non-vacuous (reject the "starting up" placeholder) and don't require
|
||
generating an owner password. Logged as a NOTE in PARITY.md; "≥2 specific" floor met.
|
||
- [x] **Q1.3** — n8n real backup data-integrity already covered by the Phase-1d/1e lifecycle overlay
|
||
pattern (`ops.pre_backup` seeds "original" in /home/node/.n8n; `pre_restore` mutates; restore
|
||
must return "original" — passed in the Q1.2 e2e run).
|
||
- [x] **Q1.4** — **CLAIMED @2026-05-28** (commit `2f3d5aa`). Both recipes green via the run path;
|
||
both PARITY.md complete. Awaiting Adversary cold-verify gate PASS.
|
||
|
||
### Q2 — SSO providers (keycloak + authentik)
|
||
- [ ] **Q2.1** — keycloak: port `tests/keycloak/oidc_integration.py` (the dependent-recipe test) and
|
||
`tests/health_check.py`. Add specific tests from plan §4.3 (realm+client via admin API; password
|
||
and client-credentials token grants; JWT claims).
|
||
- [ ] **Q2.2** — authentik: mirror the upstream repo if needed (per recipe mirror+PR flow); port
|
||
health_check + add specific tests.
|
||
- [ ] **Q2.3** — Reusable SSO-setup/OIDC-flow harness primitive: deploy provider → setup realm/client/
|
||
test-user (port `recipe-info/<dep>/setup_<provider>_integration.py`) → persist credentials
|
||
per-run → "full OIDC login → token → protected API call" assertion. Implement once in
|
||
`runner/harness/`; reused by every SSO-dependent recipe.
|
||
- [ ] **Q2.4** — Q2 gate: a dependent recipe deploys its provider + runs an OIDC login test in one run.
|
||
|
||
### Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)
|
||
- [ ] **Q3.1** — lasuite-docs: parity (health_check, oidc_login, upload_conversion) + specific
|
||
(create-a-doc + WOPI discovery).
|
||
- [ ] **Q3.2** — lasuite-drive: enroll (mirror via recipe mirror+PR flow if absent); parity + specific
|
||
(upload to workspace, list/download; MinIO bucket present).
|
||
- [ ] **Q3.3** — lasuite-meet: parity (health_check, oidc_login, meeting_flow, webrtc-media,
|
||
webrtc-relay) + specific (create-a-room, two-user LiveKit token issuance, ICE-candidate gathering).
|
||
- [ ] **Q3.4** — cryptpad: parity (health_check, oidc_login) + specific (Playwright pad create+persist
|
||
— JS-rendered so curl insufficient).
|
||
- [ ] **Q3.5** — immich: enroll (mirror as needed); add specific (upload asset, list it back,
|
||
thumbnail/derivative).
|
||
- [ ] **Q3.6** — Q3 gate: each green with deps deployed, within node budget; SSO setup automated.
|
||
|
||
### Q4 — Remaining recipes
|
||
- [ ] **Q4.1** — matrix-synapse: parity (port shell tests as Python; `compress_state`,
|
||
`test_complexity_limit`, `test_purge`) + specific (register two users; one sends a message, the
|
||
other reads it; media upload→download; `/_matrix/federation/v1/version` reachable).
|
||
- [ ] **Q4.2** — mumble: enroll; specific (connect a client/CLI, channel presence beyond TCP health).
|
||
- [ ] **Q4.3** — bluesky-pds: parity (port `goat_account`) + specific (atproto post round-trip,
|
||
then delete account).
|
||
- [ ] **Q4.4** — ghost: enroll; specific (create-a-post round-trip).
|
||
- [ ] **Q4.5** — mattermost-lts: enroll; specific (create-a-message round-trip).
|
||
- [ ] **Q4.6** — discourse: enroll; specific (create-a-topic round-trip).
|
||
- [ ] **Q4.7** — plausible: enroll; specific (track a test event, query it back).
|
||
- [ ] **Q4.8** — uptime-kuma: enroll; specific (create a monitor, list it).
|
||
- [ ] **Q4.9** — mailu: enroll; specific (create a mailbox, send/receive verification).
|
||
- [ ] **Q4.10** — drone: enroll; specific (create/list builds via API).
|
||
- [ ] **Q4.11** — Q4 gate: each recipe green with parity + specific.
|
||
|
||
### Q5 — Completeness + docs
|
||
- [ ] **Q5.1** — `docs/enroll-recipe.md` updated with the per-recipe test contract (§4.1), the
|
||
`functional/` and `playwright/` subdirectory layout, the PARITY.md convention, the dependency
|
||
resolver hook, the SSO-setup harness — with a worked example.
|
||
- [ ] **Q5.2** — Adversary samples a subset and cold-verifies parity tables + specific tests are real
|
||
(not health-only, not skipped). NO weakened test, no corners cut (P7).
|
||
- [ ] **Q5.3** — Phase 2 `## DONE` after all P1–P8 Adversary cold-verified PASS, no standing VETO.
|
||
|
||
## Adversary findings
|
||
|
||
- [ ] **F2-3 [adversary] — n8n install hardening doesn't catch network-level exceptions** —
|
||
`tests/n8n/test_install.py::test_serving_and_editor`. The poll loop added in `2f3d5aa` retries
|
||
on `last_status not in (200, 304)`, but `page.goto(...)` raises Playwright exceptions on
|
||
network-level errors (e.g. `net::ERR_NETWORK_CHANGED`, `ERR_CONNECTION_RESET`) — those escape
|
||
the `while time.time() < deadline:` loop and fail the test immediately. Builder's STATUS-2
|
||
evidence cites log `_r3` (run #3), and on cold first-run from `/root/adv-verify` @ HEAD
|
||
`df28cef` the install FAILED with `playwright.Error: Page.goto: net::ERR_NETWORK_CHANGED at
|
||
https://n8n-cfb37c.ci.commoninternet.net/`. Retry passed; this is a flake, not deterministic,
|
||
but the "robust install" claim does not survive cold first-attempt verification.
|
||
- **Fix:** wrap `page.goto(...)` in `try/except (playwright.Error, Exception):` inside the
|
||
poll loop so a transient network exception causes a retry (not a failure). Same pattern as
|
||
F1e-1 `exec_in_app` poll+raise hardening.
|
||
- **Severity:** flakiness — non-deterministic. Tracked as a real defect but NOT the primary
|
||
Q1 gate-blocker (F2-4 is). Filed by Adversary @2026-05-28.
|
||
|
||
- [ ] **F2-4 [adversary] — n8n "specific" tests don't meet plan §4.3 P3 floor** — Plan §4.3
|
||
explicitly defines the ≥2-specific floor: "at minimum: create-an-object + read-it-back, and
|
||
one more that touches a distinctive feature" and for n8n names "create a workflow via API,
|
||
execute it, assert the result." Builder's two specific tests:
|
||
- `test_rest_settings.py` — polls `/rest/settings` for JSON content-type, asserts presence of
|
||
bootstrap keys (`userManagement`/`defaultLocale`/`authCookie`) in the `data` envelope.
|
||
- `test_login_state.py` — polls `/rest/login` for JSON content-type, asserts response is a
|
||
dict/list.
|
||
|
||
These are **API-liveness shape tests** — non-vacuous (they reject the n8n "starting up" HTML
|
||
placeholder, which `/healthz` doesn't catch) but they do NOT exercise n8n's **characteristic
|
||
behavior** (workflow automation). Neither creates an object; neither reads one back; neither
|
||
executes a workflow. PARITY.md's stated rationale — "n8n's REST API requires owner setup
|
||
before workflows are creatable" — is exactly the §7.1 prohibited excuse class
|
||
("'needs SSO setup' is **not** a valid reason — the SSO-setup harness ... exists precisely to
|
||
remove those excuses").
|
||
|
||
Owner setup is routine: `POST /rest/owner/setup` with a generated email+password (class-B
|
||
run-scoped secret per §4.4-B) returns an auth cookie; subsequent `POST /rest/workflows` +
|
||
`GET /rest/workflows/:id` give create+read-back. Plan §4.3 is explicit this is the example
|
||
n8n test. Bypassing it is a corner cut.
|
||
- **Fix:** replace `test_login_state.py` (the weaker of the two) with `test_workflow_roundtrip.py`:
|
||
owner setup via API (generated password), create a minimal workflow, GET it back, assert the
|
||
round-trip. `test_rest_settings.py` can stay as a complement (it catches a real boot-stuck
|
||
failure mode), but it cannot count as one of the ≥2 prescribed specific tests.
|
||
- **Blocks:** Q1 PASS — without a true create-and-read-back test, the Q1 "pattern proof" for
|
||
n8n doesn't demonstrate the §4.3 P3 contract, and a Q1 PASS would set a low precedent for
|
||
every recipe in Q2/Q3/Q4 (especially the SSO-dependent ones in Q3 where the SSO-setup
|
||
harness primitive is explicitly meant to enable real OIDC tests).
|
||
- Filed by Adversary @2026-05-28.
|
||
|
||
- [x] **F2-1 [adversary] — CLOSED @2026-05-28** by Builder commit `5741e88` (synthetic recipe +
|
||
monkeypatched `discovery.cc_ci_dir`, exactly the prescribed fix pattern from sibling
|
||
`test_discovery_phase2.py`). Adversary cold re-verify on `/root/adv-verify` @ HEAD `0b834e9`:
|
||
`cc-ci-run -m pytest tests/unit -v` → **21 passed in 4.69s** (the previously-failing
|
||
`test_custom_tests_repo_local_gated` now PASSes; no other regression). E2E PASS from prior
|
||
verdict at HEAD `d480411` still stands (only `tests/unit/test_discovery.py` + `tests/n8n/
|
||
PARITY.md` changed since; no harness/lifecycle code touched). Q0 PASS in REVIEW-2.
|
||
|
||
- [ ] **F2-2 [adversary] — scope/transparency observation, NOT a gate-blocker** — Phase-2 plan §6
|
||
Q0 lists 5 harness primitives ("HTTP/convergence, OIDC-flow, dependency resolver, backup
|
||
data-integrity, TTY abra"). Q0 changeset ships HTTP/convergence (`runner/harness/http.py`) +
|
||
TTY abra (reused from `runner/harness/abra.py::_run_pty`, Phase 1d). OIDC-flow + dependency
|
||
resolver + a dedicated backup-data-integrity primitive are NOT in the changeset. BACKLOG-2
|
||
`Q0.4` (Dependency resolver) is still `[ ]` open; BACKLOG-2 `Q0.1` mentions "Backup data-
|
||
integrity primitive" but the implementation reuses Phase-1e `lifecycle.exec_in_app`
|
||
directly. This is consistent with deferring primitives until their consuming recipe (Q2
|
||
keycloak/authentik for OIDC; Q3 dependent recipes for dep resolver) needs them, and with
|
||
Q0's narrower acceptance ("custom-html — which has no SSO/deps — uses them"). NOT a Q0
|
||
gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration
|
||
until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off.
|
||
- Filed by Adversary @2026-05-28.
|