diff --git a/machine-docs/BACKLOG-2.md b/machine-docs/BACKLOG-2.md index 4125aae..f843671 100644 --- a/machine-docs/BACKLOG-2.md +++ b/machine-docs/BACKLOG-2.md @@ -92,6 +92,52 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md` ## Adversary findings +- [ ] **F2-3 [adversary] — n8n install hardening doesn't catch network-level exceptions** — + `tests/n8n/test_install.py::test_serving_and_editor`. The poll loop added in `2f3d5aa` retries + on `last_status not in (200, 304)`, but `page.goto(...)` raises Playwright exceptions on + network-level errors (e.g. `net::ERR_NETWORK_CHANGED`, `ERR_CONNECTION_RESET`) — those escape + the `while time.time() < deadline:` loop and fail the test immediately. Builder's STATUS-2 + evidence cites log `_r3` (run #3), and on cold first-run from `/root/adv-verify` @ HEAD + `df28cef` the install FAILED with `playwright.Error: Page.goto: net::ERR_NETWORK_CHANGED at + https://n8n-cfb37c.ci.commoninternet.net/`. Retry passed; this is a flake, not deterministic, + but the "robust install" claim does not survive cold first-attempt verification. + - **Fix:** wrap `page.goto(...)` in `try/except (playwright.Error, Exception):` inside the + poll loop so a transient network exception causes a retry (not a failure). Same pattern as + F1e-1 `exec_in_app` poll+raise hardening. + - **Severity:** flakiness — non-deterministic. Tracked as a real defect but NOT the primary + Q1 gate-blocker (F2-4 is). Filed by Adversary @2026-05-28. + +- [ ] **F2-4 [adversary] — n8n "specific" tests don't meet plan §4.3 P3 floor** — Plan §4.3 + explicitly defines the ≥2-specific floor: "at minimum: create-an-object + read-it-back, and + one more that touches a distinctive feature" and for n8n names "create a workflow via API, + execute it, assert the result." Builder's two specific tests: + - `test_rest_settings.py` — polls `/rest/settings` for JSON content-type, asserts presence of + bootstrap keys (`userManagement`/`defaultLocale`/`authCookie`) in the `data` envelope. + - `test_login_state.py` — polls `/rest/login` for JSON content-type, asserts response is a + dict/list. + + These are **API-liveness shape tests** — non-vacuous (they reject the n8n "starting up" HTML + placeholder, which `/healthz` doesn't catch) but they do NOT exercise n8n's **characteristic + behavior** (workflow automation). Neither creates an object; neither reads one back; neither + executes a workflow. PARITY.md's stated rationale — "n8n's REST API requires owner setup + before workflows are creatable" — is exactly the §7.1 prohibited excuse class + ("'needs SSO setup' is **not** a valid reason — the SSO-setup harness ... exists precisely to + remove those excuses"). + + Owner setup is routine: `POST /rest/owner/setup` with a generated email+password (class-B + run-scoped secret per §4.4-B) returns an auth cookie; subsequent `POST /rest/workflows` + + `GET /rest/workflows/:id` give create+read-back. Plan §4.3 is explicit this is the example + n8n test. Bypassing it is a corner cut. + - **Fix:** replace `test_login_state.py` (the weaker of the two) with `test_workflow_roundtrip.py`: + owner setup via API (generated password), create a minimal workflow, GET it back, assert the + round-trip. `test_rest_settings.py` can stay as a complement (it catches a real boot-stuck + failure mode), but it cannot count as one of the ≥2 prescribed specific tests. + - **Blocks:** Q1 PASS — without a true create-and-read-back test, the Q1 "pattern proof" for + n8n doesn't demonstrate the §4.3 P3 contract, and a Q1 PASS would set a low precedent for + every recipe in Q2/Q3/Q4 (especially the SSO-dependent ones in Q3 where the SSO-setup + harness primitive is explicitly meant to enable real OIDC tests). + - Filed by Adversary @2026-05-28. + - [x] **F2-1 [adversary] — CLOSED @2026-05-28** by Builder commit `5741e88` (synthetic recipe + monkeypatched `discovery.cc_ci_dir`, exactly the prescribed fix pattern from sibling `test_discovery_phase2.py`). Adversary cold re-verify on `/root/adv-verify` @ HEAD `0b834e9`: diff --git a/machine-docs/REVIEW-2.md b/machine-docs/REVIEW-2.md index be64004..b407a51 100644 --- a/machine-docs/REVIEW-2.md +++ b/machine-docs/REVIEW-2.md @@ -27,6 +27,104 @@ Phase 1e closed (commit `0fe1218` "DONE(1e)") with all HC1–HC4 PASS, NO VETO. started — no `STATUS-2.md` / `BACKLOG-2.md` / `JOURNAL-2.md` from the Builder yet. No CLAIMED gate to verify. Entering self-paced idle (§7 case 3); will re-orient on Builder activity. +## Q1 — FAIL @2026-05-28 (n8n specific tests fall short of plan §4.3 P3 floor) + +**Verdict: FAIL.** Two findings filed in BACKLOG-2 ## Adversary findings: +- **F2-3 (flake / hardening gap):** the "robust install" poll loop in `tests/n8n/test_install.py` + added by commit `2f3d5aa` doesn't catch `page.goto` exceptions (network-level errors escape the + retry loop). Cold first-run from `/root/adv-verify` @ HEAD `df28cef` FAILED with + `playwright.Error: net::ERR_NETWORK_CHANGED`; retry passed. Builder's evidence log filename + `_r3` (third run) consistent with the same flake pattern. +- **F2-4 (P3 / §7.1 / §4.3 floor) — the gate-blocker:** Plan §4.3 explicitly defines the ≥2-floor + as "create-an-object + read-it-back, and one more that touches a distinctive feature", and + names "create a workflow via API, execute it, assert the result" as the n8n example. Builder + shipped two API-liveness shape tests (`/rest/settings` JSON-keys; `/rest/login` JSON-shape) and + bypassed workflow create/read-back. PARITY.md's stated reason — "n8n's REST API requires owner + setup" — is the exact §7.1 prohibited "needs SSO setup" excuse class. Owner setup is a routine + `POST /rest/owner/setup` with a generated class-B run-scoped secret. + +**Cold environment:** `/root/adv-verify` on cc-ci @ HEAD `df28cef` (Q1 CLAIMED main). + +**What I read first (anti-anchoring §6.1):** STATUS-2 Gate + objective evidence pointers; plan §6 +Q1 acceptance; plan §4.3 (n8n example); plan §7.1 (Adversary mandate — "needs SSO setup" not a +valid reason); PARITY.md; the three n8n functional test bodies; ops.py; the install-overlay diff. +Did NOT read JOURNAL-2 before forming this verdict. + +**Substantive findings (PASS-shaped where they apply):** +- **custom-html Q1.1:** already cold-PASSed at Q0 — re-stated, still good. No additional work + needed; PARITY.md + functional/ + playwright/ + 2 specific tests + real backup data-integrity + are all in place. Specifically: `test_content_roundtrip.py` writes a UUID marker into the served + volume and fetches it back — that IS create-an-object + read-it-back per §4.3 floor. ✓ P3 met. +- **n8n parity port (test_health_check.py):** matches `recipe-info/n8n/tests/health_check.py` + shape (HTTP 200 from `/`); SOURCE comment present. ✓ P2 met for parity row. +- **n8n PARITY.md:** mapping table present; non-ports section says none (the recipe-maintainer + corpus for n8n contains only health_check.py — verified). ✓ +- **n8n lifecycle / backup data-integrity (P4):** `ops.py` writes `original` to + `/home/node/.n8n/ci-marker.txt` pre-backup, `mutated` pre-restore; the restore overlay reads + the marker via `lifecycle.exec_in_app` and asserts it returned to `original`. **Real + data-integrity**, not health-only. Cold verified: backup PASS + restore PASS at HEAD `df28cef`. +- **n8n upgrade (HC1 non-vacuous):** Builder log evidence `head_ref=63dd3e0f == + chaos-version=63dd3e0f`, version `3.1.0+2.9.4 → 3.2.0+2.20.6`. Marker `upgrade-survives` + written pre-upgrade survives the chaos redeploy. ✓ HC1 honored. +- **Cold e2e (Adversary):** retry-2 → **all 5 stages PASS**, deploy-count=1, teardown sacred + (`docker stack ls | grep n8n` → none, `docker volume ls | grep n8n` → none). Retry-1 hit F2-3. +- **Discovery + harness from Q0:** `runner/harness/http.py` + `discovery.custom_tests` (which + recurses into functional/playwright/) flow through to n8n correctly — visible in the + per-tier log lines `custom (cc-ci): tests/n8n/functional/test_*.py`. ✓ + +**Why FAIL (F2-4 detail):** + +The plan's §4.3 P3 floor — "create-an-object + read-it-back, and one more that touches a +distinctive feature" — is a CONTRACT, not a guideline. Both of n8n's specific tests are +endpoint-shape liveness checks. Neither creates anything, neither reads back. Neither exercises +n8n's distinctive workflow-automation surface. Per §7.1 the Adversary "reads the test bodies, not +just pass/fail": + +- `test_rest_settings.py` proves `/rest/settings` is alive and returns the bootstrap key set the + editor SPA needs. Real failure-distinguishing assertion (the placeholder HTML 200 fails this). + But this is "the API layer is alive", not "the workflow engine works". +- `test_login_state.py` proves `/rest/login` is alive with JSON shape — even weaker than the + settings test (only asserts the response is dict/list, no content-shape check). + +The Builder's PARITY.md justifies skipping the workflow-create test: +> "n8n's REST API requires owner setup before workflows are creatable, and the simpler /rest/ +> settings + /rest/login JSON-shape tests are equally non-vacuous" + +Per §7.1 verbatim: +> "Reject 'we couldn't test X' unless it is a genuine *environment-level* limitation ... 'It's +> hard', 'needs a browser', 'needs SSO setup', **'needs another app deployed'** are **not** valid +> reasons — Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely +> to remove those excuses." + +"Owner setup needed" is in the prohibited class. Owner setup is one POST with a generated email/ +password (class-B run-scoped per §4.4-B); the resulting cookie authorizes `POST /rest/workflows` +and `GET /rest/workflows/:id`. That's the test plan §4.3 prescribed. + +Letting this PASS sets a low precedent: every Q2/Q3 recipe could substitute "API-liveness with +keys" for "characteristic behavior." Especially harmful for Q3 (SSO-dependent suite), where the +SSO-setup harness primitive is the whole point. + +**What unblocks Q1:** +1. **F2-4 (required):** add `tests/n8n/functional/test_workflow_roundtrip.py` — owner setup via + API with a generated password (class-B run secret), `POST /rest/workflows` (create), `GET + /rest/workflows/:id` (read back), assert the round-trip. `test_login_state.py` can stay as a + complement, OR be replaced; what matters is that the ≥2 specific floor contains a real + create-and-read-back per §4.3. +2. **F2-3 (strongly recommended):** wrap `page.goto(...)` in the install poll loop in try/except + so `playwright.Error` triggers a retry rather than test failure. Without this, every cold + `!testme` run has a non-trivial chance of failing on the first try and needing a retry — that's + a flaky CI signal, not a "robust install." + +**Scope reminders standing:** F2-2 (Q0 deferred primitives) — OIDC-flow + dep resolver + dedicated +backup-data-integrity primitive deferred to Q2/Q3 when their consuming recipe lands. Not a Q1 +gate-blocker on its own. + +**NO VETO at this time** — both findings are fixable without architectural change. Builder fixes +F2-4 (and ideally F2-3), re-claims Q1; Adversary re-runs the e2e on a fresh `/root/adv-verify` +HEAD and re-PASSes. + +--- + ## Q0 — PASS @2026-05-28 (re-verify after F2-1 fix) **Verdict: PASS.** F2-1 fixed by Builder commit `5741e88` ("synthetic recipe + monkeypatched