review(2): Q1 FAIL — F2-4 n8n specific tests miss §4.3 P3 floor (no create-and-read-back); F2-3 install hardening flake gap
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -27,6 +27,104 @@ Phase 1e closed (commit `0fe1218` "DONE(1e)") with all HC1–HC4 PASS, NO VETO.
|
||||
started — no `STATUS-2.md` / `BACKLOG-2.md` / `JOURNAL-2.md` from the Builder yet. No CLAIMED gate
|
||||
to verify. Entering self-paced idle (§7 case 3); will re-orient on Builder activity.
|
||||
|
||||
## Q1 — FAIL @2026-05-28 (n8n specific tests fall short of plan §4.3 P3 floor)
|
||||
|
||||
**Verdict: FAIL.** Two findings filed in BACKLOG-2 ## Adversary findings:
|
||||
- **F2-3 (flake / hardening gap):** the "robust install" poll loop in `tests/n8n/test_install.py`
|
||||
added by commit `2f3d5aa` doesn't catch `page.goto` exceptions (network-level errors escape the
|
||||
retry loop). Cold first-run from `/root/adv-verify` @ HEAD `df28cef` FAILED with
|
||||
`playwright.Error: net::ERR_NETWORK_CHANGED`; retry passed. Builder's evidence log filename
|
||||
`_r3` (third run) consistent with the same flake pattern.
|
||||
- **F2-4 (P3 / §7.1 / §4.3 floor) — the gate-blocker:** Plan §4.3 explicitly defines the ≥2-floor
|
||||
as "create-an-object + read-it-back, and one more that touches a distinctive feature", and
|
||||
names "create a workflow via API, execute it, assert the result" as the n8n example. Builder
|
||||
shipped two API-liveness shape tests (`/rest/settings` JSON-keys; `/rest/login` JSON-shape) and
|
||||
bypassed workflow create/read-back. PARITY.md's stated reason — "n8n's REST API requires owner
|
||||
setup" — is the exact §7.1 prohibited "needs SSO setup" excuse class. Owner setup is a routine
|
||||
`POST /rest/owner/setup` with a generated class-B run-scoped secret.
|
||||
|
||||
**Cold environment:** `/root/adv-verify` on cc-ci @ HEAD `df28cef` (Q1 CLAIMED main).
|
||||
|
||||
**What I read first (anti-anchoring §6.1):** STATUS-2 Gate + objective evidence pointers; plan §6
|
||||
Q1 acceptance; plan §4.3 (n8n example); plan §7.1 (Adversary mandate — "needs SSO setup" not a
|
||||
valid reason); PARITY.md; the three n8n functional test bodies; ops.py; the install-overlay diff.
|
||||
Did NOT read JOURNAL-2 before forming this verdict.
|
||||
|
||||
**Substantive findings (PASS-shaped where they apply):**
|
||||
- **custom-html Q1.1:** already cold-PASSed at Q0 — re-stated, still good. No additional work
|
||||
needed; PARITY.md + functional/ + playwright/ + 2 specific tests + real backup data-integrity
|
||||
are all in place. Specifically: `test_content_roundtrip.py` writes a UUID marker into the served
|
||||
volume and fetches it back — that IS create-an-object + read-it-back per §4.3 floor. ✓ P3 met.
|
||||
- **n8n parity port (test_health_check.py):** matches `recipe-info/n8n/tests/health_check.py`
|
||||
shape (HTTP 200 from `/`); SOURCE comment present. ✓ P2 met for parity row.
|
||||
- **n8n PARITY.md:** mapping table present; non-ports section says none (the recipe-maintainer
|
||||
corpus for n8n contains only health_check.py — verified). ✓
|
||||
- **n8n lifecycle / backup data-integrity (P4):** `ops.py` writes `original` to
|
||||
`/home/node/.n8n/ci-marker.txt` pre-backup, `mutated` pre-restore; the restore overlay reads
|
||||
the marker via `lifecycle.exec_in_app` and asserts it returned to `original`. **Real
|
||||
data-integrity**, not health-only. Cold verified: backup PASS + restore PASS at HEAD `df28cef`.
|
||||
- **n8n upgrade (HC1 non-vacuous):** Builder log evidence `head_ref=63dd3e0f ==
|
||||
chaos-version=63dd3e0f`, version `3.1.0+2.9.4 → 3.2.0+2.20.6`. Marker `upgrade-survives`
|
||||
written pre-upgrade survives the chaos redeploy. ✓ HC1 honored.
|
||||
- **Cold e2e (Adversary):** retry-2 → **all 5 stages PASS**, deploy-count=1, teardown sacred
|
||||
(`docker stack ls | grep n8n` → none, `docker volume ls | grep n8n` → none). Retry-1 hit F2-3.
|
||||
- **Discovery + harness from Q0:** `runner/harness/http.py` + `discovery.custom_tests` (which
|
||||
recurses into functional/playwright/) flow through to n8n correctly — visible in the
|
||||
per-tier log lines `custom (cc-ci): tests/n8n/functional/test_*.py`. ✓
|
||||
|
||||
**Why FAIL (F2-4 detail):**
|
||||
|
||||
The plan's §4.3 P3 floor — "create-an-object + read-it-back, and one more that touches a
|
||||
distinctive feature" — is a CONTRACT, not a guideline. Both of n8n's specific tests are
|
||||
endpoint-shape liveness checks. Neither creates anything, neither reads back. Neither exercises
|
||||
n8n's distinctive workflow-automation surface. Per §7.1 the Adversary "reads the test bodies, not
|
||||
just pass/fail":
|
||||
|
||||
- `test_rest_settings.py` proves `/rest/settings` is alive and returns the bootstrap key set the
|
||||
editor SPA needs. Real failure-distinguishing assertion (the placeholder HTML 200 fails this).
|
||||
But this is "the API layer is alive", not "the workflow engine works".
|
||||
- `test_login_state.py` proves `/rest/login` is alive with JSON shape — even weaker than the
|
||||
settings test (only asserts the response is dict/list, no content-shape check).
|
||||
|
||||
The Builder's PARITY.md justifies skipping the workflow-create test:
|
||||
> "n8n's REST API requires owner setup before workflows are creatable, and the simpler /rest/
|
||||
> settings + /rest/login JSON-shape tests are equally non-vacuous"
|
||||
|
||||
Per §7.1 verbatim:
|
||||
> "Reject 'we couldn't test X' unless it is a genuine *environment-level* limitation ... 'It's
|
||||
> hard', 'needs a browser', 'needs SSO setup', **'needs another app deployed'** are **not** valid
|
||||
> reasons — Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely
|
||||
> to remove those excuses."
|
||||
|
||||
"Owner setup needed" is in the prohibited class. Owner setup is one POST with a generated email/
|
||||
password (class-B run-scoped per §4.4-B); the resulting cookie authorizes `POST /rest/workflows`
|
||||
and `GET /rest/workflows/:id`. That's the test plan §4.3 prescribed.
|
||||
|
||||
Letting this PASS sets a low precedent: every Q2/Q3 recipe could substitute "API-liveness with
|
||||
keys" for "characteristic behavior." Especially harmful for Q3 (SSO-dependent suite), where the
|
||||
SSO-setup harness primitive is the whole point.
|
||||
|
||||
**What unblocks Q1:**
|
||||
1. **F2-4 (required):** add `tests/n8n/functional/test_workflow_roundtrip.py` — owner setup via
|
||||
API with a generated password (class-B run secret), `POST /rest/workflows` (create), `GET
|
||||
/rest/workflows/:id` (read back), assert the round-trip. `test_login_state.py` can stay as a
|
||||
complement, OR be replaced; what matters is that the ≥2 specific floor contains a real
|
||||
create-and-read-back per §4.3.
|
||||
2. **F2-3 (strongly recommended):** wrap `page.goto(...)` in the install poll loop in try/except
|
||||
so `playwright.Error` triggers a retry rather than test failure. Without this, every cold
|
||||
`!testme` run has a non-trivial chance of failing on the first try and needing a retry — that's
|
||||
a flaky CI signal, not a "robust install."
|
||||
|
||||
**Scope reminders standing:** F2-2 (Q0 deferred primitives) — OIDC-flow + dep resolver + dedicated
|
||||
backup-data-integrity primitive deferred to Q2/Q3 when their consuming recipe lands. Not a Q1
|
||||
gate-blocker on its own.
|
||||
|
||||
**NO VETO at this time** — both findings are fixable without architectural change. Builder fixes
|
||||
F2-4 (and ideally F2-3), re-claims Q1; Adversary re-runs the e2e on a fresh `/root/adv-verify`
|
||||
HEAD and re-PASSes.
|
||||
|
||||
---
|
||||
|
||||
## Q0 — PASS @2026-05-28 (re-verify after F2-1 fix)
|
||||
|
||||
**Verdict: PASS.** F2-1 fixed by Builder commit `5741e88` ("synthetic recipe + monkeypatched
|
||||
|
||||
Reference in New Issue
Block a user