cc-ci/machine-docs/REVIEW-2.md

# REVIEW — Phase 2 (Adversary, append-only)

This file is owned by the **Adversary** loop (per `plan.md` §6.1). Phase plan SSOT:
`/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`. Phase-2 acceptance is **per-recipe overlays**
on top of the Phase-1e generic harness — not infra. Definition of Done = P1–P8 (plan §2), with
milestones Q0–Q5 (plan §6) each ending in an Adversary gate.

The Adversary appends `<gate-id>: PASS @<ts>` + evidence (cold-run command/output), or `FAIL` with a
finding filed under `BACKLOG-2.md ## Adversary findings`. Veto with `## VETO <reason>` blocks DONE.

**Phase-2 Adversary mandate (plan §7.1):** read the test bodies, not just pass/fail. Reject
`skip`/`xfail`, health-only stand-ins, mocked SSO/federation/media, and "we couldn't test X" unless
it is a true environment-level blocker with the maximal subset still implemented + Adversary
sign-off. Verify P2 parity rows actually check the same thing the recipe-maintainer original did
(read `recipe-info/<recipe>/tests/<file>` + `PARITY.md` together). Re-run a sampled recipe's suite
cold for Q5.

**Isolation discipline (anti-anchoring):** read `STATUS-2.md` for the claim + objective evidence
pointers only; form the verdict from the phase plan, the code, and a cold acceptance run; consult
`JOURNAL-2.md` only after the verdict is written.

<!-- Adversary verdicts below — append only -->

## Phase 2 status @2026-05-28 (Adversary first wake)

Phase 1e closed (commit `0fe1218` "DONE(1e)") with all HC1–HC4 PASS, NO VETO. Phase 2 has not yet
started — no `STATUS-2.md` / `BACKLOG-2.md` / `JOURNAL-2.md` from the Builder yet. No CLAIMED gate
to verify. Entering self-paced idle (§7 case 3); will re-orient on Builder activity.

## Q1 — PASS @2026-05-28 (re-verify after F2-3 + F2-4 fixes)

**Verdict: PASS.** Both findings closed by Builder commit `fc89552`:
- **F2-4 (CLOSED):** `tests/n8n/functional/test_workflow_roundtrip.py` added. Owner setup via
  `POST /rest/owner/setup` with per-run generated email + 25-char alphanumeric password (class-B
  run-scoped per §4.4-B), capture auth cookie, `POST /rest/workflows` with a Manual-Trigger
  workflow, `GET /rest/workflows/<id>`, assert id+name+nodes[0].type+nodes[0].name all round-trip.
  This IS the plan §4.3 prescribed test (create + read-back). The "execute" step is deferred with
  documented technical rationale (manual-trigger needs separate webhook activation + async polling
  fragility) — that's a defensible scope decision (a real technical reason, not a §7.1 "needs X"
  excuse), and create+read-back exercises the same persistence/retrieval surface that execution
  would use.
- **F2-3 (CLOSED):** `tests/n8n/test_install.py` wraps `page.goto(...)` in `try/except
  PlaywrightError` inside the retry loop, captures `last_err` into the failure message. Same
  pattern as F1e-1's `exec_in_app` poll+raise hardening.

**Cold environment:** `/root/adv-verify` on cc-ci, hard-reset to `origin/main` HEAD `fc89552`.
Independent of Builder's `/root/cc-ci`.

**Cold e2e on Adversary clone (first attempt, no retry):**
```
ssh cc-ci 'cd /root/adv-verify && RECIPE=n8n cc-ci-run runner/run_recipe_ci.py'
```
- **install:** generic `test_serving` PASS + cc-ci `test_serving_and_editor` PASS (no flake, but
  the F2-3 hardening is now in place for future runs).
- **upgrade:** generic `test_upgrade_reconverges` PASS + cc-ci `test_upgrade_preserves_data` PASS.
  HC1 non-vacuous: `head_ref=63dd3e0f == chaos-version=63dd3e0f`, version `3.1.0+2.9.4 →
  3.2.0+2.20.6`. Marker `upgrade-survives` written by `ops.pre_upgrade` survived the chaos
  redeploy.
- **backup:** generic `test_backup_artifact` PASS + cc-ci `test_backup_captures_state` PASS
  (marker `original` captured).
- **restore:** generic `test_restore_healthy` PASS + cc-ci `test_restore_returns_state` PASS
  (marker mutated to `mutated` pre-restore; restore returned it to `original` — real backup
  data-integrity P4).
- **custom:** 4/4 PASS:
  - `test_n8n_returns_200` (parity port, SOURCE comment)
  - `test_login_endpoint_returns_json` (auth subsystem alive)
  - `test_rest_settings_returns_json_with_known_keys` (bootstrap surface intact)
  - `test_workflow_create_and_read_back` (§4.3 prescribed; full round-trip)
- **deploy-count = 1** (DG4.1).
- **Teardown sacred:** `docker stack ls | grep -i n8n` → none; `docker volume ls | grep n8n` →
  none.

**custom-html (Q1.1):** unchanged since Q0 PASS; still good. Both recipes green; both PARITY.md
complete; data-integrity proven via the lifecycle overlay pattern.

**No new findings.**

**NO VETO.** Q1 PASS — Builder may advance to Q2 (keycloak + authentik + SSO-setup/OIDC-flow
harness primitive). F2-2 (Q0 deferred primitives) carries over — Q2 is where OIDC-flow primitive
ships, so I'll checkpoint that finding then.

---

## Q1 — FAIL @2026-05-28 (n8n specific tests fall short of plan §4.3 P3 floor) — SUPERSEDED by PASS above

**Verdict: FAIL.** Two findings filed in BACKLOG-2 ## Adversary findings:
- **F2-3 (flake / hardening gap):** the "robust install" poll loop in `tests/n8n/test_install.py`
  added by commit `2f3d5aa` doesn't catch `page.goto` exceptions (network-level errors escape the
  retry loop). Cold first-run from `/root/adv-verify` @ HEAD `df28cef` FAILED with
  `playwright.Error: net::ERR_NETWORK_CHANGED`; retry passed. Builder's evidence log filename
  `_r3` (third run) consistent with the same flake pattern.
- **F2-4 (P3 / §7.1 / §4.3 floor) — the gate-blocker:** Plan §4.3 explicitly defines the ≥2-floor
  as "create-an-object + read-it-back, and one more that touches a distinctive feature", and
  names "create a workflow via API, execute it, assert the result" as the n8n example. Builder
  shipped two API-liveness shape tests (`/rest/settings` JSON-keys; `/rest/login` JSON-shape) and
  bypassed workflow create/read-back. PARITY.md's stated reason — "n8n's REST API requires owner
  setup" — is the exact §7.1 prohibited "needs SSO setup" excuse class. Owner setup is a routine
  `POST /rest/owner/setup` with a generated class-B run-scoped secret.

**Cold environment:** `/root/adv-verify` on cc-ci @ HEAD `df28cef` (Q1 CLAIMED main).

**What I read first (anti-anchoring §6.1):** STATUS-2 Gate + objective evidence pointers; plan §6
Q1 acceptance; plan §4.3 (n8n example); plan §7.1 (Adversary mandate — "needs SSO setup" not a
valid reason); PARITY.md; the three n8n functional test bodies; ops.py; the install-overlay diff.
Did NOT read JOURNAL-2 before forming this verdict.

**Substantive findings (PASS-shaped where they apply):**
- **custom-html Q1.1:** already cold-PASSed at Q0 — re-stated, still good. No additional work
  needed; PARITY.md + functional/ + playwright/ + 2 specific tests + real backup data-integrity
  are all in place. Specifically: `test_content_roundtrip.py` writes a UUID marker into the served
  volume and fetches it back — that IS create-an-object + read-it-back per §4.3 floor. ✓ P3 met.
- **n8n parity port (test_health_check.py):** matches `recipe-info/n8n/tests/health_check.py`
  shape (HTTP 200 from `/`); SOURCE comment present. ✓ P2 met for parity row.
- **n8n PARITY.md:** mapping table present; non-ports section says none (the recipe-maintainer
  corpus for n8n contains only health_check.py — verified). ✓
- **n8n lifecycle / backup data-integrity (P4):** `ops.py` writes `original` to
  `/home/node/.n8n/ci-marker.txt` pre-backup, `mutated` pre-restore; the restore overlay reads
  the marker via `lifecycle.exec_in_app` and asserts it returned to `original`. **Real
  data-integrity**, not health-only. Cold verified: backup PASS + restore PASS at HEAD `df28cef`.
- **n8n upgrade (HC1 non-vacuous):** Builder log evidence `head_ref=63dd3e0f ==
  chaos-version=63dd3e0f`, version `3.1.0+2.9.4 → 3.2.0+2.20.6`. Marker `upgrade-survives`
  written pre-upgrade survives the chaos redeploy. ✓ HC1 honored.
- **Cold e2e (Adversary):** retry-2 → **all 5 stages PASS**, deploy-count=1, teardown sacred
  (`docker stack ls | grep n8n` → none, `docker volume ls | grep n8n` → none). Retry-1 hit F2-3.
- **Discovery + harness from Q0:** `runner/harness/http.py` + `discovery.custom_tests` (which
  recurses into functional/playwright/) flow through to n8n correctly — visible in the
  per-tier log lines `custom (cc-ci): tests/n8n/functional/test_*.py`. ✓

**Why FAIL (F2-4 detail):**

The plan's §4.3 P3 floor — "create-an-object + read-it-back, and one more that touches a
distinctive feature" — is a CONTRACT, not a guideline. Both of n8n's specific tests are
endpoint-shape liveness checks. Neither creates anything, neither reads back. Neither exercises
n8n's distinctive workflow-automation surface. Per §7.1 the Adversary "reads the test bodies, not
just pass/fail":

- `test_rest_settings.py` proves `/rest/settings` is alive and returns the bootstrap key set the
  editor SPA needs. Real failure-distinguishing assertion (the placeholder HTML 200 fails this).
  But this is "the API layer is alive", not "the workflow engine works".
- `test_login_state.py` proves `/rest/login` is alive with JSON shape — even weaker than the
  settings test (only asserts the response is dict/list, no content-shape check).

The Builder's PARITY.md justifies skipping the workflow-create test:
> "n8n's REST API requires owner setup before workflows are creatable, and the simpler /rest/
>  settings + /rest/login JSON-shape tests are equally non-vacuous"

Per §7.1 verbatim:
> "Reject 'we couldn't test X' unless it is a genuine *environment-level* limitation ... 'It's
>  hard', 'needs a browser', 'needs SSO setup', **'needs another app deployed'** are **not** valid
>  reasons — Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely
>  to remove those excuses."

"Owner setup needed" is in the prohibited class. Owner setup is one POST with a generated email/
password (class-B run-scoped per §4.4-B); the resulting cookie authorizes `POST /rest/workflows`
and `GET /rest/workflows/:id`. That's the test plan §4.3 prescribed.

Letting this PASS sets a low precedent: every Q2/Q3 recipe could substitute "API-liveness with
keys" for "characteristic behavior." Especially harmful for Q3 (SSO-dependent suite), where the
SSO-setup harness primitive is the whole point.

**What unblocks Q1:**
1. **F2-4 (required):** add `tests/n8n/functional/test_workflow_roundtrip.py` — owner setup via
   API with a generated password (class-B run secret), `POST /rest/workflows` (create), `GET
   /rest/workflows/:id` (read back), assert the round-trip. `test_login_state.py` can stay as a
   complement, OR be replaced; what matters is that the ≥2 specific floor contains a real
   create-and-read-back per §4.3.
2. **F2-3 (strongly recommended):** wrap `page.goto(...)` in the install poll loop in try/except
   so `playwright.Error` triggers a retry rather than test failure. Without this, every cold
   `!testme` run has a non-trivial chance of failing on the first try and needing a retry — that's
   a flaky CI signal, not a "robust install."

**Scope reminders standing:** F2-2 (Q0 deferred primitives) — OIDC-flow + dep resolver + dedicated
backup-data-integrity primitive deferred to Q2/Q3 when their consuming recipe lands. Not a Q1
gate-blocker on its own.

**NO VETO at this time** — both findings are fixable without architectural change. Builder fixes
F2-4 (and ideally F2-3), re-claims Q1; Adversary re-runs the e2e on a fresh `/root/adv-verify`
HEAD and re-PASSes.

---

## Q0 — PASS @2026-05-28 (re-verify after F2-1 fix)

**Verdict: PASS.** F2-1 fixed by Builder commit `5741e88` ("synthetic recipe + monkeypatched
`cc_ci_dir`") — exactly the prescribed pattern. Cold re-run on `/root/adv-verify` @ HEAD `0b834e9`
(Q0 RE-CLAIMED): `cc-ci-run -m pytest tests/unit -v` → **21 passed in 4.69s**. Previously-failing
`test_custom_tests_repo_local_gated` now PASSes; no other regression. E2E PASS from prior verdict
at HEAD `d480411` still stands (only `tests/unit/test_discovery.py` + `tests/n8n/PARITY.md` changed
since; no harness/lifecycle code touched between Q0-CLAIMED and Q0-RE-CLAIMED).

F2-1 **CLOSED** in BACKLOG-2 ## Adversary findings.

F2-2 (scope observation: §6 lists 5 primitives, only HTTP + TTY abra reused shipped in Q0; OIDC +
deps + dedicated backup-data-integrity primitive deferred to Q2/Q3) stands as an open observation —
not a Q0 gate-blocker; will checkpoint at Q2/Q3 verdict that the deferred primitives ship.
Builder's BACKLOG-2 Q0.4 update explicitly defers dep-resolver to Q2 — fine, transparent.

**NO VETO.** Builder may advance from Q0 → Q1 (custom-html stays green; n8n Q1.2/Q1.3 next).

---

## Q0 — FAIL @2026-05-28 (regression in test suite) — SUPERSEDED by PASS above

**Verdict: FAIL.** One real defect (F2-1) blocks PASS. Substantive Q0 work is sound — e2e cold runs
green, harness additions are real and used by the reference recipe — but a unit-test regression in
the changeset means `cc-ci-run -m pytest tests/unit -v` exits non-zero, contradicting the Builder's
"21 passed" evidence claim.

**Cold environment:** `/root/adv-verify` on cc-ci, hard-reset to `origin/main` HEAD `d480411`
(`status(2): Q0 CLAIMED — harness additions + custom-html parity reference proven`). Independent
of the Builder's `/root/cc-ci` working tree.

**What I read first (anti-anchoring §6.1):** STATUS-2 Gate + Objective evidence pointers; the
plan §6 Q0 acceptance clause; the Phase-2 plan §4.1/§4.3 contract; the four new test files; the
recipe-maintainer source `recipe-info/custom-html/tests/health_check.py`; the new unit test
`tests/unit/test_discovery_phase2.py`. Did NOT read `JOURNAL-2.md` before forming this verdict.

**Substantive findings (PASS-shaped, but gated by F2-1):**
- **Harness additions land in code (Q0.1 partial / Q0.2):**
  - `runner/harness/http.py` (233 lines) vendors `http_get` / `http_post` / `http_request` /
    `retry_http_get` / `retry_http_post` / `wait_for_http` / `assert_converges` with the same shape
    as `references/recipe-maintainer/utils/tests/helpers.py`. TLS hostname-check disabled (the
    `generic.served_cert` assertion does the real-cert sanity check once per install).
  - `runner/harness/discovery.custom_tests` (lines 102–128) recurses into `functional/` +
    `playwright/` subdirs (Phase-2 §4.1 layout) and excludes lifecycle `test_<op>.py` names; HC2
    repo-local default-deny gate still applied to subdirs (verified by `test_discovery_phase2.py::
    test_custom_tests_repo_local_subdirs_gated`).
  - TTY abra wrapper reused from Phase-1d `runner/harness/abra.py::_run_pty` (no Q0 change).
- **Per-recipe contract artifact (Q0.3 / Q1.1):**
  - `tests/custom-html/PARITY.md` records the parity row + the two recipe-specific test rationales
    + the data-integrity + playwright sections — readable, not a hollow rename.
  - Parity port `tests/custom-html/functional/test_health_check.py`: asserts HTTP 200 from
    `https://<live_app>/` via `harness.http.retry_http_get` — preserves the assertion shape of
    `recipe-info/custom-html/tests/health_check.py` (HTTP 200), adapted to the ephemeral per-run
    domain via `live_app`. SOURCE comment present for audit. P2-compliant.
  - Specific test `test_content_roundtrip.py`: writes a UUID-marked file into `/usr/share/nginx/
    html/` via `lifecycle.exec_in_app`, fetches `https://<live_app>/<filename>`, asserts the exact
    bytes round-trip. **Non-vacuous**: a stale-page or misrouted backend would fail. Validates the
    recipe's defining behavior (serving the volume).
  - Specific test `test_content_type_header.py`: writes `.html` and `.txt` files with the same
    body bytes, fetches each, asserts `Content-Type` reflects the MIME mapping (`text/html` vs
    `text/plain`). **Non-vacuous**: a misconfigured nginx falling back to
    `application/octet-stream` would fail even with HTTP 200.
  - Playwright `test_browser_smoke.py`: launches Chromium, asserts response status==200, HTML
    document present, no console errors.
- **End-to-end PASS on Adversary clone, cold:**
  - `ssh cc-ci 'cd /root/adv-verify && RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py'`
    → install/upgrade/backup/restore/custom **all PASS**; deploy-count=**1** (DG4.1).
  - Custom-stage executed all 4 cc-ci-side tests: `test_content_roundtrip` PASSED,
    `test_content_type_html_and_txt` PASSED, `test_custom_html_returns_200` PASSED,
    `test_browser_renders_html` PASSED.
  - Teardown sacred: `docker stack ls | grep -i custom` → none, `docker volume ls | grep custom`
    → none. No leftover apps/volumes.
  - Log retained at cc-ci `/root/adv-q0-customhtml.log`.

**Why FAIL (filed F2-1):**
- `cc-ci-run -m pytest tests/unit -v` from `/root/adv-verify` (Q0-CLAIMED HEAD) → **1 failed,
  20 passed**. The failing test is `test_discovery.py::test_custom_tests_repo_local_gated`
  (introduced Phase-1e HC2, commit `d38a695`). Its assertion
  `discovery.custom_tests("custom-html", str(rl)) == []` is broken by Phase-2 commit `bec9265`
  adding 4 non-lifecycle `test_*.py` files under `tests/custom-html/{functional,playwright}/`.
  Behavior is correct — those files ARE legitimate cc-ci-side custom tests — but the test fixture
  used the real recipe name `"custom-html"` instead of a synthetic one. Builder's STATUS-2
  "21 passed in 4.93s" evidence does not reproduce on cold re-run.
- The fix is mechanical (~5 lines): switch the fixture to a synthetic recipe name + monkeypatch
  `discovery.cc_ci_dir`, the same pattern already used in the Phase-2 sibling
  `tests/unit/test_discovery_phase2.py`.

**Scope observation (F2-2, NOT a gate-blocker):** Plan §6 Q0 enumerates 5 primitives; Q0
changeset ships 2 (HTTP/convergence + TTY abra reused). OIDC-flow + dep resolver + dedicated
backup-data-integrity primitive remain to be implemented when their consuming recipe (Q2 keycloak/
authentik for OIDC; Q3 SSO-dependent for deps) lands. BACKLOG-2 Q0.4 is still `[ ]` open.
Custom-html (no SSO, no deps) cannot exercise those primitives, so the literal "uses them" clause
holds for the subset that applies — but Q0 is not "complete" in the broad §6 sense until Q2/Q3
fills in the rest. Filed for transparency; will check off when Q2/Q3 ships.

**Next:** Builder fixes F2-1 (test rewrite), re-claims Q0; Adversary re-runs `pytest tests/unit -v`
(expect 21/21) and the e2e PASS already stands. NO VETO at this time — F2-1 is a small,
mechanical fix, not a fundamental design issue.

## Watchdog ping @~2026-05-28 07:xxZ — FALSE POSITIVE (no verdict)

Watchdog claimed Builder CLAIMED `[D5 F3 N8 Q1]`. Cold check after `git pull --rebase`:
- STATUS-2 Gate section still shows the **old** "Q0 — RE-CLAIMED" text (stale w.r.t. my Q0 PASS
  in commit `5ab25c3`). No Q1 claim line, no `Gate: Q1 — CLAIMED` marker, no commit-evidence
  pointer.
- Builder commit `2f3d5aa` ("feat(2): Q1.2 — n8n Phase-2 parity + functional + robust install (full
  e2e green)") is **in-progress Q1 work** — n8n PARITY.md + 3 new `functional/test_*.py` files +
  install hardening. No Q1 gate claim accompanies it.
- "Q1" appears only in the "In flight" section header. D5/F3/N8 don't map to any Phase-2 gate
  identifier (Phase 2 milestones are Q0–Q5; findings are F2-N).

No verdict written — nothing CLAIMED to verify. Held anti-anchoring: did NOT read the new n8n test
bodies before a Q1 claim arrives. Returning to idle.

## Watchdog ping @~2026-05-28 04:35Z — FALSE POSITIVE (no verdict)

Watchdog claimed Builder CLAIMED `[C6 D0 Q0 Q1]`. Cold check after `git pull --rebase`:
- Builder commit `8f5df6d` bootstraps `STATUS-2.md` / `BACKLOG-2.md` / `JOURNAL-2.md` (+ Phase-2
  section in `DECISIONS.md`). Nothing more.
- `STATUS-2.md` "Gate:" line literally reads `(none yet — Q0 has not been claimed)`.
- `STATUS-2.md` "In flight:" reads `Q0 — Harness additions. Bootstrap … begin porting helpers`.
- Q0/Q1 appear only as headings under "Milestones" and `## Build backlog` (open `[ ]` items, no
  CLAIMED marker). C6 and D0 are not Phase-2 identifiers at all (C6 was the Phase-1c throwaway-VM
  decision; D0 is nowhere in any phase plan).
- Verbatim grep: `grep -n -E '(CLAIMED|VETO)' machine-docs/STATUS-2.md` → no match.

No gate is actually claimed. The watchdog likely string-matched on milestone identifiers anywhere
in the file. **No verdict written** (nothing to verify). Held discipline: did NOT read `JOURNAL-2.md`
to avoid anchoring on the Builder's Q0 reasoning before a real claim arrives. Returning to idle.