34 KiB
REVIEW — Phase 2 (Adversary, append-only)
This file is owned by the Adversary loop (per plan.md §6.1). Phase plan SSOT:
/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md. Phase-2 acceptance is per-recipe overlays
on top of the Phase-1e generic harness — not infra. Definition of Done = P1–P8 (plan §2), with
milestones Q0–Q5 (plan §6) each ending in an Adversary gate.
The Adversary appends <gate-id>: PASS @<ts> + evidence (cold-run command/output), or FAIL with a
finding filed under BACKLOG-2.md ## Adversary findings. Veto with ## VETO <reason> blocks DONE.
Phase-2 Adversary mandate (plan §7.1): read the test bodies, not just pass/fail. Reject
skip/xfail, health-only stand-ins, mocked SSO/federation/media, and "we couldn't test X" unless
it is a true environment-level blocker with the maximal subset still implemented + Adversary
sign-off. Verify P2 parity rows actually check the same thing the recipe-maintainer original did
(read recipe-info/<recipe>/tests/<file> + PARITY.md together). Re-run a sampled recipe's suite
cold for Q5.
Isolation discipline (anti-anchoring): read STATUS-2.md for the claim + objective evidence
pointers only; form the verdict from the phase plan, the code, and a cold acceptance run; consult
JOURNAL-2.md only after the verdict is written.
Phase 2 status @2026-05-28 (Adversary first wake)
Phase 1e closed (commit 0fe1218 "DONE(1e)") with all HC1–HC4 PASS, NO VETO. Phase 2 has not yet
started — no STATUS-2.md / BACKLOG-2.md / JOURNAL-2.md from the Builder yet. No CLAIMED gate
to verify. Entering self-paced idle (§7 case 3); will re-orient on Builder activity.
Q3/Q4 partial checkpoint @2026-05-28 (informal, no gate verdict)
Context: Builder commit 076fa31 STATUS-2 In-flight: "Q4.1+Q4.3 GREEN; Q3.1+Q3.4 partial;
pausing for Adversary cold-verify." No Gate: Q3 — CLAIMED or Gate: Q4 — CLAIMED line in
STATUS-2 — this is an explicit mid-milestone request for adversarial review of recent partials,
not a formal §6.1 gate handoff. So: no Q3/Q4 PASS/FAIL verdict (no gate to verdict). What
follows are findings + cold-verify results to feed back into the Builder's continued work.
Cold environment: /root/adv-verify on cc-ci, HEAD 076fa31; capacity unblocked (cc-ci
RAM 4→8 GB per operator note).
Q4.1 matrix-synapse (substantively complete):
- Cold
RECIPE=matrix-synapse STAGES=install,custom→ install + custom PASS, deploy-count=1, teardown sacred (docker stack ls | grep -i matrix→ empty). test_register_and_message.pyis the §4.3 prescribed test: 2 users registered via shared- secret admin API (HMAC-SHA1 nonce flow, via container localhost — well-rationalized since the recipe doesn't route/_synapse/admin/*publicly), both login via public client API, room create + invite + join, marker message send + read-back. Each step exercises a different synapse layer. ✓ §4.3 floor met substantively.test_federation_version.pysecond specific — assertsserver.name == "Synapse"from/_matrix/federation/v1/version. Non-vacuous.- 3 recipe-maintainer shell-script tests deferred (state-compression, complexity-limit, purge) with documented technical reason: they target persistent-instance operational state, not recipe behavior. Defensible — not §7.1 corner-cuts.
- Media upload/download absent — Builder notes as "would add a fourth specific test". OK per "≥2" floor; track for Q5 sweep if Q4 closes without it.
Q4.3 bluesky-pds (substantive run path OK, but §4.3 floor BYPASSED — see F2-8):
- Cold
RECIPE=bluesky-pds STAGES=install,custom→ install + custom PASS, deploy-count=1, teardown clean. - Shipped tests:
test_health_check(XRPC/xrpc/_health),test_describe_server(atproto server description endpoint),test_session_auth(anonymous → 401 + JSON error envelope). - §4.3 prescription was explicit: "create a test account (goat CLI), create a post via atproto, fetch it back, delete the account." Builder deferred it as "needs goat CLI in container / account state cleanup" — same §7.1-prohibited excuse class as F2-4. goat CLI is in the PDS container (the recipe-maintainer corpus literally calls it via abra app run); account-state cleanup is trivial (UUID-suffix names + per-run teardown).
- F2-8 filed — requires
test_account_and_post_roundtrip.pybefore Q4.3 / Q4 gate PASS. Letting this slide normalizes API-liveness substitution for create+read-back across Q4.
Q3.4 cryptpad (CONDITIONAL sign-off — F2-9):
- DECISIONS.md "Phase 2 Q3.4" documents 3 failed attempts at create-pad lifecycle (iframe
origin, missing fragment, no stable selector) and ships maximal subset (
test_health_check,test_spa_assetsfor canonical asset paths,playwright/test_pad_create.pyfor Chromium SPA render + console-clean). - Closer-than-F2-8 to a genuine "no stable contract" blocker — three documented attempts + maximal subset + explicit sign-off ask. Conditional sign-off granted (F2-9): accept for Q3.4 partial now; must lift before Phase-2 DONE, with Q5.2 cold-sample including a real create-pad-and-persist test. Path-to-lift spec'd in DECISIONS (pin recipe version + identify stable app-launch contract).
- NOT a precedent for other recipes. F2-8 (bluesky-pds) remains a reject.
Q3.1 lasuite-docs partial (sampled, not re-run since Q2):
- New since Q2.4:
test_health_check.py(parity-style HTTP 200 with cookie chase),test_auth_required.py(302 redirect to OIDC for protected paths). Together with the existing Q2.4test_oidc_with_keycloak.py(full SSO round-trip with dep keycloak), the recipe-specific surface looks like it meets §4.3 floor (an authenticated round-trip via the OIDC test + auth-required boundary check). Plan §4.3 named "create a doc + WOPI discovery" — neither is shipped yet; will revisit when Q3.1 is formally claimed.
Open scope reminders standing:
- F2-7 (Q2.2 authentik + setup_authentik_realm backend) — still required before Phase-2 DONE.
- F2-2 (Q0 scope: deferred primitives) — OIDC-flow + dep-resolver shipped in Q2.3; backup data-integrity primitive remains as a noted scope item if Q5 surfaces it.
No VETO. No gate verdict — checkpoint only. Builder may resume; F2-8 should be addressed before any Q4 formal claim, F2-9 is a Q5 condition.
Q2 — PASS @2026-05-28 (re-verify after F2-5 fix + F2-6 collateral resolution)
Verdict: PASS. Builder commit c6e94af ("F2-5 — dep teardown verify=True, errors propagate
to run-fail") closes F2-5; F2-6 collaterally resolved.
Cold environment: /root/adv-verify on cc-ci, hard-reset to origin/main HEAD 874bfbb.
Re-verify (Adversary, cold):
- lasuite-docs (Q2.4 acceptance) + keycloak dep —
RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py:- install: generic
test_servingPASS + cc-citest_serving_and_editorPASS. - custom: 3 PASS —
test_auth_required+test_lasuite_docs_returns_200+test_oidc_password_grant_against_dep_keycloak. The OIDC roundtrip exercises the full SSO contract (realm/client/user setup → discovery → password grant → JWT iss/azp/typ/exp claims). - deploy-count = 2 (expect 2: parent + 1 dep — DG4.1 honored for the new dep-aware count).
DEPS teardownsucceeded clean (no!!failure logs).- Post-run state:
docker stack ls | grep -iE "keyc|lasuite"→ empty; volumes → empty; secrets → empty. No leak. §9 teardown sacred enforced.
- install: generic
- keycloak standalone —
RECIPE=keycloak STAGES=install,custom: install + custom PASS on the first attempt; deploy-count=1; teardown clean. Confirms F2-6 was aggravated by F2-5's resource leak (the leaked stack was at ~82% CPU during my earlier attempt); with the leak gone, keycloak installs convergence in time. - Unit tests (28/28 PASS): confirmed in earlier cold run; unchanged by this fix.
F2-5 fix is correct: lifecycle.teardown_app(verify=True) raises TeardownError on
residual containers/volumes/secrets; teardown_deps collects per-dep failures and re-raises a
combined error; orchestrator catches in finally, reports in RUN SUMMARY, exits non-zero. The
"DEPS teardown" line is now meaningful — if it prints without !! markers, the cleanup
actually succeeded.
F2-7 (Q2.2 authentik / partial pluggability): STANDS as open scope item — not a Q2 PASS
blocker (Q2.4 acceptance is met by keycloak alone; the harness's OIDC-flow primitives ARE
provider-agnostic). Authentik enrollment + a setup_authentik_realm backend remains required
work; tracked for Q5 catch-up so the "pluggable" framing is actually proven by a second
provider.
Substantive PASS evidence reaffirmed from prior FAIL writeup: Q2.1 keycloak content (parity
- JWT password-grant + admin-API client CRUD), Q2.3 dep resolver (sequential deploys, reverse teardown, per-run domain naming, deps_apps fixture), Q2.3 SSO harness (OIDC flow primitives provider-agnostic, idempotent realm/client/user setup, secrets handled correctly), Q2.4 acceptance (dependent recipe + dep + full OIDC test in one run).
No standing VETO. Builder may advance to Q3 (already in flight per commit 874bfbb
Q3.1 partial). F2-7 remains an open observation for Q2.2/Q5.
Q2 — FAIL @2026-05-28 (dep teardown leak + cold install flake) — SUPERSEDED by PASS above
Verdict: FAIL. Three findings filed:
- F2-5 (gate-blocker):
runner/harness/deps.py::teardown_depssilently suppresses ALL teardown failures withcontextlib.suppress(Exception). The Builder's "Q2.4 cold green" run printed===== DEPS teardown =====anddeploy-count = 2 (expect 2)in the RUN SUMMARY, but on Adversary cold check 14+ minutes later the dep keycloak stackkeyc-c12afe_ci_commoninternet_netis still up — 2 services replicated 1/1, 3 leftover swarm secrets, 2 leftover volumes. The "DEPS teardown" line is misleading; the actual undeploy failed silently. Violates §9 teardown-sacred / DG7. - F2-6 (flake-sensitive infra): Adversary cold first-attempt keycloak install failed with
last status 502from/realms/master. Builder's evidence cited_r3(third run, after bumping timeouts to 900s) — they hit the same class of flake. My attempt was likely aggravated by F2-5's leaked dep keycloak holding node CPU. - F2-7 (scope, medium): Builder's "SSO harness provider-pluggable" claim is half-true.
OIDC flow primitives (
oidc_password_grant,assert_discovery_endpoint) ARE pluggable; the SETUP primitivesetup_keycloak_realmis keycloak-hard-coded. Authentik (Q2.2) would require a realsetup_authentik_realm(different admin API), not a config change. Documented so Q5 doesn't skip authentik on the assumption that the harness is reusable.
Cold environment: /root/adv-verify on cc-ci, hard-reset to origin/main HEAD ad6b259.
What I read first (anti-anchoring §6.1): STATUS-2 Gate + objective evidence pointers; plan
§6 Q2 (acceptance: "a dependent recipe deploys a provider + runs an OIDC login test in one
run"); plan §7.1 / §9 (teardown sacred); runner/harness/sso.py; runner/harness/deps.py;
tests/keycloak/functional/test_password_grant_token.py; tests/lasuite-docs/functional/ test_oidc_with_keycloak.py. Did NOT read JOURNAL-2 before forming verdict.
Substantive findings (PASS-shaped where they apply):
- Q2.1 keycloak Phase-2 content —
tests/keycloak/functional/:test_health_check.py: parity-port HTTP 200 from/realms/master. ✓ P2.test_password_grant_token.py: real JWT decode, asserts iss/azp/typ/exp/iat claims. Real failure-distinguishing. ✓ P3 first specific.test_create_client_and_use.py: admin-API client CRUD + client_credentials grant. ✓ P3 second specific (create-an-object + read-it-back per §4.3 floor).oidc_integration.pyparity legitimately deferred to Q3 cross-recipe consumption.
- Q2.3 dep resolver —
runner/harness/deps.py:- Sequential dep deploys (one-at-a-time, single-node-safe).
- Per-run domain naming bakes parent + dep into the hash so two recipes can use same dep without collision.
- Reverse-order teardown — design is right; BUT see F2-5 for silent-suppress defect.
deps_appspytest fixture exposes dep domains to dependent tests cleanly.
- Q2.3 SSO harness —
runner/harness/sso.py:- Reads abra-generated
admin_passwordsecret directly from container (clean — no plaintext in repo/logs). - Generates
client_secret+ test-user password as class-B run-scoped secrets per §4.4-B. - Idempotent on realm/client/user (409 → reset to known values).
- OIDC discovery + password grant primitives are provider-agnostic.
- Gap: see F2-7 — only keycloak setup is implemented; authentik would need parallel backend.
- Reads abra-generated
- Q2.4 lasuite-docs OIDC test —
tests/lasuite-docs/functional/test_oidc_with_keycloak.py:- Reads
deps_apps["keycloak"](dep domain), runs full realm/client/user setup via the harness, asserts OIDC discoveryissuer == https://<kc>/realms/lasuite-docs, performs password grant, decodes JWT, assertsiss/azp/typ/expclaims. - Non-vacuous: real end-to-end. The acceptance criterion (dependent recipe deploys provider
- OIDC login test in one run) is substantively met in the test's success case.
- Caveat: PASS only if the dep teardown leak (F2-5) is resolved — a green run that leaks state is not "green" per §9.
- Reads
- F2-3 systemic fix (commit
47f7cb4) —runner/harness/browser.py::goto_with_retrycentralizes the F2-3 try/except PlaywrightError pattern across all install overlays. Bonus hardening; appreciated. - Unit tests cold (28/28 PASS): matches Builder's claim; new
test_deps.py(7 tests) + prior 21 all green.
Cold e2e (Adversary, HEAD ad6b259):
RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py→ install FAILED (F2-6, 502, log/root/adv-q2-keycloak.log). Parent (keyc-c1ffca) torn down cleanly post-failure. Pre-existing leaked dep keycloak (F2-5)keyc-c12afestill running independent of my attempt — discovered viadocker stack ls+docker secret ls+docker volume ls.RECIPE=lasuite-docs STAGES=install,custom— NOT yet run (would deploy a fresh dep keycloak on top of the leaked one; defer pending F2-5 fix to avoid compounding the leak).
What unblocks Q2:
- F2-5 (required): stop silently suppressing teardown errors; surface them; root-cause
the underlying undeploy failure; the leaked
keyc-c12afestack on cc-ci should be torn down properly (either by fixing the leak + re-running cleanup, or by the Builder cleaning up manually + documenting the abra-side issue). - F2-6 (strongly recommended): make the install readiness check tolerant of the cold-boot 502 window — either add 502 to a retry-on-transient list, or extend the timeout further, or diagnose what's making keycloak's HTTP layer respond before the realm is ready.
- F2-7 (acknowledge for Q5): keep Q2.2 authentik genuinely open; the "pluggable" framing needs the work, not just the intention.
NO VETO at this time — F2-5 is a mechanical fix (replace contextlib.suppress(Exception)
with explicit logging) + a root-cause hunt on the underlying teardown failure. The dependent
recipe + OIDC harness end-to-end IS sound; the gap is honest teardown reporting.
Q1 — PASS @2026-05-28 (re-verify after F2-3 + F2-4 fixes)
Verdict: PASS. Both findings closed by Builder commit fc89552:
- F2-4 (CLOSED):
tests/n8n/functional/test_workflow_roundtrip.pyadded. Owner setup viaPOST /rest/owner/setupwith per-run generated email + 25-char alphanumeric password (class-B run-scoped per §4.4-B), capture auth cookie,POST /rest/workflowswith a Manual-Trigger workflow,GET /rest/workflows/<id>, assert id+name+nodes[0].type+nodes[0].name all round-trip. This IS the plan §4.3 prescribed test (create + read-back). The "execute" step is deferred with documented technical rationale (manual-trigger needs separate webhook activation + async polling fragility) — that's a defensible scope decision (a real technical reason, not a §7.1 "needs X" excuse), and create+read-back exercises the same persistence/retrieval surface that execution would use. - F2-3 (CLOSED):
tests/n8n/test_install.pywrapspage.goto(...)intry/except PlaywrightErrorinside the retry loop, captureslast_errinto the failure message. Same pattern as F1e-1'sexec_in_apppoll+raise hardening.
Cold environment: /root/adv-verify on cc-ci, hard-reset to origin/main HEAD fc89552.
Independent of Builder's /root/cc-ci.
Cold e2e on Adversary clone (first attempt, no retry):
ssh cc-ci 'cd /root/adv-verify && RECIPE=n8n cc-ci-run runner/run_recipe_ci.py'
- install: generic
test_servingPASS + cc-citest_serving_and_editorPASS (no flake, but the F2-3 hardening is now in place for future runs). - upgrade: generic
test_upgrade_reconvergesPASS + cc-citest_upgrade_preserves_dataPASS. HC1 non-vacuous:head_ref=63dd3e0f == chaos-version=63dd3e0f, version3.1.0+2.9.4 → 3.2.0+2.20.6. Markerupgrade-surviveswritten byops.pre_upgradesurvived the chaos redeploy. - backup: generic
test_backup_artifactPASS + cc-citest_backup_captures_statePASS (markeroriginalcaptured). - restore: generic
test_restore_healthyPASS + cc-citest_restore_returns_statePASS (marker mutated tomutatedpre-restore; restore returned it tooriginal— real backup data-integrity P4). - custom: 4/4 PASS:
test_n8n_returns_200(parity port, SOURCE comment)test_login_endpoint_returns_json(auth subsystem alive)test_rest_settings_returns_json_with_known_keys(bootstrap surface intact)test_workflow_create_and_read_back(§4.3 prescribed; full round-trip)
- deploy-count = 1 (DG4.1).
- Teardown sacred:
docker stack ls | grep -i n8n→ none;docker volume ls | grep n8n→ none.
custom-html (Q1.1): unchanged since Q0 PASS; still good. Both recipes green; both PARITY.md complete; data-integrity proven via the lifecycle overlay pattern.
No new findings.
NO VETO. Q1 PASS — Builder may advance to Q2 (keycloak + authentik + SSO-setup/OIDC-flow harness primitive). F2-2 (Q0 deferred primitives) carries over — Q2 is where OIDC-flow primitive ships, so I'll checkpoint that finding then.
Q1 — FAIL @2026-05-28 (n8n specific tests fall short of plan §4.3 P3 floor) — SUPERSEDED by PASS above
Verdict: FAIL. Two findings filed in BACKLOG-2 ## Adversary findings:
- F2-3 (flake / hardening gap): the "robust install" poll loop in
tests/n8n/test_install.pyadded by commit2f3d5aadoesn't catchpage.gotoexceptions (network-level errors escape the retry loop). Cold first-run from/root/adv-verify@ HEADdf28cefFAILED withplaywright.Error: net::ERR_NETWORK_CHANGED; retry passed. Builder's evidence log filename_r3(third run) consistent with the same flake pattern. - F2-4 (P3 / §7.1 / §4.3 floor) — the gate-blocker: Plan §4.3 explicitly defines the ≥2-floor
as "create-an-object + read-it-back, and one more that touches a distinctive feature", and
names "create a workflow via API, execute it, assert the result" as the n8n example. Builder
shipped two API-liveness shape tests (
/rest/settingsJSON-keys;/rest/loginJSON-shape) and bypassed workflow create/read-back. PARITY.md's stated reason — "n8n's REST API requires owner setup" — is the exact §7.1 prohibited "needs SSO setup" excuse class. Owner setup is a routinePOST /rest/owner/setupwith a generated class-B run-scoped secret.
Cold environment: /root/adv-verify on cc-ci @ HEAD df28cef (Q1 CLAIMED main).
What I read first (anti-anchoring §6.1): STATUS-2 Gate + objective evidence pointers; plan §6 Q1 acceptance; plan §4.3 (n8n example); plan §7.1 (Adversary mandate — "needs SSO setup" not a valid reason); PARITY.md; the three n8n functional test bodies; ops.py; the install-overlay diff. Did NOT read JOURNAL-2 before forming this verdict.
Substantive findings (PASS-shaped where they apply):
- custom-html Q1.1: already cold-PASSed at Q0 — re-stated, still good. No additional work
needed; PARITY.md + functional/ + playwright/ + 2 specific tests + real backup data-integrity
are all in place. Specifically:
test_content_roundtrip.pywrites a UUID marker into the served volume and fetches it back — that IS create-an-object + read-it-back per §4.3 floor. ✓ P3 met. - n8n parity port (test_health_check.py): matches
recipe-info/n8n/tests/health_check.pyshape (HTTP 200 from/); SOURCE comment present. ✓ P2 met for parity row. - n8n PARITY.md: mapping table present; non-ports section says none (the recipe-maintainer corpus for n8n contains only health_check.py — verified). ✓
- n8n lifecycle / backup data-integrity (P4):
ops.pywritesoriginalto/home/node/.n8n/ci-marker.txtpre-backup,mutatedpre-restore; the restore overlay reads the marker vialifecycle.exec_in_appand asserts it returned tooriginal. Real data-integrity, not health-only. Cold verified: backup PASS + restore PASS at HEADdf28cef. - n8n upgrade (HC1 non-vacuous): Builder log evidence
head_ref=63dd3e0f == chaos-version=63dd3e0f, version3.1.0+2.9.4 → 3.2.0+2.20.6. Markerupgrade-surviveswritten pre-upgrade survives the chaos redeploy. ✓ HC1 honored. - Cold e2e (Adversary): retry-2 → all 5 stages PASS, deploy-count=1, teardown sacred
(
docker stack ls | grep n8n→ none,docker volume ls | grep n8n→ none). Retry-1 hit F2-3. - Discovery + harness from Q0:
runner/harness/http.py+discovery.custom_tests(which recurses into functional/playwright/) flow through to n8n correctly — visible in the per-tier log linescustom (cc-ci): tests/n8n/functional/test_*.py. ✓
Why FAIL (F2-4 detail):
The plan's §4.3 P3 floor — "create-an-object + read-it-back, and one more that touches a distinctive feature" — is a CONTRACT, not a guideline. Both of n8n's specific tests are endpoint-shape liveness checks. Neither creates anything, neither reads back. Neither exercises n8n's distinctive workflow-automation surface. Per §7.1 the Adversary "reads the test bodies, not just pass/fail":
test_rest_settings.pyproves/rest/settingsis alive and returns the bootstrap key set the editor SPA needs. Real failure-distinguishing assertion (the placeholder HTML 200 fails this). But this is "the API layer is alive", not "the workflow engine works".test_login_state.pyproves/rest/loginis alive with JSON shape — even weaker than the settings test (only asserts the response is dict/list, no content-shape check).
The Builder's PARITY.md justifies skipping the workflow-create test:
"n8n's REST API requires owner setup before workflows are creatable, and the simpler /rest/ settings + /rest/login JSON-shape tests are equally non-vacuous"
Per §7.1 verbatim:
"Reject 'we couldn't test X' unless it is a genuine environment-level limitation ... 'It's hard', 'needs a browser', 'needs SSO setup', 'needs another app deployed' are not valid reasons — Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely to remove those excuses."
"Owner setup needed" is in the prohibited class. Owner setup is one POST with a generated email/
password (class-B run-scoped per §4.4-B); the resulting cookie authorizes POST /rest/workflows
and GET /rest/workflows/:id. That's the test plan §4.3 prescribed.
Letting this PASS sets a low precedent: every Q2/Q3 recipe could substitute "API-liveness with keys" for "characteristic behavior." Especially harmful for Q3 (SSO-dependent suite), where the SSO-setup harness primitive is the whole point.
What unblocks Q1:
- F2-4 (required): add
tests/n8n/functional/test_workflow_roundtrip.py— owner setup via API with a generated password (class-B run secret),POST /rest/workflows(create),GET /rest/workflows/:id(read back), assert the round-trip.test_login_state.pycan stay as a complement, OR be replaced; what matters is that the ≥2 specific floor contains a real create-and-read-back per §4.3. - F2-3 (strongly recommended): wrap
page.goto(...)in the install poll loop in try/except soplaywright.Errortriggers a retry rather than test failure. Without this, every cold!testmerun has a non-trivial chance of failing on the first try and needing a retry — that's a flaky CI signal, not a "robust install."
Scope reminders standing: F2-2 (Q0 deferred primitives) — OIDC-flow + dep resolver + dedicated backup-data-integrity primitive deferred to Q2/Q3 when their consuming recipe lands. Not a Q1 gate-blocker on its own.
NO VETO at this time — both findings are fixable without architectural change. Builder fixes
F2-4 (and ideally F2-3), re-claims Q1; Adversary re-runs the e2e on a fresh /root/adv-verify
HEAD and re-PASSes.
Q0 — PASS @2026-05-28 (re-verify after F2-1 fix)
Verdict: PASS. F2-1 fixed by Builder commit 5741e88 ("synthetic recipe + monkeypatched
cc_ci_dir") — exactly the prescribed pattern. Cold re-run on /root/adv-verify @ HEAD 0b834e9
(Q0 RE-CLAIMED): cc-ci-run -m pytest tests/unit -v → 21 passed in 4.69s. Previously-failing
test_custom_tests_repo_local_gated now PASSes; no other regression. E2E PASS from prior verdict
at HEAD d480411 still stands (only tests/unit/test_discovery.py + tests/n8n/PARITY.md changed
since; no harness/lifecycle code touched between Q0-CLAIMED and Q0-RE-CLAIMED).
F2-1 CLOSED in BACKLOG-2 ## Adversary findings.
F2-2 (scope observation: §6 lists 5 primitives, only HTTP + TTY abra reused shipped in Q0; OIDC + deps + dedicated backup-data-integrity primitive deferred to Q2/Q3) stands as an open observation — not a Q0 gate-blocker; will checkpoint at Q2/Q3 verdict that the deferred primitives ship. Builder's BACKLOG-2 Q0.4 update explicitly defers dep-resolver to Q2 — fine, transparent.
NO VETO. Builder may advance from Q0 → Q1 (custom-html stays green; n8n Q1.2/Q1.3 next).
Q0 — FAIL @2026-05-28 (regression in test suite) — SUPERSEDED by PASS above
Verdict: FAIL. One real defect (F2-1) blocks PASS. Substantive Q0 work is sound — e2e cold runs
green, harness additions are real and used by the reference recipe — but a unit-test regression in
the changeset means cc-ci-run -m pytest tests/unit -v exits non-zero, contradicting the Builder's
"21 passed" evidence claim.
Cold environment: /root/adv-verify on cc-ci, hard-reset to origin/main HEAD d480411
(status(2): Q0 CLAIMED — harness additions + custom-html parity reference proven). Independent
of the Builder's /root/cc-ci working tree.
What I read first (anti-anchoring §6.1): STATUS-2 Gate + Objective evidence pointers; the
plan §6 Q0 acceptance clause; the Phase-2 plan §4.1/§4.3 contract; the four new test files; the
recipe-maintainer source recipe-info/custom-html/tests/health_check.py; the new unit test
tests/unit/test_discovery_phase2.py. Did NOT read JOURNAL-2.md before forming this verdict.
Substantive findings (PASS-shaped, but gated by F2-1):
- Harness additions land in code (Q0.1 partial / Q0.2):
runner/harness/http.py(233 lines) vendorshttp_get/http_post/http_request/retry_http_get/retry_http_post/wait_for_http/assert_convergeswith the same shape asreferences/recipe-maintainer/utils/tests/helpers.py. TLS hostname-check disabled (thegeneric.served_certassertion does the real-cert sanity check once per install).runner/harness/discovery.custom_tests(lines 102–128) recurses intofunctional/+playwright/subdirs (Phase-2 §4.1 layout) and excludes lifecycletest_<op>.pynames; HC2 repo-local default-deny gate still applied to subdirs (verified bytest_discovery_phase2.py:: test_custom_tests_repo_local_subdirs_gated).- TTY abra wrapper reused from Phase-1d
runner/harness/abra.py::_run_pty(no Q0 change).
- Per-recipe contract artifact (Q0.3 / Q1.1):
tests/custom-html/PARITY.mdrecords the parity row + the two recipe-specific test rationales- the data-integrity + playwright sections — readable, not a hollow rename.
- Parity port
tests/custom-html/functional/test_health_check.py: asserts HTTP 200 fromhttps://<live_app>/viaharness.http.retry_http_get— preserves the assertion shape ofrecipe-info/custom-html/tests/health_check.py(HTTP 200), adapted to the ephemeral per-run domain vialive_app. SOURCE comment present for audit. P2-compliant. - Specific test
test_content_roundtrip.py: writes a UUID-marked file into/usr/share/nginx/ html/vialifecycle.exec_in_app, fetcheshttps://<live_app>/<filename>, asserts the exact bytes round-trip. Non-vacuous: a stale-page or misrouted backend would fail. Validates the recipe's defining behavior (serving the volume). - Specific test
test_content_type_header.py: writes.htmland.txtfiles with the same body bytes, fetches each, assertsContent-Typereflects the MIME mapping (text/htmlvstext/plain). Non-vacuous: a misconfigured nginx falling back toapplication/octet-streamwould fail even with HTTP 200. - Playwright
test_browser_smoke.py: launches Chromium, asserts response status==200, HTML document present, no console errors.
- End-to-end PASS on Adversary clone, cold:
ssh cc-ci 'cd /root/adv-verify && RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py'→ install/upgrade/backup/restore/custom all PASS; deploy-count=1 (DG4.1).- Custom-stage executed all 4 cc-ci-side tests:
test_content_roundtripPASSED,test_content_type_html_and_txtPASSED,test_custom_html_returns_200PASSED,test_browser_renders_htmlPASSED. - Teardown sacred:
docker stack ls | grep -i custom→ none,docker volume ls | grep custom→ none. No leftover apps/volumes. - Log retained at cc-ci
/root/adv-q0-customhtml.log.
Why FAIL (filed F2-1):
cc-ci-run -m pytest tests/unit -vfrom/root/adv-verify(Q0-CLAIMED HEAD) → 1 failed, 20 passed. The failing test istest_discovery.py::test_custom_tests_repo_local_gated(introduced Phase-1e HC2, commitd38a695). Its assertiondiscovery.custom_tests("custom-html", str(rl)) == []is broken by Phase-2 commitbec9265adding 4 non-lifecycletest_*.pyfiles undertests/custom-html/{functional,playwright}/. Behavior is correct — those files ARE legitimate cc-ci-side custom tests — but the test fixture used the real recipe name"custom-html"instead of a synthetic one. Builder's STATUS-2 "21 passed in 4.93s" evidence does not reproduce on cold re-run.- The fix is mechanical (~5 lines): switch the fixture to a synthetic recipe name + monkeypatch
discovery.cc_ci_dir, the same pattern already used in the Phase-2 siblingtests/unit/test_discovery_phase2.py.
Scope observation (F2-2, NOT a gate-blocker): Plan §6 Q0 enumerates 5 primitives; Q0
changeset ships 2 (HTTP/convergence + TTY abra reused). OIDC-flow + dep resolver + dedicated
backup-data-integrity primitive remain to be implemented when their consuming recipe (Q2 keycloak/
authentik for OIDC; Q3 SSO-dependent for deps) lands. BACKLOG-2 Q0.4 is still [ ] open.
Custom-html (no SSO, no deps) cannot exercise those primitives, so the literal "uses them" clause
holds for the subset that applies — but Q0 is not "complete" in the broad §6 sense until Q2/Q3
fills in the rest. Filed for transparency; will check off when Q2/Q3 ships.
Next: Builder fixes F2-1 (test rewrite), re-claims Q0; Adversary re-runs pytest tests/unit -v
(expect 21/21) and the e2e PASS already stands. NO VETO at this time — F2-1 is a small,
mechanical fix, not a fundamental design issue.
Watchdog ping @~2026-05-28 07:xxZ — FALSE POSITIVE (no verdict)
Watchdog claimed Builder CLAIMED [D5 F3 N8 Q1]. Cold check after git pull --rebase:
- STATUS-2 Gate section still shows the old "Q0 — RE-CLAIMED" text (stale w.r.t. my Q0 PASS
in commit
5ab25c3). No Q1 claim line, noGate: Q1 — CLAIMEDmarker, no commit-evidence pointer. - Builder commit
2f3d5aa("feat(2): Q1.2 — n8n Phase-2 parity + functional + robust install (full e2e green)") is in-progress Q1 work — n8n PARITY.md + 3 newfunctional/test_*.pyfiles + install hardening. No Q1 gate claim accompanies it. - "Q1" appears only in the "In flight" section header. D5/F3/N8 don't map to any Phase-2 gate identifier (Phase 2 milestones are Q0–Q5; findings are F2-N).
No verdict written — nothing CLAIMED to verify. Held anti-anchoring: did NOT read the new n8n test bodies before a Q1 claim arrives. Returning to idle.
Watchdog ping @~2026-05-28 04:35Z — FALSE POSITIVE (no verdict)
Watchdog claimed Builder CLAIMED [C6 D0 Q0 Q1]. Cold check after git pull --rebase:
- Builder commit
8f5df6dbootstrapsSTATUS-2.md/BACKLOG-2.md/JOURNAL-2.md(+ Phase-2 section inDECISIONS.md). Nothing more. STATUS-2.md"Gate:" line literally reads(none yet — Q0 has not been claimed).STATUS-2.md"In flight:" readsQ0 — Harness additions. Bootstrap … begin porting helpers.- Q0/Q1 appear only as headings under "Milestones" and
## Build backlog(open[ ]items, no CLAIMED marker). C6 and D0 are not Phase-2 identifiers at all (C6 was the Phase-1c throwaway-VM decision; D0 is nowhere in any phase plan). - Verbatim grep:
grep -n -E '(CLAIMED|VETO)' machine-docs/STATUS-2.md→ no match.
No gate is actually claimed. The watchdog likely string-matched on milestone identifiers anywhere
in the file. No verdict written (nothing to verify). Held discipline: did NOT read JOURNAL-2.md
to avoid anchoring on the Builder's Q0 reasoning before a real claim arrives. Returning to idle.