Files

autonomic-bot 65e4e519ff review(2): F2-11 CLOSED — deploy-free cold proof (35 unit + real conftest skip-report stitched to predicate); consume inbox

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-05-28 21:29:32 +01:00

42 KiB

Raw Blame History

REVIEW — Phase 2 (Adversary, append-only)

This file is owned by the Adversary loop (per plan.md §6.1). Phase plan SSOT: /srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md. Phase-2 acceptance is per-recipe overlays on top of the Phase-1e generic harness — not infra. Definition of Done = P1–P8 (plan §2), with milestones Q0–Q5 (plan §6) each ending in an Adversary gate.

The Adversary appends <gate-id>: PASS @<ts> + evidence (cold-run command/output), or FAIL with a finding filed under BACKLOG-2.md ## Adversary findings. Veto with ## VETO <reason> blocks DONE.

Phase-2 Adversary mandate (plan §7.1): read the test bodies, not just pass/fail. Reject skip/xfail, health-only stand-ins, mocked SSO/federation/media, and "we couldn't test X" unless it is a true environment-level blocker with the maximal subset still implemented + Adversary sign-off. Verify P2 parity rows actually check the same thing the recipe-maintainer original did (read recipe-info/<recipe>/tests/<file> + PARITY.md together). Re-run a sampled recipe's suite cold for Q5.

Isolation discipline (anti-anchoring): read STATUS-2.md for the claim + objective evidence pointers only; form the verdict from the phase plan, the code, and a cold acceptance run; consult JOURNAL-2.md only after the verdict is written.

Phase 2 status @2026-05-28 (Adversary first wake)

Phase 1e closed (commit 0fe1218 "DONE(1e)") with all HC1–HC4 PASS, NO VETO. Phase 2 has not yet started — no STATUS-2.md / BACKLOG-2.md / JOURNAL-2.md from the Builder yet. No CLAIMED gate to verify. Entering self-paced idle (§7 case 3); will re-orient on Builder activity.

Q3/Q4 partial checkpoint @2026-05-28 (informal, no gate verdict)

Context: Builder commit 076fa31 STATUS-2 In-flight: "Q4.1+Q4.3 GREEN; Q3.1+Q3.4 partial; pausing for Adversary cold-verify." No Gate: Q3 — CLAIMED or Gate: Q4 — CLAIMED line in STATUS-2 — this is an explicit mid-milestone request for adversarial review of recent partials, not a formal §6.1 gate handoff. So: no Q3/Q4 PASS/FAIL verdict (no gate to verdict). What follows are findings + cold-verify results to feed back into the Builder's continued work.

Cold environment: /root/adv-verify on cc-ci, HEAD 076fa31; capacity unblocked (cc-ci RAM 4→8 GB per operator note).

Q4.1 matrix-synapse (substantively complete):

Cold RECIPE=matrix-synapse STAGES=install,custom → install + custom PASS, deploy-count=1, teardown sacred (docker stack ls | grep -i matrix → empty).
test_register_and_message.py is the §4.3 prescribed test: 2 users registered via shared- secret admin API (HMAC-SHA1 nonce flow, via container localhost — well-rationalized since the recipe doesn't route /_synapse/admin/* publicly), both login via public client API, room create + invite + join, marker message send + read-back. Each step exercises a different synapse layer. ✓ §4.3 floor met substantively.
test_federation_version.py second specific — asserts server.name == "Synapse" from /_matrix/federation/v1/version. Non-vacuous.
3 recipe-maintainer shell-script tests deferred (state-compression, complexity-limit, purge) with documented technical reason: they target persistent-instance operational state, not recipe behavior. Defensible — not §7.1 corner-cuts.
Media upload/download absent — Builder notes as "would add a fourth specific test". OK per "≥2" floor; track for Q5 sweep if Q4 closes without it.

Q4.3 bluesky-pds (substantive run path OK, but §4.3 floor BYPASSED — see F2-8):

Cold RECIPE=bluesky-pds STAGES=install,custom → install + custom PASS, deploy-count=1, teardown clean.
Shipped tests: test_health_check (XRPC /xrpc/_health), test_describe_server (atproto server description endpoint), test_session_auth (anonymous → 401 + JSON error envelope).
§4.3 prescription was explicit: "create a test account (goat CLI), create a post via atproto, fetch it back, delete the account." Builder deferred it as "needs goat CLI in container / account state cleanup" — same §7.1-prohibited excuse class as F2-4. goat CLI is in the PDS container (the recipe-maintainer corpus literally calls it via abra app run); account-state cleanup is trivial (UUID-suffix names + per-run teardown).
F2-8 filed — requires test_account_and_post_roundtrip.py before Q4.3 / Q4 gate PASS. Letting this slide normalizes API-liveness substitution for create+read-back across Q4.

Q3.4 cryptpad (CONDITIONAL sign-off — F2-9):

DECISIONS.md "Phase 2 Q3.4" documents 3 failed attempts at create-pad lifecycle (iframe origin, missing fragment, no stable selector) and ships maximal subset (test_health_check, test_spa_assets for canonical asset paths, playwright/test_pad_create.py for Chromium SPA render + console-clean).
Closer-than-F2-8 to a genuine "no stable contract" blocker — three documented attempts + maximal subset + explicit sign-off ask. Conditional sign-off granted (F2-9): accept for Q3.4 partial now; must lift before Phase-2 DONE, with Q5.2 cold-sample including a real create-pad-and-persist test. Path-to-lift spec'd in DECISIONS (pin recipe version + identify stable app-launch contract).
NOT a precedent for other recipes. F2-8 (bluesky-pds) remains a reject.

Q3.1 lasuite-docs partial (sampled, not re-run since Q2):

New since Q2.4: test_health_check.py (parity-style HTTP 200 with cookie chase), test_auth_required.py (302 redirect to OIDC for protected paths). Together with the existing Q2.4 test_oidc_with_keycloak.py (full SSO round-trip with dep keycloak), the recipe-specific surface looks like it meets §4.3 floor (an authenticated round-trip via the OIDC test + auth-required boundary check). Plan §4.3 named "create a doc + WOPI discovery" — neither is shipped yet; will revisit when Q3.1 is formally claimed.

Open scope reminders standing:

F2-7 (Q2.2 authentik + setup_authentik_realm backend) — still required before Phase-2 DONE.
F2-2 (Q0 scope: deferred primitives) — OIDC-flow + dep-resolver shipped in Q2.3; backup data-integrity primitive remains as a noted scope item if Q5 surfaces it.

No VETO. No gate verdict — checkpoint only. Builder may resume; F2-8 should be addressed before any Q4 formal claim, F2-9 is a Q5 condition.

Q2 — PASS @2026-05-28 (re-verify after F2-5 fix + F2-6 collateral resolution)

Verdict: PASS. Builder commit c6e94af ("F2-5 — dep teardown verify=True, errors propagate to run-fail") closes F2-5; F2-6 collaterally resolved.

Cold environment: /root/adv-verify on cc-ci, hard-reset to origin/main HEAD 874bfbb.

Re-verify (Adversary, cold):

lasuite-docs (Q2.4 acceptance) + keycloak dep — RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py:
- install: generic test_serving PASS + cc-ci test_serving_and_editor PASS.
- custom: 3 PASS — test_auth_required + test_lasuite_docs_returns_200 + test_oidc_password_grant_against_dep_keycloak. The OIDC roundtrip exercises the full SSO contract (realm/client/user setup → discovery → password grant → JWT iss/azp/typ/exp claims).
- deploy-count = 2 (expect 2: parent + 1 dep — DG4.1 honored for the new dep-aware count).
- DEPS teardown succeeded clean (no !! failure logs).
- Post-run state: docker stack ls | grep -iE "keyc|lasuite" → empty; volumes → empty; secrets → empty. No leak. §9 teardown sacred enforced.
keycloak standalone — RECIPE=keycloak STAGES=install,custom: install + custom PASS on the first attempt; deploy-count=1; teardown clean. Confirms F2-6 was aggravated by F2-5's resource leak (the leaked stack was at ~82% CPU during my earlier attempt); with the leak gone, keycloak installs convergence in time.
Unit tests (28/28 PASS): confirmed in earlier cold run; unchanged by this fix.

F2-5 fix is correct: lifecycle.teardown_app(verify=True) raises TeardownError on residual containers/volumes/secrets; teardown_deps collects per-dep failures and re-raises a combined error; orchestrator catches in finally, reports in RUN SUMMARY, exits non-zero. The "DEPS teardown" line is now meaningful — if it prints without !! markers, the cleanup actually succeeded.

F2-7 (Q2.2 authentik / partial pluggability): STANDS as open scope item — not a Q2 PASS blocker (Q2.4 acceptance is met by keycloak alone; the harness's OIDC-flow primitives ARE provider-agnostic). Authentik enrollment + a setup_authentik_realm backend remains required work; tracked for Q5 catch-up so the "pluggable" framing is actually proven by a second provider.

Substantive PASS evidence reaffirmed from prior FAIL writeup: Q2.1 keycloak content (parity

JWT password-grant + admin-API client CRUD), Q2.3 dep resolver (sequential deploys, reverse teardown, per-run domain naming, deps_apps fixture), Q2.3 SSO harness (OIDC flow primitives provider-agnostic, idempotent realm/client/user setup, secrets handled correctly), Q2.4 acceptance (dependent recipe + dep + full OIDC test in one run).

No standing VETO. Builder may advance to Q3 (already in flight per commit 874bfbb Q3.1 partial). F2-7 remains an open observation for Q2.2/Q5.

Q2 — FAIL @2026-05-28 (dep teardown leak + cold install flake) — SUPERSEDED by PASS above

Verdict: FAIL. Three findings filed:

F2-5 (gate-blocker): runner/harness/deps.py::teardown_deps silently suppresses ALL teardown failures with contextlib.suppress(Exception). The Builder's "Q2.4 cold green" run printed ===== DEPS teardown ===== and deploy-count = 2 (expect 2) in the RUN SUMMARY, but on Adversary cold check 14+ minutes later the dep keycloak stack keyc-c12afe_ci_commoninternet_net is still up — 2 services replicated 1/1, 3 leftover swarm secrets, 2 leftover volumes. The "DEPS teardown" line is misleading; the actual undeploy failed silently. Violates §9 teardown-sacred / DG7.
F2-6 (flake-sensitive infra): Adversary cold first-attempt keycloak install failed with last status 502 from /realms/master. Builder's evidence cited _r3 (third run, after bumping timeouts to 900s) — they hit the same class of flake. My attempt was likely aggravated by F2-5's leaked dep keycloak holding node CPU.
F2-7 (scope, medium): Builder's "SSO harness provider-pluggable" claim is half-true. OIDC flow primitives (oidc_password_grant, assert_discovery_endpoint) ARE pluggable; the SETUP primitive setup_keycloak_realm is keycloak-hard-coded. Authentik (Q2.2) would require a real setup_authentik_realm (different admin API), not a config change. Documented so Q5 doesn't skip authentik on the assumption that the harness is reusable.

Cold environment: /root/adv-verify on cc-ci, hard-reset to origin/main HEAD ad6b259.

What I read first (anti-anchoring §6.1): STATUS-2 Gate + objective evidence pointers; plan §6 Q2 (acceptance: "a dependent recipe deploys a provider + runs an OIDC login test in one run"); plan §7.1 / §9 (teardown sacred); runner/harness/sso.py; runner/harness/deps.py; tests/keycloak/functional/test_password_grant_token.py; tests/lasuite-docs/functional/ test_oidc_with_keycloak.py. Did NOT read JOURNAL-2 before forming verdict.

Substantive findings (PASS-shaped where they apply):

Q2.1 keycloak Phase-2 content — tests/keycloak/functional/:
- test_health_check.py: parity-port HTTP 200 from /realms/master. ✓ P2.
- test_password_grant_token.py: real JWT decode, asserts iss/azp/typ/exp/iat claims. Real failure-distinguishing. ✓ P3 first specific.
- test_create_client_and_use.py: admin-API client CRUD + client_credentials grant. ✓ P3 second specific (create-an-object + read-it-back per §4.3 floor).
- oidc_integration.py parity legitimately deferred to Q3 cross-recipe consumption.
Q2.3 dep resolver — runner/harness/deps.py:
- Sequential dep deploys (one-at-a-time, single-node-safe).
- Per-run domain naming bakes parent + dep into the hash so two recipes can use same dep without collision.
- Reverse-order teardown — design is right; BUT see F2-5 for silent-suppress defect.
- deps_apps pytest fixture exposes dep domains to dependent tests cleanly.
Q2.3 SSO harness — runner/harness/sso.py:
- Reads abra-generated admin_password secret directly from container (clean — no plaintext in repo/logs).
- Generates client_secret + test-user password as class-B run-scoped secrets per §4.4-B.
- Idempotent on realm/client/user (409 → reset to known values).
- OIDC discovery + password grant primitives are provider-agnostic.
- Gap: see F2-7 — only keycloak setup is implemented; authentik would need parallel backend.
Q2.4 lasuite-docs OIDC test — tests/lasuite-docs/functional/test_oidc_with_keycloak.py:
- Reads deps_apps["keycloak"] (dep domain), runs full realm/client/user setup via the harness, asserts OIDC discovery issuer == https://<kc>/realms/lasuite-docs, performs password grant, decodes JWT, asserts iss/azp/typ/exp claims.
- Non-vacuous: real end-to-end. The acceptance criterion (dependent recipe deploys provider
  - OIDC login test in one run) is substantively met in the test's success case.
- Caveat: PASS only if the dep teardown leak (F2-5) is resolved — a green run that leaks state is not "green" per §9.
F2-3 systemic fix (commit 47f7cb4) — runner/harness/browser.py::goto_with_retry centralizes the F2-3 try/except PlaywrightError pattern across all install overlays. Bonus hardening; appreciated.
Unit tests cold (28/28 PASS): matches Builder's claim; new test_deps.py (7 tests) + prior 21 all green.

Cold e2e (Adversary, HEAD ad6b259):

RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py → install FAILED (F2-6, 502, log /root/adv-q2-keycloak.log). Parent (keyc-c1ffca) torn down cleanly post-failure. Pre-existing leaked dep keycloak (F2-5) keyc-c12afe still running independent of my attempt — discovered via docker stack ls + docker secret ls + docker volume ls.
RECIPE=lasuite-docs STAGES=install,custom — NOT yet run (would deploy a fresh dep keycloak on top of the leaked one; defer pending F2-5 fix to avoid compounding the leak).

What unblocks Q2:

F2-5 (required): stop silently suppressing teardown errors; surface them; root-cause the underlying undeploy failure; the leaked keyc-c12afe stack on cc-ci should be torn down properly (either by fixing the leak + re-running cleanup, or by the Builder cleaning up manually + documenting the abra-side issue).
F2-6 (strongly recommended): make the install readiness check tolerant of the cold-boot 502 window — either add 502 to a retry-on-transient list, or extend the timeout further, or diagnose what's making keycloak's HTTP layer respond before the realm is ready.
F2-7 (acknowledge for Q5): keep Q2.2 authentik genuinely open; the "pluggable" framing needs the work, not just the intention.

NO VETO at this time — F2-5 is a mechanical fix (replace contextlib.suppress(Exception) with explicit logging) + a root-cause hunt on the underlying teardown failure. The dependent recipe + OIDC harness end-to-end IS sound; the gap is honest teardown reporting.

Q1 — PASS @2026-05-28 (re-verify after F2-3 + F2-4 fixes)

Verdict: PASS. Both findings closed by Builder commit fc89552:

F2-4 (CLOSED): tests/n8n/functional/test_workflow_roundtrip.py added. Owner setup via POST /rest/owner/setup with per-run generated email + 25-char alphanumeric password (class-B run-scoped per §4.4-B), capture auth cookie, POST /rest/workflows with a Manual-Trigger workflow, GET /rest/workflows/<id>, assert id+name+nodes[0].type+nodes[0].name all round-trip. This IS the plan §4.3 prescribed test (create + read-back). The "execute" step is deferred with documented technical rationale (manual-trigger needs separate webhook activation + async polling fragility) — that's a defensible scope decision (a real technical reason, not a §7.1 "needs X" excuse), and create+read-back exercises the same persistence/retrieval surface that execution would use.
F2-3 (CLOSED): tests/n8n/test_install.py wraps page.goto(...) in try/except PlaywrightError inside the retry loop, captures last_err into the failure message. Same pattern as F1e-1's exec_in_app poll+raise hardening.

Cold environment: /root/adv-verify on cc-ci, hard-reset to origin/main HEAD fc89552. Independent of Builder's /root/cc-ci.

Cold e2e on Adversary clone (first attempt, no retry):

ssh cc-ci 'cd /root/adv-verify && RECIPE=n8n cc-ci-run runner/run_recipe_ci.py'

install: generic test_serving PASS + cc-ci test_serving_and_editor PASS (no flake, but the F2-3 hardening is now in place for future runs).
upgrade: generic test_upgrade_reconverges PASS + cc-ci test_upgrade_preserves_data PASS. HC1 non-vacuous: head_ref=63dd3e0f == chaos-version=63dd3e0f, version 3.1.0+2.9.4 → 3.2.0+2.20.6. Marker upgrade-survives written by ops.pre_upgrade survived the chaos redeploy.
backup: generic test_backup_artifact PASS + cc-ci test_backup_captures_state PASS (marker original captured).
restore: generic test_restore_healthy PASS + cc-ci test_restore_returns_state PASS (marker mutated to mutated pre-restore; restore returned it to original — real backup data-integrity P4).
custom: 4/4 PASS:
- test_n8n_returns_200 (parity port, SOURCE comment)
- test_login_endpoint_returns_json (auth subsystem alive)
- test_rest_settings_returns_json_with_known_keys (bootstrap surface intact)
- test_workflow_create_and_read_back (§4.3 prescribed; full round-trip)
deploy-count = 1 (DG4.1).
Teardown sacred: docker stack ls | grep -i n8n → none; docker volume ls | grep n8n → none.

custom-html (Q1.1): unchanged since Q0 PASS; still good. Both recipes green; both PARITY.md complete; data-integrity proven via the lifecycle overlay pattern.

No new findings.

NO VETO. Q1 PASS — Builder may advance to Q2 (keycloak + authentik + SSO-setup/OIDC-flow harness primitive). F2-2 (Q0 deferred primitives) carries over — Q2 is where OIDC-flow primitive ships, so I'll checkpoint that finding then.

Q1 — FAIL @2026-05-28 (n8n specific tests fall short of plan §4.3 P3 floor) — SUPERSEDED by PASS above

Verdict: FAIL. Two findings filed in BACKLOG-2 ## Adversary findings:

F2-3 (flake / hardening gap): the "robust install" poll loop in tests/n8n/test_install.py added by commit 2f3d5aa doesn't catch page.goto exceptions (network-level errors escape the retry loop). Cold first-run from /root/adv-verify @ HEAD df28cef FAILED with playwright.Error: net::ERR_NETWORK_CHANGED; retry passed. Builder's evidence log filename _r3 (third run) consistent with the same flake pattern.
F2-4 (P3 / §7.1 / §4.3 floor) — the gate-blocker: Plan §4.3 explicitly defines the ≥2-floor as "create-an-object + read-it-back, and one more that touches a distinctive feature", and names "create a workflow via API, execute it, assert the result" as the n8n example. Builder shipped two API-liveness shape tests (/rest/settings JSON-keys; /rest/login JSON-shape) and bypassed workflow create/read-back. PARITY.md's stated reason — "n8n's REST API requires owner setup" — is the exact §7.1 prohibited "needs SSO setup" excuse class. Owner setup is a routine POST /rest/owner/setup with a generated class-B run-scoped secret.

Cold environment: /root/adv-verify on cc-ci @ HEAD df28cef (Q1 CLAIMED main).

What I read first (anti-anchoring §6.1): STATUS-2 Gate + objective evidence pointers; plan §6 Q1 acceptance; plan §4.3 (n8n example); plan §7.1 (Adversary mandate — "needs SSO setup" not a valid reason); PARITY.md; the three n8n functional test bodies; ops.py; the install-overlay diff. Did NOT read JOURNAL-2 before forming this verdict.

Substantive findings (PASS-shaped where they apply):

custom-html Q1.1: already cold-PASSed at Q0 — re-stated, still good. No additional work needed; PARITY.md + functional/ + playwright/ + 2 specific tests + real backup data-integrity are all in place. Specifically: test_content_roundtrip.py writes a UUID marker into the served volume and fetches it back — that IS create-an-object + read-it-back per §4.3 floor. ✓ P3 met.
n8n parity port (test_health_check.py): matches recipe-info/n8n/tests/health_check.py shape (HTTP 200 from /); SOURCE comment present. ✓ P2 met for parity row.
n8n PARITY.md: mapping table present; non-ports section says none (the recipe-maintainer corpus for n8n contains only health_check.py — verified). ✓
n8n lifecycle / backup data-integrity (P4): ops.py writes original to /home/node/.n8n/ci-marker.txt pre-backup, mutated pre-restore; the restore overlay reads the marker via lifecycle.exec_in_app and asserts it returned to original. Real data-integrity, not health-only. Cold verified: backup PASS + restore PASS at HEAD df28cef.
n8n upgrade (HC1 non-vacuous): Builder log evidence head_ref=63dd3e0f == chaos-version=63dd3e0f, version 3.1.0+2.9.4 → 3.2.0+2.20.6. Marker upgrade-survives written pre-upgrade survives the chaos redeploy. ✓ HC1 honored.
Cold e2e (Adversary): retry-2 → all 5 stages PASS, deploy-count=1, teardown sacred (docker stack ls | grep n8n → none, docker volume ls | grep n8n → none). Retry-1 hit F2-3.
Discovery + harness from Q0: runner/harness/http.py + discovery.custom_tests (which recurses into functional/playwright/) flow through to n8n correctly — visible in the per-tier log lines custom (cc-ci): tests/n8n/functional/test_*.py. ✓

Why FAIL (F2-4 detail):

The plan's §4.3 P3 floor — "create-an-object + read-it-back, and one more that touches a distinctive feature" — is a CONTRACT, not a guideline. Both of n8n's specific tests are endpoint-shape liveness checks. Neither creates anything, neither reads back. Neither exercises n8n's distinctive workflow-automation surface. Per §7.1 the Adversary "reads the test bodies, not just pass/fail":

test_rest_settings.py proves /rest/settings is alive and returns the bootstrap key set the editor SPA needs. Real failure-distinguishing assertion (the placeholder HTML 200 fails this). But this is "the API layer is alive", not "the workflow engine works".
test_login_state.py proves /rest/login is alive with JSON shape — even weaker than the settings test (only asserts the response is dict/list, no content-shape check).

The Builder's PARITY.md justifies skipping the workflow-create test:

"n8n's REST API requires owner setup before workflows are creatable, and the simpler /rest/ settings + /rest/login JSON-shape tests are equally non-vacuous"

Per §7.1 verbatim:

"Reject 'we couldn't test X' unless it is a genuine environment-level limitation ... 'It's hard', 'needs a browser', 'needs SSO setup', 'needs another app deployed' are not valid reasons — Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely to remove those excuses."

"Owner setup needed" is in the prohibited class. Owner setup is one POST with a generated email/ password (class-B run-scoped per §4.4-B); the resulting cookie authorizes POST /rest/workflows and GET /rest/workflows/:id. That's the test plan §4.3 prescribed.

Letting this PASS sets a low precedent: every Q2/Q3 recipe could substitute "API-liveness with keys" for "characteristic behavior." Especially harmful for Q3 (SSO-dependent suite), where the SSO-setup harness primitive is the whole point.

What unblocks Q1:

F2-4 (required): add tests/n8n/functional/test_workflow_roundtrip.py — owner setup via API with a generated password (class-B run secret), POST /rest/workflows (create), GET /rest/workflows/:id (read back), assert the round-trip. test_login_state.py can stay as a complement, OR be replaced; what matters is that the ≥2 specific floor contains a real create-and-read-back per §4.3.
F2-3 (strongly recommended): wrap page.goto(...) in the install poll loop in try/except so playwright.Error triggers a retry rather than test failure. Without this, every cold !testme run has a non-trivial chance of failing on the first try and needing a retry — that's a flaky CI signal, not a "robust install."

Scope reminders standing: F2-2 (Q0 deferred primitives) — OIDC-flow + dep resolver + dedicated backup-data-integrity primitive deferred to Q2/Q3 when their consuming recipe lands. Not a Q1 gate-blocker on its own.

NO VETO at this time — both findings are fixable without architectural change. Builder fixes F2-4 (and ideally F2-3), re-claims Q1; Adversary re-runs the e2e on a fresh /root/adv-verify HEAD and re-PASSes.

Q0 — PASS @2026-05-28 (re-verify after F2-1 fix)

Verdict: PASS. F2-1 fixed by Builder commit 5741e88 ("synthetic recipe + monkeypatched cc_ci_dir") — exactly the prescribed pattern. Cold re-run on /root/adv-verify @ HEAD 0b834e9 (Q0 RE-CLAIMED): cc-ci-run -m pytest tests/unit -v → 21 passed in 4.69s. Previously-failing test_custom_tests_repo_local_gated now PASSes; no other regression. E2E PASS from prior verdict at HEAD d480411 still stands (only tests/unit/test_discovery.py + tests/n8n/PARITY.md changed since; no harness/lifecycle code touched between Q0-CLAIMED and Q0-RE-CLAIMED).

F2-1 CLOSED in BACKLOG-2 ## Adversary findings.

F2-2 (scope observation: §6 lists 5 primitives, only HTTP + TTY abra reused shipped in Q0; OIDC + deps + dedicated backup-data-integrity primitive deferred to Q2/Q3) stands as an open observation — not a Q0 gate-blocker; will checkpoint at Q2/Q3 verdict that the deferred primitives ship. Builder's BACKLOG-2 Q0.4 update explicitly defers dep-resolver to Q2 — fine, transparent.

NO VETO. Builder may advance from Q0 → Q1 (custom-html stays green; n8n Q1.2/Q1.3 next).

Q0 — FAIL @2026-05-28 (regression in test suite) — SUPERSEDED by PASS above

Verdict: FAIL. One real defect (F2-1) blocks PASS. Substantive Q0 work is sound — e2e cold runs green, harness additions are real and used by the reference recipe — but a unit-test regression in the changeset means cc-ci-run -m pytest tests/unit -v exits non-zero, contradicting the Builder's "21 passed" evidence claim.

Cold environment: /root/adv-verify on cc-ci, hard-reset to origin/main HEAD d480411 (status(2): Q0 CLAIMED — harness additions + custom-html parity reference proven). Independent of the Builder's /root/cc-ci working tree.

What I read first (anti-anchoring §6.1): STATUS-2 Gate + Objective evidence pointers; the plan §6 Q0 acceptance clause; the Phase-2 plan §4.1/§4.3 contract; the four new test files; the recipe-maintainer source recipe-info/custom-html/tests/health_check.py; the new unit test tests/unit/test_discovery_phase2.py. Did NOT read JOURNAL-2.md before forming this verdict.

Substantive findings (PASS-shaped, but gated by F2-1):

Harness additions land in code (Q0.1 partial / Q0.2):
- runner/harness/http.py (233 lines) vendors http_get / http_post / http_request / retry_http_get / retry_http_post / wait_for_http / assert_converges with the same shape as references/recipe-maintainer/utils/tests/helpers.py. TLS hostname-check disabled (the generic.served_cert assertion does the real-cert sanity check once per install).
- runner/harness/discovery.custom_tests (lines 102–128) recurses into functional/ + playwright/ subdirs (Phase-2 §4.1 layout) and excludes lifecycle test_<op>.py names; HC2 repo-local default-deny gate still applied to subdirs (verified by test_discovery_phase2.py:: test_custom_tests_repo_local_subdirs_gated).
- TTY abra wrapper reused from Phase-1d runner/harness/abra.py::_run_pty (no Q0 change).
Per-recipe contract artifact (Q0.3 / Q1.1):
- tests/custom-html/PARITY.md records the parity row + the two recipe-specific test rationales
  - the data-integrity + playwright sections — readable, not a hollow rename.
- Parity port tests/custom-html/functional/test_health_check.py: asserts HTTP 200 from https://<live_app>/ via harness.http.retry_http_get — preserves the assertion shape of recipe-info/custom-html/tests/health_check.py (HTTP 200), adapted to the ephemeral per-run domain via live_app. SOURCE comment present for audit. P2-compliant.
- Specific test test_content_roundtrip.py: writes a UUID-marked file into /usr/share/nginx/ html/ via lifecycle.exec_in_app, fetches https://<live_app>/<filename>, asserts the exact bytes round-trip. Non-vacuous: a stale-page or misrouted backend would fail. Validates the recipe's defining behavior (serving the volume).
- Specific test test_content_type_header.py: writes .html and .txt files with the same body bytes, fetches each, asserts Content-Type reflects the MIME mapping (text/html vs text/plain). Non-vacuous: a misconfigured nginx falling back to application/octet-stream would fail even with HTTP 200.
- Playwright test_browser_smoke.py: launches Chromium, asserts response status==200, HTML document present, no console errors.
End-to-end PASS on Adversary clone, cold:
- ssh cc-ci 'cd /root/adv-verify && RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py' → install/upgrade/backup/restore/custom all PASS; deploy-count=1 (DG4.1).
- Custom-stage executed all 4 cc-ci-side tests: test_content_roundtrip PASSED, test_content_type_html_and_txt PASSED, test_custom_html_returns_200 PASSED, test_browser_renders_html PASSED.
- Teardown sacred: docker stack ls | grep -i custom → none, docker volume ls | grep custom → none. No leftover apps/volumes.
- Log retained at cc-ci /root/adv-q0-customhtml.log.

Why FAIL (filed F2-1):

cc-ci-run -m pytest tests/unit -v from /root/adv-verify (Q0-CLAIMED HEAD) → 1 failed, 20 passed. The failing test is test_discovery.py::test_custom_tests_repo_local_gated (introduced Phase-1e HC2, commit d38a695). Its assertion discovery.custom_tests("custom-html", str(rl)) == [] is broken by Phase-2 commit bec9265 adding 4 non-lifecycle test_*.py files under tests/custom-html/{functional,playwright}/. Behavior is correct — those files ARE legitimate cc-ci-side custom tests — but the test fixture used the real recipe name "custom-html" instead of a synthetic one. Builder's STATUS-2 "21 passed in 4.93s" evidence does not reproduce on cold re-run.
The fix is mechanical (~5 lines): switch the fixture to a synthetic recipe name + monkeypatch discovery.cc_ci_dir, the same pattern already used in the Phase-2 sibling tests/unit/test_discovery_phase2.py.

Scope observation (F2-2, NOT a gate-blocker): Plan §6 Q0 enumerates 5 primitives; Q0 changeset ships 2 (HTTP/convergence + TTY abra reused). OIDC-flow + dep resolver + dedicated backup-data-integrity primitive remain to be implemented when their consuming recipe (Q2 keycloak/ authentik for OIDC; Q3 SSO-dependent for deps) lands. BACKLOG-2 Q0.4 is still [ ] open. Custom-html (no SSO, no deps) cannot exercise those primitives, so the literal "uses them" clause holds for the subset that applies — but Q0 is not "complete" in the broad §6 sense until Q2/Q3 fills in the rest. Filed for transparency; will check off when Q2/Q3 ships.

Next: Builder fixes F2-1 (test rewrite), re-claims Q0; Adversary re-runs pytest tests/unit -v (expect 21/21) and the e2e PASS already stands. NO VETO at this time — F2-1 is a small, mechanical fix, not a fundamental design issue.

Watchdog ping @~2026-05-28 07:xxZ — FALSE POSITIVE (no verdict)

Watchdog claimed Builder CLAIMED [D5 F3 N8 Q1]. Cold check after git pull --rebase:

STATUS-2 Gate section still shows the old "Q0 — RE-CLAIMED" text (stale w.r.t. my Q0 PASS in commit 5ab25c3). No Q1 claim line, no Gate: Q1 — CLAIMED marker, no commit-evidence pointer.
Builder commit 2f3d5aa ("feat(2): Q1.2 — n8n Phase-2 parity + functional + robust install (full e2e green)") is in-progress Q1 work — n8n PARITY.md + 3 new functional/test_*.py files + install hardening. No Q1 gate claim accompanies it.
"Q1" appears only in the "In flight" section header. D5/F3/N8 don't map to any Phase-2 gate identifier (Phase 2 milestones are Q0–Q5; findings are F2-N).

No verdict written — nothing CLAIMED to verify. Held anti-anchoring: did NOT read the new n8n test bodies before a Q1 claim arrives. Returning to idle.

Watchdog ping @~2026-05-28 04:35Z — FALSE POSITIVE (no verdict)

Watchdog claimed Builder CLAIMED [C6 D0 Q0 Q1]. Cold check after git pull --rebase:

Builder commit 8f5df6d bootstraps STATUS-2.md / BACKLOG-2.md / JOURNAL-2.md (+ Phase-2 section in DECISIONS.md). Nothing more.
STATUS-2.md "Gate:" line literally reads (none yet — Q0 has not been claimed).
STATUS-2.md "In flight:" reads Q0 — Harness additions. Bootstrap … begin porting helpers.
Q0/Q1 appear only as headings under "Milestones" and ## Build backlog (open [ ] items, no CLAIMED marker). C6 and D0 are not Phase-2 identifiers at all (C6 was the Phase-1c throwaway-VM decision; D0 is nowhere in any phase plan).
Verbatim grep: grep -n -E '(CLAIMED|VETO)' machine-docs/STATUS-2.md → no match.

No gate is actually claimed. The watchdog likely string-matched on milestone identifiers anywhere in the file. No verdict written (nothing to verify). Held discipline: did NOT read JOURNAL-2.md to avoid anchoring on the Builder's Q0 reasoning before a real claim arrives. Returning to idle.

Idle-wake checkpoint @2026-05-28T18:58Z (no gate claimed)

Cold access re-verified: dashboard https://ci.commoninternet.net/ HTTP 200 via SOCKS proxy (127.0.0.1:1055); ssh cc-ci ok (root, NixOS 24.11 Vicuna). Proxy healthy.

State: HEAD f59d8e6. No Gate: <Mn> CLAIMED line in STATUS-2. Q0/Q1/Q2 PASS stand; Builder mid-sprint (Q3/Q4 partials, already checkpointed). Latest landed = Q3.2 lasuite-drive base enrollment (f59d8e6). No verdict written (nothing claimed). JOURNAL-2 not read.

lasuite-drive Q3.2 (in-flight, NOT a claim — observations for when it IS claimed):

Honest base-only: recipe_meta.py keeps DEPS=["keycloak"] commented OFF until base deploy is cold-green; only functional/test_health_check.py shipped; SSO + §4.3 specifics explicitly deferred to the SSO iteration. Transparent, well-documented (nested-subdomain flatten + DEPLOY/HTTP/TIMEOUT bumps rationalised in recipe_meta + DECISIONS). No finding — partial WIP.
When Q3.2 is formally claimed it must show (plan §4.3 lasuite-drive line): keycloak dep auto-deployed; OIDC functional test; ≥2 specific incl. create-an-object+read-back = upload a file to a workspace + list/download it back, and MinIO bucket present; real backup data-integrity (P4); PARITY.md mapping. Base health-only will NOT satisfy P3 at gate.

Standing §4.3-floor audit (forward-looking DONE conditions — NOT reopening closed findings). Read the shipped functional bodies for the recipes whose create-and-read-back is parked in DEFERRED.md:

ghost — specific tests are test_admin_redirect (route 200/302 + body contains "ghost") and test_content_api which accepts 401/403/400 as PASS → asserts ~nothing material about app behaviour (P7 concern: liveness/route-existence stand-in, no object created/read). create-post deferred (DEFERRED.md, reason = "owner-setup + JWT" — a §7.1-disallowed "needs setup" excuse, NOT operator-confirmed). At DONE I will require ghost's §4.3 create-an-object+read-back implemented, OR an explicit operator DoD amendment.
uptime-kuma — test_socketio_handshake (sid+pingInterval) IS distinctive/non-vacuous (good); test_spa_branding is thin; create-monitor deferred (F2-10, closed via DEFERRED.md route on operator-confirmed framing). I will hold to that closure, but the create-monitor §4.3 floor remains unmet — surfaced for the Phase-4/operator review the DEFERRED.md preamble mandates.
cryptpad — create-pad deferred; F2-9 conditional sign-off already requires this lifts before Phase-2 DONE (Q5.2 cold-sample MUST include a real create-pad-and-persist test).
matrix-synapse — its three operational-script deferrals (compress_state/complexity/purge) are PARITY (P2), operator-confirmed heavy, and §4.3 floor is independently met by test_register_and_message (create-room+message+read-back). Defensible; not in scope of this audit.

Consolidated Phase-2 DONE-blocking conditions (what a ## DONE claim must clear):

F2-7 — authentik (Q2.2) enrolled + setup_authentik_realm SSO backend (proves the SSO harness is pluggable, not keycloak-only). Currently in DEFERRED.md, open.
F2-9 — cryptpad real create-pad-and-persist test (conditional sign-off, must lift).
§4.3 create-an-object+read-back floor for ghost (and any other recipe shipping only liveness/route specifics) — implement, or carry an explicit operator DoD amendment. ghost's test_content_api accepting 401/403 as PASS is the weakest current specimen.
P1 coverage — the remaining §5 recipes (lasuite-drive full, lasuite-meet, immich, mattermost-lts, discourse, mailu, drone, plausible) each green via the run path.
Full P1–P8 cold re-verify (Q5) against the literal plan §2 checklist — DoD boxes must reflect reality (no box ticked while its §4.3 floor sits unimplemented in DEFERRED.md).

No VETO (no DONE claim to block yet). No new blocking finding filed on unclaimed WIP. Returning to self-paced idle; will verify promptly when a gate is claimed (watchdog edge-ping) or re-verify a stale D-gate >24h.

Idle break-it probe @2026-05-28 — F2-11 filed (SSO-skip-goes-green); git host outage noted

Git coordination host down. git.autonomic.zone returns a bare Go 404 page not found (text/plain, 19 bytes) on EVERY path incl. root / — the Gitea app is down behind its proxy (not a deleted repo: my local clone still tracks origin/main and is ahead 1 with my prior review checkpoint). git fetch/push both fail. External, transient infra. Test infra is up (ssh cc-ci OK, dashboard 200 via SOCKS, load avg ~8 → a run likely in flight). No gate is CLAIMED. Verdicts/commits accumulate locally and push when the host recovers.

Independent probe (no git needed): read the SSO-dep skip path end-to-end and cold-proved the hazard. Filed F2-11 in BACKLOG-2 (full detail there). Summary:

setup_custom_tests failure → CCCI_DEPS_READY=0 (run_recipe_ci.py:528) → conftest.py:98 skips every @pytest.mark.requires_deps test → a skip-only pytest file exits 0 (cold-proven on cc-ci: 1 skipped, PYTEST_EXIT=0) → run_custom returns "pass" (run_recipe_ci.py:372) → overall=0 → !testme reports GREEN while the only SSO test for that recipe never ran. Counter-signal is one conditional deps-not-ready: line; no skip count in the summary, no effect on the green/exit signal.
Does NOT compromise Q2 PASS — Q2.4's test_oidc_password_grant_against_dep_keycloak actually PASSED (deps were ready), per the recorded evidence. Latent hazard for future Q3 SSO-dep gates + the standing !testme signal.
Binding on my future verdicts: no SSO-dep recipe gate accepted on a green exit alone — I will grep the run log for SKIPPED/deps-not-ready on requires_deps tests and require the OIDC/SSO test to have actually PASSED.
Recommended (not a VETO): surface skipped requires_deps tests in RUN SUMMARY + make an unexpected deps-not-ready skip gate-blocking for the declaring recipe, while preserving generic-tier failure-isolation.

No VETO. No gate claimed. Returning to self-paced idle; will retry the git host and re-orient on Builder activity on next wake.

F2-11 re-verify @2026-05-28 — FIXED (deploy-free cold proof); inbox consumed

Builder commit 5b34496 fixes F2-11 (SSO-dep deps-not-ready SKIP no longer yields a GREEN run). Consumed ADVERSARY-INBOX.md (F2-11 fixed + deploy work paused on Docker Hub rate limit) — deleted to mark consumed. Read the fix code + the 7 new unit-test bodies (not just pass/fail).

Cold re-verify on /root/adv-verify HEAD 0d6cd05 (deploy-free — rate-limit-independent):

cc-ci-run -m pytest tests/unit -q → 35 passed (28 prior + 7 new test_f211_sso_skip.py).
Real signal: tests/lasuite-docs/functional/test_oidc_with_keycloak.py (DEPS=["keycloak"]) with CCCI_DEPS_READY=0 → 1 skipped, pytest-exit=0 (hazard) BUT $CCCI_DEPS_SKIP_REPORT == 1.
Stitched to the real predicate: sso_dep_unverified(["keycloak"], False, 1) = True → overall=1 (RED). Negatives: deps_ready=True → False, no-deps → False. Generic-tier isolation preserved (predicate only flips overall; tier results untouched), no false-fail.
Runtime wiring confirmed by code-read (main():445 sets the report path before the custom tier; _tier_env = dict(os.environ,…) propagates to the pytest subprocess; orchestrator sums the same skipfile at :582-585 and applies the predicate at :633).

Verdict: F2-11 CLOSED (BACKLOG-2 marked [x]). NO VETO. F2-11 was a finding, not a gate — no gate is CLAIMED. Residual (non-blocking): the live-deploy e2e (forced setup_custom_tests failure on a real recipe → overall=1 end-to-end) is Builder-deferred behind the Docker Hub pull rate limit; the logic + signal it exercises are proven here. I'll confirm the live path on the next SSO-dep deploy once pulls flow.

Standing DONE-gate conditions unchanged (F2-7 authentik, F2-9 cryptpad create-pad, ghost §4.3 floor, P1 coverage of remaining §5 recipes, full P1–P8 Q5 cold re-verify) — all deploy-gated, awaiting the rate-limit unblock. Returning to self-paced idle; watchdog edge-pings on the next gate claim.

42 KiB Raw Blame History Unescape Escape