Files

autonomic-bot 9f2e120ec0 review(2): F2-10 CLOSED via DEFERRED.md route — accept new operator-confirmed framing; F2-9 effectively migrates too (Phase-4 review)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-28 17:33:31 +01:00

29 KiB

Raw Blame History

BACKLOG — Phase 2 (per-recipe test authoring)

Phase-namespaced backlog. Builder edits ## Build backlog; Adversary edits ## Adversary findings. Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md

Build backlog

Q0 — Harness additions

Q0.1 — runner/harness/http.py landed (canonical Phase-2 recipe-test HTTP API: http_get/http_post/http_request/retry_http_get/retry_http_post/wait_for_http/ assert_converges). TTY abra wrapper already present (runner/harness/abra.py::_run_pty) from Phase 1d. 11 unit tests landed.
Q0.2 — discovery.custom_tests recurses into tests/<recipe>/{functional,playwright}/ (Phase 2 §4.1 layout); 2 unit tests landed.
Q0.3 — tests/custom-html/PARITY.md landed (parity row for health_check + rationale for 2 new recipe-specific tests + data-integrity + playwright sections). Parity port: tests/custom-html/functional/test_health_check.py (SOURCE comment present).
Q0.4 — Dependency resolver harness primitive (read tests/<recipe>/recipe.toml requires/test_requires, deploy deps before the recipe under test, tear down with it). Mind MAX_TESTS/node budget; sequence heavy ones. Deferred to Q2 (needed once SSO providers come online; no Phase-2 recipe in Q1 needs deps). Tracked in BACKLOG.
Q0.5 — RE-CLAIMED @2026-05-28 (commit 5741e88 adds F2-1 fix to original Q0). Custom-html reference recipe runs the full parity + ≥2 specific + playwright suite green on cc-ci; deploy-count=1; DECISIONS.md Phase-2 section in place. F2-1 closed by Builder; 21/21 unit tests PASS cold. Awaiting Adversary cold re-verify.

Q1 — Pattern proof (custom-html + n8n)

Q1.1 — custom-html: 2 NEW recipe-specific functional tests landed (test_content_roundtrip.py + test_content_type_header.py); already cold-verified in Q0 PASS.
Q1.2 — n8n enrolled under cc-ci. Parity port tests/n8n/functional/test_health_check.py + 3 recipe-specific functional tests: test_workflow_roundtrip.py (the plan §4.3 prescribed create-and-read-back via owner setup → POST /rest/workflows → GET round-trip; F2-4 fix), test_rest_settings.py (REST bootstrap surface), test_login_state.py (auth subsystem). Install overlay's Playwright now wraps page.goto in try/except PlaywrightError so transient net::ERR_* triggers retry, not failure (F2-3 fix).
Q1.3 — n8n real backup data-integrity already covered by the Phase-1d/1e lifecycle overlay pattern (ops.pre_backup seeds "original" in /home/node/.n8n; pre_restore mutates; restore must return "original" — passed in the Q1.2 e2e run).
Q1.4 — RE-CLAIMED @2026-05-28 (commit fc89552 F2-3+F2-4 on top of 2f3d5aa). Both recipes green via the run path; both PARITY.md complete; Adversary findings F2-3 + F2-4 closed by Builder. Awaiting Adversary cold re-verify.

Q2 — SSO providers (keycloak + authentik)

Q2.1 — keycloak: parity-port test_health_check.py + 2 NEW recipe-specific functional tests. Bumped timeouts to 900s. Full e2e green (commit d5f5e86).
Q2.2 — authentik: deferred (lower priority). The SSO harness primitive is provider-pluggable (the setup_keycloak_realm shape can be mirrored to setup_authentik_provider when needed); Q2.4 acceptance is already proven via keycloak. Will land when Q3 lights up an authentik-dependent recipe, or as Q4/Q5 sweep.
Q2.3 — Dep resolver (runner/harness/deps.py — declared_deps + per-(parent,dep) domain + deploy_deps/teardown_deps + run state) + SSO-setup harness (runner/harness/sso.py — setup_keycloak_realm + oidc_password_grant + assert_discovery_endpoint) + orchestrator wiring. 7 new unit tests; 28/28 PASS. Subsumes Q0.4. Commit 4d6b040.
Q2.4 — RE-CLAIMED @2026-05-28 (commit c6e94af F2-5 fix on top of 9e88741). tests/lasuite-docs/recipe_meta.py DEPS = ["keycloak"]; test_oidc_with_keycloak.py proves the full SSO flow. F2-5 verified: dep teardown now uses verify=True, raises + surfaces leak failures; cold re-verify on cc-ci → no leftover keycloak after teardown.

Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)

[~] Q3.1 — lasuite-docs: parity port (health_check) ✓ + 2 NEW recipe-specific tests (test_oidc_with_keycloak.py — Q2.4 acceptance test exercising real OIDC flow against dep keycloak; test_auth_required.py — protected backend API requires auth). Open follow-up: oidc_login.py + upload_conversion.py full ports + create-a-doc require lasuite-docs OIDC env wiring (install_steps.sh wires dep keycloak's client_secret + OIDC env into lasuite-docs's .env at install time). Documented in tests/lasuite-docs/ PARITY.md.
Q3.2 — lasuite-drive: enroll (mirror via recipe mirror+PR flow if absent); parity + specific (upload to workspace, list/download; MinIO bucket present).
Q3.3 — lasuite-meet: parity (health_check, oidc_login, meeting_flow, webrtc-media, webrtc-relay) + specific (create-a-room, two-user LiveKit token issuance, ICE-candidate gathering).
[~] Q3.4 — cryptpad: parity port (health_check) ✓ + 2 NEW recipe-specific (test_spa_assets — branding + canonical asset paths in HTML; test_pad_create.py — Playwright SPA renders + JS bundle loads + no console errors). Open follow-up: the §4.3-prescribed "create-a-pad + type + reload + read-back" test deferred with technical rationale (CryptPad pad-creation flow is version-specific; UI selector for 'new pad' varies). See DECISIONS.md Phase-2 Q3.4 section; Adversary sign-off pending per §7.1.
Q3.5 — immich: enroll (mirror as needed); add specific (upload asset, list it back, thumbnail/derivative).
Q3.6 — Q3 gate: each green with deps deployed, within node budget; SSO setup automated.

Q4 — Remaining recipes

Q4.1 — matrix-synapse: PARITY.md + 3 functional tests (federation_version, health_check, register_and_message via shared-secret admin endpoint called from container localhost — the §4.3 prescribed register-2-users + send/receive message). EXTRA_ENV TIMEOUT=900. Cold green after capacity unblock (commit 8350865). Shell-script parity tests (compress_state/test_complexity_limit/test_purge) deferred with technical rationale.
Q4.2 — mumble: enroll; specific (connect a client/CLI, channel presence beyond TCP health).
Q4.3 — bluesky-pds: enrolled. install_steps.sh generates per-run secp256k1 PLC rotation key (recipe's pds_plc_rotation_key is generate=false). PARITY.md, recipe_meta.py + 3 functional tests (health_check, describe_server, session_auth-requires-auth). Cold green via RECIPE=bluesky-pds STAGES=install,custom cc-ci-run runner/run_recipe_ci.py (commit 6115d2e). goat_account parity deferred (operational complexity).
Q4.4 — ghost: enrolled. PARITY.md + recipe_meta.py (DEPLOY_TIMEOUT=1200, TIMEOUT=1200 via EXTRA_ENV; ghost cold-start ~12-15min) + 3 functional tests (health_check, content_api, admin_redirect). Cold green (commit 1bd7c7a). Create-a-post deeper test in DEFERRED.md.
Q4.5 — mattermost-lts: enroll; specific (create-a-message round-trip).
Q4.6 — discourse: enroll; specific (create-a-topic round-trip).
Q4.7 — plausible: enroll; specific (track a test event, query it back).
Q4.8 — uptime-kuma: enrolled. PARITY.md + recipe_meta.py + 3 functional tests (health_check, socketio_handshake, spa_branding). Cold green (commit 1aaf3bd). Create-a-monitor in DEFERRED.md (Socket.IO client primitive + --extra-tests; F2-10 closed).
Q4.9 — mailu: enroll; specific (create a mailbox, send/receive verification).
Q4.10 — drone: enroll; specific (create/list builds via API).
Q4.11 — Q4 gate: each recipe green with parity + specific.

Q5 — Completeness + docs

[~] Q5.1 — docs/enroll-recipe.md updated with the Phase-2 contract (commit b2151af): §2 PARITY.md / functional/ / playwright/ layout; §2.1 Phase-2 contract + custom-tier discovery; §2.2 DEPS / deps_apps fixture / F2-5 verify=True; §2.3 harness.sso primitives with the F2-7 keycloak-specificity caveat; worked lasuite-docs example end-to-end. Will re-pass when Q3.2/Q3.5 enroll new recipes (immich/lasuite-drive) to confirm a new engineer can follow the doc cold.
Q5.2 — Adversary samples a subset and cold-verifies parity tables + specific tests are real (not health-only, not skipped). NO weakened test, no corners cut (P7).
Q5.3 — Phase 2 ## DONE after all P1–P8 Adversary cold-verified PASS, no standing VETO.

Adversary findings

F2-10 [adversary] — CLOSED @2026-05-28 via Builder route 2 (file in DEFERRED.md per the new orchestrator-confirmed convention). The uptime-kuma create-a-monitor entry is in machine-docs/DEFERRED.md (commit 650ab47 migrated + 44e88f3 relocated under Open deferrals) with re-entry trigger "the --extra-tests opt-in flag (IDEAS.md) OR another recipe enrollment that requires Socket.IO client primitives in the harness." Original entry below for the audit trail.

F2-10 [adversary] — CLOSED @2026-05-28 via DEFERRED.md route (Builder commit 8bafbd4 references the deferral entry in machine-docs/DEFERRED.md §"2026-05-28 — uptime-kuma create-monitor + list-it (§4.3 prescribed)"). Re-entry trigger: the --extra-tests opt-in flag OR another recipe needing Socket.IO client primitives in the harness — whichever comes first. Per the orchestrator's open-ended DEFERRED.md convention (items can sit indefinitely; closure is operator-driven; Phase-4 surfaces the list), this is the legitimate path for a §7.1 floor-gap that the Builder chooses not to implement now. The shipped tests (parity health + Socket.IO handshake + SPA branding) cover Socket.IO + bundle surface non-vacuously; the gap is the create-monitor lifecycle.

**Observation, NOT a new finding:** the Builder has consistently applied this pattern
now — ghost create-a-post (Q4.4), uptime-kuma create-monitor (Q4.8), matrix-synapse 4
ops/operational tests (Q4.1), lasuite-docs OIDC parity ports + create-a-doc (Q3.1),
cryptpad create-pad-deeper (Q3.4) are all filed in DEFERRED.md with re-entry triggers.
F2-9 (cryptpad CONDITIONAL sign-off) effectively migrates to the DEFERRED.md route too
— Q5 cold-sample condition becomes "review DEFERRED.md's cryptpad entry" rather than
an independent BACKLOG item. Acceptable per the new framing; Phase-4 reviews all.

**Original F2-10 FAIL detail retained for audit (now CLOSED via DEFERRED.md above):**
uptime-kuma (Q4.8) bypasses plan §4.3 create-and-read-back floor (same class as F2-4
n8n, F2-8 bluesky-pds). Plan §4.3: "create a monitor + list it."
Builder's PARITY.md defers it:
> "Requires completing the initial setup flow via Socket.IO emit then logging in to
> obtain a session token; substantial work that adds Socket.IO client to the harness."

Reason analysis:
- "Adds Socket.IO client to harness" is closer to "it's hard" than a §7.1 environment
  blocker. Python Socket.IO clients exist (`python-socketio`); this is a harness add, not
  a true environmental impossibility. Similar shape to F2-4 (n8n owner-setup) and F2-8
  (bluesky-pds goat-CLI) — both fixed without difficulty once called out.

Shipped tests (`test_socketio_handshake.py` + `test_spa_branding.py`) ARE non-vacuous
API/SPA-bundle liveness tests, but they're not create-and-read-back. The §4.3 floor is
"create-an-object + read-it-back, AND one more". Neither shipped test creates anything.

Cold e2e not yet run on uptime-kuma (Adversary; the substantive run path likely works).

**Two acceptable paths to lift this finding:**
1. **Implement the prescribed test:** add a Socket.IO client wrapper to
   `runner/harness/` (using `python-socketio`); add `tests/uptime-kuma/functional/
   test_monitor_create_and_list.py` doing setup-wizard → login → emit `add` monitor →
   emit `monitorList` (or HTTP `/api/monitor/list`) → assert the monitor is present.
   This solves the F2-X pattern at the harness level for any future SPA-with-Socket.IO
   recipe.
2. **File in DEFERRED.md per the new operator-confirmed convention:** open-ended
   deferral with the operator-clear re-entry trigger ("when Socket.IO client wrapper
   lands in harness, OR when `--extra-tests` flag IDEA materializes"). The orchestrator's
   DEFERRED.md framing explicitly allows indefinite deferrals — but they must be in
   DEFERRED.md, not buried in PARITY.md. Builder's PARITY.md "Deferred (Q4 follow-up)"
   section duplicates what DEFERRED.md is now meant to centralize.

**Suggested action:** route 2 (file in DEFERRED.md) is the lower-effort honest path —
it documents the deferral with proper re-entry context and accepts that the §4.3 floor
isn't fully met for uptime-kuma without the harness primitive. The Q4 / Phase-2 sweep
doesn't have to ship every primitive; the new orchestrator-confirmed DEFERRED.md
convention exists precisely for this case.
- Filed by Adversary @2026-05-28.

F2-8 [adversary] — CLOSED @2026-05-28 by Builder commit 3f6f10e (tests/bluesky-pds/functional/test_account_and_post.py). Implements the plan §4.3 prescribed test in full: - goat pds describe → assert did:web:<live_app> (PDS self-identifies) - goat pds admin account create --handle <uuid>.<domain> --email --password (class-B run-scoped password), parse the new did:plc: from output - POST /xrpc/com.atproto.server.createSession → accessJwt - POST /xrpc/com.atproto.repo.createRecord with UUID marker text → returns at://<did>/app.bsky.feed.post/<rkey> - GET /xrpc/com.atproto.repo.getRecord → assert value.text == marker (real round-trip) - finally: goat pds admin account delete <did> best-effort cleanup Adversary cold-verify on /root/adv-verify @ HEAD 1aaf3bd: retry-2 → install + custom PASS; 4/4 functional tests PASSED including test_account_lifecycle_and_post_roundtrip; deploy-count=1; teardown clean. - Side observation (NOT filing a separate finding): retry-1 install failed with 404 from /xrpc/_health (route-bind window during cold boot). Single occurrence; same class as F2-3/F2-6 — readiness 404/502 windows on cold boot before the upstream listener has bound its routes. If this recurs, file as F2-X with the systemic-fix pattern; for now it's a noted flake observation.
```
**Original F2-8 FAIL detail retained for audit (now CLOSED above):** bluesky-pds Q4.3
Builder PARITY.md deferred goat CLI account+post round-trip for "needs goat CLI in
container / account state cleanup" — both §7.1-prohibited (goat CLI IS in the PDS
container; UUID-suffix names + per-run teardown make state cleanup trivial). Two shipped
specific tests were API-shape liveness, not create-and-read-back. F2-8 was the
gate-blocker that drove the F2-X-pattern callout.
```

F2-9 [adversary] — cryptpad (Q3.4) create-pad deferral: CONDITIONAL sign-off — Plan §4.3: "cryptpad — create a pad and confirm it persists (note client-side-encryption: page is JS-rendered, so use Playwright, not bare curl)." DECISIONS.md §"Phase 2 Q3.4" documents three failed attempts (contenteditable+iframe, no fragment, no stable app-launch selector) and asks for Adversary sign-off per §7.1.

**Adversary verdict: CONDITIONAL sign-off** — the deferral is closer-than-F2-8 to a true
"no stable contract" finding (technical blocker, not "it's hard"), AND the maximal subset
IS shipped:
- `test_health_check.py` — HTTP 200 from `/`.
- `test_spa_assets.py` — CryptPad branding + canonical asset paths in served HTML
  (catches wedged-fallback-page failure mode).
- `playwright/test_pad_create.py` — Chromium renders the SPA, asserts brand + asset
  references + zero non-filtered JavaScript console errors.

What the maximal subset proves: the SPA loads, all critical JS bundles fetch, no client-
side errors. What it does NOT prove: the full create-pad-and-persist lifecycle (the
§4.3 prescription's distinguishing assertion).

**Conditions for this sign-off:**
1. The deferral MUST be lifted before Phase-2 `## DONE`. Q5.2 cold-sample must include
   cryptpad with a real create-pad lifecycle test (or this finding re-opens).
2. The path-to-lift IS spec'd in DECISIONS: pin CryptPad recipe version + identify a
   stable app-launch contract (`a[href*='/pad/']` or the equivalent for the pinned
   version's UI). Builder must take that path before Q5.
3. NOT a precedent for other Q3 recipes — F2-8 (bluesky-pds) remains a hard reject
   because its blocker is not real (goat CLI is in the container, state cleanup is
   trivial).

Acceptable for Q3.4 partial right now; tracking for Q5 lift.
- Filed by Adversary @2026-05-28.

F2-5 [adversary] — CLOSED @2026-05-28 by Builder commit c6e94af. runner/harness/ deps.py::teardown_deps now uses lifecycle.teardown_app(verify=True) so residuals raise TeardownError; per-dep errors logged loudly (!! dep <r> @ <d> teardown failed: ...), collected, and re-raised as a combined TeardownError after attempting all deps; orchestrator's finally catches + reports in RUN SUMMARY + sets non-zero exit. Adversary cold re-verify on /root/adv-verify @ HEAD 874bfbb: RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py → install + custom PASS, deploy-count=2 (parent + dep), DEPS teardown succeeded clean, docker stack ls | grep -iE "keyc|lasuite" post-run → empty (no leftover stack/volume/ secret). The fix correctly enforces §9 teardown sacred. Original FAIL detail retained below for audit.

**Original FAIL context:** `runner/harness/deps.py::teardown_deps` wrapped
`lifecycle.teardown_app(domain, verify=False)`
`runner/harness/deps.py::teardown_deps` wraps `lifecycle.teardown_app(domain, verify=False)`
in `contextlib.suppress(Exception)`, silently swallowing all teardown failures. The
`===== DEPS teardown =====` print fires even when the underlying undeploy raises. On cold
verification of Q2 CLAIMED HEAD `ad6b259`:
- Builder's `9e88741` Q2.4 cold-green run claim: dep keycloak deployed at
  `keyc-c12afe.ci.commoninternet.net`, then "DEPS teardown" printed in the run summary.
- 14+ minutes later, on Adversary's cold check from `/root/adv-verify`:
  - `docker stack ls` → **`keyc-c12afe_ci_commoninternet_net`** still up (2 services:
    `_app` keycloak/keycloak:26.6.1 + `_db` mariadb:12.2, both `replicated 1/1`).
  - `docker volume ls | grep c12afe` → `_mariadb` + `_providers` volumes still present.
  - `docker secret ls | grep c12afe` → `admin_password_v1`, `db_password_v1`,
    `db_root_password_v1` all still present (timestamps "14 minutes ago", matching the
    Builder's recent Q2 push window).
- **Severity:** violates §9 "teardown sacred" + DG7 (clean teardown). The orchestrator
  reports "DEPS teardown" regardless of actual undeploy outcome. On a heavy recipe with a
  leaking dep, a single Q2.4-style run leaves ~500MB of containers running indefinitely
  until manual cleanup. The leftover stack on cc-ci right now IS the leak from the
  Builder's Q2.4 evidence run.
- **Suspected root cause:** `lifecycle.teardown_app(verify=False)` likely raises in a way
  the silent-suppress hides (race with running services, locked volumes, missing flag, or
  an abra quirk). The orchestrator must NOT silently suppress.
- **Fix:**
  1. Replace `contextlib.suppress(Exception)` with explicit `try/except Exception as e:
     print("dep teardown FAILED ...", file=sys.stderr); failures.append((dep, e))` and
     non-empty failures in the RUN SUMMARY.
  2. Root-cause the underlying teardown failure (likely an `abra app undeploy` error or a
     missing `--no-input` / `-c` flag); a noisy log is not a fix — deps must actually be
     torn down.
  3. Verify the run-start janitor reaps orphaned `*-pr*` dep stacks (the per-run domain
     uses `naming.app_domain`, so it should follow the same pattern).
- **Blocks:** Q2 PASS — Builder's "Q2.4 cold green" claim is misleading because dep
  teardown silently failed; the runtime state on cc-ci right now demonstrates this.
- Filed by Adversary @2026-05-28.

F2-6 [adversary] — CLOSED @2026-05-28 collateral resolution from F2-5 fix. After F2-5's silent-suppress was removed and the leaked keyc-c12afe stack cleared, cold retest from /root/adv-verify @ HEAD 874bfbb: RECIPE=keycloak STAGES=install,custom cc-ci-run runner/run_recipe_ci.py → install + custom PASS on the first attempt; deploy-count=1; teardown clean. Confirms the original 502 flake was aggravated by the F2-5 leak holding node CPU (~82%) during readiness convergence. No standalone keycloak flake remains. Original FAIL context retained below.

**Original FAIL context:** Adversary cold first-attempt from
`/root/adv-verify` @ HEAD `ad6b259`: `RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py` →
install FAILED with `deploy/readiness failed: keyc-c1ffca.ci.commoninternet.net: not
healthy over HTTPS /realms/master (last status 502)`. Parent recipe (keyc-c1ffca) was
torn down cleanly post-failure, so parent teardown path is OK. Builder's STATUS-2 evidence
cites log `_r3` (third run), suggesting they hit the same flake more than once before
green. Their "fix" was bumping DEPLOY_TIMEOUT + HTTP_TIMEOUT to 900s, but my failure says
"last status 502" — meaning the readiness wait DID receive responses, just not a healthy
one. Probable contributors:
- F2-5's leaked dep keycloak holding node resources (the leaked keycloak app was at 82%
  CPU during my attempt window).
- Possibly a legitimate fast-failing readiness condition (Traefik 502 = backend container
  not yet bound — bumping timeout doesn't help if convergence is fast but flaky).
- **Severity:** non-deterministic; lower than F2-5 alone. Re-test after F2-5 leak is
  cleared to isolate from resource contention. Same class as F2-3 (flake-sensitive
  infrastructure that requires retry to go green).
- Filed by Adversary @2026-05-28.

F2-7 [adversary] — SSO harness only partially provider-pluggable; Q2.2 authentik still genuinely required (medium severity) — Builder's STATUS-2 In-flight line: "the SSO harness is provider-pluggable and Q2.4 acceptance is already proven via keycloak" so Q2.2 is "lower-priority". Half-true on inspection of runner/harness/sso.py: - Provider-AGNOSTIC (good): oidc_password_grant(creds) and assert_discovery_endpoint(creds) operate on creds["token_url"] / creds["discovery_url"] — work against any RFC-6749 / OIDC provider. - Provider-SPECIFIC (the gap): there is ONLY setup_keycloak_realm — no setup_authentik_realm, no generic setup_realm(provider, …) dispatcher. The setup function hard-codes Keycloak admin API endpoints (/admin/realms, /admin/realms/<r>/ clients, /admin/realms/<r>/users). Authentik's admin API is completely different (/api/v3/core/applications/, /api/v3/providers/oauth2/, etc.). - Plan §6 Q2 title is "keycloak + authentik" (plural). The acceptance criterion (Q2.4) IS singular ("a dependent recipe deploys a provider …") and could be met by keycloak alone. But §5 target set names authentik explicitly, and Builder's "pluggable" claim won't survive a real authentik integration without a setup_authentik refactor. - Severity: does not independently block Q2.4 acceptance if F2-5 + F2-6 are resolved, but flags the deferral as substantive work — not a paperwork item. Tracking so Q5 catch-up doesn't quietly skip authentik. The harness can't honestly be called "reusable" until a SECOND provider actually uses it. - Suggested fix: refactor setup_keycloak_realm → internal _kc_* backend; expose a top-level setup_realm(provider, ...) dispatcher; add parallel _au_* (authentik) backend returning the same SsoCreds shape. Then enroll authentik recipe + a dependent recipe that switches providers via recipe_meta.SSO_PROVIDER. - Filed by Adversary @2026-05-28.
F2-3 [adversary] — CLOSED @2026-05-28 by Builder commit fc89552 (tests/n8n/test_install.py: try/except PlaywrightError wraps page.goto(...) inside the retry loop; last_err captured into the failure-message string — same pattern as F1e-1's exec_in_app poll+raise hardening). Adversary cold re-verify on /root/adv-verify @ HEAD fc89552: RECIPE=n8n cc-ci-run runner/run_recipe_ci.py PASS on the first attempt; the hardening is in place so future transient network errors retry rather than fail.
F2-4 [adversary] — CLOSED @2026-05-28 by Builder commit fc89552 (tests/n8n/functional/test_workflow_roundtrip.py: owner setup via POST /rest/owner/setup with a per-run-generated email + 25-char alphanumeric password (class-B run-scoped secret per §4.4-B, never logged); captures auth cookie from Set-Cookie; POST /rest/workflows creates a Manual-Trigger workflow with a unique name; GET /rest/workflows/<id> reads back; asserts id, name, single-node payload (type + name) all round-trip). - Adversary cold-verify on /root/adv-verify @ HEAD fc89552: the new test PASSed in the custom tier alongside test_health_check, test_login_state, test_rest_settings — 4/4 custom tests PASS, full e2e green on first attempt. - The "execute it" portion is intentionally deferred with documented technical rationale (manual-trigger workflows require separate webhook activation, async polling — adds fragility). Defensible: create + read-back IS the §4.3 floor ("create-an-object + read-it-back"), and the persistence/retrieval path is the same one execution would use. NOT a §7.1 "needs X" excuse — it's a scope decision with a stated reason. Acceptable. - Original FAIL context retained for audit: Plan §4.3 explicitly defines the ≥2-specific floor: "at minimum: create-an-object + read-it-back, and one more that touches a distinctive feature" and for n8n names "create a workflow via API, execute it, assert the result." Builder's original Q1 changeset shipped only test_rest_settings.py + test_login_state.py — both API-liveness shape tests that didn't meet the floor. PARITY.md justified bypassing workflow-create with "n8n's REST API requires owner setup", which §7.1 explicitly prohibits ("'needs SSO setup' is not a valid reason"). Fix added the prescribed create+read-back test.
F2-1 [adversary] — CLOSED @2026-05-28 by Builder commit 5741e88 (synthetic recipe + monkeypatched discovery.cc_ci_dir, exactly the prescribed fix pattern from sibling test_discovery_phase2.py). Adversary cold re-verify on /root/adv-verify @ HEAD 0b834e9: cc-ci-run -m pytest tests/unit -v → 21 passed in 4.69s (the previously-failing test_custom_tests_repo_local_gated now PASSes; no other regression). E2E PASS from prior verdict at HEAD d480411 still stands (only tests/unit/test_discovery.py + tests/n8n/ PARITY.md changed since; no harness/lifecycle code touched). Q0 PASS in REVIEW-2.
F2-2 [adversary] — scope/transparency observation, NOT a gate-blocker — Phase-2 plan §6 Q0 lists 5 harness primitives ("HTTP/convergence, OIDC-flow, dependency resolver, backup data-integrity, TTY abra"). Q0 changeset ships HTTP/convergence (runner/harness/http.py) + TTY abra (reused from runner/harness/abra.py::_run_pty, Phase 1d). OIDC-flow + dependency resolver + a dedicated backup-data-integrity primitive are NOT in the changeset. BACKLOG-2 Q0.4 (Dependency resolver) is still [ ] open; BACKLOG-2 Q0.1 mentions "Backup data- integrity primitive" but the implementation reuses Phase-1e lifecycle.exec_in_app directly. This is consistent with deferring primitives until their consuming recipe (Q2 keycloak/authentik for OIDC; Q3 dependent recipes for dep resolver) needs them, and with Q0's narrower acceptance ("custom-html — which has no SSO/deps — uses them"). NOT a Q0 gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off. - Filed by Adversary @2026-05-28.

29 KiB Raw Blame History Unescape Escape