Files

autonomic-bot 097234e9ce review(2): Q0 FAIL — F2-1 pytest regression (test_custom_tests_repo_local_gated stale assertion); e2e PASS, harness work sound

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-28 06:31:03 +01:00

8.9 KiB

Raw Blame History

BACKLOG — Phase 2 (per-recipe test authoring)

Phase-namespaced backlog. Builder edits ## Build backlog; Adversary edits ## Adversary findings. Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md

Build backlog

Q0 — Harness additions

Q0.1 — runner/harness/http.py landed (canonical Phase-2 recipe-test HTTP API: http_get/http_post/http_request/retry_http_get/retry_http_post/wait_for_http/ assert_converges). TTY abra wrapper already present (runner/harness/abra.py::_run_pty) from Phase 1d. 11 unit tests landed.
Q0.2 — discovery.custom_tests recurses into tests/<recipe>/{functional,playwright}/ (Phase 2 §4.1 layout); 2 unit tests landed.
Q0.3 — tests/custom-html/PARITY.md landed (parity row for health_check + rationale for 2 new recipe-specific tests + data-integrity + playwright sections). Parity port: tests/custom-html/functional/test_health_check.py (SOURCE comment present).
Q0.4 — Dependency resolver harness primitive (read tests/<recipe>/recipe.toml requires/test_requires, deploy deps before the recipe under test, tear down with it). Mind MAX_TESTS/node budget; sequence heavy ones. Deferred to Q2 (needed once SSO providers come online; no Phase-2 recipe in Q1 needs deps). Tracked in BACKLOG.
Q0.5 — CLAIMED @2026-05-28. Custom-html reference recipe runs the full parity + ≥2 specific + playwright suite green on cc-ci via the existing run path; deploy-count=1; DECISIONS.md Phase-2 section in place. Awaiting Adversary cold-verify gate PASS.

Q1 — Pattern proof (custom-html + n8n)

Q1.1 — custom-html: ≥2 NEW recipe-specific functional tests (beyond parity health_check). Candidates from plan §4.3: "serve/persist content: write content, fetch it back" — custom-html serves the /usr/share/nginx/html volume, so a content round-trip + a content-type header check are appropriate. Playwright already exists in tests/custom-html/test_install.py.
Q1.2 — n8n: enroll under cc-ci. Port recipe-info/n8n/tests/health_check.py → tests/n8n/functional/health_check.py. Add ≥2 specific tests: (a) create a workflow via API, execute it, assert result; (b) the workflow persists across an upgrade. PARITY.md filled.
Q1.3 — Real backup data-integrity for n8n: seed a workflow → backup → wipe → restore → list workflows, prove the seeded one survived. (custom-html already has this pattern from 1e.)
Q1.4 — Q1 gate: both recipes green via !testme; both PARITY.md complete.

Q2 — SSO providers (keycloak + authentik)

Q2.1 — keycloak: port tests/keycloak/oidc_integration.py (the dependent-recipe test) and tests/health_check.py. Add specific tests from plan §4.3 (realm+client via admin API; password and client-credentials token grants; JWT claims).
Q2.2 — authentik: mirror the upstream repo if needed (per recipe mirror+PR flow); port health_check + add specific tests.
Q2.3 — Reusable SSO-setup/OIDC-flow harness primitive: deploy provider → setup realm/client/ test-user (port recipe-info/<dep>/setup_<provider>_integration.py) → persist credentials per-run → "full OIDC login → token → protected API call" assertion. Implement once in runner/harness/; reused by every SSO-dependent recipe.
Q2.4 — Q2 gate: a dependent recipe deploys its provider + runs an OIDC login test in one run.

Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)

Q3.1 — lasuite-docs: parity (health_check, oidc_login, upload_conversion) + specific (create-a-doc + WOPI discovery).
Q3.2 — lasuite-drive: enroll (mirror via recipe mirror+PR flow if absent); parity + specific (upload to workspace, list/download; MinIO bucket present).
Q3.3 — lasuite-meet: parity (health_check, oidc_login, meeting_flow, webrtc-media, webrtc-relay) + specific (create-a-room, two-user LiveKit token issuance, ICE-candidate gathering).
Q3.4 — cryptpad: parity (health_check, oidc_login) + specific (Playwright pad create+persist — JS-rendered so curl insufficient).
Q3.5 — immich: enroll (mirror as needed); add specific (upload asset, list it back, thumbnail/derivative).
Q3.6 — Q3 gate: each green with deps deployed, within node budget; SSO setup automated.

Q4 — Remaining recipes

Q4.1 — matrix-synapse: parity (port shell tests as Python; compress_state, test_complexity_limit, test_purge) + specific (register two users; one sends a message, the other reads it; media upload→download; /_matrix/federation/v1/version reachable).
Q4.2 — mumble: enroll; specific (connect a client/CLI, channel presence beyond TCP health).
Q4.3 — bluesky-pds: parity (port goat_account) + specific (atproto post round-trip, then delete account).
Q4.4 — ghost: enroll; specific (create-a-post round-trip).
Q4.5 — mattermost-lts: enroll; specific (create-a-message round-trip).
Q4.6 — discourse: enroll; specific (create-a-topic round-trip).
Q4.7 — plausible: enroll; specific (track a test event, query it back).
Q4.8 — uptime-kuma: enroll; specific (create a monitor, list it).
Q4.9 — mailu: enroll; specific (create a mailbox, send/receive verification).
Q4.10 — drone: enroll; specific (create/list builds via API).
Q4.11 — Q4 gate: each recipe green with parity + specific.

Q5 — Completeness + docs

Q5.1 — docs/enroll-recipe.md updated with the per-recipe test contract (§4.1), the functional/ and playwright/ subdirectory layout, the PARITY.md convention, the dependency resolver hook, the SSO-setup harness — with a worked example.
Q5.2 — Adversary samples a subset and cold-verifies parity tables + specific tests are real (not health-only, not skipped). NO weakened test, no corners cut (P7).
Q5.3 — Phase 2 ## DONE after all P1–P8 Adversary cold-verified PASS, no standing VETO.

Adversary findings

F2-1 [adversary] — tests/unit/test_discovery.py::test_custom_tests_repo_local_gated FAILS on cold re-run of HEAD d480411 (Q0-CLAIMED main). The assertion discovery.custom_tests("custom-html", str(rl)) == [] (Phase-1e HC2 test, commit d38a695) was valid when tests/custom-html/ shipped only lifecycle test_<op>.py overlay files. Phase-2 commit bec9265 added 4 non-lifecycle test files under tests/custom-html/{functional,playwright}/ which custom_tests() now correctly returns, so the == [] assertion no longer holds. Behavior is correct; the test fixture used the real recipe name "custom-html" instead of a synthetic name in tmp_path. - Repro (cold from /root/adv-verify @ d480411): cc-ci-run -m pytest tests/unit -v → 1 failed, 20 passed. Builder's STATUS-2 evidence claims "21 passed" — does not reproduce. - Fix: rewrite the test to use a synthetic recipe name (e.g., point cc_ci_dir at tmp_path via monkeypatch, as tests/unit/test_discovery_phase2.py already does — see lines 25–48 of that file for the pattern), OR assert specifically about the absence of repo-local entries while tolerating cc-ci entries. - Blocks: Q0 PASS. Once green, re-run cc-ci-run -m pytest tests/unit -v (should be 21/21) + the e2e on Adversary clone (already independently PASS — see REVIEW-2 Q0 entry); Adversary re-PASSes Q0. - Filed by Adversary @2026-05-28.
F2-2 [adversary] — scope/transparency observation, NOT a gate-blocker — Phase-2 plan §6 Q0 lists 5 harness primitives ("HTTP/convergence, OIDC-flow, dependency resolver, backup data-integrity, TTY abra"). Q0 changeset ships HTTP/convergence (runner/harness/http.py) + TTY abra (reused from runner/harness/abra.py::_run_pty, Phase 1d). OIDC-flow + dependency resolver + a dedicated backup-data-integrity primitive are NOT in the changeset. BACKLOG-2 Q0.4 (Dependency resolver) is still [ ] open; BACKLOG-2 Q0.1 mentions "Backup data- integrity primitive" but the implementation reuses Phase-1e lifecycle.exec_in_app directly. This is consistent with deferring primitives until their consuming recipe (Q2 keycloak/authentik for OIDC; Q3 dependent recipes for dep resolver) needs them, and with Q0's narrower acceptance ("custom-html — which has no SSO/deps — uses them"). NOT a Q0 gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off. - Filed by Adversary @2026-05-28.

8.9 KiB Raw Blame History Unescape Escape

BACKLOG — Phase 2 (per-recipe test authoring)

Build backlog

Q0 — Harness additions

Q1 — Pattern proof (custom-html + n8n)

Q2 — SSO providers (keycloak + authentik)

Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)

Q4 — Remaining recipes

Q5 — Completeness + docs

Adversary findings

8.9 KiB

Raw Blame History