Files

autonomic-bot b4a6aaea7e plan: queue cf48 — Opus 4.8 post-cfold coverage-loss review (cross-check of cf55 GPT-5.5)

Second independent review of the cfold custom-folder collapse, by Opus 4.8
instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...).
Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend.

2026-06-13 05:15:05 +00:00

4.6 KiB

Raw Blame History

Phase `cf48` — Opus 4.8 post-cfold coverage-loss review

Mission: run a SECOND, independent post-cfold coverage-loss review — identical in scope to phase cf55, but performed by Claude Opus 4.8 instead of GPT-5.5 — so the cfold custom-folder collapse is cross-validated by two different models. Confirm no custom test, assertion, fixture, screenshot hook, lifecycle overlay, or result-level behavior was lost. Review-only: do not implement new feature work unless the review finds a concrete regression that must be fixed before continuing.

State files live under machine-docs/: STATUS-cf48.md, BACKLOG-cf48.md, REVIEW-cf48.md, JOURNAL-cf48.md.

Model Requirement

This phase must run on Claude Opus 4.8 (claude-opus-4-8), on the claude backend (NOT opencode). The orchestrator sets per-phase model override files:

/srv/cc-ci/.cc-ci-logs/.loop-model-cf48 = claude-opus-4-8
/srv/cc-ci/.cc-ci-logs/.loop-model-adv-cf48 = claude-opus-4-8

and must ensure .loop-backend = claude is in effect when this phase starts (per-phase backend is not overridable in launch.py; if the loops were on opencode for cf55, the orchestrator restarts them on the claude backend before cf48). Builder and Adversary should record the model their session reports in their first STATUS-cf48.md / REVIEW-cf48.md entries; if it is not Opus 4.8, stop and ask the orchestrator to fix the launcher state.

Inputs

plan-phase-cfold-custom-folder.md
machine-docs/STATUS-cfold.md, machine-docs/REVIEW-cfold.md
the final cfold implementation commits on cc-ci main
the pre-cfold baseline matrix recorded by cfold (custom tests across all enrolled recipes)
any cfold full-sweep evidence and artifacts
machine-docs/STATUS-cf55.md / REVIEW-cf55.md — the GPT-5.5 review. Read it, but review INDEPENDENTLY first and form your own verdict; THEN reconcile: note any place the two models disagree (a finding one caught and the other missed is itself worth surfacing).

Required Review

Same seven categories as cf55 — review them independently, do not just confirm cf55:

Diff review — exact cfold commit range, line-by-line for coverage loss (discovery, manifest/reporting, docs, unit tests, imports, fixtures, lifecycle overlays, screenshot hooks, result rendering).
Discovery parity — recompute the discovered custom-test inventory after cfold vs the pre-cfold baseline: same recipes, same custom-test count, same assertions, same helper coverage, only folder paths changed functional/|playwright/ → custom/.
Assertion preservation — no weakened/removed assertions, skipped tests, relaxed waits, or renamed tests without equivalent coverage. Mechanical import/path updates are allowed.
Old-folder behavior — deprecated functional//playwright/ handled exactly as cfold decided: no silent coverage loss for recipe-local tests; loud warnings / documented compat.
Lifecycle-overlay separation — top-level lifecycle overlays stay distinct from custom tests and were not moved into / discovered through custom/.
Evidence audit — verify cfold's claimed full sweep actually ran custom tests with no silent level drops; classify any skip/level change as expected, cfold-neutral, or blocker.
Cleanliness — no unintended root coordination files, leaked test stacks, stale temp scripts, or uncommitted implementation files.

Gates

M1 — Opus 4.8 cold review complete. Builder produces a review matrix in STATUS-cf48.md covering every recipe and every category above, plus a short cf55-vs-cf48 agreement note. Adversary independently verifies and records PASS/FAIL in REVIEW-cf48.md.

M2 — No-loss verdict. Adversary either confirms NO COVERAGE LOST with concrete evidence, or records specific blocking findings. If findings exist, Builder fixes only those, evidence is refreshed, Adversary re-checks. No move to pvfix until M2 has a fresh PASS.

Guardrails

Review pass, not a redesign. Minimal fixes, directly tied to a found regression.
Do not weaken or delete tests to make the review pass.
Do not rerun a full fleet sweep unless the existing cfold evidence is incomplete or the review finds a specific reason it must be repeated.
Do not touch proxy/ghost work here; those are queued after this review.

Definition of Done

STATUS-cf48.md contains the Opus 4.8 review matrix + final verdict (incl. cf55 agreement note), REVIEW-cf48.md contains Adversary's independent Opus 4.8 PASS, and the verdict explicitly states whether cfold preserved the full pre-cfold custom-test set. Builder writes ## DONE only after Adversary records M1 and M2 PASSes.

4.6 KiB Raw Blame History

Phase cf48 — Opus 4.8 post-cfold coverage-loss review