Second independent review of the cfold custom-folder collapse, by Opus 4.8 instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...). Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend.
4.6 KiB
Phase cf48 — Opus 4.8 post-cfold coverage-loss review
Mission: run a SECOND, independent post-cfold coverage-loss review — identical in scope
to phase cf55, but performed by Claude Opus 4.8 instead of GPT-5.5 — so the cfold
custom-folder collapse is cross-validated by two different models. Confirm no custom test,
assertion, fixture, screenshot hook, lifecycle overlay, or result-level behavior was lost.
Review-only: do not implement new feature work unless the review finds a concrete regression
that must be fixed before continuing.
State files live under machine-docs/: STATUS-cf48.md, BACKLOG-cf48.md,
REVIEW-cf48.md, JOURNAL-cf48.md.
Model Requirement
This phase must run on Claude Opus 4.8 (claude-opus-4-8), on the claude backend
(NOT opencode). The orchestrator sets per-phase model override files:
/srv/cc-ci/.cc-ci-logs/.loop-model-cf48 = claude-opus-4-8/srv/cc-ci/.cc-ci-logs/.loop-model-adv-cf48 = claude-opus-4-8
and must ensure .loop-backend = claude is in effect when this phase starts (per-phase
backend is not overridable in launch.py; if the loops were on opencode for cf55, the
orchestrator restarts them on the claude backend before cf48). Builder and Adversary should
record the model their session reports in their first STATUS-cf48.md / REVIEW-cf48.md
entries; if it is not Opus 4.8, stop and ask the orchestrator to fix the launcher state.
Inputs
plan-phase-cfold-custom-folder.mdmachine-docs/STATUS-cfold.md,machine-docs/REVIEW-cfold.md- the final cfold implementation commits on cc-ci main
- the pre-cfold baseline matrix recorded by cfold (custom tests across all enrolled recipes)
- any cfold full-sweep evidence and artifacts
machine-docs/STATUS-cf55.md/REVIEW-cf55.md— the GPT-5.5 review. Read it, but review INDEPENDENTLY first and form your own verdict; THEN reconcile: note any place the two models disagree (a finding one caught and the other missed is itself worth surfacing).
Required Review
Same seven categories as cf55 — review them independently, do not just confirm cf55:
- Diff review — exact cfold commit range, line-by-line for coverage loss (discovery, manifest/reporting, docs, unit tests, imports, fixtures, lifecycle overlays, screenshot hooks, result rendering).
- Discovery parity — recompute the discovered custom-test inventory after cfold vs the
pre-cfold baseline: same recipes, same custom-test count, same assertions, same helper
coverage, only folder paths changed
functional/|playwright/→custom/. - Assertion preservation — no weakened/removed assertions, skipped tests, relaxed waits, or renamed tests without equivalent coverage. Mechanical import/path updates are allowed.
- Old-folder behavior — deprecated
functional//playwright/handled exactly as cfold decided: no silent coverage loss for recipe-local tests; loud warnings / documented compat. - Lifecycle-overlay separation — top-level lifecycle overlays stay distinct from custom
tests and were not moved into / discovered through
custom/. - Evidence audit — verify cfold's claimed full sweep actually ran custom tests with no silent level drops; classify any skip/level change as expected, cfold-neutral, or blocker.
- Cleanliness — no unintended root coordination files, leaked test stacks, stale temp scripts, or uncommitted implementation files.
Gates
M1 — Opus 4.8 cold review complete. Builder produces a review matrix in STATUS-cf48.md
covering every recipe and every category above, plus a short cf55-vs-cf48 agreement note.
Adversary independently verifies and records PASS/FAIL in REVIEW-cf48.md.
M2 — No-loss verdict. Adversary either confirms NO COVERAGE LOST with concrete
evidence, or records specific blocking findings. If findings exist, Builder fixes only those,
evidence is refreshed, Adversary re-checks. No move to pvfix until M2 has a fresh PASS.
Guardrails
- Review pass, not a redesign. Minimal fixes, directly tied to a found regression.
- Do not weaken or delete tests to make the review pass.
- Do not rerun a full fleet sweep unless the existing cfold evidence is incomplete or the review finds a specific reason it must be repeated.
- Do not touch proxy/ghost work here; those are queued after this review.
Definition of Done
STATUS-cf48.md contains the Opus 4.8 review matrix + final verdict (incl. cf55 agreement
note), REVIEW-cf48.md contains Adversary's independent Opus 4.8 PASS, and the verdict
explicitly states whether cfold preserved the full pre-cfold custom-test set. Builder writes
## DONE only after Adversary records M1 and M2 PASSes.