Phase `cf55` — GPT-5.5 post-cfold coverage-loss review

Mission: after phase cfold finishes, run one independent GPT-5.5 review pass over the custom-folder collapse implementation and confirm that no custom test, assertion, fixture, screenshot hook, lifecycle overlay, or result-level behavior was lost. This is a review-only phase: do not implement new feature work unless the review finds a concrete regression that must be fixed before continuing.

State files live under machine-docs/: STATUS-cf55.md, BACKLOG-cf55.md, REVIEW-cf55.md, JOURNAL-cf55.md.

Model Requirement

This phase must run on GPT-5.5. The orchestrator sets per-phase model override files:

/srv/cc-ci/.cc-ci-logs/.loop-model-cf55 = openai/gpt-5.5
/srv/cc-ci/.cc-ci-logs/.loop-model-adv-cf55 = openai/gpt-5.5

Builder and Adversary should explicitly record the model shown by their OpenCode session in their first STATUS-cf55.md / REVIEW-cf55.md entries. If the session is not GPT-5.5, stop and ask the orchestrator to fix the launcher state before reviewing.

Inputs

plan-phase-cfold-custom-folder.md
machine-docs/STATUS-cfold.md
machine-docs/REVIEW-cfold.md
the final cfold implementation commits on cc-ci main
the pre-cfold baseline matrix recorded by cfold (64 custom tests across 20 recipes)
any cfold full-sweep evidence and artifacts

Required Review

Diff review. Identify the exact cfold implementation commit range and review it line-by-line for coverage loss. Pay special attention to discovery, manifest/reporting, docs, unit tests, imports, fixtures, lifecycle overlays, screenshot hooks, and result rendering.
Discovery parity. Recompute the discovered custom-test inventory after cfold and compare it to the pre-cfold baseline. The expected result is the same logical test set: same recipes, same custom-test count, same assertions, same helper coverage, only folder paths changed from functional/ or playwright/ to custom/.
Assertion preservation. Check that test bodies were not weakened during the move. Mechanical import/path updates are allowed; removed assertions, skipped tests, relaxed waits, or renamed tests without equivalent coverage are findings.
Old-folder behavior. Confirm the intended behavior for deprecated functional//playwright/ folders is implemented exactly as cfold decided: no silent coverage loss for recipe-local tests, and loud warnings or documented compatibility as appropriate.
Lifecycle-overlay separation. Confirm top-level lifecycle overlays remain distinct from custom tests and were not accidentally moved into or discovered through custom/.
Evidence audit. Review cfold M1/M2 evidence. If cfold claims a full recipe sweep, verify custom tests actually ran and levels did not silently drop. If any recipe was skipped or changed level, classify it as expected, cfold-neutral, or a blocker.
Cleanliness. Confirm no unintended root coordination files, leaked test stacks, stale temporary scripts, or uncommitted implementation files remain.

Gates

M1 — GPT-5.5 cold review complete. Builder produces a review matrix in STATUS-cf55.md covering every recipe and every required review category above. Adversary independently verifies the matrix and records PASS/FAIL in REVIEW-cf55.md.

M2 — No-loss verdict. Adversary either confirms NO COVERAGE LOST with concrete evidence, or records specific blocking findings. If findings exist, Builder fixes only those findings, cfold/cf55 evidence is refreshed, and Adversary re-checks. No move to pvfix until M2 has a fresh PASS.

Guardrails

This is a review pass, not a redesign. Keep fixes minimal and directly tied to a found regression.
Do not weaken or delete tests to make the review pass.
Do not rerun a full fleet sweep unless the existing cfold evidence is incomplete or the review finds a specific reason the sweep must be repeated.
Do not touch proxy/ghost work in this phase; those are queued after this review.

Definition of Done

STATUS-cf55.md contains the GPT-5.5 review matrix and final verdict, REVIEW-cf55.md contains Adversary's independent GPT-5.5 PASS, and the verdict explicitly says whether cfold preserved the full pre-cfold custom-test set. Builder writes ## DONE only after Adversary records M1 and M2 PASSes.

4.3 KiB Raw Blame History

Phase cf55 — GPT-5.5 post-cfold coverage-loss review