4.3 KiB
Phase cf55 — GPT-5.5 post-cfold coverage-loss review
Mission: after phase cfold finishes, run one independent GPT-5.5 review pass over
the custom-folder collapse implementation and confirm that no custom test, assertion,
fixture, screenshot hook, lifecycle overlay, or result-level behavior was lost. This is a
review-only phase: do not implement new feature work unless the review finds a concrete
regression that must be fixed before continuing.
State files live under machine-docs/: STATUS-cf55.md, BACKLOG-cf55.md,
REVIEW-cf55.md, JOURNAL-cf55.md.
Model Requirement
This phase must run on GPT-5.5. The orchestrator sets per-phase model override files:
/srv/cc-ci/.cc-ci-logs/.loop-model-cf55 = openai/gpt-5.5/srv/cc-ci/.cc-ci-logs/.loop-model-adv-cf55 = openai/gpt-5.5
Builder and Adversary should explicitly record the model shown by their OpenCode session in
their first STATUS-cf55.md / REVIEW-cf55.md entries. If the session is not GPT-5.5,
stop and ask the orchestrator to fix the launcher state before reviewing.
Inputs
plan-phase-cfold-custom-folder.mdmachine-docs/STATUS-cfold.mdmachine-docs/REVIEW-cfold.md- the final cfold implementation commits on cc-ci main
- the pre-cfold baseline matrix recorded by cfold (
64custom tests across20recipes) - any cfold full-sweep evidence and artifacts
Required Review
- Diff review. Identify the exact cfold implementation commit range and review it line-by-line for coverage loss. Pay special attention to discovery, manifest/reporting, docs, unit tests, imports, fixtures, lifecycle overlays, screenshot hooks, and result rendering.
- Discovery parity. Recompute the discovered custom-test inventory after cfold and
compare it to the pre-cfold baseline. The expected result is the same logical test set:
same recipes, same custom-test count, same assertions, same helper coverage, only folder
paths changed from
functional/orplaywright/tocustom/. - Assertion preservation. Check that test bodies were not weakened during the move. Mechanical import/path updates are allowed; removed assertions, skipped tests, relaxed waits, or renamed tests without equivalent coverage are findings.
- Old-folder behavior. Confirm the intended behavior for deprecated
functional//playwright/folders is implemented exactly as cfold decided: no silent coverage loss for recipe-local tests, and loud warnings or documented compatibility as appropriate. - Lifecycle-overlay separation. Confirm top-level lifecycle overlays remain distinct
from custom tests and were not accidentally moved into or discovered through
custom/. - Evidence audit. Review cfold M1/M2 evidence. If cfold claims a full recipe sweep, verify custom tests actually ran and levels did not silently drop. If any recipe was skipped or changed level, classify it as expected, cfold-neutral, or a blocker.
- Cleanliness. Confirm no unintended root coordination files, leaked test stacks, stale temporary scripts, or uncommitted implementation files remain.
Gates
M1 — GPT-5.5 cold review complete. Builder produces a review matrix in
STATUS-cf55.md covering every recipe and every required review category above.
Adversary independently verifies the matrix and records PASS/FAIL in REVIEW-cf55.md.
M2 — No-loss verdict. Adversary either confirms NO COVERAGE LOST with concrete
evidence, or records specific blocking findings. If findings exist, Builder fixes only
those findings, cfold/cf55 evidence is refreshed, and Adversary re-checks. No move to
pvfix until M2 has a fresh PASS.
Guardrails
- This is a review pass, not a redesign. Keep fixes minimal and directly tied to a found regression.
- Do not weaken or delete tests to make the review pass.
- Do not rerun a full fleet sweep unless the existing cfold evidence is incomplete or the review finds a specific reason the sweep must be repeated.
- Do not touch proxy/ghost work in this phase; those are queued after this review.
Definition of Done
STATUS-cf55.md contains the GPT-5.5 review matrix and final verdict,
REVIEW-cf55.md contains Adversary's independent GPT-5.5 PASS, and the verdict explicitly
says whether cfold preserved the full pre-cfold custom-test set. Builder writes ## DONE
only after Adversary records M1 and M2 PASSes.