83 lines
4.3 KiB
Markdown
83 lines
4.3 KiB
Markdown
# Phase `cf55` — GPT-5.5 post-cfold coverage-loss review
|
|
|
|
**Mission:** after phase `cfold` finishes, run one independent GPT-5.5 review pass over
|
|
the custom-folder collapse implementation and confirm that no custom test, assertion,
|
|
fixture, screenshot hook, lifecycle overlay, or result-level behavior was lost. This is a
|
|
review-only phase: do not implement new feature work unless the review finds a concrete
|
|
regression that must be fixed before continuing.
|
|
|
|
State files live under `machine-docs/`: `STATUS-cf55.md`, `BACKLOG-cf55.md`,
|
|
`REVIEW-cf55.md`, `JOURNAL-cf55.md`.
|
|
|
|
## Model Requirement
|
|
|
|
This phase must run on **GPT-5.5**. The orchestrator sets per-phase model override files:
|
|
|
|
- `/srv/cc-ci/.cc-ci-logs/.loop-model-cf55 = openai/gpt-5.5`
|
|
- `/srv/cc-ci/.cc-ci-logs/.loop-model-adv-cf55 = openai/gpt-5.5`
|
|
|
|
Builder and Adversary should explicitly record the model shown by their OpenCode session in
|
|
their first `STATUS-cf55.md` / `REVIEW-cf55.md` entries. If the session is not GPT-5.5,
|
|
stop and ask the orchestrator to fix the launcher state before reviewing.
|
|
|
|
## Inputs
|
|
|
|
- `plan-phase-cfold-custom-folder.md`
|
|
- `machine-docs/STATUS-cfold.md`
|
|
- `machine-docs/REVIEW-cfold.md`
|
|
- the final cfold implementation commits on cc-ci main
|
|
- the pre-cfold baseline matrix recorded by cfold (`64` custom tests across `20` recipes)
|
|
- any cfold full-sweep evidence and artifacts
|
|
|
|
## Required Review
|
|
|
|
1. **Diff review.** Identify the exact cfold implementation commit range and review it
|
|
line-by-line for coverage loss. Pay special attention to discovery, manifest/reporting,
|
|
docs, unit tests, imports, fixtures, lifecycle overlays, screenshot hooks, and result
|
|
rendering.
|
|
2. **Discovery parity.** Recompute the discovered custom-test inventory after cfold and
|
|
compare it to the pre-cfold baseline. The expected result is the same logical test set:
|
|
same recipes, same custom-test count, same assertions, same helper coverage, only folder
|
|
paths changed from `functional/` or `playwright/` to `custom/`.
|
|
3. **Assertion preservation.** Check that test bodies were not weakened during the move.
|
|
Mechanical import/path updates are allowed; removed assertions, skipped tests, relaxed
|
|
waits, or renamed tests without equivalent coverage are findings.
|
|
4. **Old-folder behavior.** Confirm the intended behavior for deprecated
|
|
`functional/`/`playwright/` folders is implemented exactly as cfold decided: no silent
|
|
coverage loss for recipe-local tests, and loud warnings or documented compatibility as
|
|
appropriate.
|
|
5. **Lifecycle-overlay separation.** Confirm top-level lifecycle overlays remain distinct
|
|
from custom tests and were not accidentally moved into or discovered through `custom/`.
|
|
6. **Evidence audit.** Review cfold M1/M2 evidence. If cfold claims a full recipe sweep,
|
|
verify custom tests actually ran and levels did not silently drop. If any recipe was
|
|
skipped or changed level, classify it as expected, cfold-neutral, or a blocker.
|
|
7. **Cleanliness.** Confirm no unintended root coordination files, leaked test stacks,
|
|
stale temporary scripts, or uncommitted implementation files remain.
|
|
|
|
## Gates
|
|
|
|
**M1 — GPT-5.5 cold review complete.** Builder produces a review matrix in
|
|
`STATUS-cf55.md` covering every recipe and every required review category above.
|
|
Adversary independently verifies the matrix and records PASS/FAIL in `REVIEW-cf55.md`.
|
|
|
|
**M2 — No-loss verdict.** Adversary either confirms `NO COVERAGE LOST` with concrete
|
|
evidence, or records specific blocking findings. If findings exist, Builder fixes only
|
|
those findings, cfold/cf55 evidence is refreshed, and Adversary re-checks. No move to
|
|
`pvfix` until M2 has a fresh PASS.
|
|
|
|
## Guardrails
|
|
|
|
- This is a review pass, not a redesign. Keep fixes minimal and directly tied to a found
|
|
regression.
|
|
- Do not weaken or delete tests to make the review pass.
|
|
- Do not rerun a full fleet sweep unless the existing cfold evidence is incomplete or the
|
|
review finds a specific reason the sweep must be repeated.
|
|
- Do not touch proxy/ghost work in this phase; those are queued after this review.
|
|
|
|
## Definition of Done
|
|
|
|
`STATUS-cf55.md` contains the GPT-5.5 review matrix and final verdict,
|
|
`REVIEW-cf55.md` contains Adversary's independent GPT-5.5 PASS, and the verdict explicitly
|
|
says whether cfold preserved the full pre-cfold custom-test set. Builder writes `## DONE`
|
|
only after Adversary records M1 and M2 PASSes.
|