plan: queue cf48 — Opus 4.8 post-cfold coverage-loss review (cross-check of cf55 GPT-5.5)
Second independent review of the cfold custom-folder collapse, by Opus 4.8 instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...). Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend.
This commit is contained in:
@ -607,3 +607,14 @@ session cc-ci-orchestrator-stale can be killed; recipe-mirrors org still private
|
||||
a manual nudge landed as a shell command, so restarted only `cc-ci-builder` via `start_agent`; it came
|
||||
back on OpenCode GPT-5.4 and is actively planning the M2 sweep. Adversary accepted its nudge and wrote
|
||||
`WAITING-UNTIL: 2026-06-12T21:55:28Z` while awaiting Builder's formal M2 claim.
|
||||
|
||||
## 2026-06-13 ~05:15 — queued cf48 (Opus 4.8 cfold review, second model)
|
||||
- Operator: add a second independent post-cfold coverage-loss review by Opus 4.8 (cross-check
|
||||
of the cf55 GPT-5.5 review). Plan: plan-phase-cf48-opus-cfold-review.md — same 7 review
|
||||
categories as cf55, independent verdict + a cf55-vs-cf48 agreement note.
|
||||
- Inserted cf48 after cf55: queue = ...cfold;cf55;cf48;pvfix;pvcheck;ghost (15 phases, idx 10=cf55).
|
||||
- Per-phase model overrides: .loop-model-cf48 = .loop-model-adv-cf48 = claude-opus-4-8 (claude
|
||||
backend — current .loop-backend=claude, so it'll run on opus; orchestrator must keep backend=claude
|
||||
when cf48 starts since per-phase backend isn't overridable).
|
||||
- Also this session: relaunched after a session restart (NOT a host reboot, 13d uptime); loops+watchdog
|
||||
were stopped in a claude-sonnet limit window on cf55 → restarted via RESUME_PHASE=1 launch.py start.
|
||||
|
||||
81
cc-ci-plan/plan-phase-cf48-opus-cfold-review.md
Normal file
81
cc-ci-plan/plan-phase-cf48-opus-cfold-review.md
Normal file
@ -0,0 +1,81 @@
|
||||
# Phase `cf48` — Opus 4.8 post-cfold coverage-loss review
|
||||
|
||||
**Mission:** run a SECOND, independent post-cfold coverage-loss review — identical in scope
|
||||
to phase `cf55`, but performed by **Claude Opus 4.8** instead of GPT-5.5 — so the cfold
|
||||
custom-folder collapse is cross-validated by two different models. Confirm no custom test,
|
||||
assertion, fixture, screenshot hook, lifecycle overlay, or result-level behavior was lost.
|
||||
Review-only: do not implement new feature work unless the review finds a concrete regression
|
||||
that must be fixed before continuing.
|
||||
|
||||
State files live under `machine-docs/`: `STATUS-cf48.md`, `BACKLOG-cf48.md`,
|
||||
`REVIEW-cf48.md`, `JOURNAL-cf48.md`.
|
||||
|
||||
## Model Requirement
|
||||
|
||||
This phase must run on **Claude Opus 4.8** (`claude-opus-4-8`), on the **`claude`** backend
|
||||
(NOT opencode). The orchestrator sets per-phase model override files:
|
||||
|
||||
- `/srv/cc-ci/.cc-ci-logs/.loop-model-cf48 = claude-opus-4-8`
|
||||
- `/srv/cc-ci/.cc-ci-logs/.loop-model-adv-cf48 = claude-opus-4-8`
|
||||
|
||||
and must ensure `.loop-backend = claude` is in effect when this phase starts (per-phase
|
||||
backend is not overridable in launch.py; if the loops were on opencode for cf55, the
|
||||
orchestrator restarts them on the claude backend before cf48). Builder and Adversary should
|
||||
record the model their session reports in their first `STATUS-cf48.md` / `REVIEW-cf48.md`
|
||||
entries; if it is not Opus 4.8, stop and ask the orchestrator to fix the launcher state.
|
||||
|
||||
## Inputs
|
||||
|
||||
- `plan-phase-cfold-custom-folder.md`
|
||||
- `machine-docs/STATUS-cfold.md`, `machine-docs/REVIEW-cfold.md`
|
||||
- the final cfold implementation commits on cc-ci main
|
||||
- the pre-cfold baseline matrix recorded by cfold (custom tests across all enrolled recipes)
|
||||
- any cfold full-sweep evidence and artifacts
|
||||
- **`machine-docs/STATUS-cf55.md` / `REVIEW-cf55.md`** — the GPT-5.5 review. Read it, but
|
||||
review INDEPENDENTLY first and form your own verdict; THEN reconcile: note any place the
|
||||
two models disagree (a finding one caught and the other missed is itself worth surfacing).
|
||||
|
||||
## Required Review
|
||||
|
||||
Same seven categories as `cf55` — review them independently, do not just confirm cf55:
|
||||
1. **Diff review** — exact cfold commit range, line-by-line for coverage loss (discovery,
|
||||
manifest/reporting, docs, unit tests, imports, fixtures, lifecycle overlays, screenshot
|
||||
hooks, result rendering).
|
||||
2. **Discovery parity** — recompute the discovered custom-test inventory after cfold vs the
|
||||
pre-cfold baseline: same recipes, same custom-test count, same assertions, same helper
|
||||
coverage, only folder paths changed `functional/`|`playwright/` → `custom/`.
|
||||
3. **Assertion preservation** — no weakened/removed assertions, skipped tests, relaxed waits,
|
||||
or renamed tests without equivalent coverage. Mechanical import/path updates are allowed.
|
||||
4. **Old-folder behavior** — deprecated `functional/`/`playwright/` handled exactly as cfold
|
||||
decided: no silent coverage loss for recipe-local tests; loud warnings / documented compat.
|
||||
5. **Lifecycle-overlay separation** — top-level lifecycle overlays stay distinct from custom
|
||||
tests and were not moved into / discovered through `custom/`.
|
||||
6. **Evidence audit** — verify cfold's claimed full sweep actually ran custom tests with no
|
||||
silent level drops; classify any skip/level change as expected, cfold-neutral, or blocker.
|
||||
7. **Cleanliness** — no unintended root coordination files, leaked test stacks, stale temp
|
||||
scripts, or uncommitted implementation files.
|
||||
|
||||
## Gates
|
||||
|
||||
**M1 — Opus 4.8 cold review complete.** Builder produces a review matrix in `STATUS-cf48.md`
|
||||
covering every recipe and every category above, plus a short cf55-vs-cf48 agreement note.
|
||||
Adversary independently verifies and records PASS/FAIL in `REVIEW-cf48.md`.
|
||||
|
||||
**M2 — No-loss verdict.** Adversary either confirms `NO COVERAGE LOST` with concrete
|
||||
evidence, or records specific blocking findings. If findings exist, Builder fixes only those,
|
||||
evidence is refreshed, Adversary re-checks. No move to `pvfix` until M2 has a fresh PASS.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Review pass, not a redesign. Minimal fixes, directly tied to a found regression.
|
||||
- Do not weaken or delete tests to make the review pass.
|
||||
- Do not rerun a full fleet sweep unless the existing cfold evidence is incomplete or the
|
||||
review finds a specific reason it must be repeated.
|
||||
- Do not touch proxy/ghost work here; those are queued after this review.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
`STATUS-cf48.md` contains the Opus 4.8 review matrix + final verdict (incl. cf55 agreement
|
||||
note), `REVIEW-cf48.md` contains Adversary's independent Opus 4.8 PASS, and the verdict
|
||||
explicitly states whether cfold preserved the full pre-cfold custom-test set. Builder writes
|
||||
`## DONE` only after Adversary records M1 and M2 PASSes.
|
||||
Reference in New Issue
Block a user