From b4a6aaea7e67ba43d9d91b4c0c79baf980a3ec13 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sat, 13 Jun 2026 05:15:05 +0000 Subject: [PATCH] =?UTF-8?q?plan:=20queue=20cf48=20=E2=80=94=20Opus=204.8?= =?UTF-8?q?=20post-cfold=20coverage-loss=20review=20(cross-check=20of=20cf?= =?UTF-8?q?55=20GPT-5.5)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Second independent review of the cfold custom-folder collapse, by Opus 4.8 instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...). Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend. --- cc-ci-plan/JOURNAL.md | 11 +++ .../plan-phase-cf48-opus-cfold-review.md | 81 +++++++++++++++++++ 2 files changed, 92 insertions(+) create mode 100644 cc-ci-plan/plan-phase-cf48-opus-cfold-review.md diff --git a/cc-ci-plan/JOURNAL.md b/cc-ci-plan/JOURNAL.md index a38c7a0..28a2d91 100644 --- a/cc-ci-plan/JOURNAL.md +++ b/cc-ci-plan/JOURNAL.md @@ -607,3 +607,14 @@ session cc-ci-orchestrator-stale can be killed; recipe-mirrors org still private a manual nudge landed as a shell command, so restarted only `cc-ci-builder` via `start_agent`; it came back on OpenCode GPT-5.4 and is actively planning the M2 sweep. Adversary accepted its nudge and wrote `WAITING-UNTIL: 2026-06-12T21:55:28Z` while awaiting Builder's formal M2 claim. + +## 2026-06-13 ~05:15 — queued cf48 (Opus 4.8 cfold review, second model) +- Operator: add a second independent post-cfold coverage-loss review by Opus 4.8 (cross-check + of the cf55 GPT-5.5 review). Plan: plan-phase-cf48-opus-cfold-review.md — same 7 review + categories as cf55, independent verdict + a cf55-vs-cf48 agreement note. +- Inserted cf48 after cf55: queue = ...cfold;cf55;cf48;pvfix;pvcheck;ghost (15 phases, idx 10=cf55). +- Per-phase model overrides: .loop-model-cf48 = .loop-model-adv-cf48 = claude-opus-4-8 (claude + backend — current .loop-backend=claude, so it'll run on opus; orchestrator must keep backend=claude + when cf48 starts since per-phase backend isn't overridable). +- Also this session: relaunched after a session restart (NOT a host reboot, 13d uptime); loops+watchdog + were stopped in a claude-sonnet limit window on cf55 → restarted via RESUME_PHASE=1 launch.py start. diff --git a/cc-ci-plan/plan-phase-cf48-opus-cfold-review.md b/cc-ci-plan/plan-phase-cf48-opus-cfold-review.md new file mode 100644 index 0000000..8f63334 --- /dev/null +++ b/cc-ci-plan/plan-phase-cf48-opus-cfold-review.md @@ -0,0 +1,81 @@ +# Phase `cf48` — Opus 4.8 post-cfold coverage-loss review + +**Mission:** run a SECOND, independent post-cfold coverage-loss review — identical in scope +to phase `cf55`, but performed by **Claude Opus 4.8** instead of GPT-5.5 — so the cfold +custom-folder collapse is cross-validated by two different models. Confirm no custom test, +assertion, fixture, screenshot hook, lifecycle overlay, or result-level behavior was lost. +Review-only: do not implement new feature work unless the review finds a concrete regression +that must be fixed before continuing. + +State files live under `machine-docs/`: `STATUS-cf48.md`, `BACKLOG-cf48.md`, +`REVIEW-cf48.md`, `JOURNAL-cf48.md`. + +## Model Requirement + +This phase must run on **Claude Opus 4.8** (`claude-opus-4-8`), on the **`claude`** backend +(NOT opencode). The orchestrator sets per-phase model override files: + +- `/srv/cc-ci/.cc-ci-logs/.loop-model-cf48 = claude-opus-4-8` +- `/srv/cc-ci/.cc-ci-logs/.loop-model-adv-cf48 = claude-opus-4-8` + +and must ensure `.loop-backend = claude` is in effect when this phase starts (per-phase +backend is not overridable in launch.py; if the loops were on opencode for cf55, the +orchestrator restarts them on the claude backend before cf48). Builder and Adversary should +record the model their session reports in their first `STATUS-cf48.md` / `REVIEW-cf48.md` +entries; if it is not Opus 4.8, stop and ask the orchestrator to fix the launcher state. + +## Inputs + +- `plan-phase-cfold-custom-folder.md` +- `machine-docs/STATUS-cfold.md`, `machine-docs/REVIEW-cfold.md` +- the final cfold implementation commits on cc-ci main +- the pre-cfold baseline matrix recorded by cfold (custom tests across all enrolled recipes) +- any cfold full-sweep evidence and artifacts +- **`machine-docs/STATUS-cf55.md` / `REVIEW-cf55.md`** — the GPT-5.5 review. Read it, but + review INDEPENDENTLY first and form your own verdict; THEN reconcile: note any place the + two models disagree (a finding one caught and the other missed is itself worth surfacing). + +## Required Review + +Same seven categories as `cf55` — review them independently, do not just confirm cf55: +1. **Diff review** — exact cfold commit range, line-by-line for coverage loss (discovery, + manifest/reporting, docs, unit tests, imports, fixtures, lifecycle overlays, screenshot + hooks, result rendering). +2. **Discovery parity** — recompute the discovered custom-test inventory after cfold vs the + pre-cfold baseline: same recipes, same custom-test count, same assertions, same helper + coverage, only folder paths changed `functional/`|`playwright/` → `custom/`. +3. **Assertion preservation** — no weakened/removed assertions, skipped tests, relaxed waits, + or renamed tests without equivalent coverage. Mechanical import/path updates are allowed. +4. **Old-folder behavior** — deprecated `functional/`/`playwright/` handled exactly as cfold + decided: no silent coverage loss for recipe-local tests; loud warnings / documented compat. +5. **Lifecycle-overlay separation** — top-level lifecycle overlays stay distinct from custom + tests and were not moved into / discovered through `custom/`. +6. **Evidence audit** — verify cfold's claimed full sweep actually ran custom tests with no + silent level drops; classify any skip/level change as expected, cfold-neutral, or blocker. +7. **Cleanliness** — no unintended root coordination files, leaked test stacks, stale temp + scripts, or uncommitted implementation files. + +## Gates + +**M1 — Opus 4.8 cold review complete.** Builder produces a review matrix in `STATUS-cf48.md` +covering every recipe and every category above, plus a short cf55-vs-cf48 agreement note. +Adversary independently verifies and records PASS/FAIL in `REVIEW-cf48.md`. + +**M2 — No-loss verdict.** Adversary either confirms `NO COVERAGE LOST` with concrete +evidence, or records specific blocking findings. If findings exist, Builder fixes only those, +evidence is refreshed, Adversary re-checks. No move to `pvfix` until M2 has a fresh PASS. + +## Guardrails + +- Review pass, not a redesign. Minimal fixes, directly tied to a found regression. +- Do not weaken or delete tests to make the review pass. +- Do not rerun a full fleet sweep unless the existing cfold evidence is incomplete or the + review finds a specific reason it must be repeated. +- Do not touch proxy/ghost work here; those are queued after this review. + +## Definition of Done + +`STATUS-cf48.md` contains the Opus 4.8 review matrix + final verdict (incl. cf55 agreement +note), `REVIEW-cf48.md` contains Adversary's independent Opus 4.8 PASS, and the verdict +explicitly states whether cfold preserved the full pre-cfold custom-test set. Builder writes +`## DONE` only after Adversary records M1 and M2 PASSes.