From 85498931d136315da7f3dcfc316e5a6bcae08a69 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Fri, 12 Jun 2026 16:07:48 +0000 Subject: [PATCH] plan: add gpt55 cfold review phase --- cc-ci-plan/JOURNAL.md | 13 +++ .../plan-phase-cf55-gpt55-cfold-review.md | 82 +++++++++++++++++++ 2 files changed, 95 insertions(+) create mode 100644 cc-ci-plan/plan-phase-cf55-gpt55-cfold-review.md diff --git a/cc-ci-plan/JOURNAL.md b/cc-ci-plan/JOURNAL.md index c2d0567..841f838 100644 --- a/cc-ci-plan/JOURNAL.md +++ b/cc-ci-plan/JOURNAL.md @@ -578,3 +578,16 @@ session cc-ci-orchestrator-stale can be killed; recipe-mirrors org still private - Verified live tmux mapping: `cc-ci-builder`, `cc-ci-adv`, and `cc-ci-orchestrator` are all `opencode`; `cc-ci-watchdog` is running. The watchdog will hourly-wake `cc-ci-orchestrator` via the existing `ORCH_WAKE_INTERVAL=3600` path and will apply the existing limit-window nudge/restart handling. + +## 2026-06-12 ~16:05 — queued GPT-5.5 post-cfold no-loss review +- Operator requested one additional pass at the end of `cfold`: GPT-5.5 reviews the cfold + implementation and confirms no custom test/coverage/assertion/fixture behavior was lost. +- Added `plan-phase-cf55-gpt55-cfold-review.md` and inserted `cf55` immediately after `cfold`, before + `pvfix`. Persisted queue is now + `rcust;shot;lvl5;bsky;dstamp;mailu;kuma;mailu;drone;cfold;cf55;pvfix;pvcheck;ghost`. +- Set per-phase model overrides in `.cc-ci-logs`: `.loop-model-cf55=openai/gpt-5.5` and + `.loop-model-adv-cf55=openai/gpt-5.5`. Current `cfold` loops stay on OpenCode GPT-5.4; when cfold + writes real `## DONE`, watchdog should auto-transition to `cf55` and start both loops on GPT-5.5. +- Bounced only `cc-ci-watchdog` so it loaded the 14-phase queue without interrupting the active builder + and adversary sessions. Verified `launch.sh status`: `cfold [10/14]`, builder/adv/watchdog RUNNING; + watched sessions are still `opencode` for builder/adv/orchestrator. diff --git a/cc-ci-plan/plan-phase-cf55-gpt55-cfold-review.md b/cc-ci-plan/plan-phase-cf55-gpt55-cfold-review.md new file mode 100644 index 0000000..aa8c4aa --- /dev/null +++ b/cc-ci-plan/plan-phase-cf55-gpt55-cfold-review.md @@ -0,0 +1,82 @@ +# Phase `cf55` — GPT-5.5 post-cfold coverage-loss review + +**Mission:** after phase `cfold` finishes, run one independent GPT-5.5 review pass over +the custom-folder collapse implementation and confirm that no custom test, assertion, +fixture, screenshot hook, lifecycle overlay, or result-level behavior was lost. This is a +review-only phase: do not implement new feature work unless the review finds a concrete +regression that must be fixed before continuing. + +State files live under `machine-docs/`: `STATUS-cf55.md`, `BACKLOG-cf55.md`, +`REVIEW-cf55.md`, `JOURNAL-cf55.md`. + +## Model Requirement + +This phase must run on **GPT-5.5**. The orchestrator sets per-phase model override files: + +- `/srv/cc-ci/.cc-ci-logs/.loop-model-cf55 = openai/gpt-5.5` +- `/srv/cc-ci/.cc-ci-logs/.loop-model-adv-cf55 = openai/gpt-5.5` + +Builder and Adversary should explicitly record the model shown by their OpenCode session in +their first `STATUS-cf55.md` / `REVIEW-cf55.md` entries. If the session is not GPT-5.5, +stop and ask the orchestrator to fix the launcher state before reviewing. + +## Inputs + +- `plan-phase-cfold-custom-folder.md` +- `machine-docs/STATUS-cfold.md` +- `machine-docs/REVIEW-cfold.md` +- the final cfold implementation commits on cc-ci main +- the pre-cfold baseline matrix recorded by cfold (`64` custom tests across `20` recipes) +- any cfold full-sweep evidence and artifacts + +## Required Review + +1. **Diff review.** Identify the exact cfold implementation commit range and review it + line-by-line for coverage loss. Pay special attention to discovery, manifest/reporting, + docs, unit tests, imports, fixtures, lifecycle overlays, screenshot hooks, and result + rendering. +2. **Discovery parity.** Recompute the discovered custom-test inventory after cfold and + compare it to the pre-cfold baseline. The expected result is the same logical test set: + same recipes, same custom-test count, same assertions, same helper coverage, only folder + paths changed from `functional/` or `playwright/` to `custom/`. +3. **Assertion preservation.** Check that test bodies were not weakened during the move. + Mechanical import/path updates are allowed; removed assertions, skipped tests, relaxed + waits, or renamed tests without equivalent coverage are findings. +4. **Old-folder behavior.** Confirm the intended behavior for deprecated + `functional/`/`playwright/` folders is implemented exactly as cfold decided: no silent + coverage loss for recipe-local tests, and loud warnings or documented compatibility as + appropriate. +5. **Lifecycle-overlay separation.** Confirm top-level lifecycle overlays remain distinct + from custom tests and were not accidentally moved into or discovered through `custom/`. +6. **Evidence audit.** Review cfold M1/M2 evidence. If cfold claims a full recipe sweep, + verify custom tests actually ran and levels did not silently drop. If any recipe was + skipped or changed level, classify it as expected, cfold-neutral, or a blocker. +7. **Cleanliness.** Confirm no unintended root coordination files, leaked test stacks, + stale temporary scripts, or uncommitted implementation files remain. + +## Gates + +**M1 — GPT-5.5 cold review complete.** Builder produces a review matrix in +`STATUS-cf55.md` covering every recipe and every required review category above. +Adversary independently verifies the matrix and records PASS/FAIL in `REVIEW-cf55.md`. + +**M2 — No-loss verdict.** Adversary either confirms `NO COVERAGE LOST` with concrete +evidence, or records specific blocking findings. If findings exist, Builder fixes only +those findings, cfold/cf55 evidence is refreshed, and Adversary re-checks. No move to +`pvfix` until M2 has a fresh PASS. + +## Guardrails + +- This is a review pass, not a redesign. Keep fixes minimal and directly tied to a found + regression. +- Do not weaken or delete tests to make the review pass. +- Do not rerun a full fleet sweep unless the existing cfold evidence is incomplete or the + review finds a specific reason the sweep must be repeated. +- Do not touch proxy/ghost work in this phase; those are queued after this review. + +## Definition of Done + +`STATUS-cf55.md` contains the GPT-5.5 review matrix and final verdict, +`REVIEW-cf55.md` contains Adversary's independent GPT-5.5 PASS, and the verdict explicitly +says whether cfold preserved the full pre-cfold custom-test set. Builder writes `## DONE` +only after Adversary records M1 and M2 PASSes.