cc-ci-orchestrator/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md

# cc-ci Phase 4 — Final review, polish & cleanup (capstone) (Autonomous Build Plan)

**Status:** QUEUED — the **LAST** phase, runs after Phase 3 (`plan-phase3-results-ux.md`). A bounded
final review/lint/cleanup pass over the **entire** codebase as it stands after all phases, ending in a
**full cold re-verification that nothing regressed**.
**Transition:** auto (last in the launcher sequence); after it, the whole build is done.
**Builds on:** everything — Phase 1 + 1c + 1b + 1d + 1e + 2 + 2b + 3 (flake/`nix/` modules, the
runner/harness + generic suite + recipe overlays, the comment-bridge, Drone, the dashboard/results
UX, docs, `machine-docs/`).
**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md`
**Phase order:** 1c → 1b → 1d → 1e → 2 → 2b → 3 → **4 (final)**.

---

## 0. Why this phase (and why it's bounded)

This is the analogue of Phase 1b, but final and whole-codebase. By now the tree has grown a lot —
recipe overlays (Phase 2), performance changes (2b), and results/dashboard UX (3) — all layered on
the foundation. Before calling the build done, do one **bounded** pass to clean and harden it, and —
critically — **re-verify from a cold start that none of the growth/cleanup regressed any earlier
guarantee.** Same discipline as 1b: **good-enough + enforceable**, style→tooling, judgment→checklist,
don't reopen settled design, and **never weaken a test** to satisfy a nit.

---

## 1. Definition of Done (Phase 4 exit condition)

Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
`machine-docs/REVIEW-4.md`):

- [ ] **F1 — Lint/format green across the whole codebase.** Re-run the 1b toolchain (alejandra/
      statix/deadnix, ruff, shellcheck/shfmt, yamllint) over everything added since 1b (Phase-2
      overlays, 2b changes, 3 UX/dashboard); extend the lint config to any new languages/areas (e.g.
      dashboard front-end) so it's covered going forward. The `.drone.yml` lint stage still passes
      from a clean checkout; prove with a break-it probe.
- [ ] **F2 — White-box review checklist over all post-1b code.** Run the §3 checklist; fix every
      **blocking** finding, triage advisories to `BACKLOG`/`IDEAS`. Findings + resolutions in
      `machine-docs/REVIEW-4.md`.
- [ ] **F3 — Cleanup.** Remove dead code/scaffolding and stale TODOs; consistent naming/structure;
      reconcile `machine-docs/` (BACKLOG/IDEAS/DECISIONS current, no contradictions); docs match the
      final state. No behavior change beyond what F2 mandates.
- [ ] **F4 — FULL cold re-verification (the final gate).** *After* F1–F3 land, the Adversary
      **independently re-verifies every prior Definition-of-Done from a cold start**, to the same bar
      each phase used — fresh PASS + evidence + timestamps in `machine-docs/REVIEW-4.md` within 24h,
      **nothing weakened/skipped/softened** by the cleanup:
      - **Phase 1 D1–D10** (incl. the genuine **D8** byte-identical fresh-clone rebuild + a
        category-spanning live `!testme` e2e through the public gateway).
      - **Phase 1c C1–C7** (secrets-in-git, cert-in-sops, honest reproducibility).
      - **Phase 1d DG1–DG8** (generic install/upgrade/backup/restore, deploy-once `DG4.1`, override
        floor) **as amended by 1e**.
      - **Phase 1e HC1–HC3** (upgrade→PR-head via `deploy --chaos`; repo-local gated to approved
        recipes; generic-by-default + explicit opt-out).
      - **Phase 2** recipe-coverage criteria (every enrolled recipe's overlays/ported tests real,
        DRY, green).
      - **Phase 2b** performance claims (the measured improvements still hold; no test weakened to
        get them).
      - **Phase 3** results/level/UX criteria (per-run level honest, PR comment + dashboard correct).
- [ ] **F5 — Documented + cold-verified.** Final `docs/` accurate (install reproduces from scratch;
      enroll-recipe + overlay/approval flow correct); accepted deviations in `DECISIONS.md`; the
      Adversary confirms F1–F4 with no standing VETO and no open `[adversary]` finding.

When F1–F5 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-4.md` — the build is
complete.

---

## 2. Method
1. **Lint/format first** (F1) — re-run + extend; auto-fix style, don't deliberate.
2. **Review checklist** (F2, §3) — classify blocking vs advisory; fix blocking, triage rest.
3. **Cleanup** (F3) — dead code, naming, docs, `machine-docs/` reconciliation.
4. **Full cold re-verification LAST** (F4) — once everything has landed, the Adversary re-runs the
   entire cross-phase acceptance from cold. Order matters: tooling → review/fixes → cleanup → *then*
   full re-verify. Cleanup must regress nothing.
5. **Bound it** — a pass, not a rewrite; record dead-ends/deviations and stop.

## 3. White-box review checklist (teeth, not taste) — whole codebase
Blocking unless noted (plan-relevant invariants visible only by reading code):
- **Tests are real** (blocking) — every generic/overlay/custom test asserts actual app state; no
  `skip`/`xfail`/can't-fail; per-op `pass/fail/skip` honest; the 1d/1e anti-vacuous guards
  (`assert_serving` routing proof, `do_upgrade` "moved", deploy-count==1) intact.
- **1e corrections intact** (blocking) — repo-local code still gated to approved recipes; generic
  still runs by default (opt-out explicit); upgrade still targets the PR head.
- **Generic-first / custom-additive invariant** (blocking — `docs/testing.md`). Confirm no path
  makes the generic tier depend on custom: deps deploy + `setup_custom_tests` run **after** all
  generic tiers, never before; a forced `setup_custom_tests` failure still yields a clean
  generic-tier `pass/pass/pass/pass` + `skip(deps-not-ready)` for `@requires_deps` custom tests
  (re-exercise the forced-failure case). Future maintainers must be able to operate cc-ci with
  the generic tier alone — verify that path stays viable.
- **Harness DRY** (blocking-ish) — recipe quirks are data (`recipe_meta.py`), not shared-harness
  conditionals; overlays are thin; no per-recipe copy-paste of lifecycle logic.
- **Server state Nix-declared & idempotent** (blocking) — no imperative drift / run-once sentinels /
  manual post-rebuild steps; the `nix/` layout clean.
- **No footguns** (blocking) — no bare `sleep` for readiness (poll); teardown in `finally`; secrets
  reused per run not regenerated; no hardcoded versions/domains that break upstream.
- **No secrets in code/committed files** (blocking) — grep source/configs/`.drone.yml`/fixtures;
  log/dashboard redaction real (incl. any new Phase-3 UX surface that echoes run data).
- **Phase-3 UX correctness** (advisory→blocking on real drift) — the displayed level/badge/screenshot
  reflect the true per-op results; no misleading "pass".
- **Architecture matches the plans; deviations in `DECISIONS.md`** (advisory→blocking on real drift).
- **Readability & docs** (advisory) — clear names, dead code removed, docs reproduce from scratch.

## 4. Guardrails
- **Never weaken a test** to satisfy a lint/review/cleanup nit (cardinal rule wins).
- **Don't reopen settled design** — clean + harden + re-verify; bigger ideas → `IDEAS.md`.
- **Bounded** — one pass; cap iterations; record + stop.
- **Cleanup regresses nothing** — F4 is the proof; if a cleanup breaks a prior guarantee, revert the
  cleanup, not the guarantee.

## 5. Open decisions (log in machine-docs/DECISIONS.md)
- Any new linters/formatters for Phase-3 front-end / new areas, and their strictness.
- The precise blocking-vs-advisory split for the §3 checklist on the new code.
- Whether to add Python type-checking now or defer to `IDEAS.md` (carried from 1b).