# cc-ci Phase 4 — Final review, polish & cleanup (capstone) (Autonomous Build Plan) **Status:** QUEUED — the **LAST** phase, runs after Phase 3 (`plan-phase3-results-ux.md`). A bounded final review/lint/cleanup pass over the **entire** codebase as it stands after all phases, ending in a **full cold re-verification that nothing regressed**. **Transition:** auto (last in the launcher sequence); after it, the whole build is done. **Builds on:** everything — Phase 1 + 1c + 1b + 1d + 1e + 2 + 2b + 3 (flake/`nix/` modules, the runner/harness + generic suite + recipe overlays, the comment-bridge, Drone, the dashboard/results UX, docs, `machine-docs/`). **Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies. **This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md` **Phase order:** 1c → 1b → 1d → 1e → 2 → 2b → 3 → **4 (final)**. --- ## 0. Why this phase (and why it's bounded) This is the analogue of Phase 1b, but final and whole-codebase. By now the tree has grown a lot — recipe overlays (Phase 2), performance changes (2b), and results/dashboard UX (3) — all layered on the foundation. Before calling the build done, do one **bounded** pass to clean and harden it, and — critically — **re-verify from a cold start that none of the growth/cleanup regressed any earlier guarantee.** Same discipline as 1b: **good-enough + enforceable**, style→tooling, judgment→checklist, don't reopen settled design, and **never weaken a test** to satisfy a nit. --- ## 1. Definition of Done (Phase 4 exit condition) Terminates when every item holds **and the Adversary has independently cold-verified** (logged in `machine-docs/REVIEW-4.md`): - [ ] **F1 — Lint/format green across the whole codebase.** Re-run the 1b toolchain (alejandra/ statix/deadnix, ruff, shellcheck/shfmt, yamllint) over everything added since 1b (Phase-2 overlays, 2b changes, 3 UX/dashboard); extend the lint config to any new languages/areas (e.g. dashboard front-end) so it's covered going forward. The `.drone.yml` lint stage still passes from a clean checkout; prove with a break-it probe. - [ ] **F2 — White-box review checklist over all post-1b code.** Run the §3 checklist; fix every **blocking** finding, triage advisories to `BACKLOG`/`IDEAS`. Findings + resolutions in `machine-docs/REVIEW-4.md`. - [ ] **F3 — Cleanup.** Remove dead code/scaffolding and stale TODOs; consistent naming/structure; reconcile `machine-docs/` (BACKLOG/IDEAS/DECISIONS current, no contradictions); docs match the final state. No behavior change beyond what F2 mandates. - [ ] **F4 — FULL cold re-verification (the final gate).** *After* F1–F3 land, the Adversary **independently re-verifies every prior Definition-of-Done from a cold start**, to the same bar each phase used — fresh PASS + evidence + timestamps in `machine-docs/REVIEW-4.md` within 24h, **nothing weakened/skipped/softened** by the cleanup: - **Phase 1 D1–D10** (incl. the genuine **D8** byte-identical fresh-clone rebuild + a category-spanning live `!testme` e2e through the public gateway). - **Phase 1c C1–C7** (secrets-in-git, cert-in-sops, honest reproducibility). - **Phase 1d DG1–DG8** (generic install/upgrade/backup/restore, deploy-once `DG4.1`, override floor) **as amended by 1e**. - **Phase 1e HC1–HC3** (upgrade→PR-head via `deploy --chaos`; repo-local gated to approved recipes; generic-by-default + explicit opt-out). - **Phase 2** recipe-coverage criteria (every enrolled recipe's overlays/ported tests real, DRY, green). - **Phase 2b** performance claims (the measured improvements still hold; no test weakened to get them). - **Phase 3** results/level/UX criteria (per-run level honest, PR comment + dashboard correct). - [ ] **F5 — Documented + cold-verified.** Final `docs/` accurate (install reproduces from scratch; enroll-recipe + overlay/approval flow correct); accepted deviations in `DECISIONS.md`; the Adversary confirms F1–F4 with no standing VETO and no open `[adversary]` finding. When F1–F5 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-4.md` — the build is complete. --- ## 2. Method 1. **Lint/format first** (F1) — re-run + extend; auto-fix style, don't deliberate. 2. **Review checklist** (F2, §3) — classify blocking vs advisory; fix blocking, triage rest. 3. **Cleanup** (F3) — dead code, naming, docs, `machine-docs/` reconciliation. 4. **Full cold re-verification LAST** (F4) — once everything has landed, the Adversary re-runs the entire cross-phase acceptance from cold. Order matters: tooling → review/fixes → cleanup → *then* full re-verify. Cleanup must regress nothing. 5. **Bound it** — a pass, not a rewrite; record dead-ends/deviations and stop. ## 3. White-box review checklist (teeth, not taste) — whole codebase Blocking unless noted (plan-relevant invariants visible only by reading code): - **Tests are real** (blocking) — every generic/overlay/custom test asserts actual app state; no `skip`/`xfail`/can't-fail; per-op `pass/fail/skip` honest; the 1d/1e anti-vacuous guards (`assert_serving` routing proof, `do_upgrade` "moved", deploy-count==1) intact. - **1e corrections intact** (blocking) — repo-local code still gated to approved recipes; generic still runs by default (opt-out explicit); upgrade still targets the PR head. - **Generic-first / custom-additive invariant** (blocking — `docs/testing.md`). Confirm no path makes the generic tier depend on custom: deps deploy + `setup_custom_tests` run **after** all generic tiers, never before; a forced `setup_custom_tests` failure still yields a clean generic-tier `pass/pass/pass/pass` + `skip(deps-not-ready)` for `@requires_deps` custom tests (re-exercise the forced-failure case). Future maintainers must be able to operate cc-ci with the generic tier alone — verify that path stays viable. - **Harness DRY** (blocking-ish) — recipe quirks are data (`recipe_meta.py`), not shared-harness conditionals; overlays are thin; no per-recipe copy-paste of lifecycle logic. - **Server state Nix-declared & idempotent** (blocking) — no imperative drift / run-once sentinels / manual post-rebuild steps; the `nix/` layout clean. - **No footguns** (blocking) — no bare `sleep` for readiness (poll); teardown in `finally`; secrets reused per run not regenerated; no hardcoded versions/domains that break upstream. - **No secrets in code/committed files** (blocking) — grep source/configs/`.drone.yml`/fixtures; log/dashboard redaction real (incl. any new Phase-3 UX surface that echoes run data). - **Phase-3 UX correctness** (advisory→blocking on real drift) — the displayed level/badge/screenshot reflect the true per-op results; no misleading "pass". - **Architecture matches the plans; deviations in `DECISIONS.md`** (advisory→blocking on real drift). - **Readability & docs** (advisory) — clear names, dead code removed, docs reproduce from scratch. ## 4. Guardrails - **Never weaken a test** to satisfy a lint/review/cleanup nit (cardinal rule wins). - **Don't reopen settled design** — clean + harden + re-verify; bigger ideas → `IDEAS.md`. - **Bounded** — one pass; cap iterations; record + stop. - **Cleanup regresses nothing** — F4 is the proof; if a cleanup breaks a prior guarantee, revert the cleanup, not the guarantee. ## 5. Open decisions (log in machine-docs/DECISIONS.md) - Any new linters/formatters for Phase-3 front-end / new areas, and their strictness. - The precise blocking-vs-advisory split for the §3 checklist on the new code. - Whether to add Python type-checking now or defer to `IDEAS.md` (carried from 1b).