Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.7 KiB
cc-ci Phase 4 — Final review, polish & cleanup (capstone) (Autonomous Build Plan)
Status: QUEUED — the LAST phase, runs after Phase 3 (plan-phase3-results-ux.md). A bounded
final review/lint/cleanup pass over the entire codebase as it stands after all phases, ending in a
full cold re-verification that nothing regressed.
Transition: auto (last in the launcher sequence); after it, the whole build is done.
Builds on: everything — Phase 1 + 1c + 1b + 1d + 1e + 2 + 2b + 3 (flake/nix/ modules, the
runner/harness + generic suite + recipe overlays, the comment-bridge, Drone, the dashboard/results
UX, docs, machine-docs/).
Owner agents: same Builder + Adversary loops (plan.md §6/§7); Adversary cold-verifies.
This file's path: /srv/cc-ci/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md
Phase order: 1c → 1b → 1d → 1e → 2 → 2b → 3 → 4 (final).
0. Why this phase (and why it's bounded)
This is the analogue of Phase 1b, but final and whole-codebase. By now the tree has grown a lot — recipe overlays (Phase 2), performance changes (2b), and results/dashboard UX (3) — all layered on the foundation. Before calling the build done, do one bounded pass to clean and harden it, and — critically — re-verify from a cold start that none of the growth/cleanup regressed any earlier guarantee. Same discipline as 1b: good-enough + enforceable, style→tooling, judgment→checklist, don't reopen settled design, and never weaken a test to satisfy a nit.
1. Definition of Done (Phase 4 exit condition)
Terminates when every item holds and the Adversary has independently cold-verified (logged in
machine-docs/REVIEW-4.md):
- F1 — Lint/format green across the whole codebase. Re-run the 1b toolchain (alejandra/
statix/deadnix, ruff, shellcheck/shfmt, yamllint) over everything added since 1b (Phase-2
overlays, 2b changes, 3 UX/dashboard); extend the lint config to any new languages/areas (e.g.
dashboard front-end) so it's covered going forward. The
.drone.ymllint stage still passes from a clean checkout; prove with a break-it probe. - F2 — White-box review checklist over all post-1b code. Run the §3 checklist; fix every
blocking finding, triage advisories to
BACKLOG/IDEAS. Findings + resolutions inmachine-docs/REVIEW-4.md. - F3 — Cleanup. Remove dead code/scaffolding and stale TODOs; consistent naming/structure;
reconcile
machine-docs/(BACKLOG/IDEAS/DECISIONS current, no contradictions); docs match the final state. No behavior change beyond what F2 mandates. - F4 — FULL cold re-verification (the final gate). After F1–F3 land, the Adversary
independently re-verifies every prior Definition-of-Done from a cold start, to the same bar
each phase used — fresh PASS + evidence + timestamps in
machine-docs/REVIEW-4.mdwithin 24h, nothing weakened/skipped/softened by the cleanup: - Phase 1 D1–D10 (incl. the genuine D8 byte-identical fresh-clone rebuild + a category-spanning live!testmee2e through the public gateway). - Phase 1c C1–C7 (secrets-in-git, cert-in-sops, honest reproducibility). - Phase 1d DG1–DG8 (generic install/upgrade/backup/restore, deploy-onceDG4.1, override floor) as amended by 1e. - Phase 1e HC1–HC3 (upgrade→PR-head viadeploy --chaos; repo-local gated to approved recipes; generic-by-default + explicit opt-out). - Phase 2 recipe-coverage criteria (every enrolled recipe's overlays/ported tests real, DRY, green). - Phase 2b performance claims (the measured improvements still hold; no test weakened to get them). - Phase 3 results/level/UX criteria (per-run level honest, PR comment + dashboard correct). - F5 — Documented + cold-verified. Final
docs/accurate (install reproduces from scratch; enroll-recipe + overlay/approval flow correct); accepted deviations inDECISIONS.md; the Adversary confirms F1–F4 with no standing VETO and no open[adversary]finding.
When F1–F5 hold and are confirmed, write ## DONE to machine-docs/STATUS-4.md — the build is
complete.
2. Method
- Lint/format first (F1) — re-run + extend; auto-fix style, don't deliberate.
- Review checklist (F2, §3) — classify blocking vs advisory; fix blocking, triage rest.
- Cleanup (F3) — dead code, naming, docs,
machine-docs/reconciliation. - Full cold re-verification LAST (F4) — once everything has landed, the Adversary re-runs the entire cross-phase acceptance from cold. Order matters: tooling → review/fixes → cleanup → then full re-verify. Cleanup must regress nothing.
- Bound it — a pass, not a rewrite; record dead-ends/deviations and stop.
3. White-box review checklist (teeth, not taste) — whole codebase
Blocking unless noted (plan-relevant invariants visible only by reading code):
- Tests are real (blocking) — every generic/overlay/custom test asserts actual app state; no
skip/xfail/can't-fail; per-oppass/fail/skiphonest; the 1d/1e anti-vacuous guards (assert_servingrouting proof,do_upgrade"moved", deploy-count==1) intact. - 1e corrections intact (blocking) — repo-local code still gated to approved recipes; generic still runs by default (opt-out explicit); upgrade still targets the PR head.
- Generic-first / custom-additive invariant (blocking —
docs/testing.md). Confirm no path makes the generic tier depend on custom: deps deploy +setup_custom_testsrun after all generic tiers, never before; a forcedsetup_custom_testsfailure still yields a clean generic-tierpass/pass/pass/pass+skip(deps-not-ready)for@requires_depscustom tests (re-exercise the forced-failure case). Future maintainers must be able to operate cc-ci with the generic tier alone — verify that path stays viable. - Harness DRY (blocking-ish) — recipe quirks are data (
recipe_meta.py), not shared-harness conditionals; overlays are thin; no per-recipe copy-paste of lifecycle logic. - Server state Nix-declared & idempotent (blocking) — no imperative drift / run-once sentinels /
manual post-rebuild steps; the
nix/layout clean. - No footguns (blocking) — no bare
sleepfor readiness (poll); teardown infinally; secrets reused per run not regenerated; no hardcoded versions/domains that break upstream. - No secrets in code/committed files (blocking) — grep source/configs/
.drone.yml/fixtures; log/dashboard redaction real (incl. any new Phase-3 UX surface that echoes run data). - Phase-3 UX correctness (advisory→blocking on real drift) — the displayed level/badge/screenshot reflect the true per-op results; no misleading "pass".
- Architecture matches the plans; deviations in
DECISIONS.md(advisory→blocking on real drift). - Readability & docs (advisory) — clear names, dead code removed, docs reproduce from scratch.
4. Guardrails
- Never weaken a test to satisfy a lint/review/cleanup nit (cardinal rule wins).
- Don't reopen settled design — clean + harden + re-verify; bigger ideas →
IDEAS.md. - Bounded — one pass; cap iterations; record + stop.
- Cleanup regresses nothing — F4 is the proof; if a cleanup breaks a prior guarantee, revert the cleanup, not the guarantee.
5. Open decisions (log in machine-docs/DECISIONS.md)
- Any new linters/formatters for Phase-3 front-end / new areas, and their strictness.
- The precise blocking-vs-advisory split for the §3 checklist on the new code.
- Whether to add Python type-checking now or defer to
IDEAS.md(carried from 1b).