Files

autonomic-bot 36a6c9872a orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

Reboot survival for the Pi orchestrator host:
- systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot
  records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the
  orchestrator session.
- reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count).
- launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed
  orchestrator announces itself (PushNotification) + reports reboots.
- AGENTS.md: on-startup notify routine documented.

Plans/tooling accumulated this session:
- plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review),
  sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance.
- launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution,
  limit-stall re-nudge, INBOX side-channel detection.
- plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED.
- prompts: isolation discipline + INBOX + pacing.
- .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-05-28 20:28:10 +01:00

7.7 KiB

Raw Blame History

cc-ci Phase 4 — Final review, polish & cleanup (capstone) (Autonomous Build Plan)

Status: QUEUED — the LAST phase, runs after Phase 3 (plan-phase3-results-ux.md). A bounded final review/lint/cleanup pass over the entire codebase as it stands after all phases, ending in a full cold re-verification that nothing regressed. Transition: auto (last in the launcher sequence); after it, the whole build is done. Builds on: everything — Phase 1 + 1c + 1b + 1d + 1e + 2 + 2b + 3 (flake/nix/ modules, the runner/harness + generic suite + recipe overlays, the comment-bridge, Drone, the dashboard/results UX, docs, machine-docs/). Owner agents: same Builder + Adversary loops (plan.md §6/§7); Adversary cold-verifies. This file's path: /srv/cc-ci/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md Phase order: 1c → 1b → 1d → 1e → 2 → 2b → 3 → 4 (final).

0. Why this phase (and why it's bounded)

This is the analogue of Phase 1b, but final and whole-codebase. By now the tree has grown a lot — recipe overlays (Phase 2), performance changes (2b), and results/dashboard UX (3) — all layered on the foundation. Before calling the build done, do one bounded pass to clean and harden it, and — critically — re-verify from a cold start that none of the growth/cleanup regressed any earlier guarantee. Same discipline as 1b: good-enough + enforceable, style→tooling, judgment→checklist, don't reopen settled design, and never weaken a test to satisfy a nit.

1. Definition of Done (Phase 4 exit condition)

Terminates when every item holds and the Adversary has independently cold-verified (logged in machine-docs/REVIEW-4.md):

F1 — Lint/format green across the whole codebase. Re-run the 1b toolchain (alejandra/ statix/deadnix, ruff, shellcheck/shfmt, yamllint) over everything added since 1b (Phase-2 overlays, 2b changes, 3 UX/dashboard); extend the lint config to any new languages/areas (e.g. dashboard front-end) so it's covered going forward. The .drone.yml lint stage still passes from a clean checkout; prove with a break-it probe.
F2 — White-box review checklist over all post-1b code. Run the §3 checklist; fix every blocking finding, triage advisories to BACKLOG/IDEAS. Findings + resolutions in machine-docs/REVIEW-4.md.
F3 — Cleanup. Remove dead code/scaffolding and stale TODOs; consistent naming/structure; reconcile machine-docs/ (BACKLOG/IDEAS/DECISIONS current, no contradictions); docs match the final state. No behavior change beyond what F2 mandates.
F4 — FULL cold re-verification (the final gate). After F1–F3 land, the Adversary independently re-verifies every prior Definition-of-Done from a cold start, to the same bar each phase used — fresh PASS + evidence + timestamps in machine-docs/REVIEW-4.md within 24h, nothing weakened/skipped/softened by the cleanup: - Phase 1 D1–D10 (incl. the genuine D8 byte-identical fresh-clone rebuild + a category-spanning live !testme e2e through the public gateway). - Phase 1c C1–C7 (secrets-in-git, cert-in-sops, honest reproducibility). - Phase 1d DG1–DG8 (generic install/upgrade/backup/restore, deploy-once DG4.1, override floor) as amended by 1e. - Phase 1e HC1–HC3 (upgrade→PR-head via deploy --chaos; repo-local gated to approved recipes; generic-by-default + explicit opt-out). - Phase 2 recipe-coverage criteria (every enrolled recipe's overlays/ported tests real, DRY, green). - Phase 2b performance claims (the measured improvements still hold; no test weakened to get them). - Phase 3 results/level/UX criteria (per-run level honest, PR comment + dashboard correct).
F5 — Documented + cold-verified. Final docs/ accurate (install reproduces from scratch; enroll-recipe + overlay/approval flow correct); accepted deviations in DECISIONS.md; the Adversary confirms F1–F4 with no standing VETO and no open [adversary] finding.

When F1–F5 hold and are confirmed, write ## DONE to machine-docs/STATUS-4.md — the build is complete.

2. Method

Lint/format first (F1) — re-run + extend; auto-fix style, don't deliberate.
Review checklist (F2, §3) — classify blocking vs advisory; fix blocking, triage rest.
Cleanup (F3) — dead code, naming, docs, machine-docs/ reconciliation.
Full cold re-verification LAST (F4) — once everything has landed, the Adversary re-runs the entire cross-phase acceptance from cold. Order matters: tooling → review/fixes → cleanup → then full re-verify. Cleanup must regress nothing.
Bound it — a pass, not a rewrite; record dead-ends/deviations and stop.

3. White-box review checklist (teeth, not taste) — whole codebase

Blocking unless noted (plan-relevant invariants visible only by reading code):

Tests are real (blocking) — every generic/overlay/custom test asserts actual app state; no skip/xfail/can't-fail; per-op pass/fail/skip honest; the 1d/1e anti-vacuous guards (assert_serving routing proof, do_upgrade "moved", deploy-count==1) intact.
1e corrections intact (blocking) — repo-local code still gated to approved recipes; generic still runs by default (opt-out explicit); upgrade still targets the PR head.
Generic-first / custom-additive invariant (blocking — docs/testing.md). Confirm no path makes the generic tier depend on custom: deps deploy + setup_custom_tests run after all generic tiers, never before; a forced setup_custom_tests failure still yields a clean generic-tier pass/pass/pass/pass + skip(deps-not-ready) for @requires_deps custom tests (re-exercise the forced-failure case). Future maintainers must be able to operate cc-ci with the generic tier alone — verify that path stays viable.
Harness DRY (blocking-ish) — recipe quirks are data (recipe_meta.py), not shared-harness conditionals; overlays are thin; no per-recipe copy-paste of lifecycle logic.
Server state Nix-declared & idempotent (blocking) — no imperative drift / run-once sentinels / manual post-rebuild steps; the nix/ layout clean.
No footguns (blocking) — no bare sleep for readiness (poll); teardown in finally; secrets reused per run not regenerated; no hardcoded versions/domains that break upstream.
No secrets in code/committed files (blocking) — grep source/configs/.drone.yml/fixtures; log/dashboard redaction real (incl. any new Phase-3 UX surface that echoes run data).
Phase-3 UX correctness (advisory→blocking on real drift) — the displayed level/badge/screenshot reflect the true per-op results; no misleading "pass".
Architecture matches the plans; deviations in DECISIONS.md (advisory→blocking on real drift).
Readability & docs (advisory) — clear names, dead code removed, docs reproduce from scratch.

4. Guardrails

Never weaken a test to satisfy a lint/review/cleanup nit (cardinal rule wins).
Don't reopen settled design — clean + harden + re-verify; bigger ideas → IDEAS.md.
Bounded — one pass; cap iterations; record + stop.
Cleanup regresses nothing — F4 is the proof; if a cleanup breaks a prior guarantee, revert the cleanup, not the guarantee.

5. Open decisions (log in machine-docs/DECISIONS.md)

Any new linters/formatters for Phase-3 front-end / new areas, and their strictness.
The precise blocking-vs-advisory split for the §3 checklist on the new code.
Whether to add Python type-checking now or defer to IDEAS.md (carried from 1b).

7.7 KiB Raw Blame History Unescape Escape