plan: queue canon — make the canonical sweep actually work (substitute for hollow nightly sweep)

Operator 2026-06-17. The nightly-sweep timer fires green but is a no-op: only
custom-html is WARM_CANONICAL and zero canonical.json records exist -> no
canonical has ever been promoted end-to-end. canon makes it real + proven:
fix/prove the promote path, broaden enrollment, add upstream mirror-sync +
skip-when-unchanged, verify end-to-end (incl. run-twice no-op). Schedule is
incidental; correctness is the deliverable. Replaces the hollow sweep. opus.
This commit is contained in:
2026-06-17 04:06:59 +00:00
parent 03292c6f57
commit 05e2635019
2 changed files with 109 additions and 0 deletions

View File

@ -160,4 +160,6 @@ phases = [
{ id = "regall", plan = "plan-phase-regall-recipe-regression.md", status = "STATUS-regall.md", models = { builder = "claude-sonnet-4-6", adversary = "claude-sonnet-4-6" } },
# same-version upgrade-base gap: step back to newest-older-published when last-green==head (opus, design A; B in IDEAS) — see plan-phase-samever-*.md (operator 2026-06-17)
{ id = "samever", plan = "plan-phase-samever-older-base-fallback.md", status = "STATUS-samever.md", models = { builder = "claude-opus-4-8", adversary = "claude-opus-4-8" } },
# make the canonical sweep ACTUALLY work (substitute for the hollow nightly sweep) + upstream-sync + skip-unchanged; verify end-to-end (opus) — see plan-phase-canon-*.md (operator 2026-06-17)
{ id = "canon", plan = "plan-phase-canon-canonical-sweep.md", status = "STATUS-canon.md", models = { builder = "claude-opus-4-8", adversary = "claude-opus-4-8" } },
]

View File

@ -0,0 +1,107 @@
# Phase `canon` — make the canonical sweep actually work (the real "nightly sweep") + verify it
**Mission (operator-specified 2026-06-17):** the "nightly sweep" was specified in theory but **was never
actually doing anything** — confirmed live: `nightly-sweep.timer` is deployed and fires green
(`nightly_sweep.py`, last run 2026-06-17 03:09 UTC exit 0), but **only `custom-html` is `WARM_CANONICAL`
-enrolled and ZERO `canonical.json` records exist** — i.e. the machinery has **never actually promoted a
canonical end-to-end**. This phase makes it **real and proven**, as the **substitute for** that hollow
nightly sweep, with two additions the operator wants:
1. **Sync each recipe mirror's `main`** on `git.autonomic.zone/recipe-maintainers/<recipe>` to its
**upstream** (`git.coopcloud.tech/coop-cloud/<recipe>`) first, so the sweep tests true upstream latest.
2. **Skip a recipe whose `main` is unchanged** vs its current canonical (no rerun needed).
…then **run CI cold-on-`main` for each recipe and actually promote the canonical for any that pass**
and **prove the whole thing works**. **The schedule/cadence is NOT the point** (nightly vs weekly is
trivially retunable later) — *correctness of the machinery, verified end-to-end,* is the deliverable.
Keep the existing nightly timer slot; this REPLACES the hollow sweep, it is not a parallel job.
State files: `STATUS-canon.md`, `BACKLOG-canon.md`, `REVIEW-canon.md`, `JOURNAL-canon.md`. DECISIONS.md shared.
## 1. Verified starting state (2026-06-17)
- `nightly-sweep.timer` enabled + active (next ~03:00 UTC); `nightly_sweep.py` runs and exits 0. The
timer/service plumbing already works — **reuse it, don't rebuild it.**
- **Only `custom-html` sets `WARM_CANONICAL = True`.** The sweep iterates `canonical.enrolled_recipes()`
→ essentially one recipe → near-no-op across the fleet.
- **No `canonical.json` exists** on the host → the promote path (`should_promote_canonical`
`promote_canonical``write_registry`) has **never successfully produced a canonical**, even for
custom-html. This is the crux of "theory, not actually doing it."
- The sweep does **not** reconcile mirrors to upstream, and does **not** skip-when-unchanged.
## 2. The work
**A. Prove + fix the promote path FIRST (the core).** On `custom-html` (already enrolled), make a green
cold-on-latest run **actually write `canonical.json`** (recipe/version/commit/status) AND prove a
subsequent `--quick` warm-reattach uses it (`deploy_canonical` reattaches the retained volume). If it
doesn't happen today, find and fix why (this is the real defect behind the hollow sweep). A canonical
must demonstrably exist and be reusable before anything else is meaningful.
**B. Realize "promote the canonical for any recipe that passes."** Today only custom-html is enrolled, so
"each recipe" is vacuous. Decide + document the **enrollment scope**:
- Default intent (operator): broaden `WARM_CANONICAL` so the sweep tracks the real recipe set — at
minimum prove it across **several** recipes, not one.
- **Flag the resource cost:** each warm canonical retains a data volume on the single node — enrolling
everything has a disk budget. If "promote for all that pass" is wanted but warm-volumes-for-all is too
much disk, consider **decoupling** the cheap *version record* (last-green {version,commit}, promote for
all) from the expensive *warm volume* (retain selectively). Surface this in DECISIONS; get the
operator's enrollment set rather than silently enrolling all.
**C. Add the upstream mirror-sync step.** Before the per-recipe CI, reconcile each mirror's `main` + tags
to coopcloud upstream — reuse `recipe-upgrade`'s `open-recipe-pr.sh <recipe> --reconcile-only` (handles
go-git private-mirror auth, fetches coopcloud via an `upstream` remote, closes already-merged-upstream
PRs, leaves unrelated PRs). This is a **faithful mirror sync, not a push of our own changes.**
**D. Add skip-when-unchanged.** After sync, if the recipe's `main` commit == its canonical record's commit
(no change since the last promotion) → **skip** (log `SKIP unchanged`). This is the operator's efficiency
ask and is also the determinism property (see M2 run-twice proof).
**E. Keep it deterministic + AI-free at runtime** (it already is — a script + timer). The additions must
stay pure code: no AI calls during the run. AI (the loops) only authors + verifies.
**Note — exercises `samever`:** the sweep's cold-on-latest upgrade tier hits the same-version case as its
steady state (canonical == latest after a promotion); this phase is also a real-world validation that the
`samever` step-back (previous-published base) behaves under the sweep.
## 3. Gates
**M1 — machinery works locally, each piece proven.** (A) a real `canonical.json` is produced by a green
cold run on ≥1 recipe and reused by a warm reattach — **demonstrated, not assumed**. (C) mirror-sync and
(D) skip-when-unchanged implemented, reusing the existing reconcile + sweep code, with unit tests
(skip = commit-equality; sync invoked per recipe; promote still gated on green+cold+latest+enrolled).
(B) enrollment scope decided + recorded (≥several recipes enrolled, or the decouple decision). Adversary
cold-verifies: a canonical actually exists + reattaches; skip-logic correct; sync is faithful-mirror-only;
a RED recipe does NOT promote (prior known-good intact); no AI at runtime.
**M2 — proven end-to-end in real CI (the heart of this phase).** A full sweep run across the enrolled set
on cc-ci: mirrors synced to upstream, **canonicals actually promoted for the green recipes** (records
exist with correct version+commit), red recipes left intact, unchanged recipes skipped — with a
per-recipe results log. **Determinism proof: run the sweep a SECOND time immediately → it SKIPS every
recipe** (all `main` == the canonicals just promoted) = a clean no-op, no CI rerun. Confirm the **deployed
timer fires the real (non-hollow) job** — after a fire, canonicals have advanced (evidence), not exit-0
on an empty set. No AI in the loop. Fresh Adversary PASS on both milestones → `## DONE`.
## 4. Guardrails
- **Correctness over cadence.** The schedule is incidental; do not spend effort tuning nightly-vs-weekly.
The bar is: the machinery *demonstrably promotes canonicals, syncs mirrors, and skips unchanged.*
- **No AI at runtime** — pure script + systemd timer; AI only builds/verifies.
- **Single-node safety:** serial; skip the whole run if a Drone/test build is in flight (reuse the
existing nightly guard); tear down every deploy; bound total runtime; mind the warm-volume disk budget.
- **Never force-promote / never weaken:** promote only on green-cold-latest-enrolled; a red recipe keeps
its prior known-good. Never weaken a test to make a recipe promote.
- **Faithful mirror sync only:** force-sync `main`/tags to coopcloud upstream; never push our own changes
to mirror `main`; never merge/disturb unrelated PRs.
- **Nix/host changes** (enrollment is recipe-meta; any timer/module tweak is a nixos-rebuild): loops may
deploy if clean and **verify host health after**; else file for the orchestrator. Commit author
`autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>`; push every commit; abra over a pseudo-TTY.
## 5. Definition of Done
The canonical sweep **actually works and is proven**: a green cold-on-latest run produces a real,
reusable `canonical.json`; the sweep reconciles each recipe mirror's `main` to upstream, skips recipes
whose `main` is unchanged vs canonical, runs CI on the rest, and promotes the canonical for any that pass
— across a real multi-recipe set, demonstrated end-to-end in CI, including the run-twice no-op determinism
proof and a real (non-hollow) timer fire. Enrollment scope + the warm-volume budget decided/recorded; the
runtime job is AI-free; it is the substitute for the hollow nightly sweep (not a parallel job). M1 + M2
fresh Adversary PASSes in REVIEW-canon.md.