Operator 2026-06-17: UPGRADE_BASE_VERSION is still used (plausible pins 3.0.1+ v2.0.0 to dodge the broken 3.0.0 base; bluesky-pds references it as a future re-enable). Once canon establishes plausible's canonical at 3.0.1, the dynamic base resolves correctly without the pin -> strip the key (meta/resolver/docs/ tests) + migrate plausible + update bluesky-pds note. GATED: keep it if plausible genuinely still needs the escape-hatch (never drop upgrade coverage).
14 KiB
Phase canon — make the canonical sweep actually work (the real "nightly sweep") + verify it
Mission (operator-specified 2026-06-17): the "nightly sweep" was specified in theory but was never
actually doing anything — confirmed live: nightly-sweep.timer is deployed and fires green
(nightly_sweep.py, last run 2026-06-17 03:09 UTC exit 0), but only custom-html is WARM_CANONICAL
-enrolled and ZERO canonical.json records exist — i.e. the machinery has never actually promoted a
canonical end-to-end. This phase makes it real and proven, as the substitute for that hollow
nightly sweep, with the operator's refinements (2026-06-17):
- Sync each recipe mirror's
mainongit.autonomic.zone/recipe-maintainers/<recipe>to its upstream (git.coopcloud.tech/coop-cloud/<recipe>) first, so the sweep sees true upstream tags/latest. - Trigger on a new RELEASED VERSION, not a new commit. Test a recipe only when its latest release
tag on the synced
mainis newer than its current canonical version — skip when there is no new version, even ifmainhas new untagged commits. The sweep tests releases, not arbitrary commits. - Promote the canonical only to a TAGGED release. A canonical advances only to a version that has a real release tag (a published release) — never to an arbitrary untagged commit.
…then run CI cold-on-main for each recipe and actually promote the canonical for any that pass —
and prove the whole thing works. The deliverable is correctness, verified end-to-end — and the
operator specifically wants confidence it plays nicely with the samever upgrade-base work (§2
"Plays-nice-with-samever"). Operator decisions (2026-06-17): all recipes enrolled (§2.B), and the
cadence is weekly (change the existing daily timer to weekly — a one-line OnCalendar tune; exact
day/time is not critical). This REPLACES the hollow nightly sweep; it is not a parallel job.
State files: STATUS-canon.md, BACKLOG-canon.md, REVIEW-canon.md, JOURNAL-canon.md. DECISIONS.md shared.
1. Verified starting state (2026-06-17)
nightly-sweep.timerenabled + active (next ~03:00 UTC);nightly_sweep.pyruns and exits 0. The timer/service plumbing already works — reuse it, don't rebuild it.- Only
custom-htmlsetsWARM_CANONICAL = True. The sweep iteratescanonical.enrolled_recipes()→ essentially one recipe → near-no-op across the fleet. - No
canonical.jsonexists on the host → the promote path (should_promote_canonical→promote_canonical→write_registry) has never successfully produced a canonical, even for custom-html. This is the crux of "theory, not actually doing it." - The sweep does not reconcile mirrors to upstream, and does not skip-when-unchanged.
2. The work
A. Prove + fix the promote path FIRST (the core). On custom-html (already enrolled), make a green
cold-on-latest run actually write canonical.json (recipe/version/commit/status) AND prove a
subsequent --quick warm-reattach uses it (deploy_canonical reattaches the retained volume). If it
doesn't happen today, find and fix why (this is the real defect behind the hollow sweep). A canonical
must demonstrably exist and be reusable before anything else is meaningful.
- Promote-gate addition (operator 2026-06-17): only promote to a TAGGED release. Extend
should_promote_canonicalso a promote ALSO requires the tested version to correspond to a published release tag (warm_reconcile.recipe_tags): green + cold + latest + enrolled + tagged. The canonical must always be a real release — never an arbitrary untaggedmaincommit. An untagged state must never be written as a canonical.
B. Enroll ALL recipes (operator decision 2026-06-17). Set WARM_CANONICAL = True for every recipe
cc-ci tracks (the used-recipes.md set) — the sweep promotes a canonical for each that passes, not just
custom-html.
- Watch the warm-volume disk budget: ~21 recipes each retaining a data volume on the single node is
real disk. Verify headroom, lean on the existing WC8 disk-hygiene /
ci-docker-prune, and if disk becomes the binding limit, raise it rather than silently dropping recipes (a fallback if needed: decouple the cheap last-green version record — kept for all — from the expensive retained volume). Default remains all-enrolled. - If a specific recipe genuinely cannot be enrolled (e.g. unbounded data, no stable health), record the exception + reason in DECISIONS — don't silently skip it.
C. Add the upstream mirror-sync step. Before the per-recipe CI, reconcile each mirror's main + tags
to coopcloud upstream — reuse recipe-upgrade's open-recipe-pr.sh <recipe> --reconcile-only (handles
go-git private-mirror auth, fetches coopcloud via an upstream remote, closes already-merged-upstream
PRs, leaves unrelated PRs). This is a faithful mirror sync, not a push of our own changes.
D. Trigger on a new RELEASED VERSION (skip when no new version). After sync, compute the recipe's
latest release tag version reachable on main and compare it to the canonical record's version:
- latest release tag == canonical version → skip (
SKIP no-new-version) — even ifmainhas new untagged commits. The sweep tests releases, not arbitrary commits. - latest release tag newer than canonical → run CI cold on that tagged version → promote on green (tagged, per §2.A).
- no release tag at all (recipe never released) → skip with a recorded reason. This is the operator's trigger refinement (version/tag-keyed, not commit-keyed) and the determinism property (M2 run-twice → everything skips).
E. Keep it deterministic + AI-free at runtime (it already is — a script + timer). The additions must stay pure code: no AI calls during the run. AI (the loops) only authors + verifies.
F. Make the timer weekly (operator preference): change the existing daily OnCalendar to weekly. The
exact day/time is not critical — pick a low-traffic slot; it's a one-line tune. Persistent = true to
catch up a missed run. This is the only schedule work; do not over-invest in it.
G. Retire UPGRADE_BASE_VERSION if plausible no longer needs it (operator 2026-06-17). Today it is
still used: plausible sets UPGRADE_BASE_VERSION = "3.0.1+v2.0.0" (the old static [-2] default
picked 3.0.0, whose clickhouse entrypoint 404s on amd64 → base never converges; the pin forces the
newest published 3.0.1); bluesky-pds only references it (in an EXPECTED_NA upgrade-skip note as
a future re-enable path). Once this phase enrolls plausible and promotes its canonical to its latest green
release (3.0.1), the dynamic base resolves to 3.0.1 on its own — the correct base, avoiding the
broken 3.0.0 — so the explicit pin becomes redundant. Therefore:
- With plausible's canonical established at
3.0.1, remove the pin fromtests/plausible/recipe_meta.pyand confirm its upgrade tier still resolves the correct base (3.0.1) and passes under the dynamic resolver. - If that holds, strip
UPGRADE_BASE_VERSIONentirely: the meta key (runner/harness/meta.pyKEYS), the override branch inresolve_upgrade_base(run_recipe_ci.py), the docs (recipe-customization.md§4/§5,testing.md), and the unit tests (test_meta.py,test_upgrade_base.py); and updatebluesky-pds's comment so its re-enable path is the dynamic base, not the removed key. - GATE (do not force it): if plausible genuinely still can't get the right base dynamically (e.g.
3.0.1itself won't cold-deploy green, so no canonical), KEEPUPGRADE_BASE_VERSIONas the escape-hatch and record why in DECISIONS — never drop a recipe's upgrade coverage to delete a key.
Plays-nice-with-samever (operator wants this CONFIRMED). The release-tag trigger (D) makes the
sweep and samever orthogonal — confirm they don't interfere:
- In the sweep, a recipe runs only when a new release tag exists, so the version under test is
always newer than the canonical → the upgrade tier's base (previous canonical/released version) is
strictly older →
samever's same-version step-back never fires in the sweep (the tag trigger already prevents avX→vXrun; no-new-version recipes are skipped outright). sameverremains the guard for the PR path (!testme), where a PR can carry the same version label as the canonical without cutting a release — that's where the step-back matters, and it's owned/proven by thesameverphase, not here. So in the sweep, verify only: (a) no new release tag → recipe SKIPPED (no upgrade-tier run, no promote); (b) new release tag →canonical(older) → new tagged version, a real delta, promote (tagged). The sweep must never promote an untagged version and never run a same-version upgrade.
3. Gates
M1 — machinery works locally, each piece proven. (A) a real canonical.json is produced by a green
cold run on ≥1 recipe and reused by a warm reattach — demonstrated, not assumed — and the promote gate
now also requires a release tag (untagged → no promote). (C) mirror-sync and (D) the new-release-tag
trigger implemented, reusing the existing reconcile + sweep code, with unit tests (trigger = latest
release tag vs canonical version, NOT commit; sync invoked per recipe; promote gated on
green+cold+latest+enrolled+tagged). (B) all recipes enrolled. Adversary cold-verifies: a canonical
actually exists + reattaches; an untagged state never promotes; the trigger skips no-new-tag recipes
and runs new-tag ones; sync is faithful-mirror-only; a RED recipe does NOT promote; no AI at runtime.
M2 — proven end-to-end in real CI (the heart of this phase). A full sweep run across the enrolled set on cc-ci: mirrors synced to upstream, canonicals actually promoted for the green recipes (records exist with correct version+commit), red recipes left intact, unchanged recipes skipped — with a per-recipe results log. Determinism proof: run the sweep a SECOND time immediately → it SKIPS every recipe (latest release tag == canonical version for all → skip all) = a clean no-op, no CI rerun. Confirm the deployed timer fires the real (non-hollow) job — after a fire, canonicals have advanced (evidence), not exit-0 on an empty set. Tagged-promote proven: show a green run on an untagged state does NOT promote, and a green run on a tagged release DOES.
samever orthogonality proven (operator-required). Demonstrate, with evidence, the two sweep paths:
(1) no new release tag (latest tag == canonical version, even with new untagged commits on main) →
recipe SKIPPED — no upgrade-tier run, no promote; (2) new release tag → cold-test the new tagged
version, upgrade canonical(older) → new, a real delta, promote (because tagged). Confirm samever's
step-back never fires inside the sweep (the tag trigger prevents same-version runs) — its same-version
behavior is owned/proven by the samever phase on the PR path. Construct scenarios if the live recipe set
doesn't cover both.
No AI in the loop. Fresh Adversary PASS on both milestones → ## DONE.
4. Guardrails
- Correctness over cadence. The bar is the machinery demonstrably promotes canonicals, syncs mirrors,
skips unchanged, and plays nicely with
samever. The cadence is decided (weekly) — set it in oneOnCalendarline and move on; don't agonize over the exact slot. - No AI at runtime — pure script + systemd timer; AI only builds/verifies.
- Single-node safety: serial; skip the whole run if a Drone/test build is in flight (reuse the existing nightly guard); tear down every deploy; bound total runtime; mind the warm-volume disk budget.
- Never force-promote / never weaken: promote only on green-cold-latest-enrolled; a red recipe keeps its prior known-good. Never weaken a test to make a recipe promote.
- Faithful mirror sync only: force-sync
main/tags to coopcloud upstream; never push our own changes to mirrormain; never merge/disturb unrelated PRs. - Nix/host changes (enrollment is recipe-meta; any timer/module tweak is a nixos-rebuild): loops may
deploy if clean and verify host health after; else file for the orchestrator. Commit author
autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>; push every commit; abra over a pseudo-TTY.
5. Definition of Done
The canonical sweep actually works and is proven: a green cold run on a tagged release produces a
real, reusable canonical.json (an untagged state never promotes); the sweep reconciles each recipe
mirror's main to upstream, skips recipes with no new release tag (even if main has new untagged
commits), runs CI cold on the new tagged version for the rest, and promotes the canonical only to that
tagged release — across all enrolled recipes, demonstrated end-to-end in CI, including the run-twice no-op
determinism proof, the tagged-promote proof, and a real (non-hollow) timer fire. samever confirmed
orthogonal (never fires in the sweep). All recipes enrolled + warm-volume budget recorded. UPGRADE_BASE_VERSION
retired (key + resolver branch + docs + tests removed, plausible migrated to the dynamic base) if
plausible works without it — else kept with a recorded reason (§2.G). The runtime job is AI-free; it is the
substitute for the hollow nightly sweep (not a parallel job). M1 + M2 fresh Adversary PASSes in REVIEW-canon.md.