From 23c02c59b6bd52b1d2fac0f29e7c98aed82f8126 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Wed, 17 Jun 2026 06:28:35 +0000 Subject: [PATCH] =?UTF-8?q?status(canon):=20bootstrap=20phase=20canon=20?= =?UTF-8?q?=E2=80=94=20state=20files,=20hollow-sweep=20root=20cause,=20M1/?= =?UTF-8?q?M2=20backlog?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 --- machine-docs/BACKLOG-canon.md | 54 ++++++++++++++++++++ machine-docs/JOURNAL-canon.md | 94 +++++++++++++++++++++++++++++++++++ machine-docs/STATUS-canon.md | 29 +++++++++++ 3 files changed, 177 insertions(+) create mode 100644 machine-docs/BACKLOG-canon.md create mode 100644 machine-docs/JOURNAL-canon.md create mode 100644 machine-docs/STATUS-canon.md diff --git a/machine-docs/BACKLOG-canon.md b/machine-docs/BACKLOG-canon.md new file mode 100644 index 0000000..1ae3866 --- /dev/null +++ b/machine-docs/BACKLOG-canon.md @@ -0,0 +1,54 @@ +# BACKLOG — phase `canon` + +## Build backlog (Builder-owned) + +Milestone map → Definition of Done (§5). M1 = machinery + unit tests (Adversary cold-verifies the +pieces). M2 = proven end-to-end in real CI. + +### M1 — machinery works locally, each piece proven + +- [ ] **M1.1 Tagged-promote gate (§2.A).** Extend `should_promote_canonical` to ALSO require the + tested head version corresponds to a published release tag. Add a `tagged: bool` param computed + at the call site (`head_version in recipe_tags(recipe)`); keep the function pure. Untagged head + → no promote. Unit tests: enrolled+green+cold+not-ref+tagged → True; each missing condition + (incl. untagged) → False. +- [ ] **M1.2 Release-tag trigger + mirror-sync in the sweep (§2.C/§2.D).** New pure helper + `sweep_decision(recipe, latest_tag, canon_version)` → `run` | `skip:no-new-version` | + `skip:never-released`, keyed on `version_key` (NOT commit). Wire `nightly_sweep.sweep()` to, per + enrolled recipe: (1) faithful mirror-sync main+tags to upstream (reuse open-recipe-pr.sh + `--reconcile-only`, vendored into the repo for reproducibility); (2) compute latest release tag + vs canonical; (3) skip or run cold ON THE TAG (checkout tag + `CCCI_SKIP_FETCH=1`). Unit tests + for `sweep_decision` (new tag → run; equal → skip; older/no tag → skip). +- [ ] **M1.3 Enroll all recipes (§2.B).** Set `WARM_CANONICAL = True` in each of the 21 used-recipes + `tests//recipe_meta.py`. Leave fixtures (custom-html-*-bad, concurrency, regression) alone. +- [ ] **M1.4 Hollow-sweep fix (root cause).** Make the deployed sweep read the REAL tests/ + run + current code: set `CCCI_REPO=/etc/cc-ci` in the sweep service and run `nightly_sweep.py` from + the checkout (not the store copy). Deploy procedure pulls `/etc/cc-ci` before nixos-rebuild. +- [ ] **M1.5 Weekly timer (§2.F).** `nightly-sweep.nix` `OnCalendar` daily → weekly (one line), + `Persistent=true` (already set). Low-traffic slot. + +### M2 — proven end-to-end in real CI + +- [ ] **M2.1 Deploy** the M1 changes: `git -C /etc/cc-ci pull` + `nixos-rebuild switch`; verify host + health after. +- [ ] **M2.2 Full sweep run** across the enrolled set on cc-ci: mirrors synced, canonicals promoted + for green recipes (records with correct version+commit), red recipes left intact, no-new-tag + recipes skipped. Per-recipe results log captured. +- [ ] **M2.3 Determinism proof:** run the sweep a SECOND time immediately → every recipe SKIPS + (latest tag == canonical for all) = clean no-op, no CI rerun. +- [ ] **M2.4 Tagged-promote proof:** a green run on an UNTAGGED state does NOT promote; a green run + on a TAGGED release DOES. Construct if the live set doesn't cover it. +- [ ] **M2.5 Real (non-hollow) timer fire:** after a timer fire, canonicals have ADVANCED (evidence), + not exit-0 on an empty set. +- [ ] **M2.6 samever orthogonality:** (a) no new tag (even with untagged commits on main) → SKIP, no + upgrade-tier run, no promote; (b) new tag → cold-test new tag, canonical(older)→new, promote. + Show step-back never fires inside the sweep. +- [ ] **M2.7 Disk budget recorded;** all recipes enrolled (or documented exception in DECISIONS). +- [ ] **M2.8 §2.G UPGRADE_BASE_VERSION retirement** — after plausible's canonical lands at 3.0.1: + remove the pin, confirm dynamic base resolves 3.0.1 + passes; if it holds, strip the key + (meta KEYS, resolver branch, docs, unit tests) + update bluesky-pds comment. Else KEEP with a + recorded reason in DECISIONS. + +## Notes +- Order within M1: M1.1 → M1.2 (depend on version helpers) → M1.3/M1.4/M1.5 (config). Claim M1 only + when all unit tests green + tree clean + pushed. diff --git a/machine-docs/JOURNAL-canon.md b/machine-docs/JOURNAL-canon.md new file mode 100644 index 0000000..f496026 --- /dev/null +++ b/machine-docs/JOURNAL-canon.md @@ -0,0 +1,94 @@ +# JOURNAL — phase `canon` (canonical sweep, make it real) + +Builder reasoning log. WHY lives here; WHAT/HOW/EXPECTED/WHERE live in STATUS-canon.md. + +## 2026-06-17 — bootstrap / code survey + +Read the phase canon (`plan-phase-canon-canonical-sweep.md`) + plan.md §6.1/§7/§9. Surveyed the +existing canonical/sweep machinery before designing. Key findings: + +### Clone identity +`/srv/cc-ci` is a symlink → `/srv/cc-ci-orch`; the env's two "working dirs" are the same directory. +This IS the Builder clone (reflog shows the `claim(M2)`/`status(samever) ## DONE` commits). The +Adversary cold-verifies from its own fresh clones. No collision. + +### What already works (phase doc is partly stale) +- The phase doc says "ZERO canonical.json exist". **Not true any more**: a real canonical for + `custom-html` exists on the host at `/var/lib/ci-warm/custom-html/canonical.json` + (`version 1.13.0+1.31.1`, commit `2b82eba…`, status idle, ts `20260617T050314Z`) with its retained + data volume `warm-custom-html_..._content`. It was produced by a **manual** cold run during the + `samever` phase, NOT by the timer. So the *promote primitive* (seed_canonical → write_registry + + warmsnap) demonstrably works; the **sweep that should drive it is what's hollow.** + +### The real "hollow sweep" defect (root cause, confirmed live) +The deployed `nightly-sweep.timer` fired 2026-06-17 03:09 and logged: +`===== nightly cold sweep: enrolled canonicals = [] =====` → a true no-op. +Cause: `nightly_sweep.py` does `REPO = os.environ.get("CCCI_REPO", "/root/cc-ci")` then +`sys.path.insert(0, REPO/runner); from harness import canonical`. The systemd unit +(`nix/modules/nightly-sweep.nix`) sets **no `CCCI_REPO`**, and `/root/cc-ci` **does not exist** on the +host. So the import falls through to the harness packaged in the **nix store** (`runnerSrc=../../runner` +— runner/ only, NO tests/). `meta.TESTS_DIR = ROOT/tests` then points at a nonexistent dir → +`enrolled_recipes()` swallows the OSError → `[]`. Even though `custom-html` is enrolled in the repo, +the deployed timer never sees it. **This is the machinery that was "specified but never doing +anything."** Fix: point the sweep at a real, current checkout that has `tests/`. + +### How current code stays live on the host +- Normal recipe CI: Drone `exec` pipeline auto-clones cc-ci per build into its workspace, then runs + `cc-ci-run runner/run_recipe_ci.py` from that fresh clone → tests/runner always current. +- `/etc/cc-ci` is a **git clone** (the nixos flake source: `nixos-rebuild --flake /etc/cc-ci#…`). + It is currently STALE (`e60415d`, far behind main) because recent phases only touched `runner/` + (picked up by Drone's fresh clone) and needed no nixos-rebuild. The sweep is the first thing that + needs `/etc/cc-ci` current. +- Plan: sweep service sets `CCCI_REPO=/etc/cc-ci` and runs `nightly_sweep.py` FROM the checkout + (change the nix to exec `$CCCI_REPO/runner/nightly_sweep.py`, not the store copy) → after a deploy + that does `git -C /etc/cc-ci pull && nixos-rebuild`, the sweep reads current tests/ + runner. This + reuses the flake-source checkout (declarative, reproducible) rather than inventing a new clone. + +### Promote path (the core, §2.A) +- `should_promote_canonical(recipe, ref, overall, quick)` = enrolled & green & cold(not quick) & + not-ref (no PR head). `promote_canonical` deploys `latest_version(recipe_tags(recipe))` (the latest + git tag) fresh/in-place, waits healthy, undeploys, `seed_canonical` (snapshot + write_registry). +- **Tagged-promote addition needed:** the green gate currently tests *whatever fetch_recipe checked + out* (catalogue `main` HEAD for a cold run), which can be untagged-ahead of the latest tag, while + promote always writes the latest TAG. Per operator: a canonical must only ever be a real release. + Add a `tagged` requirement: the tested head version (`abra.head_compose_version`, the compose + `version` label) must equal a published release tag (`recipe_tags`). When main HEAD == latest + release (the common just-cut case) head_version == latest tag → promote; when main is untagged-ahead + → no promote. + +### Trigger on a NEW RELEASE TAG (§2.D) + test the tag (not main) +- Version ordering is centralized in `warm_reconcile.version_key` / `latest_version` / + `newest_older_version` (already used by samever step-back). Reuse them. +- Trigger (pure, in the sweep, per recipe): after mirror-sync, `latest = latest_version(recipe_tags)`; + `canon = read_registry(recipe).version`. No tag → SKIP (never released). `latest <= canon` (by + version_key) → SKIP no-new-version (even if main has untagged commits — we compare tags not + commits). `latest > canon` → run cold on the tag. +- **Test the TAG cold:** to honour "run CI cold on that tagged version" (and so a green gate proves + the exact thing that gets promoted), check out the latest tag in `~/.abra/recipes/` and run + with `CCCI_SKIP_FETCH=1` (the existing staging mechanism) → head_version = tag, head_ref = tag + commit, REF empty (so `not ref` still holds → promote allowed). The upgrade-base resolver then sees + canonical(older) < head(new tag) → real delta (samever step-back never fires: tag>canon by + construction). + +### samever orthogonality (operator-required) +The release-tag trigger guarantees, in the sweep, version-under-test > canonical, so the upgrade +base is strictly older → `samever`'s same-version step-back never fires. (a) no new tag → SKIP, no +upgrade-tier run; (b) new tag → canonical(older)→new, real delta, promote. samever's same-version +behaviour stays owned by the samever phase on the PR path. Will demonstrate both in M2. + +### Enroll-all set (§2.B) +Authoritative inventory = `cc-ci-plan/used-recipes.md` (21 rows: 20 `weekly` + `uptime-kuma` +`external`). NOT the test fixtures (custom-html-bkp-bad / -rst-bad, concurrency, regression, +_generic). custom-html-tiny IS in used-recipes (weekly) → enroll it too. + +### Disk budget (§2.B watch-item) +Host `/`: 150G total, 104G used, **40G free (73%)**. `du` of /var/lib/ci-warm today: custom-html 32K, +keycloak 159M. Retaining ~21 fresh-install data volumes should be a few GB; immich/matrix/mailu are +the ones to watch. Will measure during the M2 full sweep and record the real budget; raise the VM +disk (orchestrator) rather than silently drop recipes if it binds. + +### §2.G UPGRADE_BASE_VERSION retirement — gated on M2 +`plausible` pins `UPGRADE_BASE_VERSION="3.0.1+v2.0.0"`; `bluesky-pds` only references it in a comment. +Retirement requires plausible's canonical to actually land at its latest green release so the dynamic +resolver picks the right base — so this is sequenced AFTER M2 promotes plausible. Keep the pin if +plausible can't go green dynamically (record why). diff --git a/machine-docs/STATUS-canon.md b/machine-docs/STATUS-canon.md new file mode 100644 index 0000000..894d022 --- /dev/null +++ b/machine-docs/STATUS-canon.md @@ -0,0 +1,29 @@ +# STATUS — phase `canon` (canonical sweep, make it real) + +Gate: M1 IN PROGRESS (not yet claimed). + +WHAT/HOW/EXPECTED/WHERE for the Adversary. Reasoning lives in JOURNAL-canon.md. + +## Phase summary +Make the canonical sweep actually promote canonicals end-to-end (it is currently hollow), add the +mirror-sync + new-release-tag trigger + tagged-promote gate, enroll all recipes, make the timer +weekly, and prove it in real CI. DoD = §5 of `cc-ci-plan/plan-phase-canon-canonical-sweep.md`. + +## Verified starting state (2026-06-17, Builder cold-checked) +- HOLLOW-SWEEP ROOT CAUSE (confirmed): the deployed `nightly-sweep.timer` fired 03:09 UTC and logged + `enrolled canonicals = []`. Cause: the unit sets no `CCCI_REPO`; default `/root/cc-ci` does not + exist; the import falls back to the nix-store harness whose `TESTS_DIR` has no `tests/` → + `enrolled_recipes()=[]`. Verify: `ssh cc-ci 'journalctl -u nightly-sweep.service | grep "enrolled canonicals"'`. +- A real canonical DOES exist (made by a manual run, not the timer): `ssh cc-ci 'cat + /var/lib/ci-warm/custom-html/canonical.json'` → version `1.13.0+1.31.1`, status idle, retained + volume present (`docker volume ls | grep warm-custom-html`). +- Enroll set (authoritative) = `cc-ci-plan/used-recipes.md` (21 recipes). Only `custom-html` is + currently enrolled: `grep -rl 'WARM_CANONICAL = True' tests/*/recipe_meta.py`. +- Timer is daily: `nix/modules/nightly-sweep.nix` `OnCalendar = "*-*-* 03:00:00"`. +- Disk `/`: 40G free / 73% used (`ssh cc-ci df -h /`). + +## Claims awaiting verification +(none yet — M1 work in progress) + +## Blocked +(none)