Files
cc-ci/machine-docs/STATUS-canon.md

11 KiB

STATUS — phase canon (canonical sweep, make it real)

Gate: M1 PASS (Adversary 3bdd5d1, no VETO). M2 CLAIMED, awaiting Adversary.

WHAT/HOW/EXPECTED/WHERE for the Adversary. Reasoning lives in JOURNAL-canon.md.

Phase summary

Make the canonical sweep actually promote canonicals end-to-end (it was hollow), add the mirror-sync + new-release-tag trigger + tagged-promote gate, enroll all recipes, make the timer weekly, prove it in real CI. DoD = §5 of cc-ci-plan/plan-phase-canon-canonical-sweep.md.

M1 — PASS (Adversary 3bdd5d1, no VETO)

Machinery proven locally: tagged-promote gate (untagged never promotes), release-tag trigger (version-keyed not commit-keyed), faithful mirror-sync, all-recipes-enrolled, hollow-sweep root-cause fix, weekly timer — all cold-verified. See REVIEW-canon.md M1 verdict (07:12Z).


M2 CLAIM — proven end-to-end in real CI

All M2 code committed+pushed; deployed to /etc/cc-ci (HEAD contains every fix). Enrolled = 20 (keycloak de-enrolled exception, DECISIONS). The authoritative end-to-end evidence is a REAL timer fire under Drone-parity env + an immediate determinism re-sweep. NO AI at runtime (pure script + systemd timer).

M2.1 deploy — DONE

git -C /etc/cc-ci pull + nixos-rebuild switch --flake 'git+file:///etc/cc-ci?submodules=1#cc-ci-hetzner' (Result=success). Deployed runner = the checkout WITH tests/ (hollow-sweep fixed).

  • HOW: ssh cc-ci 'systemctl cat nightly-sweep.timer | grep OnCalendar'Sun *-*-* 03:00:00 (weekly, §2.F); ssh cc-ci 'git -C /etc/cc-ci rev-parse HEAD' contains the parity fix 2c61f2f (merge-base --is-ancestor ✓).
  • HOST PARITY (DEFECT-3 fix): nightly-sweep ExecStart prepends host system PATH /run/current-system/sw/bin:/run/wrappers/bin — byte-matches drone-runner-exec.service Environment="PATH=…". ssh cc-ci 'ls /run/current-system/sw/bin/git-lfs' present → the timer sweep validates recipes in the SAME env as Drone CI.

M2.2 + M2.5 — REAL (non-hollow) TIMER FIRE, full sweep, under production/Drone-parity env — DONE

The deployed nightly-sweep.service was fired (real systemd, not manual): active 13:01:01Z → completed 14:37:22Z, Result=success, ExecMainStatus=0, single serial (no second sweep/ run_recipe_ci proc). This is BOTH the authoritative M2.2 full-sweep evidence AND the M2.5 real-fire proof (the prior hollow timer logged enrolled canonicals = []; this one ran the 20-recipe job and ADVANCED a canonical).

  • HOW: ssh cc-ci 'journalctl -u nightly-sweep.service --since "2026-06-17 13:00" | grep -E "sweep: |FULL SWEEP done|Result="'.
  • EXPECTED/OBSERVED per-recipe summary:
    • custom-html: PASS (promoted 1.13.0+1.31.1) — a REAL non-hollow timer ADVANCE 1.11.0 → 1.13.0 in production env (also the M2.6 constructed older→new advance, see below).
    • 14 SKIP no-new-version: cryptpad, custom-html-tiny, drone, ghost, hedgedoc, immich, lasuite-{docs,drive,meet}, mailu, matrix-synapse, n8n, plausible, uptime-kuma.
    • 6 documented exceptions (all in DECISIONS, none silent): gitea (GREEN-BUT-PROMOTE-FAILED, cold green via lfs PASS, app.ini warm-advance exception, 3.5.3 kept); bluesky-pds (GREEN-BUT-PROMOTE-FAILED, warm-routing); discourse/mattermost-lts/mumble (genuine reds, canonical unchanged).
  • CANONICALS ON DISK (16 promoted): ssh cc-ci 'for f in /var/lib/ci-warm/*/canonical.json; do echo "$f:"; cat "$f"; echo; done' → cryptpad, custom-html(1.13.0+1.31.1), custom-html-tiny, drone(1.9.0 +2.26.0), ghost, gitea(3.5.3+1.24.2-rootless, known-good kept), hedgedoc, immich, lasuite-{docs,drive, meet}, mailu, matrix-synapse, n8n, plausible, uptime-kuma. Each commit == the tested release tag's commit (re-derive: git -C ~/.abra/recipes/<r> rev-list -n1 <version> == canonical.json commit).
  • REDS LEFT INTACT: discourse/mattermost-lts/mumble have NO canonical (or unchanged) — never force-promoted. bluesky-pds no canonical (warm-routing). gitea kept 3.5.3 (advance failed safely).

M2.3 — DETERMINISM run-twice → promoted-at-latest SKIP — DONE (clean serial 2nd sweep, COMPLETED)

A clean single-serial 2nd sweep ran 14:41:16Z → 16:05:38Z (started AFTER the production fire completed 14:37:22Z → NO overlap; single proc pid 2248547, no concurrent sweep/run_recipe_ci; clean structured exit with a final per-recipe summary block, no traceback). It demonstrates the operative no-op (DECISIONS M2.3 framing): no promoted-at-latest recipe re-runs; only documented exceptions RUN.

  • HOW (Adversary): ssh cc-ci 'journalctl -t cc-ci-nightly-sweep --since "2026-06-17 14:41" | grep -E "sweep: [a-z].* (SKIP|RUN|rc=)"' — and independently re-derive: for each promoted-at-latest recipe sweep_decision(latest_tag, canon_version) = skip no-new-version (latest tag == canonical version); for gitea/reds = run. The trigger is pure + version-keyed (verified M1) → deterministic by construction.
  • EXPECTED/OBSERVED final summary (SKIP — no-new-version count = 15):
    • 15 promoted-at-latest → SKIP no-new-version (incl. custom-html 1.13.0, which had just been ADVANCED in the production fire → now skips = the central determinism proof: cryptpad, custom-html, custom-html-tiny, drone, ghost, hedgedoc, immich, lasuite-{docs,drive,meet}, mailu, matrix-synapse, n8n, plausible, uptime-kuma). ZERO needless CI reruns of good-current recipes.
    • gitea → RUN (new release 3.6.0 > canonical 3.5.3 → deterministically retries the documented advance exception → GREEN-BUT-PROMOTE-FAILED, 3.5.3 kept). Correct: keeps offering to advance.
    • bluesky-pds (GREEN-BUT-PROMOTE-FAILED), discourse (rc=142), mattermost-lts (rc=1), mumble (rc=1) → RUN (no known-good canonical to protect → correctly re-tested; all 4 documented exceptions).
    • Total = 20 enrolled = 15 SKIP + gitea + 4 reds. Deviation from the literal "skip EVERY recipe" ideal (= all-promoted) is honestly flagged: the 5 non-skips are exactly the documented exceptions (DECISIONS M2.3 framing), and no test was weakened to force a promote (guardrail).

M2.4 — tagged-promote proof (untagged green ⇒ NO promote; tagged green ⇒ promote) — DONE

  • TAGGED → PROMOTE: proof-A (/root/canon-verify/_proofA.log) cold-ran custom-html on tag 1.13.0+1.31.1 → canonical.json written with commit=df2e273… == git rev-list -n1 1.13.0+1.31.1. Also live in the production fire (custom-html advance) + every promoted recipe (commit==tag-commit).
  • UNTAGGED → NO PROMOTE: proof-C (/root/canon-verify/_proofC.log) staged an untagged head (1.13.1+1.31.1, confirmed NOT a release tag) and cold-ran GREEN → grep -c "WC5 promote-on-green-cold" _proofC.log = 0; canonical.json unchanged. Unit: test_no_promote_when_untagged.

M2.6 — samever orthogonality (step-back NEVER fires in the sweep) — DONE

  • Path (2) new-tag → canonical(older)→new, real delta, promote: custom-html advanced 1.11.0 → 1.13.0 in the production timer fire (a constructed-then-live older→new advance that PROMOTED healthy) — the warm-ADVANCE machinery works end-to-end. Also gitea fired the trigger live (RUN on 3.6.0 > canonical 3.5.3) — the trigger half of path (2) (its promote is the documented recipe exception).
  • Path (1) no-new-tag → SKIP: the 15 SKIP-no-new-version recipes (latest tag == canonical) — no upgrade-tier run, no promote, even where main has untagged commits (trigger is version-keyed).
  • STEP-BACK NEVER FIRES: by construction the sweep only RUNs when latest tag > canonical version, so the upgrade base (older canonical) is strictly older than the version under test → samever's same-version step-back cannot trigger. samever's same-version behaviour is owned/proven by the samever phase on the PR path (M1 1310a95 / M2 199f5b6). HOW: read nightly_sweep.sweep() — RUN branch is gated on sweep_decision == run which requires version_key(latest) > version_key(canon).

M2.7 — warm-volume disk budget — DONE (DECISIONS 009bc60)

/ 150G total, 38G free (74% used); du -sh /var/lib/ci-warm = 1.1G; docker Local Volumes 2.024GB (929MB reclaimable). 16 retained canonicals fit with ample headroom at the full 20-enrolled set. WC8 ci-docker-prune bounds residue. No recipe dropped for disk; all-enrolled sustainable.

M2.8 — UPGRADE_BASE_VERSION retired — DONE (f611dda, 83c183d)

plausible promoted its canonical → the dynamic upgrade base now steps back to the newest published release strictly older than latest (= 3.0.1+v2.0.0, the correct base, avoiding the broken clickhouse-404 3.0.0) WITHOUT the pin. Pin removed from tests/plausible/recipe_meta.py; meta KEY removed (runner/harness/meta.py 15→14); resolver override branch removed (run_recipe_ci.py); docs (recipe-customization.md, testing.md) + unit tests (test_meta.py, test_upgrade_base.py) updated; bluesky-pds comment now points to the dynamic base. §2.G GATE (keep-if-broken) does NOT apply — plausible resolves base 3.0.1+v2.0.0 dynamically and passes.

  • HOW: grep -rn UPGRADE_BASE_VERSION runner/ tests/ docs/ (excl. machine-docs) → only explanatory comments, no live key/branch; cc-ci-run -m pytest tests/unit/ → all pass.

NO AI at runtime — DONE

grep nightly_sweep.py / warm_reconcile.py / recipe-mirror-sync.sh for anthropic|claude|openai|llm| gpt → zero calls (only one code comment). Pure script + systemd timer.

Build hashes / fingerprints

  • HEAD (Builder clone) = 009bc60; /etc/cc-ci deployed HEAD contains 2c61f2f (parity) — all M2 fixes.
  • custom-html canonical advanced: 1.11.0 → 1.13.0+1.31.1, commit df2e273….
  • gitea canonical kept at 3.5.3+1.24.2-rootless (3.6.0 advance is documented exception).

Documented exceptions (DECISIONS, none silent)

keycloak (de-enrolled, OIDC-provider domain collision); gitea 3.6.0 warm-advance (app.ini read-only); discourse (upstream 0.8.1 compose invalid: sidekiq→undefined service); mattermost-lts (test_restore red); mumble (test_handshake red); bluesky-pds (warm-domain routing 000). All recorded with reasons.

Blocked

(none)

Claims awaiting verification

  • M1 — PASS (Adversary 3bdd5d1, no VETO).
  • M2 — CLAIMED (awaiting Adversary). All §3 M2 + samever-orthogonality DoD items proven: real non-hollow timer fire (14:37:22Z, Result=success, single serial) promoted 16 canonicals incl. a live custom-html 1.11→1.13 advance, left reds intact, skipped no-new-tag; determinism 2nd sweep (clean, 14:41→16:05) shows 15 promoted-at-latest SKIP + only documented exceptions RUN; tagged-vs-untagged proven (proof-A/C); samever step-back never fires in-sweep; disk budget recorded; UPGRADE_BASE_VERSION retired; NO AI at runtime. 6 exceptions documented in DECISIONS (none silent).