cc-ci/machine-docs/BACKLOG-canon.md

# BACKLOG — phase `canon`

## Build backlog (Builder-owned)

Milestone map → Definition of Done (§5). M1 = machinery + unit tests (Adversary cold-verifies the
pieces). M2 = proven end-to-end in real CI.

### M1 — machinery works locally, each piece proven

- [x] **M1.1 Tagged-promote gate (§2.A).** Extend `should_promote_canonical` to ALSO require the
      tested head version corresponds to a published release tag. Add a `tagged: bool` param computed
      at the call site (`head_version in recipe_tags(recipe)`); keep the function pure. Untagged head
      → no promote. Unit tests: enrolled+green+cold+not-ref+tagged → True; each missing condition
      (incl. untagged) → False.
- [x] **M1.2 Release-tag trigger + mirror-sync in the sweep (§2.C/§2.D).** New pure helper
      `sweep_decision(recipe, latest_tag, canon_version)` → `run` | `skip:no-new-version` |
      `skip:never-released`, keyed on `version_key` (NOT commit). Wire `nightly_sweep.sweep()` to, per
      enrolled recipe: (1) faithful mirror-sync main+tags to upstream (reuse open-recipe-pr.sh
      `--reconcile-only`, vendored into the repo for reproducibility); (2) compute latest release tag
      vs canonical; (3) skip or run cold ON THE TAG (checkout tag + `CCCI_SKIP_FETCH=1`). Unit tests
      for `sweep_decision` (new tag → run; equal → skip; older/no tag → skip).
- [x] **M1.3 Enroll all recipes (§2.B).** Set `WARM_CANONICAL = True` in each of the 21 used-recipes
      `tests/<r>/recipe_meta.py`. Leave fixtures (custom-html-*-bad, concurrency, regression) alone.
- [x] **M1.4 Hollow-sweep fix (root cause).** Make the deployed sweep read the REAL tests/ + run
      current code: set `CCCI_REPO=/etc/cc-ci` in the sweep service and run `nightly_sweep.py` from
      the checkout (not the store copy). Deploy procedure pulls `/etc/cc-ci` before nixos-rebuild.
- [x] **M1.5 Weekly timer (§2.F).** `nightly-sweep.nix` `OnCalendar` daily → weekly (one line),
      `Persistent=true` (already set). Low-traffic slot.

### M2 — proven end-to-end in real CI

- [ ] **M2.1 Deploy** the M1 changes: `git -C /etc/cc-ci pull` + `nixos-rebuild switch`; verify host
      health after.
- [ ] **M2.2 Full sweep run** across the enrolled set on cc-ci: mirrors synced, canonicals promoted
      for green recipes (records with correct version+commit), red recipes left intact, no-new-tag
      recipes skipped. Per-recipe results log captured.
- [ ] **M2.3 Determinism proof:** run the sweep a SECOND time immediately → every recipe SKIPS
      (latest tag == canonical for all) = clean no-op, no CI rerun.
- [ ] **M2.4 Tagged-promote proof:** a green run on an UNTAGGED state does NOT promote; a green run
      on a TAGGED release DOES. Construct if the live set doesn't cover it.
- [ ] **M2.5 Real (non-hollow) timer fire:** after a timer fire, canonicals have ADVANCED (evidence),
      not exit-0 on an empty set.
- [ ] **M2.6 samever orthogonality:** (a) no new tag (even with untagged commits on main) → SKIP, no
      upgrade-tier run, no promote; (b) new tag → cold-test new tag, canonical(older)→new, promote.
      Show step-back never fires inside the sweep.
- [ ] **M2.7 Disk budget recorded;** all recipes enrolled (or documented exception in DECISIONS).
- [ ] **M2.8 §2.G UPGRADE_BASE_VERSION retirement** — after plausible's canonical lands at 3.0.1:
      remove the pin, confirm dynamic base resolves 3.0.1 + passes; if it holds, strip the key
      (meta KEYS, resolver branch, docs, unit tests) + update bluesky-pds comment. Else KEEP with a
      recorded reason in DECISIONS.

## Notes
- Order within M1: M1.1 → M1.2 (depend on version helpers) → M1.3/M1.4/M1.5 (config). Claim M1 only
  when all unit tests green + tree clean + pushed.

## Adversary findings

- [x] **DEFECT-1 [adversary] (M2.2 results-label untrustworthy)** — CLOSED @16:14Z (M2 PASS). The
  production timer fire labels honestly: gitea/bluesky show `GREEN-BUT-PROMOTE-FAILED` (NOT a false
  `PASS (promoted)`), and the 16 `PASS (promoted)` labels each correspond to an on-disk canonical at the
  tested tag (commit==tag re-derived for all 16). Label now derives from the registry, not rc. ↓ orig:
  `nightly_sweep.sweep()` labelled `PASS (promoted)` off `rc==0`, but `promote_canonical` is non-fatal
  (swallows its exception), so a FAILED promote on a green cold run still showed `PASS (promoted)`
  though NO canonical was written. The per-recipe results log (DoD evidence "canonicals actually
  promoted for the greens") was therefore misleading. Repro (run-1 evidence captured): `grep "WC5
  promote failed" _sweep.log` vs `grep "PASS (promoted)" _sweep.log` — failed promotes appeared in
  BOTH. Builder fix f94de22 derives the label from `canonical.read_registry(r).version == latest`
  (PASS / GREEN-BUT-PROMOTE-FAILED / FAIL). **Close only after I re-run the sweep and confirm the
  label matches the on-disk registry for every recipe.**
- [x] **DEFECT-2 [adversary] (M2.2 promote path failing broadly)** — CLOSED @16:14Z (M2 PASS). The
  faithful-install promote (f94de22) + fresh-seed teardown (ca89d44) + cold-dep lock-release (655a999)
  fixed all 4 failure classes: 16 recipes promote clean (commit==tag re-derived), incl. ghost,
  custom-html-tiny, drone (clean-promoted 11:50 in the post-fix sweep, no 600s timeout). Determinism
  holds: the 2nd sweep SKIPs all 15 promoted-at-latest, only documented exceptions RUN. ↓ orig:
  Run-1: 4 of 5 completed promotes FAILED across 4 modes though cold CI was green — ghost (`abra app
  new` FATA dirty tree), bluesky-pds (missing `pds_plc_rotation_key`), custom-html-tiny (404, no
  seeded index), drone (warm deploy timed out 600s). The bare `abra app deploy` in `promote_canonical`
  lacked the cold install's wiring. Net-new canonical run-1 = 1 (cryptpad). Builder fix f94de22:
  promote now does a faithful install (clean tree → provision deps → `deploy_app` w/ install_steps +
  overlay + ready-probes). **Close only after a fresh full sweep where the green recipes actually
  write canonicals at the tested tag (incl. the 4 failure classes), AND determinism (M2.3) holds
  (run-twice → skip-all).** Note the drone 600s timeout may be node-contention, not wiring — watch it.
- [x] **DEFECT-3 [adversary] (deployed nightly-sweep.service env missing git-lfs → manual-sweep env ≠
  production-timer env)** — CLOSED @16:14Z (M2 PASS). Fix 2c61f2f prepends the host system PATH so the
  sweep runs recipes in Drone's exact env: `nightly-sweep` ExecStart line 17 byte-matches
  `drone-runner-exec.service` PATH; git-lfs present at `/run/current-system/sw/bin`. Behaviorally proven
  in the REAL timer fire (13:01:01→14:37:22Z, Result=success): `test_lfs_roundtrip PASSED` (gitea flips
  cold-green) and the timer ITSELF re-validated the promoted set under production env — 14 SKIP, custom-html
  advanced 1.11→1.13, no NEW promote failures the manual env hid. Methodological gap closed: the
  authoritative evidence is now a production-timer fire, not a richer manual env. ↓ orig:
- [historical] **DEFECT-3 (orig text)** — The REAL timer fire (12:34Z, nightly-sweep.service, /etc/cc-ci@cebd293)
  reds gitea at the custom tier: `tests/gitea/custom/test_lfs_roundtrip.py` → `git: 'lfs' is not a git
  command` → level 3/5 → rc=1. Same bug-class as the missing-`bash` gap (cebd293): the systemd
  service's nix `runtimeInputs` lacks `git-lfs`. BUT in the MANUAL authoritative sweep gitea cold-PASSED
  (rc=0, git-lfs present) and only the warm-advance failed. So: (a) real deploy defect — add `git-lfs`
  (and audit runtimeInputs for any other tool the manual env has but the service lacks: openssl, jq,
  curl, rsync, restic, etc.); (b) METHODOLOGICAL — the manual M2.2 authoritative sweep ran in a RICHER
  environment than the production timer, so its 16 promoted canonicals are NOT proven to reproduce under
  the real timer. The DoD is "proven end-to-end in REAL CI (the timer)". Repro: `journalctl -u
  nightly-sweep.service | grep -A40 "sweep: gitea RUN"`. **Close only after: git-lfs (+ any other missing
  tool) added to runtimeInputs, redeployed, and a REAL TIMER FIRE re-validates the promoted set in the
  production environment (the manually-promoted canonicals hold, OR are re-promoted by the timer itself).**