20 KiB
REVIEW-canon — Adversary verdicts for the canon (canonical-sweep) phase
SSOT for what is being verified: /srv/cc-ci/cc-ci-plan/plan-phase-canon-canonical-sweep.md.
Gates: M1 (machinery works locally, each piece proven) and M2 (proven end-to-end in real CI),
plus the operator-required samever-orthogonality proof. ## DONE only after fresh PASS on both.
Orientation @ 2026-06-17T06:18Z — Adversary online for canon phase; no gate claimed yet
Prior phase samever is DONE + Adversary-verified (M1 1310a95, M2 199f5b6, no VETO). The canon
phase has not been bootstrapped by the Builder yet: no STATUS-canon.md / BACKLOG-canon.md, no
claim(/status(canon commits, no inbox. I am idling per liveness protocol and will verify promptly
when M1 is CLAIMED (watchdog will ping on the claim).
Independent COLD baseline of the claimed starting state (§1) — captured before any canon work
Verified from my own clone + a cold ssh cc-ci, NOT from the Builder:
- Enrollment: exactly one recipe sets
WARM_CANONICAL = True→custom-html. (grep -rl 'WARM_CANONICAL *= *True' tests/*/recipe_meta.py→ 1 hit.) Matches §1 "only custom-html enrolled". - canonical.json records on cc-ci: exactly one, for
custom-html:/var/lib/ci-warm/custom-html/canonical.json={recipe: custom-html, version: 1.13.0+1.31.1, commit: 2b82ebabde74a9d9b1fd4cb49722a7037b18a176, status: idle, ts: 20260617T050314Z}, retained volumewarm-custom-html_..._contentpresent.- NOTE — plan §1 is now slightly stale. The plan (authored 04:43Z) says "ZERO canonical.json records exist." That was true at authoring, but the just-completed samever M2 e2e (custom-html two-run) wrote this record at 05:03:14Z. So there is now exactly one canonical, produced by samever's promote path. This is favorable evidence for canon M1(A) — the promote path already demonstrably writes a real, reusable record + retains the volume for custom-html — but the Builder must NOT cite custom-html's pre-existing canonical as proof of canon's new work (tagged-gate, trigger, all-enrolled, mirror-sync). I will require fresh, canon-attributable evidence for each M1/M2 sub-claim.
- Timer:
nightly-sweep.timerenabled+active, dailyOnCalendar(NEXT 2026-06-18 03:00:24 UTC), last fired 2026-06-17 03:09:20 UTC exit 0. So the timer plumbing works; the job was a near-no-op (only custom-html enrolled). Phase must (F) move this to weekly and (M2) prove a real fire advances canonicals, not exit-0 on an empty set.
What I will adversarially probe when claimed (from the plan, not the Builder's narrative)
- M1(A): a canon-attributable green cold run writes canonical.json AND
--quickwarm-reattach reuses it; promote now ALSO requires a release tag — feed an UNTAGGED state, confirm NO promote. - M1(C): mirror-sync is faithful upstream sync only — never pushes our changes to mirror
main, never disturbs unrelated PRs. Will diff before/after on a mirror. - M1(D): trigger keyed on latest release tag vs canonical version, NOT commit — new untagged
commits on
mainwith same tag ⇒ SKIP; newer tag ⇒ run cold on that tag. - M1(B): all ~21 recipes enrolled; warm-volume disk budget recorded (not silently dropped).
- M2: full sweep promotes greens / leaves reds intact / skips unchanged; run-twice ⇒ skip-all determinism; real (non-hollow) timer fire; tagged-promote proof (untagged green ⇒ no promote).
- samever orthogonality: (a) no-new-tag ⇒ SKIPPED; (b) new-tag ⇒ canonical(older)→new, real delta, promote; step-back NEVER fires in the sweep. Construct scenarios if the live set doesn't cover both.
- §2.G: if plausible's canonical lands at 3.0.1,
UPGRADE_BASE_VERSIONretired cleanly (key + resolver branch + docs + tests) AND plausible still resolves base 3.0.1 dynamically + passes — else kept with a recorded DECISIONS reason. Will re-derive, not trust. - Guardrail: NO AI at runtime (pure script + timer).
Pre-claim code read @ 2026-06-17T06:41Z — M1 still IN PROGRESS (M1.2 not yet committed)
Builder has landed 4 of 5 M1 items (27e0628 M1.1, 136100f M1.3, f8c0e53 M1.4+M1.5). M1.2 (the
release-tag trigger sweep_decision + mirror-sync wiring into nightly_sweep.sweep()) is not yet
committed — M1 is correctly not-yet-claimed. Read the landed code (NOT JOURNAL); points to scrutinize
when claimed:
- M1.1 (
27e0628):should_promote_canonicalgainedtaggedparam; caller computestagged = warm_reconcile.is_released_version(recipe, head_version). ⚠️ PROBE: the gate checkshead_version(code under test) butpromote_canonicalrecordslatest_version(recipe_tags(recipe))(newest tag). Confirm these can't diverge — e.g. a manual latest run wheremainsits on a tagged commit OLDER thanlatesttag would gate on the older tag yet promote the newer. In the sweep path (D) the tag is checked out so head==tag; verify the manual/RECIPE=<r>path too. - M1.4 (
f8c0e53): root cause = sweep service ran the nix-STORE runner copy (notests/) soTESTS_DIRmissing →enrolled_recipes()=[]. Fix setsCCCI_REPO=/etc/cc-ci+cd+ execs$CCCI_REPO/runner/nightly_sweep.py. ⚠️ PROBE at M2: confirm/etc/cc-ciactually exists on cc-ci, has runner/ AND tests/, and is git-pulled before nixos-rebuild (else still hollow). The fix also means sweep-logic ships via checkout pull, NOT a store rebuild — verify deploy procedure pulls it. - M1.5 (
f8c0e53):OnCalendardaily →Sun *-*-* 03:00:00, Persistent kept. Trivial; verify the deployed timer shows the weekly schedule after M2.1 nixos-rebuild. - M1.3 (
136100f): enroll all 21 — verify the count is exactly theused-recipes.mdset and that fixtures (custom-html-*-bad, concurrency, regression) were NOT enrolled. - Still owed for M1 claim: M1.2
sweep_decision(recipe, latest_tag, canon_version)→ run|skip:no-new-version|skip:never-released keyed onversion_keyNOT commit; mirror-sync viaopen-recipe-pr.sh --reconcile-only(faithful, vendored); cold-run ON THE TAG. Unit tests for all.
M1: PASS @ 2026-06-17T07:12Z — machinery cold-verified (claim 626badd, code @ d4cc9e4)
Verified from a COLD start: my own clone for code/pure-logic, a fresh independent clone on cc-ci
(/tmp/adv-canon @ 626badd) for the unit suite, and a cold ssh cc-ci for live state. I did NOT
read JOURNAL-canon.md before forming this verdict. Every M1 sub-claim re-derived against the plan,
not the Builder's narrative.
M1.1 tagged-promote gate (§2.A) — PASS.
- Code:
should_promote_canonicalreturnsis_enrolled and overall==0 and not quick and not ref and tagged; caller computestagged = is_released_version(recipe, head_version);promote_canonicalnow records the TESTEDhead_version(commitd4cc9e4), not a re-derivedlatest_version. My prior PROBE (head_version-vs-latest_version divergence on a manualRECIPE=<r>run) is CLOSED byd4cc9e4— read the diff, it promotes exactly the tested version. - Unit: ran
tests/unit/test_promote.pymyself in the fresh cc-ci clone — all 6 pass, each gate clause individually exercised (test_no_promote_when_untaggedassertstagged=False → False; all-conditions assertstagged=True → True). Not hollow. - Live PROMOTE: re-derived
git rev-list -n1 1.13.0+1.31.1=df2e27339f983a25da548fc8b8d56e9af8645f83and/var/lib/ci-warm/custom-html/canonical.jsonrecords EXACTLY that commit + version1.13.0+1.31.1, status idle, retained volumewarm-custom-html_..._contentpresent. So the promote recorded the tag's own commit (correcting samever's earlier2b82ebamerge-commit record) — the divergence fix is live-proven, not just unit-tested. - Live UNTAGGED → NO PROMOTE: independently confirmed
1.13.1+1.31.1isNOT-A-TAGin the custom-html clone →is_released_versionreturns False → gate blocks. canonical.json is unchanged (still df2e273). The full live tagged-vs-untagged e2e is M2.4; at M1 the code + unit + live-not-a-tag + unchanged-canonical chain is sufficient.
M1.2 release-tag trigger + faithful mirror-sync (§2.C/§2.D) — PASS.
sweep_decisionre-derived directly (no pytest) — truth table exactly right and VERSION-keyed, not commit-keyed: new>canon→run; equal→skip no-new-version; older→skip; no tag→skip never-released; no canon→run(seed). The function takes only (latest_tag, canon_version) — it CANNOT see commits, so new untagged commits onmaincan never trigger a run. That IS the operator's refinement.scripts/recipe-mirror-sync.shread in full: pins an explicit coopcloudupstreamremote, force- syncs mirrormain := upstream/main+ all tags, pushes NOTHING of our own. PR close is gated ongit merge-tree --write-tree NEW_MAIN_SHA <pr-head>== upstreamMAIN_TREE(i.e. the PR's merge is a no-op because it's already in upstream) → close; otherwise "left as-is". Faithful, never merges, never disturbs unrelated PRs.nightly_sweep.sweep()wiring read: per enrolled recipemirror_sync → fetch_recipe → sweep_decision → run_on_tag(checkout the release tag +CCCI_SKIP_FETCH=1so head IS the tag → tagged-gate passes; REF popped → cold → promote allowed). Pure script.
M1.3 all recipes enrolled (§2.B) — PASS. My grep -rl 'WARM_CANONICAL = True' set is EXACTLY the
21 used-recipes.md rows (incl. uptime-kuma, the lone external row — correctly enrolled for
CI/canonical even though excluded from weekly upgrade). Fixtures (custom-html-*-bad, concurrency,
regression) NOT enrolled.
M1.4 hollow-sweep fix — PASS (code; live is M2.1). nix/modules/nightly-sweep.nix exports
CCCI_REPO=/etc/cc-ci, cds there, and execs $CCCI_REPO/runner/nightly_sweep.py — the checkout WITH
tests/, replacing the store copy whose missing tests/ caused enrolled_recipes()=[]. Root cause
correctly addressed in code. ⚠️ CARRIED TO M2: /etc/cc-ci is currently STALE — git -C /etc/cc-ci
HEAD is e60415d (Phase-3 era), canon code NOT yet there. M2.1 deploy MUST git -C /etc/cc-ci pull
before nixos-rebuild, else the deployed timer stays hollow. I will verify the pull + a real fire at
M2.5.
M1.5 weekly timer (§2.F) — PASS (code). OnCalendar = "Sun *-*-* 03:00:00", Persistent = true.
Deployed-timer schedule verified at M2.
Guardrail NO-AI-at-runtime — PASS. grep of nightly_sweep.py / warm_reconcile.py /
recipe-mirror-sync.sh for anthropic|claude|openai|llm|gpt|ai_ → only one code COMMENT match, zero
calls. Pure script + systemd timer.
Full unit suite — PASS. Ran cc-ci-run -m pytest tests/unit/ in the fresh independent cc-ci clone
@ 626badd → 295 passed in 5.60s, matching the claim. Enrolling 21 recipes broke nothing.
Minor narrative note (not a defect): the claim cites proof-A ts 065027Z but live canonical ts is
065532Z; promoting the same tag again yields the same version+commit (only ts moves), so this is a
benign re-run, not a divergence — the recorded version/commit are correct either way.
Verdict: M1 PASS. No VETO. All M1 DoD items cold-verified; the deployed-state items (M1.4 live, M1.5 timer schedule) are honestly scoped by the Builder to M2 and I will hold them there. (Consulted JOURNAL-canon.md only AFTER writing this verdict: no surprises — confirms the proof-A/C sequence.)
Pre-claim observation @ 2026-06-17T07:23Z — M2.1 deploy verified live (NOT a gate verdict)
Builder inbox: M1 PASS consumed; M2.1 deploy done; M2.2 full sweep started (long, serial, hours). M2 NOT yet claimed — no formal verdict here, just an opportunistic READ-ONLY check that resolves my two carried-to-M2 code-only probes (favorable; I'll still re-verify the live proofs at the M2 claim):
- /etc/cc-ci now at
3bdd5d1(current main; was stalee60415dPhase-3 era), withtests/+runner/nightly_sweep.pypresent → the deploy DIDgit -C /etc/cc-ci pull. My M1.4 "deploy must pull or stays hollow" risk is cleared. - Deployed timer:
systemctl cat nightly-sweep.timer→OnCalendar=Sun *-*-* 03:00:00,Persistent=true(weekly, live). M1.5 deployed-schedule probe cleared. - Deployed code path is the non-hollow one: the in-flight sweep (PID 1620630) runs
nightly_sweep.sweep()from/etc/cc-ci/runner, andrun_recipe_ci.pyruns from/etc/cc-ci/runner/— i.e. the checkout WITHtests/, not the store copy. Root cause fixed live. STILL OWED at the M2 claim (I will cold-verify, not trust the sweep log): canonicals actually promoted for greens / reds left intact / no-new-tag skipped (M2.2); run-twice→skip-all (M2.3); live tagged-vs- untagged (M2.4); real timer fire advances canonicals via full main() incl. roll (M2.5); samever never fires in-sweep (M2.6); disk budget recorded (M2.7); §2.G UPGRADE_BASE_VERSION retirement (M2.8). Staying read-only while the sweep is in flight (single node).
Pre-claim finding @ 2026-06-17T08:40Z — M2.2 sweep: PASS-labelled but promotes mostly FAILING (evidence captured)
NOT a verdict (M2 unclaimed). Read-only capture from /root/canon-verify/_sweep.log so the evidence
survives log growth. Per-recipe promote outcomes observed (alphabetical sweep, ~7 recipes deep):
- bluesky-pds: cold rc=0;
WC5 promote failed: abra app deploy warm-bluesky-pds… failed (1)→ NO canonical; loggedPASS (promoted). - cryptpad: cold rc=0;
canonical cryptpad advanced to known-good 0.6.0+v2026.5.1→ canonical WRITTEN. ✓ (the only real promote so far) - custom-html: SKIP no-new-version (pre-existing canonical). ✓ expected.
- custom-html-tiny: cold rc=0;
WC5 promote failed: warm-custom-html-tiny… not healthy over HTTPS / (404)→ NO canonical; loggedPASS (promoted). - discourse: cold rc=142 (deploy timeout — the 51m wedge I flagged) →
FAIL (canonical unchanged). Legit red. - drone: cold rc=0;
WC5 promote failed: …warm-drone… timed out after 600 seconds→ NO canonical; loggedPASS (promoted). - ghost: cold rc=0;
WC5 promote failed: abra app new ghost… failed (1)→ NO canonical; loggedPASS (promoted). - gitea: promote in progress at capture.
Live
/var/lib/ci-warm/*/canonical.json= {cryptpad, custom-html} only. NET NEW this sweep = 1 (cryptpad). Leftover warm volumes w/ NO registry record: drone, gitea, custom-html-tiny (partial-promote residue).
DEFECT-1 [adversary] (results-label): nightly_sweep.sweep() line ~119 sets
results[r] = "PASS (promoted)" if rc==0 else "FAIL …". Because promote_canonical is non-fatal
(swallows its own exception so it "never fails a green run"), a FAILED promote still yields rc=0 →
the summary asserts "PASS (promoted)" when NO canonical was written. The per-recipe results log — the
DoD's evidence that "canonicals actually promoted for the green recipes" — is therefore UNTRUSTWORTHY.
Repro: grep "WC5 promote failed" _sweep.log vs grep "PASS (promoted)" _sweep.log — failed promotes
appear in BOTH. Fix direction: label from "does a canonical record now exist at the tested version",
not from rc.
DEFECT-2 [adversary] (promote path failing broadly): 4 of 5 completed promotes FAILED across 4
modes (warm app deploy failed(1) / timed-out 600s / unhealthy-404 / app new failed(1)). Cold CI is
green for each, so this is specifically the WARM-CANONICAL promote deploy failing — the exact
end-to-end step this phase exists to make real. Root cause TBD (node contention on the long serial
run / unclean cold-test teardown / discourse residue / flat 600s warm timeout) — Builder's to diagnose.
Determinism risk (M2.3): every recipe left without a canonical (bluesky-pds, custom-html-tiny,
drone, ghost, discourse…) will sweep_decision(latest, None) → run on a second sweep, NOT skip — so
run-twice ≠ skip-all until promotes actually succeed. I will hard-test this at the M2 claim.
Sent the Builder a BUILDER-INBOX heads-up (ba28a88). When M2 is claimed I will cold-verify, per recipe,
that a canonical record exists at the tested tag version (not trust the PASS label), and re-run the
determinism no-op myself. If promotes are still failing / mislabelled, M2 FAILs.
Pre-claim note @ 2026-06-17T09:11Z — fix f94de22 validated by Builder; M2 re-run in flight (NOT a verdict)
Consumed ADVERSARY-INBOX (Builder ~09:10Z): DEFECT-1/DEFECT-2 fix validated live — custom-html-tiny PROMOTED (1.2.0+2.43.0, was 404) and ghost PROMOTED (1.4.0+6.45.0-alpine, was app-new dirty-tree FATA); label now derives from "canonical record exists at tested version". 7 canonicals claimed (cryptpad, custom-html, custom-html-tiny, ghost, gitea, hedgedoc, immich). Full sweep re-run in flight. M2 unclaimed. Staying read-only off the node (sweep in flight, single node).
bluesky-pds "documented RED" — must scrutinise at M2 claim, two ways it could be wrong:
- The conservative direction is CORRECT per guardrail (no force-promote; prior known-good kept). But I must confirm bluesky has NO stale/partial canonical written, and that it is recorded as an exception in DECISIONS (plan §2.B: "don't silently skip" / §4 "documented exception"), not just left silent.
- The real risk: Builder says warm health fails because traefik doesn't route the WARM domain
(
warm-bluesky-pds…→ 000) though internal localhost:3000 = 200, and "cold domain worked." I must verify this is genuinely bluesky-SPECIFIC and not a warm-canonical-deploy machinery defect (warm domain label/overlay/router rule) that could equally hit other recipes — if the warm-domain routing is systemically flaky, a recipe could intermittently fail to promote (or, worse, a health probe could pass spuriously). At claim I will: (a) confirm OTHER promoted recipes (custom-html-tiny, ghost, immich) actually answered 200 over HTTPS on THEIR warm domains during promote (grep ready-probe lines), and (b) independently curl a couple of the live warm canonical domains. If warm-domain routing is broadly unreliable, the promote evidence is suspect and M2 is not done.
Pre-claim observation @ 2026-06-17T09:34Z — read-only sweep-progress peek (NOT a verdict)
Sweep re-run still in flight (proc 1712141 from /etc/cc-ci/runner); 7 canonicals on disk. Captured
from _sweep.log so it survives log growth:
- DEFECT-1 fix is LIVE and honest:
sweep: bluesky-pds rc=0 (GREEN-BUT-PROMOTE-FAILED (canonical=none, expected 0.3.0+v0.4.219))— the label no longer claimsPASS (promoted)on a failed promote. Favorable; I will still confirm the label matches the on-disk registry per recipe at claim before closing DEFECT-1. cryptpad / custom-html / custom-html-tiny→SKIP no-new-version(latest tag == canonical). The skip path works for promoted recipes.discourse rc=143 → FAIL (red; canonical unchanged)— legit red (timeout/SIGTERM), canonical kept.- NEW —
sweep: mirror-sync drone rc=128 (non-fatal — continuing): drone's faithful mirror-sync FAILED (git rc=128) yet the sweep proceeded to RUN drone against the un-synced mirror. SCRUTINISE at claim: plan §2.C requires the mirror be reconciled to upstream FIRST; a swallowed sync failure means the recipe may be tested against a stale mirror (wrong tags/version) — the trigger (D) and tagged promote then rest on un-synced state. Is rc=128 a benign "already up to date / no upstream" case or a real sync failure? Must check what drone's sync hit and whether the tested tag is genuinely upstream's. - DETERMINISM (M2.3) — central risk crystallising: bluesky-pds (promote-failed) and discourse (red)
both end
canonical=none, so a 2nd sweep →sweep_decision(latest, None) → RUN, NOT skip. Plan M2.3 literally requires run-twice → "SKIPS every recipe." That can hold ONLY if every enrolled recipe actually promoted. Red/promote-failed recipes legitimately re-run (no known-good to protect) — which is arguably correct behaviour but is NOT "skip every recipe." At the M2 claim I will require the Builder's determinism evidence to honestly reconcile this with §3/§5: either (i) every recipe promotes so run-twice is a true no-op, or (ii) a reasoned, plan-consistent argument that the no-op property applies to the promoted set and red recipes correctly retry — and I'll judge it against the plan, not accept a partial skip-all relabelled as success.