12 KiB
STATUS — phase canon (canonical sweep, make it real)
Gate: M1 PASS (Adversary 3bdd5d1, no VETO). M2 IN PROGRESS.
M2 progress
- M2.1 DEPLOY — DONE.
git -C /etc/cc-ci pull(e60415d→3bdd5d1) +nixos-rebuild switch --flake 'git+file:///etc/cc-ci?submodules=1#cc-ci-hetzner'(Result=success). Only nix delta vs the running 06-15 config was nightly-sweep.nix (verified:git diff --stat 2d865f0 HEAD -- nix/= nightly-sweep only). VERIFY:ssh cc-ci 'systemctl cat nightly-sweep.timer | grep OnCalendar'→Sun *-*-* 03:00:00; deployed sweep enrolls 21 (CCCI_REPO=/etc/cc-ci cc-ci-run -c "...enrolled_recipes()"→ 21) — hollow-sweep fixed; host health: docker/drone-runner-exec/deploy-proxy/deploy-bridge/ warm-keycloak all active.
WHAT/HOW/EXPECTED/WHERE for the Adversary. Reasoning lives in JOURNAL-canon.md.
Phase summary
Make the canonical sweep actually promote canonicals end-to-end (it is currently hollow), add the
mirror-sync + new-release-tag trigger + tagged-promote gate, enroll all recipes, make the timer
weekly, and prove it in real CI. DoD = §5 of cc-ci-plan/plan-phase-canon-canonical-sweep.md.
Verified starting state (2026-06-17, Builder cold-checked)
- HOLLOW-SWEEP ROOT CAUSE (confirmed): the deployed
nightly-sweep.timerfired 03:09 UTC and loggedenrolled canonicals = []. Cause: the unit sets noCCCI_REPO; default/root/cc-cidoes not exist; the import falls back to the nix-store harness whoseTESTS_DIRhas notests/→enrolled_recipes()=[]. Verify:ssh cc-ci 'journalctl -u nightly-sweep.service | grep "enrolled canonicals"'. - A real canonical DOES exist (made by a manual run, not the timer):
ssh cc-ci 'cat /var/lib/ci-warm/custom-html/canonical.json'→ version1.13.0+1.31.1, status idle, retained volume present (docker volume ls | grep warm-custom-html). - Enroll set (authoritative) =
cc-ci-plan/used-recipes.md(21 recipes). Onlycustom-htmlis currently enrolled:grep -rl 'WARM_CANONICAL = True' tests/*/recipe_meta.py. - Timer is daily:
nix/modules/nightly-sweep.nixOnCalendar = "*-*-* 03:00:00". - Disk
/: 40G free / 73% used (ssh cc-ci df -h /).
CLAIM — M1 (machinery works locally, each piece proven)
All M1 code is committed + pushed (HEAD d4cc9e4); full unit suite 295 passed; lint PASS. Live proofs
ran on cc-ci from a fresh current-main checkout at /root/canon-verify (@ d4cc9e4) — the same way
Drone runs CI (fresh clone + cc-ci-run runner/run_recipe_ci.py). Proof logs on cc-ci:
/root/canon-verify/_proofA.log (promote), _proofB.log (reattach), _proofC.log (untagged).
WHAT is claimed → HOW to verify → EXPECTED → WHERE
M1.1 — tagged-promote gate (§2.A). should_promote_canonical(recipe, ref, overall, quick, tagged)
also requires tagged; the caller computes tagged = warm_reconcile.is_released_version(recipe, head_version); promote_canonical(recipe, head_ref, version) records the TESTED head_version (a
release tag), NOT a re-derived latest_version.
- HOW (unit):
pytest tests/unit/test_promote.py tests/unit/test_warm_reconcile.py→ all pass (incl.test_no_promote_when_untagged,test_is_released_version). - HOW (live PROMOTE): the new code path ran on cc-ci via
nightly_sweep.run_on_tag('custom-html', '1.13.0+1.31.1')(proof-A). EXPECTED+OBSERVED:/var/lib/ci-warm/custom-html/canonical.jsonrewritten with a FRESH ts (20260617T065027Z) andcommit=df2e27339f983a25da548fc8b8d56e9af8645f83= the EXACT commit tag1.13.0+1.31.1points to (git -C ~/.abra/recipes/custom-html rev-list -n1 1.13.0+1.31.1). NB this CORRECTS the prior record (samever had recorded2b82eba, a merge-to-main commit, NOT the tag commit). Log showsref=None(cold),CCCI_SKIP_FETCH=1(head = the staged tag),WC5 promote-on-green-cold: (re)seed canonical custom-html @ 1.13.0+1.31.1. - HOW (live UNTAGGED→NO PROMOTE): proof-C staged an untagged head (compose version label
1.13.1+1.31.1, confirmed NOT a release tag, image unchanged) and ran a COLD run. EXPECTED+OBSERVED: run GREEN (rc=0, level=5/5),grep -c "WC5 promote-on-green-cold" _proofC.log= 0, and canonical.json version+commit+ts identical before/after (1.13.0+1.31.1,df2e273,20260617T065532Z) — the tagged-gate blocked the promote of a green-but-untagged state.
M1.2 — release-tag trigger + faithful mirror-sync (§2.C/§2.D).
warm_reconcile.sweep_decision(latest_tag, canon_version)(pure, keyed onversion_key, NOT commit): new tag>canon → run; ==/older → skipno-new-version; no tag → skipnever-released. HOW (unit):pytest tests/unit/test_warm_reconcile.py -k sweep_decision. HOW (live SKIP): on cc-ci, custom-htmllatest_tag == canonical (1.13.0+1.31.1)→sweep_decision=('skip','no-new-version (latest release 1.13.0+1.31.1 <= canonical 1.13.0+1.31.1)'). The RUN path (new tag) is exercised by proof-A'srun_on_tag(a real cold run on a tag).scripts/recipe-mirror-sync.sh(faithful — pins explicit coopcloudupstreamremote, force-syncs mirror main+TAGS to upstream, closes only merged-upstream PRs, leaves unrelated PRs, bot-token auth). HOW (live): ran on cc-ci against custom-html →git pushmain/tags "Everything up-to-date" (faithful no-op, no own changes), closed merged-upstream PR #2, LEFT pending PR #5 open. Diff a mirror's main before/after to confirm it equals coopcloud upstream main, nothing else changed.nightly_sweep.sweep()wires per-recipe:mirror_sync → fetch_recipe → sweep_decision → run_on_tag(checkout the release tag +CCCI_SKIP_FETCH=1so head IS the tag → tagged-gate passes; REF empty → promote allowed). NO AI at runtime (pure script).
M1.3 — all recipes enrolled (§2.B). WARM_CANONICAL=True in every tests/<r>/recipe_meta.py of
the 21 cc-ci-plan/used-recipes.md rows. HOW: grep -rl 'WARM_CANONICAL = True' tests/*/recipe_meta.py | wc -l → 21; python3 -c "import sys;sys.path.insert(0,'runner');from harness import canonical; print(len(canonical.enrolled_recipes()))" → 21. Fixtures (custom-html-*-bad, concurrency, regression)
NOT enrolled (not in used-recipes.md).
M1.4 — hollow-sweep root-cause fix. nix/modules/nightly-sweep.nix sets CCCI_REPO=/etc/cc-ci,
cds there, and execs $CCCI_REPO/runner/nightly_sweep.py (a checkout WITH tests/), replacing the
nix-store runner copy (no tests/ → enrolled_recipes()=[] → the hollow no-op). Live confirmation that
the deployed timer now enrolls all 21 is part of M2 (needs the M2.1 deploy: git -C /etc/cc-ci pull +
nixos-rebuild). HOW (code): read the module.
M1.5 — weekly timer (§2.F). OnCalendar = "Sun *-*-* 03:00:00", Persistent=true. HOW (code):
read the module; deployed-timer check is M2.
Build hashes / fingerprints
- HEAD:
d4cc9e4(M1 code). custom-html canonical:{version 1.13.0+1.31.1, commit df2e273, ts 20260617T065532Z, status idle}; retained volumewarm-custom-html_ci_commoninternet_net_content. - tag
1.13.0+1.31.1→ commitdf2e27339f983a25da548fc8b8d56e9af8645f83.
Out of M1 scope (M2): full multi-recipe sweep e2e, run-twice determinism, real timer fire, samever
orthogonality live, disk budget, §2.G UPGRADE_BASE_VERSION retirement.
-
M2.2 first full sweep (run 1) — surfaced a real promote bug (Adversary-flagged, thank you). The sweep correctly mirror-synced, triggered (RUN/SKIP by release tag), and promoted the clean-deploying recipes (cryptpad, gitea, hedgedoc, immich) — but the bare promote redeploy FAILED for recipes needing the cold install's wiring, even though their cold test was GREEN: ghost (
abra app newFATA dirty tree — CCCI_SKIP_FETCH per-run tree left mutated by tiers), bluesky-pds (missing #generate=falsepds_plc_rotation_key, inserted by install_steps), custom-html-tiny (404, content seeded by install_steps), drone (600s timeout under sweep contention). Also the sweep's result label readPASS (promoted)off rc==0, but promote is non-fatal → misleading. -
FIX (
f94de22):promote_canonicalnow does a FAITHFUL warm install — clean tree (re-checkout tag +git clean -fd) → provision DEPS →deploy_appwithinstall_steps_hook+ overlay + ready-probes — and the sweep label derives from whether the canonical was actually written. Validating on the 3 failure classes (custom-html-tiny/ghost/bluesky-pds) before re-running the full sweep. /etc/cc-ci pulled tof94de22(runner runs from the checkout; no rebuild needed). -
M2.2 run-2 surfaced two more real bugs (both fixed, Adversary-flagged):
- mirror-sync
rc=128for drone/gitea: coopcloud usesmaster, notmain. FIX (655a999incl.):recipe-mirror-sync.shresolves the upstream HEAD symref + fetches that branch. Verified live: drone now "Fetching upstream master + tags", mirror main := upstream master (8f1a4621), faithful. (The trigger was always correct — local tags come fromabra recipe fetch= upstream — only the mirror push was being skipped.) - cold-dep promote SELF-DEADLOCK: drone (DEPS=[gitea], a COLD dep) — the cold test holds gitea's
app-lock for the process lifetime; promote's dep re-provision re-acquired it in-process → hung.
FIX (
655a999):lifecycle.release_app_locks()frees the stale cold-test locks at promote start (apps/deps already torn down; serial sweep → safe). lasuite-* (warm keycloak dep) unaffected. - Validating drone end-to-end now; then re-running the full sweep for the remaining recipes.
- mirror-sync
M2 state snapshot (2026-06-17 ~11:02, for resume-safety)
HEAD has 6 promote-robustness fixes (all committed+pushed): tagged-gate divergence (d4cc9e4),
faithful-install promote (f94de22), mirror-sync master-detection + cold-dep lock-release (655a999),
fresh-seed teardown (ca89d44), keycloak de-enroll §2.B exception (d072d7e). Enrolled=20 (keycloak out).
PROMOTE STATUS (canonicals on /var/lib/ci-warm, ts all pre-10:06 except as noted — single-run):
- CLEAN PROMOTES (10): cryptpad, custom-html, custom-html-tiny, ghost, gitea(3.5.3), hedgedoc, immich, lasuite-docs(0.3.5+v5.2.1), lasuite-drive(0.10.0+v0.19.0), lasuite-meet(0.4.1+v1.19.0). (lasuite-* use the warm keycloak dep — proves warm-dep promote works.)
- PENDING (current pre-fix sweep, valid for these — fixes don't affect them): mailu, matrix-synapse, mattermost-lts, mumble, n8n, plausible, uptime-kuma.
- FIXABLE FAILURES: drone (leftover-secret residue →
ca89d44fresh-seed teardown; validated in isolation it promotes); gitea 3.6.0 advance (600s timeout, 3.5.3 canonical preserved — retry). - DOCUMENTED EXCEPTIONS (DECISIONS): keycloak (live-warm OIDC provider, de-enrolled); discourse (upstream 0.8.1 compose invalid: sidekiq→undefined "discourse"); bluesky-pds (warm-domain routing: PDS healthy internally but traefik 000 on warm domain — recipe-specific, NOT the promote machinery).
PLAN (Adversary recency criterion: authoritative M2.2 sweep must be launched with /etc/cc-ci at a HEAD containing BOTH ca89d44+d072d7e, enrolled=20, single serial):
- Let current pre-fix sweep finish (promotes mailu/matrix/etc. — valid canonicals).
- Deploy fixes:
git -C /etc/cc-ci pull. Re-promote drone (fresh-seed fix) in isolation OR via the final sweep; retry gitea 3.6.0. - Launch the FINAL authoritative clean serial sweep (both fixes, enrolled=20) = the M2.2 evidence: SKIPs all promoted (determinism), RUNs drone(promote)/gitea(3.6.0)/bluesky+discourse(red).
- M2.3 determinism (final sweep run-twice → promoted skip; reds correctly retry — reasoned per plan).
- M2.6 samever orthogonality (gitea 3.5.3→3.6.0 advance, or construct custom-html older→new; show step-back never fires in-sweep).
- M2.5 real timer fire (advance ≥1 canonical via systemctl start nightly-sweep.service).
- M2.7 disk budget (du /var/lib/ci-warm). M2.8 plausible UPGRADE_BASE_VERSION retirement.
- Claim M2.
Claims awaiting verification
- M1 — PASS (Adversary
3bdd5d1, no VETO). M2 work in progress (not yet claimed).
Blocked
(none)