163 lines
12 KiB
Markdown
163 lines
12 KiB
Markdown
# STATUS — phase `canon` (canonical sweep, make it real)
|
|
|
|
Gate: M1 PASS (Adversary 3bdd5d1, no VETO). M2 IN PROGRESS.
|
|
|
|
## M2 progress
|
|
- **M2.1 DEPLOY — DONE.** `git -C /etc/cc-ci pull` (e60415d→3bdd5d1) + `nixos-rebuild switch --flake
|
|
'git+file:///etc/cc-ci?submodules=1#cc-ci-hetzner'` (Result=success). Only nix delta vs the running
|
|
06-15 config was nightly-sweep.nix (verified: `git diff --stat 2d865f0 HEAD -- nix/` = nightly-sweep
|
|
only). VERIFY: `ssh cc-ci 'systemctl cat nightly-sweep.timer | grep OnCalendar'` → `Sun *-*-*
|
|
03:00:00`; deployed sweep enrolls 21 (`CCCI_REPO=/etc/cc-ci cc-ci-run -c "...enrolled_recipes()"` →
|
|
21) — hollow-sweep fixed; host health: docker/drone-runner-exec/deploy-proxy/deploy-bridge/
|
|
warm-keycloak all active.
|
|
|
|
WHAT/HOW/EXPECTED/WHERE for the Adversary. Reasoning lives in JOURNAL-canon.md.
|
|
|
|
## Phase summary
|
|
Make the canonical sweep actually promote canonicals end-to-end (it is currently hollow), add the
|
|
mirror-sync + new-release-tag trigger + tagged-promote gate, enroll all recipes, make the timer
|
|
weekly, and prove it in real CI. DoD = §5 of `cc-ci-plan/plan-phase-canon-canonical-sweep.md`.
|
|
|
|
## Verified starting state (2026-06-17, Builder cold-checked)
|
|
- HOLLOW-SWEEP ROOT CAUSE (confirmed): the deployed `nightly-sweep.timer` fired 03:09 UTC and logged
|
|
`enrolled canonicals = []`. Cause: the unit sets no `CCCI_REPO`; default `/root/cc-ci` does not
|
|
exist; the import falls back to the nix-store harness whose `TESTS_DIR` has no `tests/` →
|
|
`enrolled_recipes()=[]`. Verify: `ssh cc-ci 'journalctl -u nightly-sweep.service | grep "enrolled canonicals"'`.
|
|
- A real canonical DOES exist (made by a manual run, not the timer): `ssh cc-ci 'cat
|
|
/var/lib/ci-warm/custom-html/canonical.json'` → version `1.13.0+1.31.1`, status idle, retained
|
|
volume present (`docker volume ls | grep warm-custom-html`).
|
|
- Enroll set (authoritative) = `cc-ci-plan/used-recipes.md` (21 recipes). Only `custom-html` is
|
|
currently enrolled: `grep -rl 'WARM_CANONICAL = True' tests/*/recipe_meta.py`.
|
|
- Timer is daily: `nix/modules/nightly-sweep.nix` `OnCalendar = "*-*-* 03:00:00"`.
|
|
- Disk `/`: 40G free / 73% used (`ssh cc-ci df -h /`).
|
|
|
|
## CLAIM — M1 (machinery works locally, each piece proven)
|
|
|
|
All M1 code is committed + pushed (HEAD d4cc9e4); full unit suite 295 passed; lint PASS. Live proofs
|
|
ran on cc-ci from a fresh current-main checkout at `/root/canon-verify` (@ d4cc9e4) — the same way
|
|
Drone runs CI (fresh clone + `cc-ci-run runner/run_recipe_ci.py`). Proof logs on cc-ci:
|
|
`/root/canon-verify/_proofA.log` (promote), `_proofB.log` (reattach), `_proofC.log` (untagged).
|
|
|
|
### WHAT is claimed → HOW to verify → EXPECTED → WHERE
|
|
|
|
**M1.1 — tagged-promote gate (§2.A).** `should_promote_canonical(recipe, ref, overall, quick, tagged)`
|
|
also requires `tagged`; the caller computes `tagged = warm_reconcile.is_released_version(recipe,
|
|
head_version)`; `promote_canonical(recipe, head_ref, version)` records the TESTED `head_version` (a
|
|
release tag), NOT a re-derived `latest_version`.
|
|
- HOW (unit): `pytest tests/unit/test_promote.py tests/unit/test_warm_reconcile.py` → all pass
|
|
(incl. `test_no_promote_when_untagged`, `test_is_released_version`).
|
|
- HOW (live PROMOTE): the new code path ran on cc-ci via `nightly_sweep.run_on_tag('custom-html',
|
|
'1.13.0+1.31.1')` (proof-A). EXPECTED+OBSERVED: `/var/lib/ci-warm/custom-html/canonical.json`
|
|
rewritten with a FRESH ts (`20260617T065027Z`) and `commit=df2e27339f983a25da548fc8b8d56e9af8645f83`
|
|
= the EXACT commit tag `1.13.0+1.31.1` points to (`git -C ~/.abra/recipes/custom-html rev-list -n1
|
|
1.13.0+1.31.1`). NB this CORRECTS the prior record (samever had recorded `2b82eba`, a
|
|
merge-to-main commit, NOT the tag commit). Log shows `ref=None` (cold), `CCCI_SKIP_FETCH=1` (head
|
|
= the staged tag), `WC5 promote-on-green-cold: (re)seed canonical custom-html @ 1.13.0+1.31.1`.
|
|
- HOW (live UNTAGGED→NO PROMOTE): proof-C staged an untagged head (compose version label
|
|
`1.13.1+1.31.1`, confirmed NOT a release tag, image unchanged) and ran a COLD run. EXPECTED+OBSERVED:
|
|
run GREEN (rc=0, level=5/5), `grep -c "WC5 promote-on-green-cold" _proofC.log` = **0**, and
|
|
canonical.json version+commit+ts **identical** before/after (`1.13.0+1.31.1`, `df2e273`,
|
|
`20260617T065532Z`) — the tagged-gate blocked the promote of a green-but-untagged state.
|
|
|
|
**M1.2 — release-tag trigger + faithful mirror-sync (§2.C/§2.D).**
|
|
- `warm_reconcile.sweep_decision(latest_tag, canon_version)` (pure, keyed on `version_key`, NOT
|
|
commit): new tag>canon → run; ==/older → skip `no-new-version`; no tag → skip `never-released`.
|
|
HOW (unit): `pytest tests/unit/test_warm_reconcile.py -k sweep_decision`. HOW (live SKIP): on
|
|
cc-ci, custom-html `latest_tag == canonical (1.13.0+1.31.1)` → `sweep_decision` =
|
|
`('skip','no-new-version (latest release 1.13.0+1.31.1 <= canonical 1.13.0+1.31.1)')`. The RUN
|
|
path (new tag) is exercised by proof-A's `run_on_tag` (a real cold run on a tag).
|
|
- `scripts/recipe-mirror-sync.sh` (faithful — pins explicit coopcloud `upstream` remote, force-syncs
|
|
mirror main+TAGS to upstream, closes only merged-upstream PRs, leaves unrelated PRs, bot-token
|
|
auth). HOW (live): ran on cc-ci against custom-html → `git push` main/tags "Everything up-to-date"
|
|
(faithful no-op, no own changes), closed merged-upstream PR #2, LEFT pending PR #5 open. Diff a
|
|
mirror's main before/after to confirm it equals coopcloud upstream main, nothing else changed.
|
|
- `nightly_sweep.sweep()` wires per-recipe: `mirror_sync → fetch_recipe → sweep_decision →
|
|
run_on_tag` (checkout the release tag + `CCCI_SKIP_FETCH=1` so head IS the tag → tagged-gate
|
|
passes; REF empty → promote allowed). NO AI at runtime (pure script).
|
|
|
|
**M1.3 — all recipes enrolled (§2.B).** `WARM_CANONICAL=True` in every `tests/<r>/recipe_meta.py` of
|
|
the 21 `cc-ci-plan/used-recipes.md` rows. HOW: `grep -rl 'WARM_CANONICAL = True' tests/*/recipe_meta.py
|
|
| wc -l` → 21; `python3 -c "import sys;sys.path.insert(0,'runner');from harness import canonical;
|
|
print(len(canonical.enrolled_recipes()))"` → 21. Fixtures (custom-html-*-bad, concurrency, regression)
|
|
NOT enrolled (not in used-recipes.md).
|
|
|
|
**M1.4 — hollow-sweep root-cause fix.** `nix/modules/nightly-sweep.nix` sets `CCCI_REPO=/etc/cc-ci`,
|
|
`cd`s there, and execs `$CCCI_REPO/runner/nightly_sweep.py` (a checkout WITH tests/), replacing the
|
|
nix-store runner copy (no tests/ → `enrolled_recipes()=[]` → the hollow no-op). Live confirmation that
|
|
the deployed timer now enrolls all 21 is part of M2 (needs the M2.1 deploy: `git -C /etc/cc-ci pull` +
|
|
`nixos-rebuild`). HOW (code): read the module.
|
|
|
|
**M1.5 — weekly timer (§2.F).** `OnCalendar = "Sun *-*-* 03:00:00"`, `Persistent=true`. HOW (code):
|
|
read the module; deployed-timer check is M2.
|
|
|
|
### Build hashes / fingerprints
|
|
- HEAD: `d4cc9e4` (M1 code). custom-html canonical: `{version 1.13.0+1.31.1, commit df2e273, ts
|
|
20260617T065532Z, status idle}`; retained volume `warm-custom-html_ci_commoninternet_net_content`.
|
|
- tag `1.13.0+1.31.1` → commit `df2e27339f983a25da548fc8b8d56e9af8645f83`.
|
|
|
|
### Out of M1 scope (M2): full multi-recipe sweep e2e, run-twice determinism, real timer fire, samever
|
|
orthogonality live, disk budget, §2.G UPGRADE_BASE_VERSION retirement.
|
|
|
|
- **M2.2 first full sweep (run 1) — surfaced a real promote bug (Adversary-flagged, thank you).**
|
|
The sweep correctly mirror-synced, triggered (RUN/SKIP by release tag), and promoted the
|
|
clean-deploying recipes (cryptpad, gitea, hedgedoc, immich) — but the bare promote redeploy FAILED
|
|
for recipes needing the cold install's wiring, even though their cold test was GREEN: ghost
|
|
(`abra app new` FATA dirty tree — CCCI_SKIP_FETCH per-run tree left mutated by tiers), bluesky-pds
|
|
(missing #generate=false `pds_plc_rotation_key`, inserted by install_steps), custom-html-tiny (404,
|
|
content seeded by install_steps), drone (600s timeout under sweep contention). Also the sweep's
|
|
result label read `PASS (promoted)` off rc==0, but promote is non-fatal → misleading.
|
|
- **FIX (f94de22):** `promote_canonical` now does a FAITHFUL warm install — clean tree (re-checkout
|
|
tag + `git clean -fd`) → provision DEPS → `deploy_app` with `install_steps_hook` + overlay +
|
|
ready-probes — and the sweep label derives from whether the canonical was actually written.
|
|
Validating on the 3 failure classes (custom-html-tiny/ghost/bluesky-pds) before re-running the
|
|
full sweep. /etc/cc-ci pulled to f94de22 (runner runs from the checkout; no rebuild needed).
|
|
|
|
- **M2.2 run-2 surfaced two more real bugs (both fixed, Adversary-flagged):**
|
|
- mirror-sync `rc=128` for drone/gitea: coopcloud uses `master`, not `main`. FIX (655a999 incl.):
|
|
`recipe-mirror-sync.sh` resolves the upstream HEAD symref + fetches that branch. Verified live:
|
|
drone now "Fetching upstream master + tags", mirror main := upstream master (8f1a4621), faithful.
|
|
(The trigger was always correct — local tags come from `abra recipe fetch` = upstream — only the
|
|
mirror push was being skipped.)
|
|
- cold-dep promote SELF-DEADLOCK: drone (DEPS=[gitea], a COLD dep) — the cold test holds gitea's
|
|
app-lock for the process lifetime; promote's dep re-provision re-acquired it in-process → hung.
|
|
FIX (655a999): `lifecycle.release_app_locks()` frees the stale cold-test locks at promote start
|
|
(apps/deps already torn down; serial sweep → safe). lasuite-* (warm keycloak dep) unaffected.
|
|
- Validating drone end-to-end now; then re-running the full sweep for the remaining recipes.
|
|
|
|
## M2 state snapshot (2026-06-17 ~11:02, for resume-safety)
|
|
HEAD has 6 promote-robustness fixes (all committed+pushed): tagged-gate divergence (d4cc9e4),
|
|
faithful-install promote (f94de22), mirror-sync master-detection + cold-dep lock-release (655a999),
|
|
fresh-seed teardown (ca89d44), keycloak de-enroll §2.B exception (d072d7e). Enrolled=20 (keycloak out).
|
|
|
|
PROMOTE STATUS (canonicals on /var/lib/ci-warm, ts all pre-10:06 except as noted — single-run):
|
|
- CLEAN PROMOTES (10): cryptpad, custom-html, custom-html-tiny, ghost, gitea(3.5.3), hedgedoc,
|
|
immich, lasuite-docs(0.3.5+v5.2.1), lasuite-drive(0.10.0+v0.19.0), lasuite-meet(0.4.1+v1.19.0).
|
|
(lasuite-* use the warm keycloak dep — proves warm-dep promote works.)
|
|
- PENDING (current pre-fix sweep, valid for these — fixes don't affect them): mailu, matrix-synapse,
|
|
mattermost-lts, mumble, n8n, plausible, uptime-kuma.
|
|
- FIXABLE FAILURES: drone (leftover-secret residue → ca89d44 fresh-seed teardown; validated in
|
|
isolation it promotes); gitea 3.6.0 advance (600s timeout, 3.5.3 canonical preserved — retry).
|
|
- DOCUMENTED EXCEPTIONS (DECISIONS): keycloak (live-warm OIDC provider, de-enrolled); discourse
|
|
(upstream 0.8.1 compose invalid: sidekiq→undefined "discourse"); bluesky-pds (warm-domain routing:
|
|
PDS healthy internally but traefik 000 on warm domain — recipe-specific, NOT the promote machinery).
|
|
|
|
PLAN (Adversary recency criterion: authoritative M2.2 sweep must be launched with /etc/cc-ci at a HEAD
|
|
containing BOTH ca89d44+d072d7e, enrolled=20, single serial):
|
|
1. Let current pre-fix sweep finish (promotes mailu/matrix/etc. — valid canonicals).
|
|
2. Deploy fixes: `git -C /etc/cc-ci pull`. Re-promote drone (fresh-seed fix) in isolation OR via the
|
|
final sweep; retry gitea 3.6.0.
|
|
3. Launch the FINAL authoritative clean serial sweep (both fixes, enrolled=20) = the M2.2 evidence:
|
|
SKIPs all promoted (determinism), RUNs drone(promote)/gitea(3.6.0)/bluesky+discourse(red).
|
|
4. M2.3 determinism (final sweep run-twice → promoted skip; reds correctly retry — reasoned per plan).
|
|
5. M2.6 samever orthogonality (gitea 3.5.3→3.6.0 advance, or construct custom-html older→new; show
|
|
step-back never fires in-sweep).
|
|
6. M2.5 real timer fire (advance ≥1 canonical via systemctl start nightly-sweep.service).
|
|
7. M2.7 disk budget (du /var/lib/ci-warm). M2.8 plausible UPGRADE_BASE_VERSION retirement.
|
|
8. Claim M2.
|
|
|
|
## Claims awaiting verification
|
|
- **M1 — PASS** (Adversary 3bdd5d1, no VETO). M2 work in progress (not yet claimed).
|
|
|
|
## Blocked
|
|
(none)
|