Files
cc-ci/machine-docs/REVIEW-canon.md

332 lines
26 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW-canon — Adversary verdicts for the `canon` (canonical-sweep) phase
SSOT for what is being verified: `/srv/cc-ci/cc-ci-plan/plan-phase-canon-canonical-sweep.md`.
Gates: **M1** (machinery works locally, each piece proven) and **M2** (proven end-to-end in real CI),
plus the operator-required **samever-orthogonality** proof. `## DONE` only after fresh PASS on both.
---
## Orientation @ 2026-06-17T06:18Z — Adversary online for canon phase; no gate claimed yet
Prior phase `samever` is DONE + Adversary-verified (M1 1310a95, M2 199f5b6, no VETO). The `canon`
phase has **not** been bootstrapped by the Builder yet: no STATUS-canon.md / BACKLOG-canon.md, no
`claim(`/`status(canon` commits, no inbox. I am idling per liveness protocol and will verify promptly
when M1 is CLAIMED (watchdog will ping on the claim).
### Independent COLD baseline of the claimed starting state (§1) — captured before any canon work
Verified from my own clone + a cold `ssh cc-ci`, NOT from the Builder:
- **Enrollment:** exactly **one** recipe sets `WARM_CANONICAL = True``custom-html`. (`grep -rl
'WARM_CANONICAL *= *True' tests/*/recipe_meta.py` → 1 hit.) Matches §1 "only custom-html enrolled".
- **canonical.json records on cc-ci:** exactly **one**, for `custom-html`:
`/var/lib/ci-warm/custom-html/canonical.json` =
`{recipe: custom-html, version: 1.13.0+1.31.1, commit: 2b82ebabde74a9d9b1fd4cb49722a7037b18a176,
status: idle, ts: 20260617T050314Z}`, retained volume `warm-custom-html_..._content` present.
- **NOTE — plan §1 is now slightly stale.** The plan (authored 04:43Z) says "ZERO canonical.json
records exist." That was true at authoring, but the just-completed **samever M2** e2e
(custom-html two-run) wrote this record at **05:03:14Z**. So there is now exactly one canonical,
produced by samever's promote path. This is *favorable* evidence for canon M1(A) — the promote
path already demonstrably writes a real, reusable record + retains the volume for custom-html —
but the Builder must NOT cite custom-html's pre-existing canonical as proof of canon's *new*
work (tagged-gate, trigger, all-enrolled, mirror-sync). I will require fresh, canon-attributable
evidence for each M1/M2 sub-claim.
- **Timer:** `nightly-sweep.timer` enabled+active, daily `OnCalendar` (NEXT 2026-06-18 03:00:24 UTC),
last fired 2026-06-17 03:09:20 UTC exit 0. So the timer plumbing works; the job was a near-no-op
(only custom-html enrolled). Phase must (F) move this to **weekly** and (M2) prove a real fire
advances canonicals, not exit-0 on an empty set.
### What I will adversarially probe when claimed (from the plan, not the Builder's narrative)
- M1(A): a canon-attributable green cold run writes canonical.json AND `--quick` warm-reattach reuses
it; promote now ALSO requires a **release tag** — feed an UNTAGGED state, confirm NO promote.
- M1(C): mirror-sync is *faithful upstream sync only* — never pushes our changes to mirror `main`,
never disturbs unrelated PRs. Will diff before/after on a mirror.
- M1(D): trigger keyed on **latest release tag vs canonical version**, NOT commit — new untagged
commits on `main` with same tag ⇒ SKIP; newer tag ⇒ run cold on that tag.
- M1(B): all ~21 recipes enrolled; warm-volume disk budget recorded (not silently dropped).
- M2: full sweep promotes greens / leaves reds intact / skips unchanged; **run-twice ⇒ skip-all**
determinism; real (non-hollow) timer fire; tagged-promote proof (untagged green ⇒ no promote).
- samever orthogonality: (a) no-new-tag ⇒ SKIPPED; (b) new-tag ⇒ canonical(older)→new, real delta,
promote; step-back NEVER fires in the sweep. Construct scenarios if the live set doesn't cover both.
- §2.G: if plausible's canonical lands at 3.0.1, `UPGRADE_BASE_VERSION` retired cleanly (key +
resolver branch + docs + tests) AND plausible still resolves base 3.0.1 dynamically + passes — else
kept with a recorded DECISIONS reason. Will re-derive, not trust.
- Guardrail: NO AI at runtime (pure script + timer).
## Pre-claim code read @ 2026-06-17T06:41Z — M1 still IN PROGRESS (M1.2 not yet committed)
Builder has landed 4 of 5 M1 items (27e0628 M1.1, 136100f M1.3, f8c0e53 M1.4+M1.5). M1.2 (the
release-tag trigger `sweep_decision` + mirror-sync wiring into `nightly_sweep.sweep()`) is **not yet
committed** — M1 is correctly not-yet-claimed. Read the landed code (NOT JOURNAL); points to scrutinize
when claimed:
- **M1.1 (27e0628):** `should_promote_canonical` gained `tagged` param; caller computes
`tagged = warm_reconcile.is_released_version(recipe, head_version)`. ⚠️ PROBE: the gate checks
`head_version` (code under test) but `promote_canonical` records `latest_version(recipe_tags(recipe))`
(newest tag). Confirm these can't diverge — e.g. a manual latest run where `main` sits on a tagged
commit OLDER than `latest` tag would gate on the older tag yet promote the newer. In the sweep path
(D) the tag is checked out so head==tag; verify the manual/`RECIPE=<r>` path too.
- **M1.4 (f8c0e53):** root cause = sweep service ran the nix-STORE runner copy (no `tests/`) so
`TESTS_DIR` missing → `enrolled_recipes()=[]`. Fix sets `CCCI_REPO=/etc/cc-ci` + `cd` + execs
`$CCCI_REPO/runner/nightly_sweep.py`. ⚠️ PROBE at M2: confirm `/etc/cc-ci` actually exists on cc-ci,
has runner/ AND tests/, and is git-pulled before nixos-rebuild (else still hollow). The fix also
means sweep-logic ships via checkout pull, NOT a store rebuild — verify deploy procedure pulls it.
- **M1.5 (f8c0e53):** `OnCalendar` daily → `Sun *-*-* 03:00:00`, Persistent kept. Trivial; verify the
deployed timer shows the weekly schedule after M2.1 nixos-rebuild.
- **M1.3 (136100f):** enroll all 21 — verify the count is exactly the `used-recipes.md` set and that
fixtures (custom-html-*-bad, concurrency, regression) were NOT enrolled.
- **Still owed for M1 claim:** M1.2 `sweep_decision(recipe, latest_tag, canon_version)` →
run|skip:no-new-version|skip:never-released keyed on `version_key` NOT commit; mirror-sync via
`open-recipe-pr.sh --reconcile-only` (faithful, vendored); cold-run ON THE TAG. Unit tests for all.
---
## M1: PASS @ 2026-06-17T07:12Z — machinery cold-verified (claim 626badd, code @ d4cc9e4)
Verified from a COLD start: my own clone for code/pure-logic, a fresh independent clone on cc-ci
(`/tmp/adv-canon` @ 626badd) for the unit suite, and a cold `ssh cc-ci` for live state. I did NOT
read JOURNAL-canon.md before forming this verdict. Every M1 sub-claim re-derived against the plan,
not the Builder's narrative.
**M1.1 tagged-promote gate (§2.A) — PASS.**
- Code: `should_promote_canonical` returns `is_enrolled and overall==0 and not quick and not ref and
tagged`; caller computes `tagged = is_released_version(recipe, head_version)`; `promote_canonical`
now records the TESTED `head_version` (commit d4cc9e4), not a re-derived `latest_version`. My prior
PROBE (head_version-vs-latest_version divergence on a manual `RECIPE=<r>` run) is CLOSED by d4cc9e4
— read the diff, it promotes exactly the tested version.
- Unit: ran `tests/unit/test_promote.py` myself in the fresh cc-ci clone — all 6 pass, each gate
clause individually exercised (`test_no_promote_when_untagged` asserts `tagged=False → False`;
all-conditions asserts `tagged=True → True`). Not hollow.
- Live PROMOTE: re-derived `git rev-list -n1 1.13.0+1.31.1` = `df2e27339f983a25da548fc8b8d56e9af8645f83`
and `/var/lib/ci-warm/custom-html/canonical.json` records EXACTLY that commit + version
`1.13.0+1.31.1`, status idle, retained volume `warm-custom-html_..._content` present. So the promote
recorded the tag's own commit (correcting samever's earlier `2b82eba` merge-commit record) — the
divergence fix is live-proven, not just unit-tested.
- Live UNTAGGED → NO PROMOTE: independently confirmed `1.13.1+1.31.1` is `NOT-A-TAG` in the custom-html
clone → `is_released_version` returns False → gate blocks. canonical.json is unchanged (still
df2e273). The full live tagged-vs-untagged e2e is M2.4; at M1 the code + unit + live-not-a-tag +
unchanged-canonical chain is sufficient.
**M1.2 release-tag trigger + faithful mirror-sync (§2.C/§2.D) — PASS.**
- `sweep_decision` re-derived directly (no pytest) — truth table exactly right and VERSION-keyed, not
commit-keyed: new>canon→run; equal→skip no-new-version; older→skip; no tag→skip never-released; no
canon→run(seed). The function takes only (latest_tag, canon_version) — it CANNOT see commits, so new
untagged commits on `main` can never trigger a run. That IS the operator's refinement.
- `scripts/recipe-mirror-sync.sh` read in full: pins an explicit coopcloud `upstream` remote, force-
syncs mirror `main := upstream/main` + all tags, pushes NOTHING of our own. PR close is gated on
`git merge-tree --write-tree NEW_MAIN_SHA <pr-head>` == upstream `MAIN_TREE` (i.e. the PR's merge is
a no-op because it's already in upstream) → close; otherwise "left as-is". Faithful, never merges,
never disturbs unrelated PRs.
- `nightly_sweep.sweep()` wiring read: per enrolled recipe `mirror_sync → fetch_recipe →
sweep_decision → run_on_tag` (checkout the release tag + `CCCI_SKIP_FETCH=1` so head IS the tag →
tagged-gate passes; REF popped → cold → promote allowed). Pure script.
**M1.3 all recipes enrolled (§2.B) — PASS.** My `grep -rl 'WARM_CANONICAL = True'` set is EXACTLY the
21 `used-recipes.md` rows (incl. `uptime-kuma`, the lone `external` row — correctly enrolled for
CI/canonical even though excluded from weekly upgrade). Fixtures (`custom-html-*-bad`, `concurrency`,
`regression`) NOT enrolled.
**M1.4 hollow-sweep fix — PASS (code; live is M2.1).** `nix/modules/nightly-sweep.nix` exports
`CCCI_REPO=/etc/cc-ci`, `cd`s there, and execs `$CCCI_REPO/runner/nightly_sweep.py` — the checkout WITH
`tests/`, replacing the store copy whose missing `tests/` caused `enrolled_recipes()=[]`. Root cause
correctly addressed in code. ⚠️ CARRIED TO M2: `/etc/cc-ci` is currently STALE — `git -C /etc/cc-ci`
HEAD is `e60415d` (Phase-3 era), canon code NOT yet there. M2.1 deploy MUST `git -C /etc/cc-ci pull`
before `nixos-rebuild`, else the deployed timer stays hollow. I will verify the pull + a real fire at
M2.5.
**M1.5 weekly timer (§2.F) — PASS (code).** `OnCalendar = "Sun *-*-* 03:00:00"`, `Persistent = true`.
Deployed-timer schedule verified at M2.
**Guardrail NO-AI-at-runtime — PASS.** grep of `nightly_sweep.py` / `warm_reconcile.py` /
`recipe-mirror-sync.sh` for anthropic|claude|openai|llm|gpt|ai_ → only one code COMMENT match, zero
calls. Pure script + systemd timer.
**Full unit suite — PASS.** Ran `cc-ci-run -m pytest tests/unit/` in the fresh independent cc-ci clone
@ 626badd → **295 passed in 5.60s**, matching the claim. Enrolling 21 recipes broke nothing.
**Minor narrative note (not a defect):** the claim cites proof-A ts `065027Z` but live canonical ts is
`065532Z`; promoting the same tag again yields the same version+commit (only ts moves), so this is a
benign re-run, not a divergence — the recorded version/commit are correct either way.
**Verdict: M1 PASS.** No VETO. All M1 DoD items cold-verified; the deployed-state items (M1.4 live,
M1.5 timer schedule) are honestly scoped by the Builder to M2 and I will hold them there. (Consulted
JOURNAL-canon.md only AFTER writing this verdict: no surprises — confirms the proof-A/C sequence.)
---
## Pre-claim observation @ 2026-06-17T07:23Z — M2.1 deploy verified live (NOT a gate verdict)
Builder inbox: M1 PASS consumed; M2.1 deploy done; M2.2 full sweep started (long, serial, hours).
M2 NOT yet claimed — no formal verdict here, just an opportunistic READ-ONLY check that resolves my
two carried-to-M2 code-only probes (favorable; I'll still re-verify the live proofs at the M2 claim):
- **/etc/cc-ci now at `3bdd5d1`** (current main; was stale `e60415d` Phase-3 era), with `tests/` +
`runner/nightly_sweep.py` present → the deploy DID `git -C /etc/cc-ci pull`. My M1.4 "deploy must
pull or stays hollow" risk is cleared.
- **Deployed timer:** `systemctl cat nightly-sweep.timer` → `OnCalendar=Sun *-*-* 03:00:00`,
`Persistent=true` (weekly, live). M1.5 deployed-schedule probe cleared.
- **Deployed code path is the non-hollow one:** the in-flight sweep (PID 1620630) runs
`nightly_sweep.sweep()` from `/etc/cc-ci/runner`, and `run_recipe_ci.py` runs from
`/etc/cc-ci/runner/` — i.e. the checkout WITH `tests/`, not the store copy. Root cause fixed live.
STILL OWED at the M2 claim (I will cold-verify, not trust the sweep log): canonicals actually promoted
for greens / reds left intact / no-new-tag skipped (M2.2); run-twice→skip-all (M2.3); live tagged-vs-
untagged (M2.4); real timer fire advances canonicals via full main() incl. roll (M2.5); samever never
fires in-sweep (M2.6); disk budget recorded (M2.7); §2.G UPGRADE_BASE_VERSION retirement (M2.8).
Staying read-only while the sweep is in flight (single node).
---
## Pre-claim finding @ 2026-06-17T08:40Z — M2.2 sweep: PASS-labelled but promotes mostly FAILING (evidence captured)
NOT a verdict (M2 unclaimed). Read-only capture from `/root/canon-verify/_sweep.log` so the evidence
survives log growth. Per-recipe promote outcomes observed (alphabetical sweep, ~7 recipes deep):
- bluesky-pds: cold rc=0; `WC5 promote failed: abra app deploy warm-bluesky-pds… failed (1)` → NO canonical; logged `PASS (promoted)`.
- cryptpad: cold rc=0; `canonical cryptpad advanced to known-good 0.6.0+v2026.5.1` → canonical WRITTEN. ✓ (the only real promote so far)
- custom-html: SKIP no-new-version (pre-existing canonical). ✓ expected.
- custom-html-tiny: cold rc=0; `WC5 promote failed: warm-custom-html-tiny… not healthy over HTTPS / (404)` → NO canonical; logged `PASS (promoted)`.
- discourse: cold rc=142 (deploy timeout — the 51m wedge I flagged) → `FAIL (canonical unchanged)`. Legit red.
- drone: cold rc=0; `WC5 promote failed: …warm-drone… timed out after 600 seconds` → NO canonical; logged `PASS (promoted)`.
- ghost: cold rc=0; `WC5 promote failed: abra app new ghost… failed (1)` → NO canonical; logged `PASS (promoted)`.
- gitea: promote in progress at capture.
Live `/var/lib/ci-warm/*/canonical.json` = {cryptpad, custom-html} only. NET NEW this sweep = 1 (cryptpad).
Leftover warm volumes w/ NO registry record: drone, gitea, custom-html-tiny (partial-promote residue).
**DEFECT-1 [adversary] (results-label):** `nightly_sweep.sweep()` line ~119 sets
`results[r] = "PASS (promoted)" if rc==0 else "FAIL …"`. Because `promote_canonical` is non-fatal
(swallows its own exception so it "never fails a green run"), a FAILED promote still yields rc=0 →
the summary asserts "PASS (promoted)" when NO canonical was written. The per-recipe results log — the
DoD's evidence that "canonicals actually promoted for the green recipes" — is therefore UNTRUSTWORTHY.
Repro: `grep "WC5 promote failed" _sweep.log` vs `grep "PASS (promoted)" _sweep.log` — failed promotes
appear in BOTH. Fix direction: label from "does a canonical record now exist at the tested version",
not from rc.
**DEFECT-2 [adversary] (promote path failing broadly):** 4 of 5 completed promotes FAILED across 4
modes (warm `app deploy` failed(1) / timed-out 600s / unhealthy-404 / `app new` failed(1)). Cold CI is
green for each, so this is specifically the WARM-CANONICAL promote deploy failing — the exact
end-to-end step this phase exists to make real. Root cause TBD (node contention on the long serial
run / unclean cold-test teardown / discourse residue / flat 600s warm timeout) — Builder's to diagnose.
**Determinism risk (M2.3):** every recipe left without a canonical (bluesky-pds, custom-html-tiny,
drone, ghost, discourse…) will `sweep_decision(latest, None) → run` on a second sweep, NOT skip — so
run-twice ≠ skip-all until promotes actually succeed. I will hard-test this at the M2 claim.
Sent the Builder a BUILDER-INBOX heads-up (ba28a88). When M2 is claimed I will cold-verify, per recipe,
that a canonical record exists at the tested tag version (not trust the PASS label), and re-run the
determinism no-op myself. If promotes are still failing / mislabelled, M2 FAILs.
## Pre-claim note @ 2026-06-17T09:11Z — fix f94de22 validated by Builder; M2 re-run in flight (NOT a verdict)
Consumed ADVERSARY-INBOX (Builder ~09:10Z): DEFECT-1/DEFECT-2 fix validated live — custom-html-tiny
PROMOTED (1.2.0+2.43.0, was 404) and ghost PROMOTED (1.4.0+6.45.0-alpine, was app-new dirty-tree FATA);
label now derives from "canonical record exists at tested version". 7 canonicals claimed (cryptpad,
custom-html, custom-html-tiny, ghost, gitea, hedgedoc, immich). Full sweep re-run in flight. M2 unclaimed.
Staying read-only off the node (sweep in flight, single node).
**bluesky-pds "documented RED" — must scrutinise at M2 claim, two ways it could be wrong:**
1. The conservative direction is CORRECT per guardrail (no force-promote; prior known-good kept). But I
must confirm bluesky has NO stale/partial canonical written, and that it is recorded as an exception
in DECISIONS (plan §2.B: "don't silently skip" / §4 "documented exception"), not just left silent.
2. **The real risk:** Builder says warm health fails because traefik doesn't route the WARM domain
(`warm-bluesky-pds…` → 000) though internal localhost:3000 = 200, and "cold domain worked." I must
verify this is genuinely bluesky-SPECIFIC and not a warm-canonical-deploy machinery defect (warm
domain label/overlay/router rule) that could equally hit other recipes — if the warm-domain routing
is systemically flaky, a recipe could intermittently fail to promote (or, worse, a health probe could
pass spuriously). At claim I will: (a) confirm OTHER promoted recipes (custom-html-tiny, ghost, immich)
actually answered 200 over HTTPS on THEIR warm domains during promote (grep ready-probe lines), and
(b) independently curl a couple of the live warm canonical domains. If warm-domain routing is broadly
unreliable, the promote evidence is suspect and M2 is not done.
## Pre-claim observation @ 2026-06-17T09:34Z — read-only sweep-progress peek (NOT a verdict)
Sweep re-run still in flight (proc 1712141 from `/etc/cc-ci/runner`); 7 canonicals on disk. Captured
from `_sweep.log` so it survives log growth:
- **DEFECT-1 fix is LIVE and honest:** `sweep: bluesky-pds rc=0 (GREEN-BUT-PROMOTE-FAILED
(canonical=none, expected 0.3.0+v0.4.219))` — the label no longer claims `PASS (promoted)` on a
failed promote. Favorable; I will still confirm the label matches the on-disk registry per recipe at
claim before closing DEFECT-1.
- `cryptpad / custom-html / custom-html-tiny` → `SKIP no-new-version` (latest tag == canonical). The
skip path works for promoted recipes.
- `discourse rc=143 → FAIL (red; canonical unchanged)` — legit red (timeout/SIGTERM), canonical kept.
- **NEW — `sweep: mirror-sync drone rc=128 (non-fatal — continuing)`:** drone's faithful mirror-sync
FAILED (git rc=128) yet the sweep proceeded to RUN drone against the un-synced mirror. SCRUTINISE at
claim: plan §2.C requires the mirror be reconciled to upstream FIRST; a swallowed sync failure means
the recipe may be tested against a stale mirror (wrong tags/version) — the trigger (D) and tagged
promote then rest on un-synced state. Is rc=128 a benign "already up to date / no upstream" case or a
real sync failure? Must check what drone's sync hit and whether the tested tag is genuinely upstream's.
- **DETERMINISM (M2.3) — central risk crystallising:** bluesky-pds (promote-failed) and discourse (red)
both end `canonical=none`, so a 2nd sweep → `sweep_decision(latest, None) → RUN`, NOT skip. Plan M2.3
literally requires run-twice → "SKIPS every recipe." That can hold ONLY if every enrolled recipe
actually promoted. Red/promote-failed recipes legitimately re-run (no known-good to protect) — which
is arguably correct behaviour but is NOT "skip every recipe." At the M2 claim I will require the
Builder's determinism evidence to honestly reconcile this with §3/§5: either (i) every recipe promotes
so run-twice is a true no-op, or (ii) a reasoned, plan-consistent argument that the no-op property
applies to the promoted set and red recipes correctly retry — and I'll judge it against the plan, not
accept a partial skip-all relabelled as success.
## Pre-claim observation @ 2026-06-17T10:20Z — TWO concurrent sweeps (transient process state, captured)
Read-only `ps` on cc-ci caught a non-serial condition while M2 is mid-development (NOT a verdict; M2
unclaimed):
- PID **1712141** = OLD sweep (started 09:10:40, code f94de22) — WEDGED: child PID 1720589
(`run_recipe_ci.py`, started 09:33:58, alive ~46 min) is the drone cold-dep self-deadlock the
lock-release fix (655a999) addresses. The old sweep process is still ALIVE, holding cold-test locks.
- PID **1736506** = NEW sweep (started 10:16:27, code 655a999), already cold-testing recipe 1.
So at 10:20Z two `nightly_sweep.sweep()` ran simultaneously. This violates §4 SERIAL and, more
pointedly, **invalidates the documented precondition of `release_app_locks()`** ("serial sweep → no
concurrent run relies on these locks") — the wedged old run still holds drone/gitea locks, so the two
can collide. **Any M2 promote/determinism/log evidence from a sweep that overlapped the wedged one is
non-serial and I will not accept it.** Canonical count is 8 (drone now promoted → lock-release fix
works), so the fix itself is good; the issue is the leftover concurrent process. Sent BUILDER-INBOX
asking the Builder to kill the wedged old sweep, confirm a clean single serial run, and regenerate M2
evidence. **SCRUTINY CARRIED TO CLAIM:** confirm the claimed M2 sweep ran with exactly ONE sweep
process and no overlap (check run start time vs old-sweep kill time); and verify `release_app_locks()`
cannot free a lock still guarding a live app under any interleaving the in-flight guard permits.
**Update @ 10:24Z:** Builder consumed the alert and acted correctly — SIGKILLed both sweeps + the
wedged drone child, cleared stale `/run/lock/cc-ci-app-*.lock`, confirmed no leftover warm-*/dep stacks,
**discarded drone's concurrency-tainted canonical** (promoted by a standalone validation at 10:06:45
that overlapped the wedged old sweep), kept the 7 single-run canonicals, and relaunched ONE clean serial
sweep (pid 1741209, code 655a999) as the M2.2 evidence run. Concurrency window was ~10:0610:24 (old
sweep 1712141 alive 09:10→killed 10:24). **CARRIED TO CLAIM:** independently confirm each of the 7 kept
canonicals (cryptpad, custom-html, custom-html-tiny, ghost, gitea, hedgedoc, immich) has a ts OUTSIDE
the concurrency window and was produced single-run — do NOT take the Builder's accounting on faith;
check `canonical.json` ts per recipe vs the 09:1010:24 overlap. And confirm the claimed sweep (1741209)
ran start→finish with no second sweep process alive.
## Pre-claim observation @ 2026-06-17T10:47Z — clean serial sweep progress (NOT a verdict)
ONE sweep proc confirmed (serial intact). Transient `_sweep.log` lines captured before rotation:
- **CONCERN — `drone rc=0 GREEN-BUT-PROMOTE-FAILED (canonical=none, expected 1.9.0+2.26.0)` in the
CLEAN serial run.** Drone promoted under the discarded tainted validation but FAILS to promote
clean-serial — and it no longer hangs (returns cleanly), so the lock-release fix (655a999) cured the
46-min deadlock but drone's warm promote still fails for a DIFFERENT reason (likely warm gitea-dep
provisioning or warm deploy/health). Net: the lock fix is necessary-but-not-sufficient for drone;
drone will lack a canonical → hits both promote-evidence and determinism (run-twice) at the claim.
Builder will see it in their own running log; their diagnose. I'll require drone to either promote
clean or be a recorded DECISIONS exception (like bluesky) at claim — a silent no-canonical is not OK.
- **FAVORABLE — `gitea RUN — new release 3.6.0+1.24.2-rootless > canonical 3.5.3+1.24.2-rootless;
cold-testing tagged release 3.6.0…`** — a LIVE instance of the new-release-tag trigger advancing an
existing canonical (older→newer TAGGED), i.e. exactly the M2.6 samever-orthogonality path (2):
canonical(older)→new tagged, real delta, promote-if-green. If gitea promotes to 3.6.0 this is strong
M2.6 evidence (no constructed scenario needed). VERIFY AT CLAIM: gitea's canonical advances 3.5.3→3.6.0
with the new tag's own commit, and samever's same-version step-back NEVER fired in the run (the tag
trigger guarantees vX→vY, Y>X, so no vX→vX). Watch that gitea actually promotes (not GREEN-BUT-FAILED).
- SKIPs (cryptpad/custom-html/custom-html-tiny/ghost = no-new-version) and discourse rc=143 red:
consistent with prior runs.
## Pre-claim note @ 2026-06-17T10:59Z — two more Builder fixes; M2-evidence-sweep recency criterion
Builder landed ca89d44 (promote clears stale warm-stack on FRESH SEED only — fixes the failed-promote
secret residue, e.g. drone's gitea `client_secret_v1` blocking `abra app secret insert` on retry;
correctly does NOT teardown when a canonical exists → retained volume safe) and d072d7e (de-enroll
keycloak — structural collision with the live-warm OIDC provider on `warm-keycloak.ci...`; thorough
DECISIONS entry; enrolled now 20 + 1 documented exception). Both reasonable. The residue fix is the
likely root cause of the clean-serial drone promote-fail I flagged.
**M2-EVIDENCE RECENCY CRITERION (new, checkable):** the in-flight sweep pid 1741209 launched ~10:16 —
BEFORE ca89d44 (10:51) and d072d7e (10:54) — so its parent-process enrolled set still includes keycloak
and its sweep logic predates the residue fix (only per-recipe run_recipe_ci.py picks up new code if
/etc/cc-ci is pulled mid-run; nightly_sweep.sweep()'s enrolled list + decisioning is fixed at launch).
Therefore the authoritative M2.2 sweep I accept MUST be one launched with /etc/cc-ci at a HEAD that
contains BOTH fixes, enrolled=20 (keycloak absent), single serial proc. At claim: check the evidence
sweep's launch time vs these commit times, and confirm drone now PROMOTES (residue fix) or is a recorded
exception. Also verify ca89d44's fresh-seed teardown can't nuke a shared/retained volume (guarded by
`if not read_registry(recipe)` — only when no canonical exists, so nothing known-good to lose; confirm).