note(canon): pre-claim finding — sweep PASS-label vs actual promote failures (4/5), determinism risk; evidence captured for M2 verification

2026-06-17 08:40:41 +00:00
parent ba28a8897a
commit d933585e92
1 changed files with 40 additions and 0 deletions
--- a/machine-docs/REVIEW-canon.md
+++ b/machine-docs/REVIEW-canon.md
@ -171,3 +171,43 @@ for greens / reds left intact / no-new-tag skipped (M2.2); run-twice→skip-all
 untagged (M2.4); real timer fire advances canonicals via full main() incl. roll (M2.5); samever never
 fires in-sweep (M2.6); disk budget recorded (M2.7); §2.G UPGRADE_BASE_VERSION retirement (M2.8).
 Staying read-only while the sweep is in flight (single node).
+
+---
+
+## Pre-claim finding @ 2026-06-17T08:40Z — M2.2 sweep: PASS-labelled but promotes mostly FAILING (evidence captured)
+
+NOT a verdict (M2 unclaimed). Read-only capture from `/root/canon-verify/_sweep.log` so the evidence
+survives log growth. Per-recipe promote outcomes observed (alphabetical sweep, ~7 recipes deep):
+- bluesky-pds: cold rc=0; `WC5 promote failed: abra app deploy warm-bluesky-pds… failed (1)` → NO canonical; logged `PASS (promoted)`.
+- cryptpad: cold rc=0; `canonical cryptpad advanced to known-good 0.6.0+v2026.5.1` → canonical WRITTEN. ✓ (the only real promote so far)
+- custom-html: SKIP no-new-version (pre-existing canonical). ✓ expected.
+- custom-html-tiny: cold rc=0; `WC5 promote failed: warm-custom-html-tiny… not healthy over HTTPS / (404)` → NO canonical; logged `PASS (promoted)`.
+- discourse: cold rc=142 (deploy timeout — the 51m wedge I flagged) → `FAIL (canonical unchanged)`. Legit red.
+- drone: cold rc=0; `WC5 promote failed: …warm-drone… timed out after 600 seconds` → NO canonical; logged `PASS (promoted)`.
+- ghost: cold rc=0; `WC5 promote failed: abra app new ghost… failed (1)` → NO canonical; logged `PASS (promoted)`.
+- gitea: promote in progress at capture.
+Live `/var/lib/ci-warm/*/canonical.json` = {cryptpad, custom-html} only. NET NEW this sweep = 1 (cryptpad).
+Leftover warm volumes w/ NO registry record: drone, gitea, custom-html-tiny (partial-promote residue).
+
+**DEFECT-1 [adversary] (results-label):** `nightly_sweep.sweep()` line ~119 sets
+`results[r] = "PASS (promoted)" if rc==0 else "FAIL …"`. Because `promote_canonical` is non-fatal
+(swallows its own exception so it "never fails a green run"), a FAILED promote still yields rc=0 →
+the summary asserts "PASS (promoted)" when NO canonical was written. The per-recipe results log — the
+DoD's evidence that "canonicals actually promoted for the green recipes" — is therefore UNTRUSTWORTHY.
+Repro: `grep "WC5 promote failed" _sweep.log` vs `grep "PASS (promoted)" _sweep.log` — failed promotes
+appear in BOTH. Fix direction: label from "does a canonical record now exist at the tested version",
+not from rc.
+
+**DEFECT-2 [adversary] (promote path failing broadly):** 4 of 5 completed promotes FAILED across 4
+modes (warm `app deploy` failed(1) / timed-out 600s / unhealthy-404 / `app new` failed(1)). Cold CI is
+green for each, so this is specifically the WARM-CANONICAL promote deploy failing — the exact
+end-to-end step this phase exists to make real. Root cause TBD (node contention on the long serial
+run / unclean cold-test teardown / discourse residue / flat 600s warm timeout) — Builder's to diagnose.
+
+**Determinism risk (M2.3):** every recipe left without a canonical (bluesky-pds, custom-html-tiny,
+drone, ghost, discourse…) will `sweep_decision(latest, None) → run` on a second sweep, NOT skip — so
+run-twice ≠ skip-all until promotes actually succeed. I will hard-test this at the M2 claim.
+
+Sent the Builder a BUILDER-INBOX heads-up (ba28a88). When M2 is claimed I will cold-verify, per recipe,
+that a canonical record exists at the tested tag version (not trust the PASS label), and re-run the
+determinism no-op myself. If promotes are still failing / mislabelled, M2 FAILs.