note(canon): pre-claim finding — sweep PASS-label vs actual promote failures (4/5), determinism risk; evidence captured for M2 verification
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
@ -171,3 +171,43 @@ for greens / reds left intact / no-new-tag skipped (M2.2); run-twice→skip-all
|
||||
untagged (M2.4); real timer fire advances canonicals via full main() incl. roll (M2.5); samever never
|
||||
fires in-sweep (M2.6); disk budget recorded (M2.7); §2.G UPGRADE_BASE_VERSION retirement (M2.8).
|
||||
Staying read-only while the sweep is in flight (single node).
|
||||
|
||||
---
|
||||
|
||||
## Pre-claim finding @ 2026-06-17T08:40Z — M2.2 sweep: PASS-labelled but promotes mostly FAILING (evidence captured)
|
||||
|
||||
NOT a verdict (M2 unclaimed). Read-only capture from `/root/canon-verify/_sweep.log` so the evidence
|
||||
survives log growth. Per-recipe promote outcomes observed (alphabetical sweep, ~7 recipes deep):
|
||||
- bluesky-pds: cold rc=0; `WC5 promote failed: abra app deploy warm-bluesky-pds… failed (1)` → NO canonical; logged `PASS (promoted)`.
|
||||
- cryptpad: cold rc=0; `canonical cryptpad advanced to known-good 0.6.0+v2026.5.1` → canonical WRITTEN. ✓ (the only real promote so far)
|
||||
- custom-html: SKIP no-new-version (pre-existing canonical). ✓ expected.
|
||||
- custom-html-tiny: cold rc=0; `WC5 promote failed: warm-custom-html-tiny… not healthy over HTTPS / (404)` → NO canonical; logged `PASS (promoted)`.
|
||||
- discourse: cold rc=142 (deploy timeout — the 51m wedge I flagged) → `FAIL (canonical unchanged)`. Legit red.
|
||||
- drone: cold rc=0; `WC5 promote failed: …warm-drone… timed out after 600 seconds` → NO canonical; logged `PASS (promoted)`.
|
||||
- ghost: cold rc=0; `WC5 promote failed: abra app new ghost… failed (1)` → NO canonical; logged `PASS (promoted)`.
|
||||
- gitea: promote in progress at capture.
|
||||
Live `/var/lib/ci-warm/*/canonical.json` = {cryptpad, custom-html} only. NET NEW this sweep = 1 (cryptpad).
|
||||
Leftover warm volumes w/ NO registry record: drone, gitea, custom-html-tiny (partial-promote residue).
|
||||
|
||||
**DEFECT-1 [adversary] (results-label):** `nightly_sweep.sweep()` line ~119 sets
|
||||
`results[r] = "PASS (promoted)" if rc==0 else "FAIL …"`. Because `promote_canonical` is non-fatal
|
||||
(swallows its own exception so it "never fails a green run"), a FAILED promote still yields rc=0 →
|
||||
the summary asserts "PASS (promoted)" when NO canonical was written. The per-recipe results log — the
|
||||
DoD's evidence that "canonicals actually promoted for the green recipes" — is therefore UNTRUSTWORTHY.
|
||||
Repro: `grep "WC5 promote failed" _sweep.log` vs `grep "PASS (promoted)" _sweep.log` — failed promotes
|
||||
appear in BOTH. Fix direction: label from "does a canonical record now exist at the tested version",
|
||||
not from rc.
|
||||
|
||||
**DEFECT-2 [adversary] (promote path failing broadly):** 4 of 5 completed promotes FAILED across 4
|
||||
modes (warm `app deploy` failed(1) / timed-out 600s / unhealthy-404 / `app new` failed(1)). Cold CI is
|
||||
green for each, so this is specifically the WARM-CANONICAL promote deploy failing — the exact
|
||||
end-to-end step this phase exists to make real. Root cause TBD (node contention on the long serial
|
||||
run / unclean cold-test teardown / discourse residue / flat 600s warm timeout) — Builder's to diagnose.
|
||||
|
||||
**Determinism risk (M2.3):** every recipe left without a canonical (bluesky-pds, custom-html-tiny,
|
||||
drone, ghost, discourse…) will `sweep_decision(latest, None) → run` on a second sweep, NOT skip — so
|
||||
run-twice ≠ skip-all until promotes actually succeed. I will hard-test this at the M2 claim.
|
||||
|
||||
Sent the Builder a BUILDER-INBOX heads-up (ba28a88). When M2 is claimed I will cold-verify, per recipe,
|
||||
that a canonical record exists at the tested tag version (not trust the PASS label), and re-run the
|
||||
determinism no-op myself. If promotes are still failing / mislabelled, M2 FAILs.
|
||||
|
||||
Reference in New Issue
Block a user