review(rcust): M2 proof-run cold analysis — 3/6 (immich/mattermost/plausible) reproduce baseline L4 at baseline ref on merged main (restructure innocent); discourse L4->L1 upgrade-HC1 at baseline ref UNexplained (A/B was at wrong ref) + lasuite-drive needs fresh L5 post-fix-forward; M2 OPEN
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
autonomic-bot
2026-06-10 23:54:36 +00:00
parent 5c0676b7d0
commit 40b59b356b
2 changed files with 82 additions and 0 deletions

View File

@ -307,3 +307,59 @@ ports), filtering for non-mechanical error-handling (raise/assert/except/exit/ti
Net: exactly ONE accidental hook-port regression (lasuite-drive), now under approved fix. No other
best-effort↔fatal flips. This audit closes the M1-method gap for the hook bodies.
---
### M2 proof-run independent analysis (cold, Adversary) @2026-06-10T23:53Z
M2 is NOT yet claimed by the Builder; this is my independent read of the proof runs sitting on
cc-ci (`/var/lib/cc-ci-runs/{m2b-*,ab-*-oldmain}`), parsed myself via jq (NOT trusting Builder
narrative). The 6 first-sweep mismatches break down as follows.
**Confirmed root fact — REF MISMATCH is real (I verified, not taken on faith).** Every baseline
matrix run used a *PR-head* ref; the first M2.3 sweep used each mirror's *default-branch head* — a
different commit. Independently confirmed via `results.json.ref`:
| recipe | baseline run/ref/level | sweep ref/level |
|---|---|---|
| discourse | 184 / 7ae7b0f76efb / L4 | 7d53d4ec390f / L2 |
| plausible | 308 / 13458fac56a1 / L4 | da159375d89a / L2 |
| mattermost-lts | 196 / a333e31a6002 / L4 | 41c9eb8e5f34 / L2 |
| immich | 307 / 107d7220adce / L4 | 7eb3937a82d0 / L2 |
| lasuite-drive | 189 / ffa7d585afa2 / L5 | f4135d78201e / L0 |
So the sweep was NOT apples-to-apples vs the baseline matrix. Reconciliation requires either
(a) re-run at the baseline ref on new main == baseline level, or (b) A/B same-ref old-vs-new main
== same level. Status per recipe:
- **immich** — m2b-immich (new main, baseline ref 107d7220adce) = **L4 == baseline L4. CLEAN.**
- **mattermost-lts** — m2b (new main, a333e31a6002) = **L4 == baseline L4. CLEAN.**
- **plausible** — m2b (new main, 13458fac56a1) = **L4 == baseline L4. CLEAN.**
→ these three: restructure proven INNOCENT (baseline ref reproduces baseline level on merged main).
- **bluesky-pds** — ab-bluesky-pds-oldmain (OLD main, b2d86efba3f1) = L0 == new-main sweep L0 at
same ref → restructure-NEUTRAL at the sweep ref. (Baseline is "L4-equiv, pre-results-era", no run
id — softer baseline; A/B neutrality is the available evidence.)
- **discourse — NOT yet clean. OPEN.** Two *distinct* flake modes seen, and the A/B was run at the
wrong ref to close the gap:
- baseline 184 (OLD main, 7ae7b0f): all pass → L4.
- m2b-discourse (NEW main, SAME ref 7ae7b0f): **upgrade FAILED**, HC1 guard fired —
"upgrade deployed chaos commit 'eb96de94+U', not intended PR-head '7ae7b0f76efb' — re-checkout
to code-under-test failed (HC1)" → L1. ← same-ref old=L4 vs new=L1 discrepancy, UNexplained.
- ab-discourse-oldmain (OLD main, 7d53d4ec): **restore FAILED** (ci_marker truncated-dump race)
→ L2 == new-main sweep L2 at that ref → neutrality proven, but for the RESTORE mode at the
DEFAULT-head ref, NOT for the L1/upgrade-HC1 mode at the baseline ref.
- Net: the clean A/B (ref 7ae7b0f on OLD main vs NEW main) that would explain L4→L1 was NOT run.
The upgrade re-checkout/HC1 path lives in run_recipe_ci.py/lifecycle which the meta-param
threading DID touch — so "pre-existing flake" is plausible but UNPROVEN here. To clear: run
discourse @7ae7b0f on OLD main (does it deterministically reproduce L4, or also flake to L1?),
and/or repeat @7ae7b0f on new main to characterise the HC1 re-checkout as a race. The HC1 guard
FIRING (not silently passing the wrong commit) is the safety net working — good — but it means
the upgrade did not exercise the PR code, so the run is inconclusive, not a clean baseline match.
- **lasuite-drive** — fix-forward 1357544 (restore best-effort bucket poll) landed; needs a fresh
L5 run at the baseline ref ffa7d585afa2 on merged main to confirm baseline. m2rr/earlier runs
predate or used the default head — NOT yet a clean baseline match. OPEN.
**M2 disposition: still OPEN — no PASS.** 3/6 cleanly reconciled (immich/mattermost/plausible);
bluesky neutral-at-sweep-ref; discourse + lasuite-drive NOT yet closed. I will require, at the M2
claim: (1) discourse same-ref A/B (or repeat) explaining L4→L1; (2) a clean lasuite-drive L5 at
baseline ref; (3) my own cold re-parse of every per-recipe level vs baseline; (4) the M2.4
customization-executed spot-greps; (5) zero leaked apps. Recorded a BUILDER-INBOX heads-up on the
discourse-HC1 gap so it is addressed in the claim, not glossed as "the restore flake".

View File

@ -0,0 +1,26 @@
# Builder inbox — from Adversary @2026-06-10T23:53Z (M2 proof-run heads-up, non-gate)
I cold-parsed the proof runs on cc-ci (m2b-*, ab-*-oldmain) myself. Good news first:
immich / mattermost-lts / plausible all reproduce **baseline L4 at the baseline ref on merged
main** (m2b-*) — restructure proven innocent for those three. bluesky-pds is restructure-neutral
at the sweep ref (ab-oldmain L0 == sweep L0).
**Two recipes are NOT yet cleanly reconciled — please close before you claim M2:**
1. **discourse — the L4→L1 same-ref discrepancy is the gap.** Your A/B (ab-discourse-oldmain)
ran ref **7d53d4ec** (default head) → L2 old == L2 new, which proves neutrality for the *restore*
race at THAT ref. But m2b-discourse ran the *baseline* ref **7ae7b0f** on new main and got **L1
via an UPGRADE HC1 failure** ("deployed chaos commit 'eb96de94+U', not PR-head 7ae7b0f — re-checkout
failed"), whereas baseline 184 at that SAME ref was L4. That's a different stage/mode than the
restore race, and there is no same-ref A/B for it. The upgrade re-checkout path is in
run_recipe_ci.py/lifecycle, which your meta-param threading touched — so I can't accept
"pre-existing flake" on faith here. Please run discourse @**7ae7b0f76efb** on OLD main (pre-merge
commit) — if it deterministically gives L4, that's a new-main regression to root-cause; if it
also flakes to L1, that characterises the HC1 re-checkout as a race. A couple repeats @7ae7b0f on
new main would also help. I'll cold re-verify whatever you produce.
2. **lasuite-drive** — your fix-forward 1357544 landed AFTER the m2rr/sweep runs. I need a fresh
**L5 run at the baseline ref ffa7d585afa2** on merged main (post-1357544) to confirm baseline.
Not blocking your other M2.4 spot-grep work — just don't let discourse get folded into "the restore
flake cluster" in the claim; it's now also an upgrade-recheckout mode at the baseline ref.