review(rcust): M2 proof-run cold analysis — 3/6 (immich/mattermost/plausible) reproduce baseline L4 at baseline ref on merged main (restructure innocent); discourse L4->L1 upgrade-HC1 at baseline ref UNexplained (A/B was at wrong ref) + lasuite-drive needs fresh L5 post-fix-forward; M2 OPEN

2026-06-10 23:54:36 +00:00
parent 5c0676b7d0
commit 40b59b356b
2 changed files with 82 additions and 0 deletions
--- a/REVIEW-rcust.md
+++ b/REVIEW-rcust.md
@ -307,3 +307,59 @@ ports), filtering for non-mechanical error-handling (raise/assert/except/exit/ti

 Net: exactly ONE accidental hook-port regression (lasuite-drive), now under approved fix. No other
 best-effort↔fatal flips. This audit closes the M1-method gap for the hook bodies.
+
+---
+
+### M2 proof-run independent analysis (cold, Adversary) @2026-06-10T23:53Z
+
+M2 is NOT yet claimed by the Builder; this is my independent read of the proof runs sitting on
+cc-ci (`/var/lib/cc-ci-runs/{m2b-*,ab-*-oldmain}`), parsed myself via jq (NOT trusting Builder
+narrative). The 6 first-sweep mismatches break down as follows.
+
+**Confirmed root fact — REF MISMATCH is real (I verified, not taken on faith).** Every baseline
+matrix run used a *PR-head* ref; the first M2.3 sweep used each mirror's *default-branch head* — a
+different commit. Independently confirmed via `results.json.ref`:
+| recipe | baseline run/ref/level | sweep ref/level |
+|---|---|---|
+| discourse | 184 / 7ae7b0f76efb / L4 | 7d53d4ec390f / L2 |
+| plausible | 308 / 13458fac56a1 / L4 | da159375d89a / L2 |
+| mattermost-lts | 196 / a333e31a6002 / L4 | 41c9eb8e5f34 / L2 |
+| immich | 307 / 107d7220adce / L4 | 7eb3937a82d0 / L2 |
+| lasuite-drive | 189 / ffa7d585afa2 / L5 | f4135d78201e / L0 |
+So the sweep was NOT apples-to-apples vs the baseline matrix. Reconciliation requires either
+(a) re-run at the baseline ref on new main == baseline level, or (b) A/B same-ref old-vs-new main
+== same level. Status per recipe:
+
+- **immich** — m2b-immich (new main, baseline ref 107d7220adce) = **L4 == baseline L4. CLEAN.**
+- **mattermost-lts** — m2b (new main, a333e31a6002) = **L4 == baseline L4. CLEAN.**
+- **plausible** — m2b (new main, 13458fac56a1) = **L4 == baseline L4. CLEAN.**
+  → these three: restructure proven INNOCENT (baseline ref reproduces baseline level on merged main).
+- **bluesky-pds** — ab-bluesky-pds-oldmain (OLD main, b2d86efba3f1) = L0 == new-main sweep L0 at
+  same ref → restructure-NEUTRAL at the sweep ref. (Baseline is "L4-equiv, pre-results-era", no run
+  id — softer baseline; A/B neutrality is the available evidence.)
+- **discourse — NOT yet clean. OPEN.** Two *distinct* flake modes seen, and the A/B was run at the
+  wrong ref to close the gap:
+  - baseline 184 (OLD main, 7ae7b0f): all pass → L4.
+  - m2b-discourse (NEW main, SAME ref 7ae7b0f): **upgrade FAILED**, HC1 guard fired —
+    "upgrade deployed chaos commit 'eb96de94+U', not intended PR-head '7ae7b0f76efb' — re-checkout
+    to code-under-test failed (HC1)" → L1.  ← same-ref old=L4 vs new=L1 discrepancy, UNexplained.
+  - ab-discourse-oldmain (OLD main, 7d53d4ec): **restore FAILED** (ci_marker truncated-dump race)
+    → L2 == new-main sweep L2 at that ref → neutrality proven, but for the RESTORE mode at the
+    DEFAULT-head ref, NOT for the L1/upgrade-HC1 mode at the baseline ref.
+  - Net: the clean A/B (ref 7ae7b0f on OLD main vs NEW main) that would explain L4→L1 was NOT run.
+    The upgrade re-checkout/HC1 path lives in run_recipe_ci.py/lifecycle which the meta-param
+    threading DID touch — so "pre-existing flake" is plausible but UNPROVEN here. To clear: run
+    discourse @7ae7b0f on OLD main (does it deterministically reproduce L4, or also flake to L1?),
+    and/or repeat @7ae7b0f on new main to characterise the HC1 re-checkout as a race. The HC1 guard
+    FIRING (not silently passing the wrong commit) is the safety net working — good — but it means
+    the upgrade did not exercise the PR code, so the run is inconclusive, not a clean baseline match.
+- **lasuite-drive** — fix-forward 1357544 (restore best-effort bucket poll) landed; needs a fresh
+  L5 run at the baseline ref ffa7d585afa2 on merged main to confirm baseline. m2rr/earlier runs
+  predate or used the default head — NOT yet a clean baseline match. OPEN.
+
+**M2 disposition: still OPEN — no PASS.** 3/6 cleanly reconciled (immich/mattermost/plausible);
+bluesky neutral-at-sweep-ref; discourse + lasuite-drive NOT yet closed. I will require, at the M2
+claim: (1) discourse same-ref A/B (or repeat) explaining L4→L1; (2) a clean lasuite-drive L5 at
+baseline ref; (3) my own cold re-parse of every per-recipe level vs baseline; (4) the M2.4
+customization-executed spot-greps; (5) zero leaked apps. Recorded a BUILDER-INBOX heads-up on the
+discourse-HC1 gap so it is addressed in the claim, not glossed as "the restore flake".
--- a/machine-docs/BUILDER-INBOX.md
+++ b/machine-docs/BUILDER-INBOX.md
@ -0,0 +1,26 @@
+# Builder inbox — from Adversary @2026-06-10T23:53Z (M2 proof-run heads-up, non-gate)
+
+I cold-parsed the proof runs on cc-ci (m2b-*, ab-*-oldmain) myself. Good news first:
+immich / mattermost-lts / plausible all reproduce **baseline L4 at the baseline ref on merged
+main** (m2b-*) — restructure proven innocent for those three. bluesky-pds is restructure-neutral
+at the sweep ref (ab-oldmain L0 == sweep L0).
+
+**Two recipes are NOT yet cleanly reconciled — please close before you claim M2:**
+
+1. **discourse — the L4→L1 same-ref discrepancy is the gap.** Your A/B (ab-discourse-oldmain)
+   ran ref **7d53d4ec** (default head) → L2 old == L2 new, which proves neutrality for the *restore*
+   race at THAT ref. But m2b-discourse ran the *baseline* ref **7ae7b0f** on new main and got **L1
+   via an UPGRADE HC1 failure** ("deployed chaos commit 'eb96de94+U', not PR-head 7ae7b0f — re-checkout
+   failed"), whereas baseline 184 at that SAME ref was L4. That's a different stage/mode than the
+   restore race, and there is no same-ref A/B for it. The upgrade re-checkout path is in
+   run_recipe_ci.py/lifecycle, which your meta-param threading touched — so I can't accept
+   "pre-existing flake" on faith here. Please run discourse @**7ae7b0f76efb** on OLD main (pre-merge
+   commit) — if it deterministically gives L4, that's a new-main regression to root-cause; if it
+   also flakes to L1, that characterises the HC1 re-checkout as a race. A couple repeats @7ae7b0f on
+   new main would also help. I'll cold re-verify whatever you produce.
+
+2. **lasuite-drive** — your fix-forward 1357544 landed AFTER the m2rr/sweep runs. I need a fresh
+   **L5 run at the baseline ref ffa7d585afa2** on merged main (post-1357544) to confirm baseline.
+
+Not blocking your other M2.4 spot-grep work — just don't let discourse get folded into "the restore
+flake cluster" in the claim; it's now also an upgrade-recheckout mode at the baseline ref.