Files
cc-ci/REVIEW-rcust.md

542 lines
40 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW-rcust.md — Adversary ledger for the recipe-customization restructure phase
SSOT for this phase: `/srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md`.
Gates: **M1** (implementation verified — branch `restructure/recipe-custom`, unit+concurrency+lint
green on cold clone, resolved-customization diff clean for all 21 recipes, adversarial diff review)
and **M2** (merged + real-CI regression sweep matching baseline matrix). DONE requires fresh PASS
for both with no open VETO.
I own this file and the `## Adversary findings` section of BACKLOG-rcust.md only.
---
## Standing watch items (what I will hunt at M1/M2)
- **Coverage loss** (cardinal risk): for every migrated recipe, old loaders' effective customization
values must equal new `meta.load()` values. Throwaway diff script over all 21 recipe dirs; any
delta = finding.
- **Assertion weakening** in `tests/<recipe>/` diffs — migrations must be mechanical only (signatures,
fixture/key renames, underscore prefixes). Any changed assert/expected value = VETO.
- **Deleted-code fallout** — dangling refs to `_recipe_meta`, `_load_meta`, `_recipe_extra_env`,
`_recipe_meta_flag`, `declared_deps`, `is_canonical_enrolled`, `OIDC_AT_INSTALL`,
`CHAOS_BASE_DEPLOY`, `SKIP_GENERIC`, `setup_custom_tests`, `deps_apps`, `deps_creds`, `deployed_app`.
- **Validation gaps** — typo'd key / wrong type / callable-on-data-key must raise MetaError, not pass.
- **R2 fixed end-to-end** — orchestrator load path delivers SCREENSHOT to screenshot.py.
- **HC2 / F2-11 integrity** — repo-local default-deny, requires_deps skip-report, generic floor
semantics all unchanged.
---
## Verdicts
_(no GATE verdict yet — M1 is not claimed. M1 only claims after P1P6 are all on the branch;
Builder has landed P1 (472a68b) + P2 (8cd72fd) and is mid-P3. The interim pre-review below is
front-loaded break-it work on the FROZEN P1/P2 commits — NOT an M1 PASS.)_
### Interim pre-review of frozen P1+P2 (branch @ 8cd72fd) — @2026-06-10, cold from upstream clone
Done as idle-time break-it work while no gate is pending. P1/P2 phase commits won't be rewritten
(Builder adds P3+ on top), so reviewing them now is non-wasted and front-loads M1. Cold clone of
`origin/restructure/recipe-custom` into `/tmp/rcust-verify` from the true upstream remote.
**No defects found so far.** Results:
1. **Deleted-code fallout — CLEAN.** Grepped `runner/ tests/ scripts/` for live refs to every deleted
symbol (`_recipe_meta`, `_load_meta`, `_recipe_extra_env`, `_recipe_meta_flag`, `declared_deps`,
`is_canonical_enrolled`, `OIDC_AT_INSTALL`, `CHAOS_BASE_DEPLOY`, `SKIP_GENERIC`,
`setup_custom_tests`, `deps_apps`, `deps_creds`, `deployed_app`). All hits are comments/docstrings
explaining the deletion, test names, or the intentionally-RETAINED `CCCI_SKIP_GENERIC*` env form
(kept per P2c). Zero live call-sites. `setup_custom_tests.sh` files gone.
2. **All-recipes-load-clean (typo gate) — PASS, independently.** Ran `meta.load()` (pure stdlib) over
all 21 recipe dirs cold via plain python3 (did NOT trust the Builder's test_meta.py). All 21 load;
non-default key sets sane. Every ALL-CAPS key used in any recipe_meta.py is in the 14-key registry.
3. **Coverage-loss diff (CARDINAL check) — ZERO deltas on data keys + hook presence.** Throwaway
harness (`/tmp/diff_meta.py`) reproduces main's six-loader effective resolution (`_load_meta`,
`declared_deps`, `is_enrolled`, `_recipe_extra_env`) from MAIN's recipe_meta files and diffs vs the
BRANCH's `meta.load()` for all 21 recipes. After correcting one harness artifact (EXTRA_ENV default
is `{}` not None), **0/21 recipes show any delta** for HEALTH_PATH/HEALTH_OK/DEPLOY_TIMEOUT/
HTTP_TIMEOUT/BACKUP_CAPABLE/EXPECTED_NA/UPGRADE_BASE_VERSION/DEPS/WARM_CANONICAL + presence of
READY_PROBE/BACKUP_VERIFY/UPGRADE_EXTRA_ENV/EXTRA_ENV/SCREENSHOT.
4. **Validation gaps — CLOSED.** Crafted tmp recipe_metas: typo'd key → MetaError (with "did you mean
DEPLOY_TIMEOUT?"); wrong type (`DEPLOY_TIMEOUT="str"`) → MetaError; callable on data key
(`DEPLOY_TIMEOUT=lambda ctx:...`) → MetaError; `_PRIVATE`/lowercase-helper → loads clean (exemption
works). All four behave per the locked decision.
5. **meta.py read** — single `exec()`, frozen `RecipeMeta` generated from `KEYS`, `_coerce` rejects
bool-as-int and callable-on-data-key; `non_default` compares vs registry default. No issues.
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** full `pytest tests/unit -q` +
`pytest tests/concurrency -q` + `scripts/lint.sh` cold on the cc-ci host; R2 end-to-end through the
real orchestrator screenshot path; P3 ctx-hook signature migration (assert byte-identical, legacy
`lambda domain:` raises clear MetaError); P4/P5/P6; re-run the coverage diff on the FINAL branch
(P3 changes hook signatures); recipe-test diffs are mechanical-only (no assertion weakening);
HC2/F2-11/generic-floor integrity. These wait for the `claim(rcust): M1`.
### Interim pre-review of frozen P3 (branch @ fd02d9f) — @2026-06-10, cold from upstream clone
Builder landed P3 (uniform ctx hook convention) and moved to P4, so P3 is frozen. Pre-reviewed it.
**No defects found.**
1. **Mechanical-migration discipline — HELD (no VETO trigger).** `git diff 8cd72fd..fd02d9f` over
`tests/*/` shows ZERO changed assert/expected literals. Every hook change is purely
`def HOOK(domain[, meta])``def HOOK(ctx)` + `domain``ctx.domain` in the body. Spot-checked
cryptpad/mumble/ghost/lasuite-drive recipe_meta.py + lasuite-drive ops.py: seeded values, return
dicts, paths, status codes, and the `pre_restore` `assert _psql(...) in (...)` are byte-identical
apart from the `ctx.` deref.
2. **HookCtx — present + complete.** `meta.HookCtx` frozen dataclass has all 5 documented fields
(`.domain`, `.base_url`, `.meta`, `.deps`, `.op`); `meta.hook_ctx(domain, meta, op=…)` factory
builds it and pulls `deps` from `$CCCI_DEPS_FILE`. All call sites migrated: run_recipe_ci
`pre_<op>`, BACKUP_VERIFY; lifecycle `extra_env` + READY_PROBE; screenshot `SCREENSHOT(page, ctx)`.
(NB my first pass falsely flagged "no HookCtx" — that was a STALE WORKTREE at P2; corrected by
checking out fd02d9f. Logged here for honesty.)
3. **Legacy-signature guard (P3.4) — PRESENT + works, live-probed.** `meta.check_hook_signature`
exact-matches positional params and raises a CLEAR MetaError naming the P3 migration + HookCtx
fields. Wired into both `load()` (recipe_meta hooks; SCREENSHOT expects `(page, ctx)`, rest
`(ctx)`) and the orchestrator (ops.py `pre_<op>`). Crafted tmp metas: legacy `READY_PROBE(domain)`,
`SCREENSHOT(page, domain, meta)`, `EXTRA_ENV(domain)` all → MetaError at load; `READY_PROBE(ctx)`
loads clean. No silent mid-run TypeError path.
4. **Coverage diff re-run at P3 head — still 0/21 deltas** (hook presence + all data keys unchanged).
Net: P1+P2+P3 all clean under cold adversarial probing. M1 still gated on full unit+concurrency+lint
on the cc-ci host, P4P6, R2 end-to-end via the real screenshot orchestrator path, and a final
coverage re-diff. No findings filed; no VETO.
### Interim pre-review of frozen P4 (branch @ 29a28e2) — @2026-06-10T18:55Z, cold from fresh host clone
Builder landed P4 (custom-test ergonomics) and moved to P5, so P4 is frozen. Pre-reviewed it cold.
**No defects found.** NOT an M1 verdict — M1 stays gated (see "Still UNVERIFIED" below).
Cold acceptance (fresh `git clone` on cc-ci host at 29a28e2, my own checkout — not the Builder's):
- `cc-ci-run -m pytest tests/unit -q`**184 passed** (exact match to claim; full suite, no
cross-fixture pollution from the session-scoped `deps` fixture).
- `cc-ci-run -m pytest tests/unit/test_discovery.py test_discovery_phase2.py
test_conftest_fixtures.py -q` → 14 passed.
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS** (ruff format/check, deadnix,
shfmt, shellcheck, yamllint all clean).
Correctness probes:
1. **Placement-rule claim ("zero in-repo users of top-level custom tests") — HOLDS.** Filesystem
sweep of every `tests/<recipe>/test_*.py`: ALL are lifecycle names (test_{install,upgrade,
backup,restore}.py). No top-level non-lifecycle custom exists in-repo, so dropping the top-level
glob in `discovery.custom_tests` loses ZERO coverage. The lifecycle-name exclusion is retained
inside functional/playwright as the double-run safety net.
2. **Discovery diff — clean.** Top-level `glob(test_*.py)` branch removed; functional/ + playwright/
subdir globs retained with `basename not in lifecycle_names` guard. Docstring + module header
updated to state the placement RULE.
3. **Test changes are adaptation + strengthening, NOT weakening (no VETO trigger).**
- `test_discovery_phase2`: renamed to `..._placement_rule_...`; now ASSERTS the top-level
`test_sso_smoke.py` is `not in names` (new negative assertion proving the behavior change),
while functional/playwright customs are still `in names` and lifecycle name excluded.
- `test_discovery::test_custom_tests_repo_local_gated`: repo-local custom moved from top-level
into `functional/`; HC2 default-deny (`== []` when unapproved) and approved-case
(`functional/test_sso.py in names`, `test_install.py` excluded) both INTACT. HC2 integrity
preserved.
4. **op_state fixture — correct.** Skips with clear reason on unset env / missing file / non-JSON
(`except ValueError` catches JSONDecodeError); reads & returns parsed dict otherwise. Tests
cover 3 of 4 paths (the non-JSON skip path is untested — minor coverage gap, not a defect; the
branch is trivially correct by inspection).
Net: P1+P2+P3+P4 all clean under cold adversarial probing; both halves of every phase claim
(unit count + lint) reproduced cold on a fresh clone. No findings filed; no VETO.
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** P5 (manifest) + P6 (docs);
`pytest tests/concurrency -q` cold; R2 end-to-end through the real orchestrator screenshot path;
final coverage re-diff on the COMPLETE branch (P1P6, all 21 recipes, effective customization set
unchanged); recipe-test diffs mechanical-only across the whole branch; HC2/F2-11/generic-floor
integrity at the final head. These wait for `claim(rcust): M1`.
### Interim pre-review of frozen P5 (branch @ 68954be) — @2026-06-10T19:06Z, cold from fresh host clone
Builder landed P5 (customization manifest) and moved to P6, so P5 is frozen. Pre-reviewed it cold.
**No blocking defect; one secret-SURFACE observation raised (heads-up to Builder, NOT a VETO, NOT
an M1 secret-leak failure).** NOT an M1 verdict.
Cold acceptance (fresh `git clone` on cc-ci host at 68954be, my own checkout):
- `cc-ci-run -m pytest tests/unit -q` → **191 passed** (exact match to claim).
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
Primary adversarial target — SECRET LEAKAGE via the new manifest surface (D-gate: published logs +
dashboard contain NO secrets, incl. generated app passwords):
1. **Generated/runtime secrets — NOT exposed (gate holds).** `manifest.build` collects only:
`meta_non_default` (static recipe_meta), hook NAMES (pre-ops/install_steps.sh/compose.ccci.yml),
overlay FILENAMES, custom-test COUNTS, and env-override KEY names (printed `KEY=1`, value never
rendered). It never touches `deps` (client_secret), `op_state`, abra-generated app passwords, or
any env VALUE. The cardinal concern — generated app passwords on the dashboard — is structurally
absent from this surface.
2. **Cold all-recipes sweep.** Built+rendered the manifest for all 21 recipes on the host; grepped
the rendered blocks AND the results.json `customization` payload for secret/password/token/key/
credential and for any 32+ char high-entropy string. The ONLY hit, across every recipe, is
plausible's `EXTRA_ENV.SECRET_KEY_BASE` =
`"ccciplausibletestkeybase64charsexactlyforCIephemeral4567890123"`.
3. **OBSERVATION (not a leak):** that value is a HARDCODED, committed, PUBLIC dummy CI constant
(tests/plausible/recipe_meta.py, in the open-source repo) — not a generated or real secret.
`meta_non_default` dumps EXTRA_ENV literal dicts verbatim into the log AND results.json (→
dashboard), so a field literally named `SECRET_KEY_BASE` with a value now appears on the
dashboard. No real secret is exposed (it's public), so this is NOT a D-gate failure and does NOT
block P5. BUT it's a standing surface: (a) a dashboard secret-scan gets a true-positive-shaped
hit on a public dummy (noise that could mask a real leak), and (b) if any recipe ever set a real
secret-ish literal in a meta dict, the manifest would surface it unredacted. Flagged to Builder
via BUILDER-INBOX as a heads-up to consider redacting values of sensitive-named meta keys before
M1. Will re-examine on the real dashboard at the M1 cold-verify.
4. **HC2-honoring — confirmed.** Manifest routes ALL repo-local reads through `discovery._gated`
(ops.py loop direct; `install_steps`/`resolve_overlay_op`/`custom_tests` each call `_gated`
internally). An unapproved repo-local recipe contributes nothing to the manifest.
5. **Pure presentation — holds.** `build()` only reads files/env and returns a dict; `render()`
formats a string. Called at run_recipe_ci.py:889-890 (print) + embedded at :1261 into results;
no state mutation, no verdict influence. `_jsonable` renders callables as `'<hook>'` (so a
callable EXTRA_ENV/READY_PROBE never leaks closure internals) and tuples→lists for JSON.
Net: P1P5 all clean under cold adversarial probing; every phase claim (unit count + lint)
reproduced cold. No findings filed; no VETO. One non-blocking secret-surface heads-up sent.
**Still UNVERIFIED for M1:** P6 (docs); `pytest tests/concurrency -q` cold; R2 end-to-end via the
real orchestrator screenshot path; final coverage re-diff on the COMPLETE branch (all 21 recipes,
effective customization unchanged); recipe-test diffs mechanical-only across the whole branch;
HC2/F2-11/generic-floor integrity at final head; AND — at the M1 dashboard check — confirm the
SECRET_KEY_BASE-named field on the real dashboard is the accepted public dummy (or redacted).
These wait for `claim(rcust): M1`.
## M1 — implementation verified: **PASS** @2026-06-10T19:27Z (branch `restructure/recipe-custom` @ 858e0f5)
Cold-verified from TWO fresh clones on the cc-ci host (NEW=858e0f5, OLD=main pre-restructure;
merge-base 49fb818 confirmed → `main..858e0f5` is exactly P1P6). Verdict formed from the phase plan
(SSOT), the code/git history, the STATUS verification facts, and my own cold re-runs — NOT from
JOURNAL rationale (isolation discipline; I did not need to consult JOURNAL).
**All M1 Definition-of-Done items PASS:**
1. **Cold test suites — match claim exactly.** Fresh clone @858e0f5:
`cc-ci-run -m pytest tests/unit -q` → **192 passed**; `tests/concurrency -q` → **23 passed**
(untouched by this plan, proven); `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
2. **Coverage diff (cardinal risk) — 0 REAL deltas / 21 recipes.** Wrote throwaway extractors that
resolve EVERY recipe's effective customization in BOTH worlds — OLD via the legacy loaders
(`_load_meta` + `lifecycle._recipe_extra_env` + `deps.declared_deps` + `_recipe_meta_flag`),
NEW via `meta.load()` + `meta.extra_env/upgrade_extra_env` — for the common keys (HEALTH_*,
timeouts, DEPS, EXTRA_ENV resolved at a fixed domain, UPGRADE_EXTRA_ENV, BACKUP_CAPABLE,
EXPECTED_NA, UPGRADE_BASE_VERSION, READY_PROBE/BACKUP_VERIFY presence). Diff = **0 behavioral
deltas**; the only raw diffs were 20× `UPGRADE_EXTRA_ENV: None→{}` (unset default representation,
behaviorally identical) and mumble (most-customized: callable EXTRA_ENV→dict, UPGRADE_EXTRA_ENV,
READY_PROBE) is **byte-identical** old↔new.
Deleted keys accounted for (no silent loss): `SKIP_GENERIC` (0 recipe users); `CHAOS_BASE_DEPLOY`
→ overlay-presence (discourse+ghost, exactly the two shipping compose.ccci.yml — perfect 1:1, no
change either direction); `OIDC_AT_INSTALL` → install-time made universal (drive+meet were
already install-time). **lasuite-docs** declared DEPS but NOT OIDC_AT_INSTALL → OLD post-install,
NEW install-time: an INTENTIONAL P2b consolidation, not a drop — flagged below for M2 validation.
3. **Assertion weakening (VETO-class) — NONE.** Full branch diff over all recipe test files
(excl. harness unit/concurrency/regression): 18 removed asserts, 18 added. After mechanical
normalization (`domain`→`ctx.domain`, `deps_creds`→`deps`, `MAX_USERS`→`_MAX_USERS`, whitespace)
the removed and added assert sets are **IDENTICAL** — zero unmatched in either direction. Every
change is a pure signature/fixture/constant rename; no expected value altered, no assert deleted.
Spot-confirmed discourse/ghost `_psql(domain,…ci_marker…) in (…)` → `ctx.domain` only (expected
tuple + SQL byte-identical). **No VETO.**
4. **Deleted-code fallout — clean.** No dangling LIVE refs to any of the 13 deleted symbols
(`_recipe_meta`/`_load_meta`/`_recipe_extra_env`/`_recipe_meta_flag`/`declared_deps`/
`is_canonical_enrolled`/`OIDC_AT_INSTALL`/`CHAOS_BASE_DEPLOY`/`SKIP_GENERIC`/`setup_custom_tests`/
`deps_apps`/`deps_creds`/`deployed_app`). Only residue: stale DOC/comment mentions of
`OIDC_AT_INSTALL` + `setup_custom_tests.sh` in PARITY.md files (non-blocking P6 cosmetic nit).
5. **Validation gaps — closed.** Cold-probed `meta.load()` with synthetic bad metas: typo'd key,
str-on-int, bool-as-int, callable-on-data-key, legacy hook sig `READY_PROBE(domain)`, and unknown
key ALL → `MetaError` (clear, names the offending file/key). Clean + underscore-private-helper
metas load fine (no false positives). No silent pass.
6. **R2 fixed end-to-end.** Cold proof through the REAL load path: a recipe declaring
`def SCREENSHOT(page, ctx)` is surfaced by `meta.load()` and resolved callable by
`screenshot._load_screenshot_hook` (old L1 allowlist dropped it — now arrives); orchestrator wires
it `run_recipe_ci.py:1029 capture(…, recipe_meta=meta)` → `hook(page, hook_ctx(domain, meta))`.
Absent recipe → None (default landing-page path). Legacy `SCREENSHOT(page, domain, meta)` sig
rejected at load.
7. **HC2 / F2-11 / generic-floor integrity — preserved.** Cold-probed `discovery.custom_tests` +
`install_steps`: UNAPPROVED repo-local → `[]` / `None` (default-deny holds); APPROVED → surfaced.
`sso_dep_unverified` (F2-11) logic UNCHANGED (only a comment edited) — a deps-not-ready run that
skips ≥1 `requires_deps` test still suppresses the green signal. Generic floor `_skip_generic`
default = run (additive); opt-out now env-only (same env vars as before; the 0-user meta key
removed) and surfaced LOUDLY in CI + flagged `!!` in the manifest — strictly stronger, never
silent.
8. **(Bonus) P5 secret-surface heads-up RESOLVED + verified.** The Builder landed `858e0f5`
redacting secret-named meta values in the manifest (my P5 BUILDER-INBOX ask). Cold-verified:
`plausible.EXTRA_ENV.SECRET_KEY_BASE` → `<redacted>` in BOTH the log block and results.json;
recursive into nested dict keys; word-segment `(^|_)KEY(_|$)` regex avoids over-match
(KEYCLOAK_* passes). All-21-recipe sweep: exactly 1 redaction, ZERO over-redaction, ZERO
under-redaction (no secret-shaped value remains). Regression test
`test_manifest_redacts_sensitive_named_values` present.
**Verdict: M1 PASS.** No findings filed, no VETO.
**This does NOT clear `## DONE`.** Per the phase DoD, DONE requires a fresh Adversary PASS for BOTH
M1 *and* M2. M2 (merged-main real-CI regression sweep vs the committed baseline matrix) is still
unverified. M2 watch-items I will specifically re-check from run logs:
- **lasuite-docs OIDC is now install-time** (post→install change above) — must pass a real run with
OIDC wired at install (skip-count 0 on its `requires_deps` tests).
- the customization spot-checks the plan §M2.4 enumerates (mumble READY_PROBE tcp lines, cryptpad
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + auto-chaos base deploy, lasuite-*
deps provisioning + OIDC tests ran, immich ops.py seeds, manifest block present in every log,
screenshot.png where capture succeeded).
- canary suite (RED canaries still caught at intended tier) + per-recipe level == baseline matrix.
- zero leaked apps after teardown.
### M2-prep — independent hook-port audit (shell→python / best-effort↔fatal drift) @2026-06-10T20:55Z
Triggered by the lasuite-drive regression (below), which my M1 PASS MISSED: my M1 coverage diff
compared recipe_meta KEYS (resolved values), not ops.py hook BODIES, and my assertion scan matched
`assert ` not `raise AssertionError`. So a hook that flipped best-effort→fatal was invisible to my
M1 method. M2 (real-CI sweep) caught it — the safety net working as designed. I then audited ALL
hook ports cold (`git diff c2508c7..origin/main` per recipe ops.py + the 2 setup_custom_tests.sh
ports), filtering for non-mechanical error-handling (raise/assert/except/exit/timeout/poll changes):
- **lasuite-drive `pre_install`** — GENUINE rcust regression (Builder-disclosed, I confirmed):
OLD setup_custom_tests.sh bucket poll fell through on 90s timeout (best-effort, no failure; the
custom-tier `test_minio_storage.py` upload→list→download is the real gate); NEW port added a
terminal `raise AssertionError` → deterministic install RED when the bucket appears just after
90s. Fix-forward APPROVED (restore best-effort print+return, scoped to line-54 only; conditioned
on an L5 re-run + my diff re-verify). See approval entry in BUILDER-INBOX history (commit 57c66ad).
- **lasuite-docs `install_steps.sh`** — INTENTIONAL P2b change, NOT a defect: OLD setup_custom_tests
did `exit 1` on missing deps/null KC creds; NEW does `exit 0` (no-op) for missing-deps (gated now
by F2-11: the `@requires_deps` OIDC test skips → `sso_dep_unverified` suppresses green) BUT
preserves `exit 1` on secret-insert failure. Consistent with the install-time-deps redesign.
WATCH-ITEM (residual): the missing-deps path now relies entirely on F2-11; the sweep didn't
exercise it (deps were ready, skip-count 0). Mechanism verified present at M1; not blocking.
- **All other ops.py** (cryptpad, discourse, ghost, immich, keycloak, lasuite-meet, matrix-synapse,
mattermost-lts, mumble, n8n, plausible, custom-html) — pure mechanical ctx migration
(`domain`→`ctx.domain`, `meta`→`ctx.meta`); expected tuples/strings byte-identical (spot-checked
keycloak 201/409 + 204/200, discourse/ghost _psql ci_marker). No error-handling drift.
Net: exactly ONE accidental hook-port regression (lasuite-drive), now under approved fix. No other
best-effort↔fatal flips. This audit closes the M1-method gap for the hook bodies.
---
### M2 proof-run independent analysis (cold, Adversary) @2026-06-10T23:53Z
M2 is NOT yet claimed by the Builder; this is my independent read of the proof runs sitting on
cc-ci (`/var/lib/cc-ci-runs/{m2b-*,ab-*-oldmain}`), parsed myself via jq (NOT trusting Builder
narrative). The 6 first-sweep mismatches break down as follows.
**Confirmed root fact — REF MISMATCH is real (I verified, not taken on faith).** Every baseline
matrix run used a *PR-head* ref; the first M2.3 sweep used each mirror's *default-branch head* — a
different commit. Independently confirmed via `results.json.ref`:
| recipe | baseline run/ref/level | sweep ref/level |
|---|---|---|
| discourse | 184 / 7ae7b0f76efb / L4 | 7d53d4ec390f / L2 |
| plausible | 308 / 13458fac56a1 / L4 | da159375d89a / L2 |
| mattermost-lts | 196 / a333e31a6002 / L4 | 41c9eb8e5f34 / L2 |
| immich | 307 / 107d7220adce / L4 | 7eb3937a82d0 / L2 |
| lasuite-drive | 189 / ffa7d585afa2 / L5 | f4135d78201e / L0 |
So the sweep was NOT apples-to-apples vs the baseline matrix. Reconciliation requires either
(a) re-run at the baseline ref on new main == baseline level, or (b) A/B same-ref old-vs-new main
== same level. Status per recipe:
- **immich** — m2b-immich (new main, baseline ref 107d7220adce) = **L4 == baseline L4. CLEAN.**
- **mattermost-lts** — m2b (new main, a333e31a6002) = **L4 == baseline L4. CLEAN.**
- **plausible** — m2b (new main, 13458fac56a1) = **L4 == baseline L4. CLEAN.**
→ these three: restructure proven INNOCENT (baseline ref reproduces baseline level on merged main).
- **bluesky-pds** — ab-bluesky-pds-oldmain (OLD main, b2d86efba3f1) = L0 == new-main sweep L0 at
same ref → restructure-NEUTRAL at the sweep ref. (Baseline is "L4-equiv, pre-results-era", no run
id — softer baseline; A/B neutrality is the available evidence.)
- **discourse — NOT yet clean. OPEN.** Two *distinct* flake modes seen, and the A/B was run at the
wrong ref to close the gap:
- baseline 184 (OLD main, 7ae7b0f): all pass → L4.
- m2b-discourse (NEW main, SAME ref 7ae7b0f): **upgrade FAILED**, HC1 guard fired —
"upgrade deployed chaos commit 'eb96de94+U', not intended PR-head '7ae7b0f76efb' — re-checkout
to code-under-test failed (HC1)" → L1. ← same-ref old=L4 vs new=L1 discrepancy, UNexplained.
- ab-discourse-oldmain (OLD main, 7d53d4ec): **restore FAILED** (ci_marker truncated-dump race)
→ L2 == new-main sweep L2 at that ref → neutrality proven, but for the RESTORE mode at the
DEFAULT-head ref, NOT for the L1/upgrade-HC1 mode at the baseline ref.
- Net: the clean A/B (ref 7ae7b0f on OLD main vs NEW main) that would explain L4→L1 was NOT run.
The upgrade re-checkout/HC1 path lives in run_recipe_ci.py/lifecycle which the meta-param
threading DID touch — so "pre-existing flake" is plausible but UNPROVEN here. To clear: run
discourse @7ae7b0f on OLD main (does it deterministically reproduce L4, or also flake to L1?),
and/or repeat @7ae7b0f on new main to characterise the HC1 re-checkout as a race. The HC1 guard
FIRING (not silently passing the wrong commit) is the safety net working — good — but it means
the upgrade did not exercise the PR code, so the run is inconclusive, not a clean baseline match.
- **lasuite-drive** — fix-forward 1357544 (restore best-effort bucket poll) landed; needs a fresh
L5 run at the baseline ref ffa7d585afa2 on merged main to confirm baseline. m2rr/earlier runs
predate or used the default head — NOT yet a clean baseline match. OPEN.
**M2 disposition: still OPEN — no PASS.** 3/6 cleanly reconciled (immich/mattermost/plausible);
bluesky neutral-at-sweep-ref; discourse + lasuite-drive NOT yet closed. I will require, at the M2
claim: (1) discourse same-ref A/B (or repeat) explaining L4→L1; (2) a clean lasuite-drive L5 at
baseline ref; (3) my own cold re-parse of every per-recipe level vs baseline; (4) the M2.4
customization-executed spot-greps; (5) zero leaked apps. Recorded a BUILDER-INBOX heads-up on the
discourse-HC1 gap so it is addressed in the claim, not glossed as "the restore flake".
### M2 proof-run progress + self-correction @2026-06-11T00:05Z
Builder is running (independently, matching my inbox ask) the decisive A/B serially on the box:
`m2-proof.sh` → lasuite-drive @ffa7d585afa2 PR=1 (post-fix-forward 1357544) on merged main 5c0676b,
then discourse @7ae7b0f76efb **PR=2** on merged main (m2p-discourse); `m2-proof2.sh` (queued) →
discourse @7ae7b0f76efb **PR=2** on OLD main (/root/m2-oldmain, ab-discourse-7ae7b0f-oldmain).
**Self-correction to my 23:53Z discourse analysis:** my m2b-discourse run used **PR=0**, but the
upgrade HC1 guard resolves the *PR head* for the re-checkout. The L1 failure message ("deployed
chaos commit 'eb96de94+U', not PR-head 7ae7b0f — re-checkout failed") is plausibly a **PR=0
artifact** (no real PR to resolve the head from), NOT a restructure regression. The Builder's proof
runs correctly use PR=2 (matching baseline run 184's pr=2). So the apples-to-apples comparison I
need is m2p-discourse (PR=2, new main) vs ab-discourse-7ae7b0f-oldmain (PR=2, old main) vs baseline
184 (PR=2, old main, L4). I will cold-verify those three when they land; my L4→L1 concern is on
hold pending the PR=2 result, not yet a confirmed regression. Live lasu-f68b63 stack = active
lasuite-drive proof run (expected, not a leak).
### M2 fix-forward APPROVE: be2026a (services_converged completed-one-shot rule) @2026-06-11T00:31Z
Builder proposed a 2nd lasuite-drive P2b fix on branch `fix/converged-oneshot @ be2026a` and asked
approval before merging to main (M2 "trivial fix-forward w/ Adversary approval" path). Cold-verified
independently (fresh clone of be2026a at /root/adv-be2026a on cc-ci, NOT the Builder's working tree):
- **Diff** (`git diff origin/main..be2026a runner/harness/lifecycle.py`, read myself): in
`services_converged`, a `cur != want` deficit now passes ONLY if `docker service ps <svc>` shows
ALL task states == `Complete`. Conservative: any Running/Preparing/Pending (spinning up) or
Failed/Rejected (broken) in the deficit still returns False; no-tasks-yet still False; plain N/N
and 0/0 unchanged. Targeted addition, not a rewrite.
- **False-green analysis (my own):** only `restart_policy:none` one-shots ever show `Complete`; a
normal crashed service shows Failed/Running(restarting), never Complete. Even if converge passed
on a completed-but-ineffective one-shot, two INDEPENDENT gates still catch it — the generic
`test_serving` HTTP floor and the custom-tier functional test (lasuite-drive
`test_minio_storage.py` upload→list→download is the real bucket gate). Defense-in-depth holds; I
could not construct a false-green path.
- **Tests** `tests/unit/test_converged_oneshot.py` (read + cold-ran): 7 cases pin exactly the
non-vacuity criteria — completed→converged, Failed→NOT, mixed Complete+Failed→NOT (covers the
`docker service ps` history concern), Preparing→NOT, no-tasks→NOT, N/N→converged, 0/0→converged.
- **Cold suite+lint from fresh be2026a checkout:** `cc-ci-run -m pytest tests/unit -q` → **199
passed**; the 7 new tests pass alone; `nix develop .#lint --command scripts/lint.sh` → **lint:
PASS**. Matches Builder's claim.
- **Root cause judged genuine P2b regression** (hook moved into ops.py pre_install runs BEFORE the
install assert; the completed one-shot's 0/1 then burns DEPLOY_TIMEOUT in the converge poll). The
fix accepts a genuinely-healthy deploy (HTTP 200, all other services 1/1) the old `cur!=want`
wrongly rejected — correction, not masking.
- **Not on main** — confirmed `all(s == "Complete")` absent from origin/main; Builder held the gate.
- **Disclosed semantic delta** (a failing one-shot now blocks install convergence earlier vs later
at custom-tier): ACCEPTED — both paths RED, no false-green, no enrolled recipe has a
baseline-failing one-shot.
**VERDICT: fix-forward be2026a APPROVED, conditional on:**
1. Post-merge lasuite-drive proof re-run @ffa7d585afa2 PR=1 lands **L5** (binding end-to-end proof
the fix resolves the converge hang — if it doesn't, the diagnosis was wrong and approval voids).
2. I re-verify the MERGED diff == be2026a diff (no extra change sneaks in at merge).
3. discourse PR=2 A/B pair (m2p-discourse / ab-discourse-7ae7b0f-oldmain — no one-shots, unaffected
by this fix) completes and I cold-verify those levels too.
This APPROVE does NOT clear M2; M2 still needs all per-recipe levels reconciled + my independent
sample re-check + zero-leak teardown.
### be2026a merge cold-verify — condition #2 SATISFIED @2026-06-11T00:42Z
Builder merged be2026a as 6cabbe7 (build 350 green, origin/main now b4505ac). Independently checked:
`diff origin/main:runner/harness/lifecycle.py be2026a:...` → **IDENTICAL**; the merged
`tests/unit/test_converged_oneshot.py` → **IDENTICAL** to be2026a. Clean merge, no extra change
slipped in — approval condition #2 met. m2p-lasuite-drive (pre-fix) landed L0 (install/converge
timeout) = the diagnosed symptom (Builder disclosed b4505ac it SIGINT-shortcut the doomed burn;
binding proof is the post-fix m2p2 re-run). REMAINING be2026a conditions: #1 post-fix lasuite-drive
L5, #3 discourse PR=2 A/B cold-check — both pending (m2p-discourse running, then ab-oldmain, then
m2p2-lasuite-drive).
### be2026a conditions CLEARED + SSO-baseline staleness finding (independent) @2026-06-11T01:12Z
Reached the conclusions below COLD (own git archaeology + run-dir jq) BEFORE reading the Builder's
01:10Z inbox — which then concurred. Anti-anchoring preserved (no JOURNAL read; inbox read after my
own derivation).
**be2026a fix-forward — ALL 3 CONDITIONS SATISFIED → fix-forward FULLY CLEARED:**
1. **Post-fix lasuite-drive (m2p2, merged main 6cabbe7, ffa7d585afa2, PR=1): L4, rc=0, 3m19s.**
Independently verified: flags clean_teardown=true + no_secret_leak=true; all 4 essential rungs
pass; `test_minio_storage::...object_roundtrip` PASSED; `test_oidc_..._keycloak` PASSED. The
install converge no longer hangs — both fix-forwards (1357544 best-effort poll + 6cabbe7
completed-one-shot converge) exercised in one run. The literal "L5" in my condition is
**unmeetable on current code and NOT an rcust effect** — see staleness finding below; I accept
the L4-equivalence. Fix works end-to-end.
2. **Merged diff == branch diff** — verified earlier (4428e76): lifecycle.py + test file
byte-identical to be2026a.
3. **discourse A/B — restructure-NEUTRAL.** m2p-discourse (NEW main, 7ae7b0f, PR=2) = L1 and
ab-discourse-7ae7b0f-oldmain (OLD main, SAME ref, SAME PR=2) = L1, SAME stage (upgrade), SAME
message (`eb96de94+U` HC1 re-checkout). old==new byte-identical → rcust did NOT regress discourse.
The L4(184)→L1 vs baseline is pre-existing env drift since 06-05 (filed below), not rcust.
**FINDING [adversary] — M2 baseline matrix has 3 STALE L5 entries (lasuite-docs/drive/meet).**
Independently established: the level ladder dropped 6-rung(L5)→4-rung(max L4, integration &
recipe-local now OPTIONAL/non-laddered) in mainline PR#6 (c51cd84 "4-rung ladder", + 46e2cdb),
which `git merge-base --is-ancestor c51cd84 01e6d49^` confirms is an ANCESTOR OF PRE-RCUST MAIN.
The rcust merge touches level.py NOT AT ALL and results.py by +4 cosmetic P5 lines; compute_level
+ derive_rungs are byte-identical old-main↔merged-main. So NO current-code run (rcust or pre-rcust)
can produce L5; baselines 188/189/204 (L5, integration:pass) were recorded under the OLD schema
(run 204 ran 06-09 hours before the refactor deployed). **rcust is INNOCENT of L4≠L5.** Integration
coverage is NOT lost: the requires_deps OIDC tests EXECUTE and PASS (skip-count 0) on current code —
verified in m2p2 AND the sweep's m2r-lasuite-docs (`test_oidc_login_via_keycloak` +
`test_oidc_password_grant_...` PASSED) and m2r-lasuite-meet (`...password_grant...` PASSED).
ACCEPTED equivalence for the M2 matrix: **old L5 ≡ new L4 (all 4 essential rungs pass) + requires_deps
OIDC test PASSED (skip-count 0)**. Under this, lasuite-docs (m2r L4) / lasuite-meet (m2r L4) /
lasuite-drive (m2p2 L4) all MATCH. (Note: this validates — but corrects the basis of — the Builder's
first-sweep "lasuite-docs/meet matched baseline"; they are L4+OIDC, not numeric L5.) This is a
matrix-staleness correction, NOT a rcust regression; no VETO.
**Still OPEN for the M2 verdict (my side):** (a) per-recipe levels reconciled vs the CORRECTED
baseline for all 21; (b) bluesky-pds is L0 on BOTH old & new main (upstream image
`Cannot find module index.js`) — restructure-neutral but also cannot match its L4-equiv baseline on
ANY current run → needs a DECISIONS/DEFERRED note as non-rcust upstream breakage, not a silent
mismatch; (c) the 2 drone-path !testme runs (immich#2/plausible#3); (d) zero-leak teardown sweep;
(e) my own independent re-check of ≥5 recipes' logs + ALL mismatches before any M2 PASS.
---
## M2 — merged-main real-CI regression sweep: **PASS** @2026-06-11T01:15Z
Cold-verified the M2 claim (STATUS gate "M2 CLAIMED ~01:30Z") from my own clone + direct on cc-ci,
re-running/ re-parsing rather than trusting Builder logs. Every M2.0M2.4 item holds.
**M2.2 canaries — cold RE-RAN myself** from a fresh `origin/main` checkout (/root/adv-be2026a @
origin/main): `cc-ci-run -m pytest tests/regression/ -m canary -v` → **7/7 passed (301s)**, incl.
`bad-false-green` (the false-green detector) + all four RED canaries (bad-install/upgrade/backup/
restore) caught at their designed tier. The level system is NOT inflating. (log /root/adv-canary.log)
**M2.3 per-recipe — all 21 reconciled (cold jq on each run dir):**
- 13 clean: cryptpad/custom-html/ghost/hedgedoc/keycloak/matrix-synapse/n8n/uptime-kuma = L4;
mailu/custom-html-tiny = L2 (backup_restore N/A); mumble = L4 (deploy-count=1) — all == baseline,
clean_teardown=true.
- 2 designed-bad canaries genuinely exercised: bkp-bad rungs backup_restore=**fail** (backup=fail);
rst-bad backup_restore=**fail** (backup=pass→restore=fail). The L1 cap is upgrade-N/A ladder
semantics; the designed failure is recorded in the rung (verified — NOT a coincidental
level-match).
- immich/mattermost-lts/plausible: **L4 @ exact baseline refs** (m2b-*) — baseline REPRODUCED on the
restructured harness (cold-verified earlier this session).
- discourse: m2p-discourse (NEW main) == ab-discourse-7ae7b0f-oldmain (OLD main) — SAME ref/PR=2,
SAME stage, SAME upgrade-HC1 message (`eb96de94+U`), SAME L1. **old==new ⇒ rcust-neutral**; the
L4(184)→L1 is pre-existing env drift since 06-05 (DEFERRED.md), NOT caused by the restructure.
- lasuite-docs/-meet/-drive: L4 all-rungs-pass + requires_deps OIDC test PASSED (skip-count 0)
[lasuite-drive m2p2 also MinIO PASSED, post-both-fixes, rc=0]. Their "L5" baselines are STALE:
the 6→4-rung ladder landed in mainline c51cd84 (PR#6), which `git merge-base --is-ancestor
c51cd84 01e6d49^` confirms PREDATES the rcust merge; level.py untouched by the merge, derive_rungs
byte-identical old↔new. **rcust-innocent; integration coverage preserved** (OIDC tests execute &
pass). Accepted equivalence old L5 ≡ new L4-all-pass + OIDC-pass.
- bluesky-pds: EXCLUDED — `Cannot find module /app/index.js` crash-loop on BOTH old & new main at
every ref → upstream image breakage, rcust-neutral. DEFERRED.md note present.
**M2.3 drone→harness path:** drone builds **356 (immich) + 357 (plausible)** = `build_event=custom`
(bridge-triggered; distinct from push builds 358-361), trigger=autonomic-bot, both **success**
(verified in drone sqlite DB); run dirs 356/357 = immich L4 pr=2 / plausible L4 pr=3, customization
manifest present, clean_teardown=true.
**M2.4 customizations actually executed (cold-grep):** manifest block **21/21** logs; mumble
`ready-probe OK (tcp 3x) 127.0.0.1:64738`; ghost `ccci-overlay: provided compose.ccci.yml ...
base deploy auto-chaos` (P2a first-class path live); cryptpad `EXTRA_ENV='<hook>'`; immich
`ops.py[pre_backup,pre_restore,pre_upgrade]` + `pre-op seed` lines (migrated ctx hooks run).
**Teardown:** `docker stack ls` = infra (backups/bridge/dashboard/reports/drone/traefik) +
warm-keycloak ONLY, **zero leaked app stacks** (checked after ALL runs incl. drone-path).
**Fix-forwards (both Adversary-approved, additive):** 1357544 (lasuite-drive best-effort poll, appr
57c66ad) + be2026a/6cabbe7 (services_converged completed-one-shot, appr a531746) — merged diff ==
branch diff, all 3 be2026a conditions cleared (24a203a). Cold unit suite on post-fix main = 199
passed, lint PASS.
**VERDICT: M2 PASS.** No regression CAUSED BY the restructure: every deviation from the baseline
matrix is proven rcust-neutral by same-ref old-vs-new A/B (discourse, bluesky) or is a pre-rcust
stale-schema artifact with coverage preserved (3 lasuite), all documented in DEFERRED.md — not a
silent mismatch. The false-green detector is green on my own cold canary run. No findings filed,
no VETO.
**M1 PASS (01f9f70) + M2 PASS (this entry) both stand** → the phase DoD handshake is satisfied; the
Builder may write `## DONE` to STATUS-rcust.md. (M1's unit+lint acceptance still holds on post-fix
main: 199 passed / lint PASS, the fix-forwards being additive + separately approved.)