Compare commits

..

73 Commits

Author SHA1 Message Date
3d8d286cf3 chore(lvl5): ruff format lint.py
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:49:47 +00:00
1d3b61c6c2 fix(lvl5): lint table parser — abra renders HEAVY box verticals (┃ U+2503); accept both; meta registry EXPECTED_NA/BACKUP_CAPABLE wording → regenerated doc table
Some checks failed
continuous-integration/drone/push Build is failing
Found by real-abra smoke on cc-ci: hedgedoc clean → pass; +lightweight tag →
fail R014. Full suite 246 passed on cc-ci venv.
2026-06-11 07:49:29 +00:00
af7488a498 docs(lvl5): results-ux.md → 5-rung de-capped ladder + schema 2; recipe-customization.md EXPECTED_NA/BACKUP_CAPABLE rows to new semantics
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:45:18 +00:00
392f7df48f decisions(lvl5): level-semantics de-cap record, N/A classification table, lint mirror-context decision
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:43:25 +00:00
e219a7891d feat(lvl5): P1 — 5-rung ladder (L5=abra recipe lint) + de-capped level semantics
All checks were successful
continuous-integration/drone/push Build is passing
level.py: RUNGS += lint; statuses {pass,fail,skip,unver}; compute_level = max passed
rung with all below pass-or-skip (fail/unver block); cap_reason/capped DELETED.
harness/lint.py: lint executor — pristine scratch clone of the per-run tree at the
exact tested ref (mirror-origin + untracked-overlay pollution solved by context, no
rule filtered), PTY via script -qec, 60s hard budget, lint.txt artifact, table-parse
classifier (rc only signals FATA), unver on any non-run (never silent pass).
results.py: derive_rungs classifies every N/A source (structural/declared → skip,
else unver), lint rung + synthetic lint stage + lint block in results.json, schema 2,
cap fields removed. run_recipe_ci.py: lint call before tiers (double-wrapped,
verdict-neutral), badge = level only. card/dashboard: 0-5 ramp, cap line → 'level N
of {4|5}', unverified rows, badge number+colour only, lint.txt servable, old schema-1
artifacts render untouched. Unit suite rewritten: 245 passed on cc-ci venv.
2026-06-11 07:42:30 +00:00
df301a5917 status(lvl5): phase open — state files bootstrapped, orientation done, probing abra lint next
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:22:53 +00:00
4822115b2b status(shot): ## DONE — M1 (ae10b55) + M2 (2b54adb) both PASS, A1 closed, no VETO; phase complete
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:19:09 +00:00
2b54adbe46 review(shot): M2 PASS — all 19 enrolled cold-verified. 18/18 final PNGs Read (real, representative, credential-free; every login/setup form EMPTY-field, mattermost real login NOT interstitial, keycloak/immich/etc SPA paint-race fixed); no verdict/level regression (all pass at baseline); 2 GENUINE drone !testme (370 immich#2 comment 14321 + 371 plausible#3 comment 14322, bridge-triggered per ccci-bridge logs, NOT manual); durations 199→198/209→166 no balloon; R7 intact (call site outside-deploy+double-wrapped+untouched by shot phase, capture swallows, 60s budget); dashboard/screenshot/badge live 200; screenshot 12/12 + card 10/10 unit tests GREEN cold on real harness; no_secret_leak=true. bluesky N/A re-confirmed; mumble N/A-variant AGREED (reverses M1 on new evidence: connect-dialog DOM absent + perpetual spinner). A1 closed. No VETO — DoD handshake satisfied, Builder may write ## DONE.
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:18:05 +00:00
196156e497 claim(shot): M2 — all 19 recipes OK or documented-N/A (bluesky-pds upstream-broken; mumble best-available loader + DEFERRED); fixes on main (harness settle+keep-larger retry, plausible 62→68ch SECRET_KEY_BASE root-cause, mattermost click-through hook); 10 fresh proof runs incl drone !testme 370+371, levels=baselines, durations 198/166s vs 199/209s; every PNG Builder-Read, credential-free; dashboard/card/badge verified
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:06:04 +00:00
2b2a7ba823 status(shot): M2 evidence assembled — P3/P4 ledgers complete, proof table, durations, dashboard checks
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:05:52 +00:00
6104a9970d chore(shot): DEFERRED — mumble-web client never paints for anonymous visitors (upstream question; loader frame is the honest web-surface view; voice fully tested via protocol tests)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 07:02:49 +00:00
3c33129ebd fix(shot): mattermost hook v2 — interstitial appears on ANY first-visit route incl /login (proven byte-identical PNG); click 'View in Browser' best-effort then settle; unit test covers click + no-interstitial fallback; 207 pass, lint PASS
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:45:43 +00:00
5fc86991dd review(shot): finding A1 CLOSED — fix 7ad7d1f re-verified cold by independent probe (filed case [9999,4801]->keeps 9999, no temp leak; 4 original cases intact; R7 preserved). 5/5 pass.
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:33:02 +00:00
58d3505ea7 journal(shot): proof sweep progress + A1 fix + mumble probe plan
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:32:42 +00:00
7ad7d1f20d fix(shot): A1 — blank-retry keeps the LARGER frame (retry snapped to temp path, os.replace only if >= first; worse late frame discarded + temp cleaned); regression test [9999,4801]->9999; 207 unit tests pass, lint PASS
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:24:01 +00:00
ea0e3e9d2f review(shot): finding A1 [adversary] — blank-retry overwrites unconditionally, can REGRESS a larger frame (9999B->4801B) to a worse one; LOW/non-blocking (R7 holds, visual M2 check is backstop); trivial max(first,retry) guard suggested. Independent cold probe, 9/9 R7 checks otherwise pass.
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:20:12 +00:00
80e5713c5c feat(shot): mattermost-lts SCREENSHOT hook → /login (default lands the desktop-or-browser interstitial; watch-list wants the real sign-in form) + public screenshot.settle() for hooks; unit test via real loader; 206 unit tests pass, lint PASS
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:19:39 +00:00
b8414a8fdb journal(shot): plausible root-cause story + P4 proof-run kickoff
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:00:11 +00:00
b98a471dac fix(shot): plausible SECRET_KEY_BASE 62→68 chars — Phoenix cookie store requires >=64 bytes, so EVERY HTML render 500'd (the real cause of screenshot:null on all runs; /api/* unaffected which is why tiers passed). Default capture now lands the real registration page; verified: shot-fix-plausible run install=pass, screenshot.png 64132B real form, no hook needed
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
2026-06-11 05:55:43 +00:00
ce50f641cc feat(shot): harness default capture fix — bounded networkidle settle after domcontentloaded + blank-frame retry (≤60s wait budget, R7 best-effort preserved); 6 unit tests; lint PASS, 205 unit tests pass via cc-ci-run
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:31:03 +00:00
ae10b553b0 review(shot): M1 PASS — audit matrix 19/19 cold-verified (enrolled set complete, no omissions), all non-OK root-causes evidence-backed (plausible 500-by-design via drone build-357 log; bluesky deploy-gated; BLANK/LOADING=domcontentloaded paint race; mumble NOT N/A via mumble-web), 11 PNGs independently Read incl plausible+multiple 4801B, every matrix read matched reality. N/A args agreed (bluesky justified, mumble denied). No VETO.
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:29:55 +00:00
e005897cb9 claim(shot): M1 — audit matrix 19/19 (every PNG visually inspected), all non-OK rows root-caused with evidence (plausible 500-by-design via drone build-357 log; blank/loading = domcontentloaded paint race, 4801B fingerprint; bluesky-pds deploy-gated N/A; mumble NOT N/A), N/A candidates argued
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:26:50 +00:00
8978fa6ae3 status(shot): phase open — P1 audit matrix complete (19/19 recipes, every PNG visually inspected) + P2 root causes (plausible /-500s-by-design via build-357 log; blank/loading = domcontentloaded paint race; bluesky-pds deploy-gated; mumble has real web UI; custom-html nginx-welcome is honest fresh-install content)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:26:23 +00:00
4f3a74759d review(shot): phase open — independent cold pre-audit ground truth (immich/n8n/cryptpad blank 4801-2B, keycloak/lasuite-docs loading-spinner, plausible null); awaiting M1 claim
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:19:52 +00:00
1bcb2ed8fe status(rcust): ## DONE — M1 (01f9f70) + M2 (3245150) both PASS, no VETO; phase complete
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:16:27 +00:00
3245150982 review(rcust): M2 PASS — merged-main regression sweep cold-verified. Canaries 7/7 (re-ran myself incl. false-green detector); all 21 recipes reconciled (every baseline deviation proven rcust-neutral via same-ref old-vs-new A/B or stale-schema w/ coverage preserved, all in DEFERRED); drone-path 356/357 custom success; customizations execute (manifest 21/21, mumble tcp, ghost overlay+chaos, immich seeds); zero leaks; both fix-forwards cleared. M1+M2 both PASS → DoD handshake satisfied, Builder may write ## DONE. No VETO.
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:15:45 +00:00
f7b9b6f167 status(rcust): Current section → M2 CLAIMED
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:07:13 +00:00
d7f85c3f28 claim(rcust): M2 — merge+2 approved fix-forwards green, canaries 7/7, 21/21 reconciled vs corrected baseline (3 lasuite via accepted L5≡L4+OIDC equivalence, bluesky-pds justified exclusion), drone path covered (356/357), zero leaks
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:06:48 +00:00
89dec5188f inbox(rcust): consumed 01:12Z be2026a-cleared note; bluesky-pds filed in DEFERRED.md as non-rcust upstream image breakage (justified M2 exclusion, A/B-proven harness-neutral)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:00:32 +00:00
24a203a098 review(rcust): be2026a fix-forward CLEARED (all 3 conditions met, independently verified) + ACCEPT L5≡L4+OIDC-pass equivalence — lasuite-* L5 baselines stale (c51cd84 4-rung predates rcust, git-proven), rcust innocent, OIDC coverage preserved. Consumed 01:10Z inbox. M2 still open: bluesky upstream-breakage note, drone-path runs, zero-leak, my sample re-check
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:59:29 +00:00
f359069d40 inbox(rcust): m2p2 GREEN rc=0 3m19s (both fix-forwards exercised end-to-end; OIDC+MinIO pass) — level=4 vs condition-1 'L5' explained: 6-rung ladder removed on MAINLINE 06-09 (46e2cdb/c51cd84 PR#6) pre-merge; equivalence proposed (L4 all-pass + requires_deps OIDC PASSED)
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
2026-06-11 00:57:12 +00:00
a13a83a775 status(rcust): discourse A/B CLOSED — old==new byte-identical upgrade-HC1 at baseline ref+invocation (harness-neutral, env drift since 06-05; branch-tip/tag/abra-pin drift eliminated); m2p2 lasuite-drive binding proof started
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:51:10 +00:00
4428e76f48 review(rcust): be2026a merge cold-verified — merged lifecycle.py + test file byte-identical to branch (condition #2 met); m2p-lasuite-drive L0 = diagnosed pre-fix symptom; awaiting discourse A/B + post-fix L5
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:42:54 +00:00
b4505acbbd status(rcust): disclosed SIGINT shortcut of doomed m2p overlay install burn (KeyboardInterrupt at the diagnosed converge line); m2p2 is the binding proof
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:39:44 +00:00
9715ab5c50 status(rcust): be2026a merged as 6cabbe7 (build 350 green on 914c166); m2p2-lasuite-drive post-fix proof queued behind discourse A/B
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:38:06 +00:00
914c1663b5 inbox(rcust): consumed 00:31Z conditional APPROVE — merging be2026a, post-merge lasuite-drive re-run queued behind discourse A/B pair
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:33:07 +00:00
6cabbe73b7 fix(harness): merge fix/converged-oneshot @ be2026a — services_converged completed-one-shot rule (rcust M2 fix-forward #2, Adversary-approved a531746) 2026-06-11 00:33:07 +00:00
a531746e53 review(rcust): APPROVE fix-forward be2026a (services_converged completed-one-shot rule) — cold-verified diff+7 tests+199 unit+lint on fresh checkout, no false-green path (HTTP floor + minio custom test independent); conditional on post-merge lasuite-drive L5 + merged-diff==branch-diff + discourse PR=2 A/B cold re-check. Consumed 00:40Z inbox
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:31:54 +00:00
49d796d9ac status(rcust): m2p-lasuite-drive WILL land L0 — second P2b regression (completed one-shot 0/1 vs services_converged) root-caused live; fix on branch be2026a awaiting approval
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:28:33 +00:00
73421dabb4 inbox(rcust): lasuite-drive SECOND P2b regression root-caused live (completed one-shot 0/1 poisons services_converged after hook moved pre-assert) — fix-forward on branch fix/converged-oneshot @ be2026a, 199 unit + lint green, awaiting approval
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:27:49 +00:00
be2026aafb fix(harness): services_converged — a replica deficit explained entirely by Complete tasks is converged (triggered one-shot, rcust M2 lasuite-drive root cause)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:26:53 +00:00
77a9415b37 inbox(rcust): consumed Builder 00:20Z reply — proof runs confirmed queued; m2b-discourse/sidekiq/bluesky facts noted for independent cold-verify (not taken on trust)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:06:42 +00:00
4dcfb5ba96 review(rcust): M2 proof in flight — Builder running discourse PR=2 A/B (new vs old main) + lasuite-drive post-fix; self-correct my m2b L1 finding (PR=0 confound on HC1 re-checkout) — awaiting PR=2 results to cold-verify
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:06:16 +00:00
1ec0e772e8 inbox(rcust): consumed 23:53Z asks — lasuite-drive proof RUNNING, discourse same-ref 2x2 queued (new-main PR=2 + old-main PR=2 @7ae7b0f); m2b-discourse HC1 facts pinned (re-checkout persisted, eb96de94=base tag, sidekiq line benign); bluesky-pds = upstream image breakage (MODULE_NOT_FOUND x3, harness-neutral)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:06:13 +00:00
40b59b356b review(rcust): M2 proof-run cold analysis — 3/6 (immich/mattermost/plausible) reproduce baseline L4 at baseline ref on merged main (restructure innocent); discourse L4->L1 upgrade-HC1 at baseline ref UNexplained (A/B was at wrong ref) + lasuite-drive needs fresh L5 post-fix-forward; M2 OPEN
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 23:54:36 +00:00
5c0676b7d0 note(rcust): M2-prep hook-port audit — only lasuite-drive flipped best-effort->fatal (fix approved); lasuite-docs exit1->exit0 is intentional P2b (F2-11-gated); all other ops.py pure mechanical ctx migration. Closes M1-method gap (key-diff missed hook bodies)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:55:01 +00:00
efd7efc32b inbox(rcust): consumed 20:53Z approval — fix-forward pushed as 57c66ad; proof re-run at baseline REF queued behind tests 2+3
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:53:52 +00:00
1357544301 fix(tests): restore best-effort semantics of lasuite-drive pre_install bucket trigger (rcust M2 regression)
All checks were successful
continuous-integration/drone/push Build is passing
The P2b port of setup_custom_tests.sh -> ops.py::pre_install made the 90s bucket-poll timeout a
fatal AssertionError; the original shell hook fell through on timeout BY DESIGN (best-effort) and
the custom-tier MinIO storage test is the real gate for a genuinely missing bucket. Live evidence:
in both M2 sweep failures the bucket landed just after the window and every later tier including
the custom MinIO test passed. Warn loudly + continue, exactly the old semantics.

Adversary-approved fix-forward (REVIEW-rcust 57c66ad, scoped to this raise).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 20:53:31 +00:00
57c66add51 review(rcust): APPROVE lasuite-drive pre_install fix-forward (scoped to line-54 bucket-poll raise→best-effort; verified old=best-effort, custom MinIO test is real gate, no coverage loss); conditioned on L5 re-run + my diff re-verify. Auditing other shell->python hook ports for same drift
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:52:53 +00:00
a95fad4fa0 inbox(rcust): lasuite-drive P2b port regression root-caused (best-effort poll became fatal assert) — trivial fix-forward proposed, awaiting Adversary approval
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:50:31 +00:00
b9abf48116 inbox(rcust): consumed 20:33Z ACK — ref-mismatch independently confirmed; tests 2+3 concurred; proceeding
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:34:36 +00:00
4cb1f57e2c inbox(rcust): consumed Builder 20:35Z ref-mismatch heads-up + ACK — independently confirmed sweep ran default-branch heads (7d53d4ec/da159375) != baseline PR refs; concur tests 2+3 separate harness×content; will run own cold A/B at claim
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:33:56 +00:00
e30a414ce1 inbox(rcust): heads-up — restore cluster is a REF-mismatch vs baseline (sweep ran old default heads; baselines were PR-head runs); baseline-REF re-runs + old-main A/B queued
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:32:33 +00:00
41033b4500 inbox(rcust): consumed 20:15Z follow-up — restore cluster confirmed pre-existing, VETO threat withdrawn; proceeding to satisfy the 4 M2 PASS conditions (re-runs at baseline, canary+zero-leak, log sample, !testme x2)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:19:12 +00:00
a7a558ada3 note(rcust): M2 follow-up — confirmed restore cluster is the PRE-EXISTING truncated-dump race (documented in discourse BACKUP_VERIFY docstring on pre-merge 49fb818); VETO-threat withdrawn; stated M2 PASS conditions (re-runs at baseline + spot-checks)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:18:26 +00:00
37dcfab07d inbox(rcust): consumed Adversary 20:13Z restore-cluster heads-up — ACK: serial re-runs of all 6 already in flight (/root/m2-rerun-logs/, results m2rr-*); will ALSO run immich on OLD main (pre-merge c2508c7) serially in the same env as the requested A/B regardless of re-run outcome; no M2 claim until both legs are documented in STATUS
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:18:13 +00:00
ffc88848f3 note(rcust): M2 heads-up — restore-failure cluster (discourse/immich/plausible/mattermost ci_marker-missing) blocks M2 PASS; evidence says infra/pre-existing not restructure (restore orchestration unchanged, no BACKUP_VERIFY correlation, peers pass); suggest A/B vs old main (NOT a verdict)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:17:14 +00:00
85d14101ef status(rcust): M2 sweep first pass — canaries 7/7, 15/21 at baseline, 6 flake-shaped reds re-running serially; spot-grep evidence + zero leaks
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:14:05 +00:00
9aa0c5d624 status(rcust): fix stale Current section — M2 in progress
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 19:33:23 +00:00
4d342a2c5d status(rcust): M1 PASS — merged to main 01e6d49, push build 326 green; M2 canaries running, sweep driver staged
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 19:33:05 +00:00
01e6d497ba Merge branch 'restructure/recipe-custom' — recipe-customization restructure (rcust M1 PASS @858e0f5, REVIEW-rcust 01f9f70)
All checks were successful
continuous-integration/drone/push Build is passing
Single registry-backed meta loader, legacy key/path deletion, uniform ctx hooks, custom-test
placement rule + fixtures, customization manifest, docs. M2 real-CI regression sweep follows.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 19:28:38 +00:00
01f9f70970 review(rcust): M1 PASS @858e0f5 — cold unit 192+conc 23+lint PASS; coverage diff 0 real deltas/21 (mumble byte-identical, deleted keys all accounted); 18=18 asserts no weakening (no VETO); validation gaps closed; R2 delivered end-to-end; HC2/F2-11/generic-floor intact; manifest secret-redaction verified surgical. DONE still gated on M2 (real-CI sweep).
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 19:27:49 +00:00
c2508c7fd2 claim(rcust): M1 — P1–P6 complete on restructure/recipe-custom @ 858e0f5; unit 192 + concurrency 23 + lint PASS; baseline matrix committed
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 19:13:36 +00:00
8984b57b35 status(rcust): P6 complete (da558ca) + Adversary inbox consumed — manifest redaction landed (858e0f5); M1 prep starting
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 19:10:00 +00:00
5ccc0d1c34 note(rcust): interim pre-review of frozen P5 (68954be) — cold unit 191 + lint PASS reproduced; manifest exposes NO generated/real secrets (HC2-honoring, pure presentation); one non-blocking heads-up re plausible SECRET_KEY_BASE public-dummy on dashboard (NOT an M1 verdict)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 19:07:24 +00:00
52f5266dfb status(rcust): P5 complete on branch (68954be) — unit 191 green + lint PASS; starting P6
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 18:58:33 +00:00
270476beb3 note(rcust): interim pre-review of frozen P4 (29a28e2) — cold unit 184 + lint PASS reproduced; placement-rule claim holds (0 non-lifecycle top-level customs), HC2 intact, tests strengthened not weakened (NOT an M1 verdict)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 18:53:32 +00:00
ff09c4075b status(rcust): P4 complete on branch (29a28e2) — unit 184 green + lint PASS; starting P5
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 17:14:38 +00:00
63befd05b0 note(rcust): interim pre-review of frozen P3 — mechanical migration held (0 changed asserts), HookCtx complete, legacy-sig guard live-probed PASS, coverage diff still 0/21 (NOT M1)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 17:14:37 +00:00
802b2792a7 note(rcust): interim pre-review of frozen P1+P2 — fallout clean, typo gate PASS, coverage diff 0/21 deltas, validation gaps closed (NOT an M1 verdict; M1 unclaimed)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 17:11:41 +00:00
0264af72c7 status(rcust): P3 complete on branch (fd02d9f) — unit 180 green + lint PASS; starting P4
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 17:10:45 +00:00
8945d13674 status(rcust): P2 complete on branch (8cd72fd) — unit 175 green + lint PASS; starting P3
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 17:01:58 +00:00
f5119a9703 status(rcust): P1 complete on branch (472a68b) — unit 175 green + lint PASS; starting P2
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
2026-06-10 16:47:35 +00:00
34 changed files with 3199 additions and 402 deletions

18
BACKLOG-lvl5.md Normal file
View File

@ -0,0 +1,18 @@
# BACKLOG — Phase lvl5
## Build backlog
- [ ] B1 (P1) `level.py`: append rung `lint` (L5); new status vocabulary {pass, fail, skip, unver}; `compute_level()` → new formula (level = max i: rung_i pass ∧ ∀j<i status {pass,skip}); DELETE cap_reason/capped concepts.
- [ ] B2 (P1) lint executor (`harness/lint.py`): `abra recipe lint <recipe>` against the exact tested ref; hard ~60s timeout; rc+full output `lint.txt` artifact; pass/fail/unver classification (missing abra / timeout / exception unver, never pass, never skip); mirror-context handling per phase-plan §2.3 (probe abra behavior first; any filtering = named + unit-tested + DECISIONS.md).
- [ ] B3 (P1) `results.py`: wire lint into `derive_rungs` + explicit intentional-vs-unintentional classification of EVERY N/A source; drop level_cap_reason/level_cap_rung from schema; `skips()` reflects new statuses; orchestrator (`run_recipe_ci.py`) runs lint executor at the tested-ref point + passes result through; verdict-neutral (R7 wrap).
- [ ] B4 (P1) unit tests: rewrite test_level.py/test_results.py to new semantics incl. mission worked examples (fail-blocks L1; intentional-skip climbs L5; unver-blocks L2; lint unver L4; unclassifiable N/A unver default); lint executor tests; old-artifact rendering compat tests.
- [ ] B5 (P2) `card.py`: 05 color ramp; cap line removed ("level N of 5" neutral); rung table renders ✔/✘/intentional-skip/unverified; level_badge_svg loses cap_skip third segment (badge = number+color only); tolerate old artifacts.
- [ ] B6 (P2) `dashboard.py`: _LEVEL_COLOR 5-scale; _level_pill/badge SVG number-only; legend text; old results.json (cap_reason present, lint absent) render without KeyError.
- [ ] B7 (P2) docs: results-ux.md, testing.md, recipe-customization.md §EXPECTED_NA wording L5 ladder, de-cap semantics.
- [ ] B8 (P1) DECISIONS.md: semantics change record (replaces Phase-3 "N/A caps"); N/A classification table (every derive_rungs N/A source intentional|unintentional); mirror-filter decision for lint (if any filtering).
- [ ] B9 gate M1: claim (branch w/ P1+P2; clean tree; cold-verifiable).
- [ ] B10 (P3) lint sweep over ALL enrolled recipes (scratch clones never touch ~/.abra/recipes during builds); matrix here (pass/fail + rule hits); mechanical fixes mirror PRs (never push main/never merge); rest DEFERRED.md.
- [ ] B11 (P4) real-CI proofs: 1 genuine L5; 1 lint-blocked L4 (synth branch ok); 1 N/A-skip climb; 2× drone !testme; canary suite at re-derived designed levels; 1 synthesized unver-blocks run; before/after level table for ALL enrolled recipes; card/dashboard PNG/SVG visually verified.
- [ ] B12 gate M2: claim; then ## DONE after fresh PASS.
## Adversary findings

128
BACKLOG-shot.md Normal file
View File

@ -0,0 +1,128 @@
# BACKLOG-shot.md — phase `shot` (recipe screenshot audit & repair)
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md. Gates: M1 (audit+diagnosis), M2 (all OK / agreed N/A).
## Build backlog
### P1 — Audit matrix (status: complete, all 19 PNGs visually inspected 2026-06-11)
Enrolled set (19) = `tests/<r>/recipe_meta.py` minus fixtures (`_generic`, `regression`, `concurrency`,
`custom-html-bkp-bad`, `custom-html-rst-bad`). Evidence: `/var/lib/cc-ci-runs/<run>/` on cc-ci;
PNGs pulled to /tmp/shot-audit/ on the builder host and each one Read (visually).
| recipe | latest run w/ artifacts | screenshot field | PNG bytes | visual content (I looked) | class |
|---|---|---|---|---|---|
| bluesky-pds | ab-bluesky-pds-oldmain | null | — | no PNG; install=fail level=0 (upstream image breakage, rcust DEFERRED) → capture correctly skipped (`if deploy_ok`) | N-A-candidate (blocked upstream) |
| cryptpad | m2r-cryptpad | screenshot.png | 4802 | solid light-grey frame, nothing else | BLANK |
| custom-html | m2r-custom-html | screenshot.png | 35707 | "Welcome to nginx!" default page | OK? (diagnose: is this the recipe's true fresh-install content?) |
| custom-html-tiny | m2r-custom-html-tiny | screenshot.png | 12950 | seeded CI content ("cc-ci custom-html-tiny … DG5") | OK |
| discourse | m2p-discourse | screenshot.png | 66121 | real forum UI, welcome topic, Sign Up/Log In | OK |
| ghost | m2r-ghost | screenshot.png | 444183 | real blog landing ("Thoughts, stories and ideas") | OK |
| hedgedoc | m2r-hedgedoc | screenshot.png | 131967 | real landing (logo, Sign In, feature intro) | OK |
| immich | 356 | screenshot.png | 4801 | pure white frame | BLANK |
| keycloak | m2r-keycloak | screenshot.png | 8764 | spinner + "Loading the Administration Console" | LOADING |
| lasuite-docs | m2r-lasuite-docs | screenshot.png | 6022 | lone spinner on white | LOADING |
| lasuite-drive | m2p2-lasuite-drive | screenshot.png | 5895 | lone spinner on white | LOADING |
| lasuite-meet | m2r-lasuite-meet | screenshot.png | 4801 | pure white frame | BLANK |
| mailu | m2r-mailu | screenshot.png | 33800 | real sign-in page (empty fields) | OK |
| matrix-synapse | m2r-matrix-synapse | screenshot.png | 33296 | "It works! Synapse is running" landing | OK |
| mattermost-lts | m2b-mattermost-lts | screenshot.png | 242139 | brand splash/loading screen (logo on blue), NOT the login form | LOADING (borderline — brand-recognizable but a loading state) |
| mumble | m2r-mumble | screenshot.png | 7913 | spinner on grey — a web page IS served on the domain | LOADING (diagnose what serves it; N/A may NOT be justified) |
| n8n | m2r-n8n | screenshot.png | 4801 | off-white blank frame. Flaky: run 197 (30256 B) shows the real "Set up owner account" form (empty fields, credential-free) | BLANK (flaky) |
| plausible | 357 | null | — | no PNG on ANY run (122→357) | NULL |
| uptime-kuma | m2r-uptime-kuma | screenshot.png | 30858 | real "Create your admin account" setup form (empty fields) | OK |
PNG-size note: 4801/4802 B at 1280×800 is a byte-stable blank-frame fingerprint (3 different apps, same size).
### P2 — Root-cause diagnoses
- [x] **NULL — plausible** (evidence: Drone build 357 ci-step log, t=73s):
`screenshot: capture failed (non-fatal, verdict unaffected): page.goto(https://plau-b51425.ci.commoninternet.net/) never returned a status in (200, 301, 302, 303, 401, 403) after 15 attempts (45s); last status=500`.
Plausible's `/` 500s **by design** under `DISABLE_AUTH=true` (auth_controller; documented in
`tests/plausible/functional/test_health_check.py` docstring and recipe_meta — that's why HEALTH_PATH
is `/api/health`). Default landing-page capture can NEVER succeed → needs a per-recipe SCREENSHOT
hook to a path that actually renders (probe live: e.g. /login or /sites).
- [x] **NULL — bluesky-pds**: install fails (level=0) before the app is up → `if deploy_ok:` gate in
runner/run_recipe_ci.py:1024 correctly skips capture. Not a screenshot defect; upstream image
breakage already filed in machine-docs/DEFERRED.md (rcust). → documented N/A while upstream is broken.
- [x] **BLANK class — immich, lasuite-meet, n8n(flaky), cryptpad**: SPA paint race. capture() navigates
with `wait_until="domcontentloaded"` (runner/harness/screenshot.py:91) and screenshots immediately;
SPA shell HTML has loaded but JS hasn't painted → solid 4801-2 B frame. n8n flakiness = same race,
sometimes JS wins (run 197 captured the real form).
- [x] **LOADING class — keycloak, lasuite-docs, lasuite-drive, mumble, mattermost-lts(borderline)**:
same race, caught mid-paint (spinner/splash rendered, app JS still loading/connecting).
- [x] **mumble** web stack identified: recipe deploys a `web` service (mumble-web client) on the domain —
spinner is its connecting state; landing renders a connect dialog once JS settles. NOT an N/A.
- [x] **custom-html** nginx-welcome question: the recipe's fresh install genuinely serves the nginx
default page at `/` (no content seeded for this recipe's install; only custom-html-tiny seeds via
install_steps.sh). Screenshot is an honest representative view of a fresh install. → OK as-is.
### P3 — Fixes (all merged to main)
- [x] Harness default improvement (ce50f64 + A1 hardening 7ad7d1f): bounded networkidle settle
(10s) + 0.5s render grace after domcontentloaded; blank/spinner-frame detect (<10000 B) ONE
retry with 4s settle, larger frame kept (A1). Wait budget 45+10+0.5+4+0.5 = 60s, unit-tested.
8 new unit tests; 207 pass; lint PASS.
- [x] plausible NOT a hook in the end: the real root cause was EXTRA_ENV SECRET_KEY_BASE being
62 chars (<64-byte Phoenix cookie-store minimum) every HTML render 500'd. Fixed to 68 chars
(b98a471); default capture then lands the genuine registration page. Stale auth_controller
comments corrected (no assertion touched).
- [x] mattermost-lts SCREENSHOT hook (80e5713 + 3c33129): interstitial appears on ANY first-visit
route incl /login (proven byte-identical PNG) hook navigates /login, clicks "View in Browser"
best-effort, settles; lands the real login form. First real hook; public screenshot.settle().
- [x] keycloak / lasuite-docs / lasuite-drive / lasuite-meet / immich / cryptpad / n8n: fixed by
the harness default alone (no hooks needed proof PNGs below).
- [x] mumble: NOT fixable harness-side pinned mumble-web:0.5 client never paints UI for an
anonymous browser (≥90s DOM/console/network observation: no errors, no failed requests,
connect-dialog elements absent, no autoconnect overrides). Loader frame = the genuine anonymous
web view; voice (the recipe's function) fully covered by protocol tests. DEFERRED.md entry filed
(upstream question for the operator).
- [x] bluesky-pds: documented N/A while upstream image broken (rcust DEFERRED; Adversary-agreed at
M1, contingent re-check at M2 latest failing evidence ab-bluesky-pds-oldmain, 2026-06-11).
### P4 — Proof runs (fresh, post-fix; every PNG visually Read by Builder)
| recipe | proof run (dir on cc-ci) | level (baseline) | PNG B | visual |
|---|---|---|---|---|
| immich | 370 (drone !testme immich#2) | 4 (=356:4) | 234351 | real "Welcome to Immich" onboarding |
| plausible | 371 (drone !testme plausible#3) | 4 (=357:4) | 64132 | real registration form, empty fields |
| keycloak | shot-proof-keycloak | 4 | 215587 | real "Sign in to your account" form |
| cryptpad | shot-proof-cryptpad | 4 | 57310 | real landing + document-type picker |
| lasuite-meet | shot-proof-lasuite-meet | 4 | 225686 | real video-conferencing landing |
| lasuite-docs | shot-proof-lasuite-docs | 4 | 284769 | real Docs landing |
| lasuite-drive | shot-proof2-lasuite-drive | 4 | 132037 | real Drive landing |
| n8n | shot-proof-n8n | 4 | 26433 | real "Set up owner account", empty fields (now deterministic) |
| mattermost-lts | shot-proof3-mattermost-lts | 2 (=m2r:2) | 178367 | real "Log in to your account" form (hook v2) |
| mumble | shot-proof-mumble | 4 | 7980 | loader frame best-available (see P3/DEFERRED) |
Drone durations pre/post (same recipe+PR): immich 199s198s; plausible 209s166s (faster capture
no longer burns 45s failing). Healthy class (ghost, hedgedoc, discourse, custom-html,
custom-html-tiny, mailu, matrix-synapse, uptime-kuma): existing artifacts cited in P1 matrix, each
visually verified real + credential-free; no new runs needed per plan §3 P4.
Dashboard/card: grid thumbnails for runs 370/371 served 200, summary.html embeds screenshot.png,
/badge/immich.svg 200.
## Adversary findings
### [adversary] A1 — blank-retry can REGRESS a larger frame to a worse one (LOW, non-blocking) — CLOSED @2026-06-11T06:32Z
**CLOSED:** fixed in 7ad7d1f (retry snapped to a temp path; `os.replace` only if `retry >= first`,
else discard + cleanup in `finally`). Re-verified COLD with my own probe (not the Builder's test):
the exact filed case `[9999,4801]` now keeps **9999** (retry discarded, no temp leak); originals
intact (`[4801,30256]`30256, `[4801,4802]`4802, `[35707]`1 shot, `[5000,5000]`replace). 5/5 pass.
R7 contract preserved (retry-raise still propagates to capture's swallow None; first frame on disk).
--- original finding (for the record) ---
**Where:** `runner/harness/screenshot.py` `_snap_with_blank_retry` (ce50f64).
**What:** the retry overwrites `out_path` *unconditionally* with the second screenshot. The code/comment
claim "the retry only ever replaces a tiny frame with a later one" but *later ≠ better*. If the first
frame is e.g. 9999 B (a partial render, just under `BLANK_SIZE_BYTES=10000`) and the page regresses in the
extra 4 s settle (redirect, session-timeout splash, error overlay), the retry can yield a 4801 B blank that
**overwrites the better 9999 B frame**. The Builder's unit test only covers blankblank (48014802); the
biggersmaller regression is untested.
**Repro (cold, my independent probe, not the Builder's test file):** fake page returning sizes
`[9999, 4801]` `_snap_with_blank_retry` keeps **4801** (the worse frame).
**Severity:** LOW. R7 holds (cosmetic only, never affects verdict); my M2 per-PNG visual check is the
backstop any actually-blank final PNG will FAIL that recipe regardless. Filed for hardening, not a veto.
**Suggested guard (trivial, strictly safer):** keep the larger frame only overwrite if
`getsize(retry) >= getsize(first)` (or snap retry to a temp path and pick `max`). Then extend the unit
test with a biggersmaller case asserting the larger frame survives.
**Closes:** only I close this, after re-test. Non-blocking for an M2 claim, but I will re-check at M2.

19
JOURNAL-lvl5.md Normal file
View File

@ -0,0 +1,19 @@
# JOURNAL — Phase lvl5
## 2026-06-11 bootstrap
- Read plan-phase-lvl5-lint-rung.md in full + plan.md §6/§6.1/§7/§9. Phase files created.
- Orientation reads: level.py (RUNGS 4, compute_level gap-caps, backup_restore_status, tier_to_rung), results.py derive_rungs/build_results (cap fields at :215-229), card.py (LEVEL_COLOR 0-6!, cap line :246, level_badge_svg cap_skip third segment), dashboard.py (_LEVEL_COLOR :68, _level_pill :245, cap div :277, render_level_badge :363), run_recipe_ci.py build_results call :1248 + badge wiring :1296-1320, bridge.py :224 (badge embed — number-only already, no cap text → likely untouched), docs (results-ux.md has cap language; recipe-customization.md EXPECTED_NA row).
- Notable: card.py LEVEL_COLOR already has keys 0-6 (5=green, 6=bright green) — only 0-4 reachable today; dashboard._LEVEL_COLOR needs checking for the same.
- Lint context: abra.py:105-127 documents the R014/lightweight-tag + origin-repoint/go-git history. Per-run recipe tree = $ABRA_DIR/recipes/<recipe>, origin = private mirror (SRC) on PR runs, upstream tags fetched in by fetch_recipe. OPEN QUESTION for B2: what does `abra recipe lint` actually touch (origin fetch? auth? R014 against which tags?) — probe on cc-ci host next, in a scratch clone, both origin-shapes (mirror-origin vs canonical-origin).
- Next: probe abra lint behavior on cc-ci (scratch clones, no shared-checkout touch), then B1.
## 2026-06-11 abra lint probe (B2 design input) — all on cc-ci, scratch ABRA_DIR=/tmp/lvl5-lint-probe/abra
- `abra recipe lint hedgedoc` (fresh canonical clone): FATA "inappropriate ioctl for device" rc=1 — needs a PTY even with `-n`. Under `script -qec "abra recipe lint -n hedgedoc" /dev/null`: rc=0, 21-line unicode table R001R016 (cols: ref|rule|severity|satisfied ✅/❌|skipped|how-to-fix), maxlen 146 no wrapping, wall time 0.7s.
- rc SEMANTICS: rc≠0 ONLY on FATA (cannot lint). Probes:
- rm .env.sample + commit → rc=1 FATA "unable to validate recipe: .env.sample ... no such file" (content-attributable FATA).
- lightweight tag added → table renders R014 error ❌, final line `WARN critical errors present in <recipe> config`, **rc=0**. So pass/fail MUST be parsed from the table (error-severity ❌ rows), sentinel line as cross-check. Baseline warn-only ❌ (R015) → NO sentinel, rc=0 → pass.
- untracked compose.ccci.yml (CI overlay) in tree → FATA "version mismatched between two composefiles" rc=1 — abra lint globs compose*.yml INCLUDING untracked harness overlays ⇒ lint MUST run on a pristine clone of the exact ref, not the deploy tree.
- origin repointed to auth-required mirror URL → rc=1 FATA "unable to fetch tags in ...: repository not found" — lint force-fetches tags from origin ⇒ scratch clone's origin must be fetchable without auth. Cloning FROM the per-run tree (local path origin) satisfies this offline and preserves the run's true tag set (fetch_recipe pulls upstream tags into the per-run tree).
- run_quick emits no results.json/card (build_results only at run_recipe_ci.py:1248, cold path) → lint rung wiring is full-path only.
- Executor design settled (DECISIONS.md entry to come with B2): scratch ABRA_DIR (recipes/<r> = `git clone <per-run-tree>` + `checkout -f <exact tested sha>`; catalogue/servers symlinks to canonical), `script -qec "abra recipe lint -n <r>"`, hard 60s timeout, full output → lint.txt artifact, parse table rows; status = fail iff any error-severity row ❌(not skipped) or content-attributable FATA ("unable to validate recipe"); pass iff table rendered & no error-row ❌; anything else (timeout, abra missing, fetch FATA, unparseable) → unver + loud log. No rule filtering needed (mirror pollution solved by context, not by ignoring rules).
- Tier-skip sources mapped for derive_rungs classification (run_recipe_ci.py:1040-1131): upgrade skip ⟺ `prev` falsy ("only one published version", structural-intentional) given install passed; backup/restore skip ⟺ not backup_cap (structural-intentional); install-fail → downstream tiers skip (unintentional); custom skip ⟺ no custom tests (unintentional unless EXPECTED_NA declares functional); tier absent from `stages` (CCCI_STAGES dev escape) → missing key (unintentional).

View File

@ -8,3 +8,300 @@ be `restructure/recipe-custom` off main @ 76a4b6b. Starting P1: reading the six
(run_recipe_ci.py::_load_meta, conftest.py::_recipe_meta, lifecycle.py::_recipe_extra_env,
lifecycle.py::_recipe_meta_flag, deps.py::declared_deps, canonical.py::is_canonical_enrolled)
before writing harness/meta.py.
## 2026-06-10 P1 — single loader + registry (branch 472a68b)
Wrote runner/harness/meta.py: KEYS registry (14 keys + CHAOS_BASE_DEPLOY/OIDC_AT_INSTALL/
SKIP_GENERIC kept registered as deprecated=True so P1 lands green before P2 deletes them),
RecipeMeta generated from KEYS via dataclasses.make_dataclass (frozen; field set cannot drift from
the registry), load() = the only exec() of recipe_meta.py, MetaError on unknown ALL-CAPS/type
mismatch/callable-on-data-key, difflib suggestion in the unknown-key message. BACKUP_CAPABLE keeps
its tri-state via default None (None = auto-detect — preserves the old `"BACKUP_CAPABLE" in meta`
semantics in generic.backup_capable).
Migrations: orchestrator loads once + passes meta down (deploy_app/perform_upgrade/_perform_op/
run_lifecycle_tier all take the object); conftest meta fixture returns full RecipeMeta (R3 closed);
lifecycle._recipe_extra_env/_recipe_meta_flag and deps.declared_deps deleted; canonical.is_enrolled
+ enrolled_recipes go through meta.load (tests monkeypatch meta.TESTS_DIR now instead of
canonical.__file__); screenshot._load_screenshot_hook reads the attribute (R2 fixed — unit test
proves SCREENSHOT survives the real orchestrator load path). deploy_app keeps an optional
meta=None fallback (loads via the single loader) for fixture/manual callers — exec still happens
in exactly one function.
Effective-value safety check before committing: dumped non_default() for all 21 recipe dirs through
the new loader — every recipe's customized key set matches its recipe_meta.py source (e.g. mumble:
DEPLOY_TIMEOUT/EXTRA_ENV/HEALTH_OK/READY_PROBE/UPGRADE_EXTRA_ENV). One intentional delta class:
deps.deploy_deps' fallback timeouts for a MISSING dep meta change from literal 900/600 to loading
the dep's real meta (orchestrator path always supplied metas, so CI behavior is identical).
Verified on cc-ci (rsynced working tree before committing):
cc-ci-run -m pytest tests/unit -q -> 175 passed
nix develop .#lint --command scripts/lint.sh -> lint: PASS
Three pre-existing f212 unit tests passed dicts to wait_ready_probes — updated mechanically to
construct RecipeMeta via dataclasses.replace (assertions untouched).
Next: P2a compose.ccci.yml first-class + auto-chaos.
## 2026-06-10 P2 — legacy keys & paths deleted (branch 8cd72fd)
P2a: lifecycle.provide_ccci_overlay copies tests/<recipe>/compose.ccci.yml into the per-run
checkout (after install_steps hook, before prepull/deploy); pinned base deploys auto-chaos on
overlay presence (has_ccci_overlay replaces the meta.CHAOS_BASE_DEPLOY elif). ghost/discourse
install_steps.sh were copy-only -> deleted whole; their metas keep COMPOSE_FILE in EXTRA_ENV
(unchanged wiring, the harness now owns the copy).
P2b: oidc_at_install condition removed — `if declared:` provisions before the single deploy,
legacy post-deploy block + _run_setup_custom_tests_hook deleted. lasuite-docs install_steps.sh is
the meet/drive hook with docs' exact env names (diffed against the deleted setup_custom_tests.sh:
same keys incl. OIDC_OP_DISCOVERY_ENDPOINT + scopes 'openid email profile'; secret-insert bump
identical; only the abra-redeploy step is gone — the single deploy reads the env instead).
lasuite-drive's MinIO bucket one-shot -> ops.py pre_install (runs at install-tier start, post-
deploy; bucket lives in the minio volume so it survives upgrade/restore; same scale --detach +
30x3s poll as the shell version). run_quick: deps still provision (realm/creds), hook call gone —
no quick-enrolled recipe declares DEPS today; noted inline.
P2c: SKIP_GENERIC out of the registry; _skip_generic(op) env-only; skip_generic_env_overrides()
prints a `!!` warning when active under DRONE (P5 will embed in the manifest).
P2d: conftest deps fixture = dict of _DepEntry (dict subclass w/ attribute sugar) — the 6 lasuite
files only ever used deps_creds, renamed param to deps, zero assertion changes. NOTE for Adversary:
some assert MESSAGE strings ('setup_custom_tests should have populated this.' -> 'dep
provisioning...') and docstrings updated — message text only, no assert logic/expected values.
Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 175 passed;
nix develop .#lint --command scripts/lint.sh -> PASS. Doc table regenerated to the 14-key registry
(doc-sync unit test pins it).
Next: P3 — HookCtx + ctx-hook signatures everywhere.
## 2026-06-10 P3 — uniform ctx hook convention (branch fd02d9f)
HookCtx frozen dataclass + hook_ctx() constructor in harness/meta.py; ctx.deps read straight from
$CCCI_DEPS_FILE (json, both shapes) — meta.py stays import-cycle-free (deps.py imports lifecycle
which imports meta). Registry keys carry hook_params; meta.load() enforces the expected positional
names per hook key (READY_PROBE/BACKUP_VERIFY/EXTRA_ENV/UPGRADE_EXTRA_ENV=(ctx,),
SCREENSHOT=(page, ctx)); _run_pre_hook applies meta.check_hook_signature(fn, ("ctx",)) to ops.py
hooks before calling. Conversion of 17 ops.py + 8 recipe_meta hooks was scripted (def-line regex +
bare `domain` -> `ctx.domain` inside the pre_*/hook function bodies only) and diff-reviewed; the
only manual fixes: keycloak pre_restore passed `meta` -> `ctx.meta`, and two comment lines in
lasuite-drive/-meet metas that the regex over-replaced were restored. wait_ready_probes gained
op= (install/upgrade call sites pass it) so probes can know the phase.
Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 180 passed; lint PASS.
Next: P4 — discovery placement rule + op_state/deps fixtures + migrate hand-parsers.
## 2026-06-10 P4 — custom-test ergonomics (branch 29a28e2)
Pre-change sweeps confirmed the plan's zero-users claims: no top-level non-lifecycle test_*.py in
any recipe dir; no recipe test file reads os.environ / CCCI_OP_STATE_FILE directly (the only
op-state consumers are the generic assertions via harness.generic.op_state — harness-side, fine).
So P4 = discovery glob removal + new op_state fixture + pinning tests; no test migrations needed.
test_discovery.py's HC2 gate test moved its repo-local custom fixture under functional/ (the rule);
test_discovery_phase2.py now asserts top-level custom is NOT discovered. op_state fixture skips
(clear reason) when env unset / file missing / unparseable; tested via request.getfixturevalue.
Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 184 passed; lint PASS.
Next: P5 — customization manifest (print block + results.json key).
## 2026-06-10 P5 — customization manifest (branch 68954be)
(Resumed after a usage-limit pause mid-P5; working tree carried the in-flight manifest.py.)
New runner/harness/manifest.py: build() collects {meta_non_default, hooks, overlays, custom_tests,
env_overrides} via the SAME discovery/meta functions the run uses (so the manifest can never
disagree with what actually executes — incl. the HC2 _gated() repo-local gate), render() prints
the block. Orchestrator builds+prints right after meta load / repo-local snapshot, BEFORE the
quick-lane branch (both lanes get the block); the dict rides into build_results(customization=...)
verbatim. run_quick writes no results.json, so the single build_results call site covers all.
Hooks render as "<hook>", tuples as lists (JSON-clean); ops.py pre-ops listed by cheap source
scan (same approach as discovery._module_defines — no import at manifest time).
Lint flagged: C408 dict() literal, import-block order (manifest after deps), ruff-format on the
new test file — all fixed. Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest
tests/unit -q -> 191 passed; nix develop .#lint --command scripts/lint.sh -> lint: PASS.
Next: P6 docs, then M1 prep (tests/concurrency proof run + 21-recipe baseline matrix).
## 2026-06-10 P6 — docs (branch da558ca) + inbox response (858e0f5)
Rewrote the three docs to the restructured end state; kept the generated §4 table byte-identical
(doc-sync test pins it). recipe-customization.md flipped from review spec to reference; §8 is now
the R1R9 resolution ledger. Facts double-checked against code before writing: R2 proof lives in
test_screenshot.py::test_screenshot_reachable_through_real_load_path (not test_meta.py — fixed a
first-draft error); mumble's post-F2-14c shape has NO install_steps.sh/CHAOS_BASE_DEPLOY (base =
mumbleweb-only COMPOSE_FILE, host-ports added at head via UPGRADE_EXTRA_ENV); lasuite-docs now
ships install_steps.sh (P2b migration); deps file shape is dict recipe->entry; custom_tests
discovery is NON-recursive over functional/+playwright/ (old doc said recursive — corrected).
Adversary inbox (19:06Z, non-blocking): manifest dumps meta values verbatim -> dashboard shows a
field named SECRET_KEY_BASE (plausible's committed CI dummy — public, no real leak). Took the
redaction option: _jsonable masks values whose key NAME matches
SECRET|PASSWORD|TOKEN|CREDENTIAL|word-segment-KEY, recursing into dict values (the plausible case
is a NESTED key under EXTRA_ENV); names stay visible. KEYCLOAK_URL deliberately not matched
(word-segment KEY). Unit test pins redacted+passthrough both.
Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 192 passed;
nix develop .#lint --command scripts/lint.sh -> lint: PASS.
Next: M1 prep — tests/concurrency proof run on the branch + the 21-dir baseline matrix.
## 2026-06-10 M1 prep + claim
Concurrency proof run on branch head 858e0f5 (rsynced tree on cc-ci): cc-ci-run -m pytest
tests/concurrency -q -> 23 passed in 11.46s (suite untouched by the restructure, as planned).
Baseline matrix: pulled every /var/lib/cc-ci-runs/*/results.json (141 files) and took the most
recent per recipe. 19/21 dirs covered by results.json; mumble's last full run predates the
results system (log ~/ccci-mumble-f214c.log, 5 tiers pass 05-31); bluesky-pds likewise
(Adversary Phase-2 cold verify e45e0ee). plausible's weekly-report RED was its PR branch
(pg13->14, build 200); its default-branch baseline is run 308 (06-10) L4 — runs 307/308 are
today's, from the conc-phase M2 sweep. Bad canaries recorded at their designed-fail tier.
Claimed M1. While waiting: nothing else unblocked in this phase (M2 is gated on M1) — will hold
with short fallback polls per §7 case 2.
## 2026-06-11 M2 reconciliation — discourse upgrade-HC1 root-cause hunt + bluesky re-characterization
Resumed after a loop stall (~21:18Z23:50Z): the m2b/ab sweeps had finished but nothing processed
them. Adversary's 23:53Z inbox asked for (1) a same-ref A/B for the m2b-discourse upgrade-HC1 L1
and (2) a fresh post-fix lasuite-drive L5 at baseline ref — both now queued/running.
Discourse dig (why I don't yet have a mechanism): first hypothesis was my own invocation error —
m2b ran PR=0 where baseline 184 ran PR=2, and I guessed the PR-head sha was unreachable without
the PR fetch. WRONG: fetch_recipe clones all mirror branches and `git checkout <sha>` is check=True
— and the preserved per-run clone sits at HEAD=7ae7b0f, so the re-checkout ran AND persisted.
Second hypothesis (prepull resets the checkout): also wrong — prepull_images is pure
`docker compose config --images` in cwd, never touches git. The scary
`service "sidekiq" depends on undefined service "discourse"` line turned out benign: it appears in
the PASSING m2r/m2rr upgrade sections verbatim (the published compose ships a dangling depends_on;
swarm ignores it — documented in the overlay NOTE). What's left: abra stamped the PREV-TAG commit
(eb96de94 = 0.7.0+3.3.1) on the chaos redeploy while the tree was at 7ae7b0f. One live hypothesis:
the cc-ci overlay clamps app+sidekiq images to bitnamilegacy/discourse:3.3.1; at this PR head
(0.9.0+3.5.0 bump) the redeploy spec may end up close enough to the base spec that the label
update path degenerates — but that requires abra-internals knowledge I can't verify analytically,
and m2r at 7d53d4ec (which also post-dates the 3.5.0 bump?) stamped correctly with the same
overlay, so content-difference-between-refs is doing SOMETHING. Decision: stop theorizing, let the
2x2 complete — m2p-discourse (new main, PR=2, @7ae7b0f) distinguishes PR=0-artifact/race from
deterministic; ab-discourse-7ae7b0f-oldmain (old main, PR=2, @7ae7b0f) distinguishes regression
from pre-existing. Run 184 left no orchestrator log (drone-side), so its chaos stamp is unknowable
— the old-main re-run stands in for it.
lifecycle.py diff c2508c7..main re-read for the upgrade path: overlay copy moved from per-recipe
install_steps.sh to first-class auto-chaos (P2a) but the copied FILE and its untracked-persistence
semantics are byte-identical; run_upgrade order (checkout → upgrade_env → prepull → chaos
redeploy -c → own wait_healthy) unchanged from old main. Nothing jumps out as the delta.
bluesky-pds: pulled the swarm service logs from all three failed runs — identical
`Cannot find module '/app/index.js'` crash-loop (Node v24.15.0) on new main @ mirror head, new
main serial re-run, AND old main @ old default head. The earlier "deploy timed out during
concurrent image pulls" guess in STATUS was wrong (the 600s timeout was the SYMPTOM; the ~2min
A/B failure exposed the crash-loop). Upstream re-published the pinned tag with a different image
layout — no harness can deploy it. Filed in STATUS as restructure-neutral with grep-able evidence.
## 2026-06-11 lasuite-drive root cause #2 — completed one-shot poisons convergence (caught live)
Watching the m2p proof run instead of just waiting paid off: the fix-forward's best-effort line
printed (so #1 is fixed), but the install assert then sat in pytest for 25+ minutes. Live state:
app serving 200, every service 1/1 EXCEPT minio-createbuckets 0/1 with its task **Complete 28
minutes ago**. services_converged demands cur==want for every service; a completed
restart_policy-none one-shot never returns to 1/1, so the bounded converge poll (DEPLOY_TIMEOUT
1800s for this recipe) was always going to burn to the deadline and fail install.
Why nobody ever saw this before P2b: the old setup_custom_tests.sh ran AFTER the install asserts
(post-deploy hook path), so converge never observed desired=1 on the one-shot, and the upgrade
tier's chaos redeploy reapplied the compose spec (replicas: 0) before its own converge checks.
P2b folded the trigger into ops.py pre_install — which the orchestrator runs BEFORE the generic
install assert. Also explains m2rr's odd "install fail but upgrade/backup/restore/custom all pass"
shape exactly (redeploy resets the spec).
Fix options weighed: (a) hook scales the one-shot back to 0 after the poll — rejected: on the
timeout path the task is typically still Preparing (image pull) and scale-to-0 CANCELS it, so the
observed "bucket lands just after the window" runs would become custom-tier RED, i.e. strictly
worse than baseline; (b) move the trigger to a post-assert hook point — no such hook exists in the
new convention and inventing one mid-M2 is scope creep; (c) teach services_converged that a
replica deficit consisting entirely of Complete tasks IS converged — chosen: semantically correct
(the one-shot did its job), restores baseline behavior for any triggered one-shot, and the
converge window doubles as the late-landing grace. Disclosed delta: a genuinely FAILING one-shot
now reds at install (converge timeout) instead of at the custom bucket test — both red, no false
green. Guard: Failed/mixed/spinning-up/no-tasks-yet still block (unit-pinned, 7 cases).
Branch fix/converged-oneshot @ be2026a, proposal in ADVERSARY-INBOX, awaiting approval per the M2
fix-forward protocol. Unit suite 199 passed + lint PASS from the cc-ci working-tree rsync.
## 2026-06-11 ~01:00Z — merge landed, queue shortened
be2026a approved (REVIEW a531746, cold-verified independently) and merged as 6cabbe7; drone build
350 green on the push head 914c166. Merged diff verified == branch diff (empty git diff be2026a..
main for the two files). Post-fix proof m2p2-lasuite-drive queued from a FRESH clone
/root/m2-postfix @6cabbe7 rather than git-updating /root/m2-sweep, because the serial queue's
discourse runs exec from m2-sweep and swapping code under an active/imminent run is how you get
unexplainable results. The discourse A/B therefore runs at 5c0676b (pre-converge-fix) — irrelevant
to discourse (no one-shots), and the Adversary's approval explicitly noted that.
Shortened the doomed m2p run: the generic install assert had already burned its 1800s converge
deadline and failed; the overlay install test then started an IDENTICAL second 1800s burn (same
assert_serving). SIGINT'd the overlay pytest child only — KeyboardInterrupt surfaced at
generic.py:97, the exact diagnosed converge-poll line (a nice live confirmation), and the
orchestrator advanced to the upgrade tier on its normal path. Teardown semantics untouched.
Disclosed in STATUS so the log's KeyboardInterrupt is pre-explained.
Drone API note for future me: no token on disk; fastest read-only check is docker cp the drone
sqlite out and query builds (documented in STATUS). The Gitea statuses API returned empty for
these shas (drone evidently doesn't post commit statuses here).
## 2026-06-11 ~00:55Z — discourse A/B closed (harness-neutral), mechanism still unattributed
m2p-discourse (new main, PR=2, @7ae7b0f) and ab-discourse-7ae7b0f-oldmain (old main, PR=2, same
ref) failed the upgrade IDENTICALLY: HC1, chaos-version=eb96de94+U, all other tiers pass, L2.
Same invocation as baseline 184 which was L4 five days ago. So: deterministic, harness-neutral,
and something outside both harnesses drifted since 06-05. Eliminated: branch-tip existence (7ae7b0f
still tips upgrade-0.8.0+3.5.0 + pr/2), upstream tag set (0.7.0+3.3.1 still latest), abra pin
(flake.lock untouched by the restructure). Not eliminated: abra-internal interaction with repo/app
state (the chaos stamp lands on the prev-base TAG commit despite the tree being at the PR head —
my best guess remains something in how abra resolves the version/commit for the chaos label when
COMPOSE_FILE includes the overlay and the project normalizes invalid, but m2r at 7d53d4ec stamping
correctly with the same dangling depends_on kills the simple version of that theory). The
`service "sidekiq" depends on...` line appears in passing AND failing upgrades, position-identical,
so it discriminates nothing. M2-wise the question is settled — the restructure is exonerated by
byte-identical old==new failure; chasing abra's stamp resolution further is post-phase work, filed
as a DEFERRED note rather than burning more M2 wall-clock on a non-rcust mechanism.
m2p2-lasuite-drive (the binding post-fix proof) auto-started at 00:48:58Z from /root/m2-postfix
@6cabbe7. Watching for: no 1800s converge burn after the one-shot completes, then L5.
## 2026-06-11 ~01:10Z — m2p2 green; "L5" turned out to be a moved goalpost (mainline, not ours)
m2p2-lasuite-drive: rc=0, 3m19s, all stages pass, OIDC + MinIO custom tests green, and the
fix-forward pair demonstrably exercised (one-shot overshot 90s again → best-effort line → late
Complete → converge fix admitted it). But results.json said level=4 where the binding condition
said L5 — heart-stopper until the git archaeology: run 189's level-5 + "L6 recipe-local N/A" cap
didn't match ANY derive_rungs I could find in either world, because the 6-rung ladder was removed
on MAIN by 46e2cdb+c51cd84 (PR #6) on 06-09, between the baseline runs and the merge — by the
mirror/report phase, not rcust. The merge didn't touch level.py (checked 01e6d49^1..01e6d49), and
run 204 on 06-09 (hours pre-deploy of the refactor) still shows 6 rungs — clean timeline. So the
baseline matrix's "L5" rows need a schema-equivalence reading, declared in STATUS BEFORE the claim
rather than negotiated after the Adversary trips on it. Lesson re-learned: a baseline matrix
should pin the SCHEMA VERSION of its evidence, not just the level number.
## 2026-06-11 ~01:30Z — M2 claim assembled
Drone-path runs landed green (356 immich#2 L4, 357 plausible#3 L4, both with embedded
customization manifests + clean flags, triggered by real !testme comments). Zero-leak verified
after everything. Plausible's missing screenshot.png checked against its other runs — it never
produces one (no screenshot surface), so not a capture regression. Claimed M2 with the full
21-recipe reconciliation table against the corrected baseline; the three lasuite rows ride the
Adversary-accepted L5≡L4+OIDC equivalence, bluesky-pds is the one justified exclusion, discourse
is reconciled as env-drift with byte-identical old==new evidence. Nothing else unblocked in this
phase while the verdict is out — holding per §7 case 2.
## 2026-06-11 ~01:20Z — M2 PASS → ## DONE
Adversary cold-verified the whole claim independently (re-ran the canaries themselves, jq'd all 21
run dirs, re-checked the drone DB and the zero-leak state) and passed M2 with no findings and no
VETO. M1 + M2 both stand; ## DONE written. Phase summary: 6 plan phases landed on one branch,
merged after M1; the real-CI sweep then caught exactly TWO genuine regressions (both in the same
lasuite-drive P2b hook port: raise-on-timeout, and one-shot-vs-converge ordering), both root-caused
live, fixed forward under approval, and proven end-to-end — plus it surfaced two pre-existing
environment drifts (discourse upgrade-HC1, bluesky-pds upstream image) that the A/B discipline
kept from being misattributed to the restructure. The sweep-as-safety-net worked as designed.

105
JOURNAL-shot.md Normal file
View File

@ -0,0 +1,105 @@
# JOURNAL-shot.md — Builder journal, phase `shot`
## 2026-06-11 ~01:1701:35Z — phase open, P1+P2 in one sweep
Read the phase plan + plan.md §6.1/§7/§9. Enumerated enrolled recipes (19). Pulled per-recipe
latest-run data off cc-ci (`results.json` screenshot field + PNG size for all ~190 run dirs),
scp'd 18 PNGs to /tmp/shot-audit/ and Read every one of them.
Findings vs the orchestrator pre-audit: all four 4801-2B suspects are indeed blank frames
(immich pure white, lasuite-meet white, n8n off-white, cryptpad grey). keycloak 8.7KB is a
"Loading the Administration Console" spinner — NOT a sparse login page as §2 guessed.
lasuite-docs/drive ~5.9KB are lone spinners. Two surprises: (1) mattermost-lts 242KB, classed
healthy by size, is actually the brand splash/loading screen, not the login form — size
heuristics lie in both directions; (2) mumble serves a real web page (mumble-web client per
compose.mumbleweb.yml, deployed since Phase 2 for HTTP health) showing its connecting spinner —
so mumble is fixable, not an N/A.
plausible root cause: traced via Drone sqlite (no python3 on host; ran alpine+sqlite3 against
the drone data volume). Build 357 log t=73s: capture failed, last status=500 after 45s. Cross-ref
tests/plausible/functional/test_health_check.py: `/` 500s via auth_controller under
DISABLE_AUTH=true — permanent, not an init race. So the default landing capture can never work;
plausible needs a SCREENSHOT hook to a path that renders (will probe /login, /sites on a live
deploy during P3).
bluesky-pds: null because install fails at level 0 (upstream image breakage, already in
DEFERRED.md from rcust) — capture gated on deploy_ok, correctly skipped. N/A while upstream broken.
custom-html nginx-welcome: verified no install-time seeding exists for this recipe (custom-html-tiny
has install_steps.sh; custom-html only seeds in pre_backup/pre_upgrade ops, after capture). The
nginx default page IS the honest fresh-install view. Leaving OK; flagged in matrix for Adversary.
Adversary opened REVIEW-shot.md with its own cold pre-audit (4f3a747) before my first push —
good: my visual reads agree with theirs on every overlapping row.
Design thinking for P3 (next iteration): default-path improvement = after goto(domcontentloaded),
try a bounded `wait_for_load_state("networkidle")` (~10-15s cap) and/or wait for a non-trivial
painted body, then screenshot; then a blank-detect (PNG < ~6KB or near-uniform) → one retry with
a longer settle. Keep total ≤ ~60s worst case, all inside the existing capture() try/except so R7
(cosmetics never block) is preserved. Unit tests: blank-detector pure function + retry logic with
a fake page. Per-recipe hooks only for plausible (500 root) + whatever the re-audit still shows.
## 2026-06-11 ~05:45-06:00Z — plausible root cause was a 62-char SECRET_KEY_BASE; M1 PASSed meanwhile
M1 PASS (ae10b55) with a watch-list. P3 done in two commits: ce50f64 (harness settle+blank-retry,
6 unit tests, 205 pass, lint PASS) and b98a471 (plausible fix). The plausible story changed under
probing: three live probes (shot-probe{,2,3}-plausible) showed / and every HTML route 302→/register
which 500s; app logs gave the smoking gun: `(ArgumentError) cookie store expects conn.secret_key_base
to be at least 64 bytes`. Our EXTRA_ENV value — comment claimed "64-char" — measures 62. So every
page render 500'd while /api/* (no cookie store) passed all tiers. NOT auth_controller/DISABLE_AUTH
as the old comments claimed; corrected both stale comments. Fix = 68-char value; verified
shot-fix-plausible run: install pass, screenshot.png 64132B = real registration page (empty fields,
placeholders only — same safe shape the Adversary blessed for n8n/uptime-kuma). No hook needed.
P4 started: !testme posted 05:56:32Z on immich#2 + plausible#3 (drone builds 370+371 running,
concurrent). Manual full proof run keycloak launched (shot-proof-keycloak). Remaining queue:
mattermost-lts, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n, mumble.
## 2026-06-11 ~06:05-06:30Z — proof sweep underway; A1 fixed; mumble is the holdout
Proofs verified visually so far (each level matches its baseline): drone 370 immich L4 234KB real
onboarding card (was 4801B); drone 371 plausible L4 64KB registration page (was null); keycloak L4
real sign-in form (was loading spinner); cryptpad L4 real landing w/ document picker (was grey blank);
lasuite-meet L4 real product landing (was white blank); mattermost-lts L2(=m2r baseline L2) — real
page but it's the desktop-or-browser interstitial, so per the watch-list I added the first
SCREENSHOT hook (80e5713, → /login + public settle()); re-run pending.
A1 (blank-retry could regress a larger frame): fixed in 7ad7d1f — retry goes to a temp path and
only replaces via os.replace when >= first; regression test [9999,4801]→9999. 207 unit, lint PASS.
mumble: proof run still spinner after settle+retry (7980B). Probing live what mumble-web does over
90s (it printed real mumble-web HTML while up; suspect autoconnect overlay that never resolves
because the websocket voice path may not be browser-reachable). Orchestrated probe2 running.
Also in flight: n8n + lasuite-docs proofs from the A1-fixed tree. Queue: lasuite-drive, mattermost
re-run; then ghost/hedgedoc/etc. healthy-class citations + dashboard/card check + runtime compare.
## 2026-06-11 ~06:40-07:15Z — mattermost solved via click-through; mumble settled as best-available; M2 assembled
mattermost: hook v1 (/login) produced a byte-identical interstitial PNG — mattermost shows the
desktop-or-browser chooser on ANY first-visit route. Hook v2 clicks "View in Browser" (best-effort,
suppress) → shot-proof3 PNG is the genuine "Log in to your account" form at L2=baseline. That's
watch-list item 3 satisfied the hard way.
mumble: three live probes. probe4 (90s DOM+console watch): localization loads, NO errors, NO failed
requests, connect-dialog selectors match nothing, page stays at loading-container forever. orch5:
websockify serves everything (its own 404s on /ws,/websocket; config.local.js = untouched sample, no
autoconnect). Conclusion: the pinned mumble-web:0.5 client never paints for an anonymous visitor —
not a capture bug, not fixable harness-side without changing the deploy (guardrail says upstream).
Filed DEFERRED (6104a99); claiming the loader frame as documented best-available. Voice = the
recipe's function and is protocol-tested; the Adversary may still want a different disposition —
their call at the gate.
Ops lessons this stretch: 3 simultaneous run launches race on abra catalogue fetch (lasuite-drive
died "unable to update catalogue"; reran solo green) — stagger launches. Backgrounded one-shot ssh
launchers with `cd X && nohup A & nohup B &` only cd for the first — give each its own cd.
M2 evidence: 10 fixed-class proof runs (table in BACKLOG-shot P4, every PNG Read by me), 2 of them
real !testme drone builds (370/371, durations 198s/166s vs 199s/209s baselines — plausible FASTER
since capture stops burning its 45s fail window), healthy-class cited from P1, dashboard grid/card/
badge all 200. Claiming M2.
## 2026-06-11 ~07:20Z — phase complete
M2 PASS (2b54adb): 18/18 PNGs independently Read, both !testme proofs confirmed genuine via bridge
logs, durations/levels/R7 all verified, mumble N/A-variant agreed (Adversary reversed its M1 stance
on the new DOM evidence), bluesky-pds N/A re-confirmed. Wrote ## DONE. Loop ends.

View File

@ -29,6 +29,513 @@ I own this file and the `## Adversary findings` section of BACKLOG-rcust.md only
## Verdicts
_(none yet — phase just started; Builder has not yet created STATUS-rcust.md or branch
`restructure/recipe-custom`. Only the reference spec doc `76a4b6b` has landed. Awaiting first
`claim(rcust): M1` from the Builder.)_
_(no GATE verdict yet — M1 is not claimed. M1 only claims after P1P6 are all on the branch;
Builder has landed P1 (472a68b) + P2 (8cd72fd) and is mid-P3. The interim pre-review below is
front-loaded break-it work on the FROZEN P1/P2 commits — NOT an M1 PASS.)_
### Interim pre-review of frozen P1+P2 (branch @ 8cd72fd) — @2026-06-10, cold from upstream clone
Done as idle-time break-it work while no gate is pending. P1/P2 phase commits won't be rewritten
(Builder adds P3+ on top), so reviewing them now is non-wasted and front-loads M1. Cold clone of
`origin/restructure/recipe-custom` into `/tmp/rcust-verify` from the true upstream remote.
**No defects found so far.** Results:
1. **Deleted-code fallout — CLEAN.** Grepped `runner/ tests/ scripts/` for live refs to every deleted
symbol (`_recipe_meta`, `_load_meta`, `_recipe_extra_env`, `_recipe_meta_flag`, `declared_deps`,
`is_canonical_enrolled`, `OIDC_AT_INSTALL`, `CHAOS_BASE_DEPLOY`, `SKIP_GENERIC`,
`setup_custom_tests`, `deps_apps`, `deps_creds`, `deployed_app`). All hits are comments/docstrings
explaining the deletion, test names, or the intentionally-RETAINED `CCCI_SKIP_GENERIC*` env form
(kept per P2c). Zero live call-sites. `setup_custom_tests.sh` files gone.
2. **All-recipes-load-clean (typo gate) — PASS, independently.** Ran `meta.load()` (pure stdlib) over
all 21 recipe dirs cold via plain python3 (did NOT trust the Builder's test_meta.py). All 21 load;
non-default key sets sane. Every ALL-CAPS key used in any recipe_meta.py is in the 14-key registry.
3. **Coverage-loss diff (CARDINAL check) — ZERO deltas on data keys + hook presence.** Throwaway
harness (`/tmp/diff_meta.py`) reproduces main's six-loader effective resolution (`_load_meta`,
`declared_deps`, `is_enrolled`, `_recipe_extra_env`) from MAIN's recipe_meta files and diffs vs the
BRANCH's `meta.load()` for all 21 recipes. After correcting one harness artifact (EXTRA_ENV default
is `{}` not None), **0/21 recipes show any delta** for HEALTH_PATH/HEALTH_OK/DEPLOY_TIMEOUT/
HTTP_TIMEOUT/BACKUP_CAPABLE/EXPECTED_NA/UPGRADE_BASE_VERSION/DEPS/WARM_CANONICAL + presence of
READY_PROBE/BACKUP_VERIFY/UPGRADE_EXTRA_ENV/EXTRA_ENV/SCREENSHOT.
4. **Validation gaps — CLOSED.** Crafted tmp recipe_metas: typo'd key → MetaError (with "did you mean
DEPLOY_TIMEOUT?"); wrong type (`DEPLOY_TIMEOUT="str"`) → MetaError; callable on data key
(`DEPLOY_TIMEOUT=lambda ctx:...`) → MetaError; `_PRIVATE`/lowercase-helper → loads clean (exemption
works). All four behave per the locked decision.
5. **meta.py read** — single `exec()`, frozen `RecipeMeta` generated from `KEYS`, `_coerce` rejects
bool-as-int and callable-on-data-key; `non_default` compares vs registry default. No issues.
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** full `pytest tests/unit -q` +
`pytest tests/concurrency -q` + `scripts/lint.sh` cold on the cc-ci host; R2 end-to-end through the
real orchestrator screenshot path; P3 ctx-hook signature migration (assert byte-identical, legacy
`lambda domain:` raises clear MetaError); P4/P5/P6; re-run the coverage diff on the FINAL branch
(P3 changes hook signatures); recipe-test diffs are mechanical-only (no assertion weakening);
HC2/F2-11/generic-floor integrity. These wait for the `claim(rcust): M1`.
### Interim pre-review of frozen P3 (branch @ fd02d9f) — @2026-06-10, cold from upstream clone
Builder landed P3 (uniform ctx hook convention) and moved to P4, so P3 is frozen. Pre-reviewed it.
**No defects found.**
1. **Mechanical-migration discipline — HELD (no VETO trigger).** `git diff 8cd72fd..fd02d9f` over
`tests/*/` shows ZERO changed assert/expected literals. Every hook change is purely
`def HOOK(domain[, meta])``def HOOK(ctx)` + `domain``ctx.domain` in the body. Spot-checked
cryptpad/mumble/ghost/lasuite-drive recipe_meta.py + lasuite-drive ops.py: seeded values, return
dicts, paths, status codes, and the `pre_restore` `assert _psql(...) in (...)` are byte-identical
apart from the `ctx.` deref.
2. **HookCtx — present + complete.** `meta.HookCtx` frozen dataclass has all 5 documented fields
(`.domain`, `.base_url`, `.meta`, `.deps`, `.op`); `meta.hook_ctx(domain, meta, op=…)` factory
builds it and pulls `deps` from `$CCCI_DEPS_FILE`. All call sites migrated: run_recipe_ci
`pre_<op>`, BACKUP_VERIFY; lifecycle `extra_env` + READY_PROBE; screenshot `SCREENSHOT(page, ctx)`.
(NB my first pass falsely flagged "no HookCtx" — that was a STALE WORKTREE at P2; corrected by
checking out fd02d9f. Logged here for honesty.)
3. **Legacy-signature guard (P3.4) — PRESENT + works, live-probed.** `meta.check_hook_signature`
exact-matches positional params and raises a CLEAR MetaError naming the P3 migration + HookCtx
fields. Wired into both `load()` (recipe_meta hooks; SCREENSHOT expects `(page, ctx)`, rest
`(ctx)`) and the orchestrator (ops.py `pre_<op>`). Crafted tmp metas: legacy `READY_PROBE(domain)`,
`SCREENSHOT(page, domain, meta)`, `EXTRA_ENV(domain)` all → MetaError at load; `READY_PROBE(ctx)`
loads clean. No silent mid-run TypeError path.
4. **Coverage diff re-run at P3 head — still 0/21 deltas** (hook presence + all data keys unchanged).
Net: P1+P2+P3 all clean under cold adversarial probing. M1 still gated on full unit+concurrency+lint
on the cc-ci host, P4P6, R2 end-to-end via the real screenshot orchestrator path, and a final
coverage re-diff. No findings filed; no VETO.
### Interim pre-review of frozen P4 (branch @ 29a28e2) — @2026-06-10T18:55Z, cold from fresh host clone
Builder landed P4 (custom-test ergonomics) and moved to P5, so P4 is frozen. Pre-reviewed it cold.
**No defects found.** NOT an M1 verdict — M1 stays gated (see "Still UNVERIFIED" below).
Cold acceptance (fresh `git clone` on cc-ci host at 29a28e2, my own checkout — not the Builder's):
- `cc-ci-run -m pytest tests/unit -q`**184 passed** (exact match to claim; full suite, no
cross-fixture pollution from the session-scoped `deps` fixture).
- `cc-ci-run -m pytest tests/unit/test_discovery.py test_discovery_phase2.py
test_conftest_fixtures.py -q` → 14 passed.
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS** (ruff format/check, deadnix,
shfmt, shellcheck, yamllint all clean).
Correctness probes:
1. **Placement-rule claim ("zero in-repo users of top-level custom tests") — HOLDS.** Filesystem
sweep of every `tests/<recipe>/test_*.py`: ALL are lifecycle names (test_{install,upgrade,
backup,restore}.py). No top-level non-lifecycle custom exists in-repo, so dropping the top-level
glob in `discovery.custom_tests` loses ZERO coverage. The lifecycle-name exclusion is retained
inside functional/playwright as the double-run safety net.
2. **Discovery diff — clean.** Top-level `glob(test_*.py)` branch removed; functional/ + playwright/
subdir globs retained with `basename not in lifecycle_names` guard. Docstring + module header
updated to state the placement RULE.
3. **Test changes are adaptation + strengthening, NOT weakening (no VETO trigger).**
- `test_discovery_phase2`: renamed to `..._placement_rule_...`; now ASSERTS the top-level
`test_sso_smoke.py` is `not in names` (new negative assertion proving the behavior change),
while functional/playwright customs are still `in names` and lifecycle name excluded.
- `test_discovery::test_custom_tests_repo_local_gated`: repo-local custom moved from top-level
into `functional/`; HC2 default-deny (`== []` when unapproved) and approved-case
(`functional/test_sso.py in names`, `test_install.py` excluded) both INTACT. HC2 integrity
preserved.
4. **op_state fixture — correct.** Skips with clear reason on unset env / missing file / non-JSON
(`except ValueError` catches JSONDecodeError); reads & returns parsed dict otherwise. Tests
cover 3 of 4 paths (the non-JSON skip path is untested — minor coverage gap, not a defect; the
branch is trivially correct by inspection).
Net: P1+P2+P3+P4 all clean under cold adversarial probing; both halves of every phase claim
(unit count + lint) reproduced cold on a fresh clone. No findings filed; no VETO.
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** P5 (manifest) + P6 (docs);
`pytest tests/concurrency -q` cold; R2 end-to-end through the real orchestrator screenshot path;
final coverage re-diff on the COMPLETE branch (P1P6, all 21 recipes, effective customization set
unchanged); recipe-test diffs mechanical-only across the whole branch; HC2/F2-11/generic-floor
integrity at the final head. These wait for `claim(rcust): M1`.
### Interim pre-review of frozen P5 (branch @ 68954be) — @2026-06-10T19:06Z, cold from fresh host clone
Builder landed P5 (customization manifest) and moved to P6, so P5 is frozen. Pre-reviewed it cold.
**No blocking defect; one secret-SURFACE observation raised (heads-up to Builder, NOT a VETO, NOT
an M1 secret-leak failure).** NOT an M1 verdict.
Cold acceptance (fresh `git clone` on cc-ci host at 68954be, my own checkout):
- `cc-ci-run -m pytest tests/unit -q` → **191 passed** (exact match to claim).
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
Primary adversarial target — SECRET LEAKAGE via the new manifest surface (D-gate: published logs +
dashboard contain NO secrets, incl. generated app passwords):
1. **Generated/runtime secrets — NOT exposed (gate holds).** `manifest.build` collects only:
`meta_non_default` (static recipe_meta), hook NAMES (pre-ops/install_steps.sh/compose.ccci.yml),
overlay FILENAMES, custom-test COUNTS, and env-override KEY names (printed `KEY=1`, value never
rendered). It never touches `deps` (client_secret), `op_state`, abra-generated app passwords, or
any env VALUE. The cardinal concern — generated app passwords on the dashboard — is structurally
absent from this surface.
2. **Cold all-recipes sweep.** Built+rendered the manifest for all 21 recipes on the host; grepped
the rendered blocks AND the results.json `customization` payload for secret/password/token/key/
credential and for any 32+ char high-entropy string. The ONLY hit, across every recipe, is
plausible's `EXTRA_ENV.SECRET_KEY_BASE` =
`"ccciplausibletestkeybase64charsexactlyforCIephemeral4567890123"`.
3. **OBSERVATION (not a leak):** that value is a HARDCODED, committed, PUBLIC dummy CI constant
(tests/plausible/recipe_meta.py, in the open-source repo) — not a generated or real secret.
`meta_non_default` dumps EXTRA_ENV literal dicts verbatim into the log AND results.json (→
dashboard), so a field literally named `SECRET_KEY_BASE` with a value now appears on the
dashboard. No real secret is exposed (it's public), so this is NOT a D-gate failure and does NOT
block P5. BUT it's a standing surface: (a) a dashboard secret-scan gets a true-positive-shaped
hit on a public dummy (noise that could mask a real leak), and (b) if any recipe ever set a real
secret-ish literal in a meta dict, the manifest would surface it unredacted. Flagged to Builder
via BUILDER-INBOX as a heads-up to consider redacting values of sensitive-named meta keys before
M1. Will re-examine on the real dashboard at the M1 cold-verify.
4. **HC2-honoring — confirmed.** Manifest routes ALL repo-local reads through `discovery._gated`
(ops.py loop direct; `install_steps`/`resolve_overlay_op`/`custom_tests` each call `_gated`
internally). An unapproved repo-local recipe contributes nothing to the manifest.
5. **Pure presentation — holds.** `build()` only reads files/env and returns a dict; `render()`
formats a string. Called at run_recipe_ci.py:889-890 (print) + embedded at :1261 into results;
no state mutation, no verdict influence. `_jsonable` renders callables as `'<hook>'` (so a
callable EXTRA_ENV/READY_PROBE never leaks closure internals) and tuples→lists for JSON.
Net: P1P5 all clean under cold adversarial probing; every phase claim (unit count + lint)
reproduced cold. No findings filed; no VETO. One non-blocking secret-surface heads-up sent.
**Still UNVERIFIED for M1:** P6 (docs); `pytest tests/concurrency -q` cold; R2 end-to-end via the
real orchestrator screenshot path; final coverage re-diff on the COMPLETE branch (all 21 recipes,
effective customization unchanged); recipe-test diffs mechanical-only across the whole branch;
HC2/F2-11/generic-floor integrity at final head; AND — at the M1 dashboard check — confirm the
SECRET_KEY_BASE-named field on the real dashboard is the accepted public dummy (or redacted).
These wait for `claim(rcust): M1`.
## M1 — implementation verified: **PASS** @2026-06-10T19:27Z (branch `restructure/recipe-custom` @ 858e0f5)
Cold-verified from TWO fresh clones on the cc-ci host (NEW=858e0f5, OLD=main pre-restructure;
merge-base 49fb818 confirmed → `main..858e0f5` is exactly P1P6). Verdict formed from the phase plan
(SSOT), the code/git history, the STATUS verification facts, and my own cold re-runs — NOT from
JOURNAL rationale (isolation discipline; I did not need to consult JOURNAL).
**All M1 Definition-of-Done items PASS:**
1. **Cold test suites — match claim exactly.** Fresh clone @858e0f5:
`cc-ci-run -m pytest tests/unit -q` → **192 passed**; `tests/concurrency -q` → **23 passed**
(untouched by this plan, proven); `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
2. **Coverage diff (cardinal risk) — 0 REAL deltas / 21 recipes.** Wrote throwaway extractors that
resolve EVERY recipe's effective customization in BOTH worlds — OLD via the legacy loaders
(`_load_meta` + `lifecycle._recipe_extra_env` + `deps.declared_deps` + `_recipe_meta_flag`),
NEW via `meta.load()` + `meta.extra_env/upgrade_extra_env` — for the common keys (HEALTH_*,
timeouts, DEPS, EXTRA_ENV resolved at a fixed domain, UPGRADE_EXTRA_ENV, BACKUP_CAPABLE,
EXPECTED_NA, UPGRADE_BASE_VERSION, READY_PROBE/BACKUP_VERIFY presence). Diff = **0 behavioral
deltas**; the only raw diffs were 20× `UPGRADE_EXTRA_ENV: None→{}` (unset default representation,
behaviorally identical) and mumble (most-customized: callable EXTRA_ENV→dict, UPGRADE_EXTRA_ENV,
READY_PROBE) is **byte-identical** old↔new.
Deleted keys accounted for (no silent loss): `SKIP_GENERIC` (0 recipe users); `CHAOS_BASE_DEPLOY`
→ overlay-presence (discourse+ghost, exactly the two shipping compose.ccci.yml — perfect 1:1, no
change either direction); `OIDC_AT_INSTALL` → install-time made universal (drive+meet were
already install-time). **lasuite-docs** declared DEPS but NOT OIDC_AT_INSTALL → OLD post-install,
NEW install-time: an INTENTIONAL P2b consolidation, not a drop — flagged below for M2 validation.
3. **Assertion weakening (VETO-class) — NONE.** Full branch diff over all recipe test files
(excl. harness unit/concurrency/regression): 18 removed asserts, 18 added. After mechanical
normalization (`domain`→`ctx.domain`, `deps_creds`→`deps`, `MAX_USERS`→`_MAX_USERS`, whitespace)
the removed and added assert sets are **IDENTICAL** — zero unmatched in either direction. Every
change is a pure signature/fixture/constant rename; no expected value altered, no assert deleted.
Spot-confirmed discourse/ghost `_psql(domain,…ci_marker…) in (…)` → `ctx.domain` only (expected
tuple + SQL byte-identical). **No VETO.**
4. **Deleted-code fallout — clean.** No dangling LIVE refs to any of the 13 deleted symbols
(`_recipe_meta`/`_load_meta`/`_recipe_extra_env`/`_recipe_meta_flag`/`declared_deps`/
`is_canonical_enrolled`/`OIDC_AT_INSTALL`/`CHAOS_BASE_DEPLOY`/`SKIP_GENERIC`/`setup_custom_tests`/
`deps_apps`/`deps_creds`/`deployed_app`). Only residue: stale DOC/comment mentions of
`OIDC_AT_INSTALL` + `setup_custom_tests.sh` in PARITY.md files (non-blocking P6 cosmetic nit).
5. **Validation gaps — closed.** Cold-probed `meta.load()` with synthetic bad metas: typo'd key,
str-on-int, bool-as-int, callable-on-data-key, legacy hook sig `READY_PROBE(domain)`, and unknown
key ALL → `MetaError` (clear, names the offending file/key). Clean + underscore-private-helper
metas load fine (no false positives). No silent pass.
6. **R2 fixed end-to-end.** Cold proof through the REAL load path: a recipe declaring
`def SCREENSHOT(page, ctx)` is surfaced by `meta.load()` and resolved callable by
`screenshot._load_screenshot_hook` (old L1 allowlist dropped it — now arrives); orchestrator wires
it `run_recipe_ci.py:1029 capture(…, recipe_meta=meta)` → `hook(page, hook_ctx(domain, meta))`.
Absent recipe → None (default landing-page path). Legacy `SCREENSHOT(page, domain, meta)` sig
rejected at load.
7. **HC2 / F2-11 / generic-floor integrity — preserved.** Cold-probed `discovery.custom_tests` +
`install_steps`: UNAPPROVED repo-local → `[]` / `None` (default-deny holds); APPROVED → surfaced.
`sso_dep_unverified` (F2-11) logic UNCHANGED (only a comment edited) — a deps-not-ready run that
skips ≥1 `requires_deps` test still suppresses the green signal. Generic floor `_skip_generic`
default = run (additive); opt-out now env-only (same env vars as before; the 0-user meta key
removed) and surfaced LOUDLY in CI + flagged `!!` in the manifest — strictly stronger, never
silent.
8. **(Bonus) P5 secret-surface heads-up RESOLVED + verified.** The Builder landed `858e0f5`
redacting secret-named meta values in the manifest (my P5 BUILDER-INBOX ask). Cold-verified:
`plausible.EXTRA_ENV.SECRET_KEY_BASE` → `<redacted>` in BOTH the log block and results.json;
recursive into nested dict keys; word-segment `(^|_)KEY(_|$)` regex avoids over-match
(KEYCLOAK_* passes). All-21-recipe sweep: exactly 1 redaction, ZERO over-redaction, ZERO
under-redaction (no secret-shaped value remains). Regression test
`test_manifest_redacts_sensitive_named_values` present.
**Verdict: M1 PASS.** No findings filed, no VETO.
**This does NOT clear `## DONE`.** Per the phase DoD, DONE requires a fresh Adversary PASS for BOTH
M1 *and* M2. M2 (merged-main real-CI regression sweep vs the committed baseline matrix) is still
unverified. M2 watch-items I will specifically re-check from run logs:
- **lasuite-docs OIDC is now install-time** (post→install change above) — must pass a real run with
OIDC wired at install (skip-count 0 on its `requires_deps` tests).
- the customization spot-checks the plan §M2.4 enumerates (mumble READY_PROBE tcp lines, cryptpad
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + auto-chaos base deploy, lasuite-*
deps provisioning + OIDC tests ran, immich ops.py seeds, manifest block present in every log,
screenshot.png where capture succeeded).
- canary suite (RED canaries still caught at intended tier) + per-recipe level == baseline matrix.
- zero leaked apps after teardown.
### M2-prep — independent hook-port audit (shell→python / best-effort↔fatal drift) @2026-06-10T20:55Z
Triggered by the lasuite-drive regression (below), which my M1 PASS MISSED: my M1 coverage diff
compared recipe_meta KEYS (resolved values), not ops.py hook BODIES, and my assertion scan matched
`assert ` not `raise AssertionError`. So a hook that flipped best-effort→fatal was invisible to my
M1 method. M2 (real-CI sweep) caught it — the safety net working as designed. I then audited ALL
hook ports cold (`git diff c2508c7..origin/main` per recipe ops.py + the 2 setup_custom_tests.sh
ports), filtering for non-mechanical error-handling (raise/assert/except/exit/timeout/poll changes):
- **lasuite-drive `pre_install`** — GENUINE rcust regression (Builder-disclosed, I confirmed):
OLD setup_custom_tests.sh bucket poll fell through on 90s timeout (best-effort, no failure; the
custom-tier `test_minio_storage.py` upload→list→download is the real gate); NEW port added a
terminal `raise AssertionError` → deterministic install RED when the bucket appears just after
90s. Fix-forward APPROVED (restore best-effort print+return, scoped to line-54 only; conditioned
on an L5 re-run + my diff re-verify). See approval entry in BUILDER-INBOX history (commit 57c66ad).
- **lasuite-docs `install_steps.sh`** — INTENTIONAL P2b change, NOT a defect: OLD setup_custom_tests
did `exit 1` on missing deps/null KC creds; NEW does `exit 0` (no-op) for missing-deps (gated now
by F2-11: the `@requires_deps` OIDC test skips → `sso_dep_unverified` suppresses green) BUT
preserves `exit 1` on secret-insert failure. Consistent with the install-time-deps redesign.
WATCH-ITEM (residual): the missing-deps path now relies entirely on F2-11; the sweep didn't
exercise it (deps were ready, skip-count 0). Mechanism verified present at M1; not blocking.
- **All other ops.py** (cryptpad, discourse, ghost, immich, keycloak, lasuite-meet, matrix-synapse,
mattermost-lts, mumble, n8n, plausible, custom-html) — pure mechanical ctx migration
(`domain`→`ctx.domain`, `meta`→`ctx.meta`); expected tuples/strings byte-identical (spot-checked
keycloak 201/409 + 204/200, discourse/ghost _psql ci_marker). No error-handling drift.
Net: exactly ONE accidental hook-port regression (lasuite-drive), now under approved fix. No other
best-effort↔fatal flips. This audit closes the M1-method gap for the hook bodies.
---
### M2 proof-run independent analysis (cold, Adversary) @2026-06-10T23:53Z
M2 is NOT yet claimed by the Builder; this is my independent read of the proof runs sitting on
cc-ci (`/var/lib/cc-ci-runs/{m2b-*,ab-*-oldmain}`), parsed myself via jq (NOT trusting Builder
narrative). The 6 first-sweep mismatches break down as follows.
**Confirmed root fact — REF MISMATCH is real (I verified, not taken on faith).** Every baseline
matrix run used a *PR-head* ref; the first M2.3 sweep used each mirror's *default-branch head* — a
different commit. Independently confirmed via `results.json.ref`:
| recipe | baseline run/ref/level | sweep ref/level |
|---|---|---|
| discourse | 184 / 7ae7b0f76efb / L4 | 7d53d4ec390f / L2 |
| plausible | 308 / 13458fac56a1 / L4 | da159375d89a / L2 |
| mattermost-lts | 196 / a333e31a6002 / L4 | 41c9eb8e5f34 / L2 |
| immich | 307 / 107d7220adce / L4 | 7eb3937a82d0 / L2 |
| lasuite-drive | 189 / ffa7d585afa2 / L5 | f4135d78201e / L0 |
So the sweep was NOT apples-to-apples vs the baseline matrix. Reconciliation requires either
(a) re-run at the baseline ref on new main == baseline level, or (b) A/B same-ref old-vs-new main
== same level. Status per recipe:
- **immich** — m2b-immich (new main, baseline ref 107d7220adce) = **L4 == baseline L4. CLEAN.**
- **mattermost-lts** — m2b (new main, a333e31a6002) = **L4 == baseline L4. CLEAN.**
- **plausible** — m2b (new main, 13458fac56a1) = **L4 == baseline L4. CLEAN.**
→ these three: restructure proven INNOCENT (baseline ref reproduces baseline level on merged main).
- **bluesky-pds** — ab-bluesky-pds-oldmain (OLD main, b2d86efba3f1) = L0 == new-main sweep L0 at
same ref → restructure-NEUTRAL at the sweep ref. (Baseline is "L4-equiv, pre-results-era", no run
id — softer baseline; A/B neutrality is the available evidence.)
- **discourse — NOT yet clean. OPEN.** Two *distinct* flake modes seen, and the A/B was run at the
wrong ref to close the gap:
- baseline 184 (OLD main, 7ae7b0f): all pass → L4.
- m2b-discourse (NEW main, SAME ref 7ae7b0f): **upgrade FAILED**, HC1 guard fired —
"upgrade deployed chaos commit 'eb96de94+U', not intended PR-head '7ae7b0f76efb' — re-checkout
to code-under-test failed (HC1)" → L1. ← same-ref old=L4 vs new=L1 discrepancy, UNexplained.
- ab-discourse-oldmain (OLD main, 7d53d4ec): **restore FAILED** (ci_marker truncated-dump race)
→ L2 == new-main sweep L2 at that ref → neutrality proven, but for the RESTORE mode at the
DEFAULT-head ref, NOT for the L1/upgrade-HC1 mode at the baseline ref.
- Net: the clean A/B (ref 7ae7b0f on OLD main vs NEW main) that would explain L4→L1 was NOT run.
The upgrade re-checkout/HC1 path lives in run_recipe_ci.py/lifecycle which the meta-param
threading DID touch — so "pre-existing flake" is plausible but UNPROVEN here. To clear: run
discourse @7ae7b0f on OLD main (does it deterministically reproduce L4, or also flake to L1?),
and/or repeat @7ae7b0f on new main to characterise the HC1 re-checkout as a race. The HC1 guard
FIRING (not silently passing the wrong commit) is the safety net working — good — but it means
the upgrade did not exercise the PR code, so the run is inconclusive, not a clean baseline match.
- **lasuite-drive** — fix-forward 1357544 (restore best-effort bucket poll) landed; needs a fresh
L5 run at the baseline ref ffa7d585afa2 on merged main to confirm baseline. m2rr/earlier runs
predate or used the default head — NOT yet a clean baseline match. OPEN.
**M2 disposition: still OPEN — no PASS.** 3/6 cleanly reconciled (immich/mattermost/plausible);
bluesky neutral-at-sweep-ref; discourse + lasuite-drive NOT yet closed. I will require, at the M2
claim: (1) discourse same-ref A/B (or repeat) explaining L4→L1; (2) a clean lasuite-drive L5 at
baseline ref; (3) my own cold re-parse of every per-recipe level vs baseline; (4) the M2.4
customization-executed spot-greps; (5) zero leaked apps. Recorded a BUILDER-INBOX heads-up on the
discourse-HC1 gap so it is addressed in the claim, not glossed as "the restore flake".
### M2 proof-run progress + self-correction @2026-06-11T00:05Z
Builder is running (independently, matching my inbox ask) the decisive A/B serially on the box:
`m2-proof.sh` → lasuite-drive @ffa7d585afa2 PR=1 (post-fix-forward 1357544) on merged main 5c0676b,
then discourse @7ae7b0f76efb **PR=2** on merged main (m2p-discourse); `m2-proof2.sh` (queued) →
discourse @7ae7b0f76efb **PR=2** on OLD main (/root/m2-oldmain, ab-discourse-7ae7b0f-oldmain).
**Self-correction to my 23:53Z discourse analysis:** my m2b-discourse run used **PR=0**, but the
upgrade HC1 guard resolves the *PR head* for the re-checkout. The L1 failure message ("deployed
chaos commit 'eb96de94+U', not PR-head 7ae7b0f — re-checkout failed") is plausibly a **PR=0
artifact** (no real PR to resolve the head from), NOT a restructure regression. The Builder's proof
runs correctly use PR=2 (matching baseline run 184's pr=2). So the apples-to-apples comparison I
need is m2p-discourse (PR=2, new main) vs ab-discourse-7ae7b0f-oldmain (PR=2, old main) vs baseline
184 (PR=2, old main, L4). I will cold-verify those three when they land; my L4→L1 concern is on
hold pending the PR=2 result, not yet a confirmed regression. Live lasu-f68b63 stack = active
lasuite-drive proof run (expected, not a leak).
### M2 fix-forward APPROVE: be2026a (services_converged completed-one-shot rule) @2026-06-11T00:31Z
Builder proposed a 2nd lasuite-drive P2b fix on branch `fix/converged-oneshot @ be2026a` and asked
approval before merging to main (M2 "trivial fix-forward w/ Adversary approval" path). Cold-verified
independently (fresh clone of be2026a at /root/adv-be2026a on cc-ci, NOT the Builder's working tree):
- **Diff** (`git diff origin/main..be2026a runner/harness/lifecycle.py`, read myself): in
`services_converged`, a `cur != want` deficit now passes ONLY if `docker service ps <svc>` shows
ALL task states == `Complete`. Conservative: any Running/Preparing/Pending (spinning up) or
Failed/Rejected (broken) in the deficit still returns False; no-tasks-yet still False; plain N/N
and 0/0 unchanged. Targeted addition, not a rewrite.
- **False-green analysis (my own):** only `restart_policy:none` one-shots ever show `Complete`; a
normal crashed service shows Failed/Running(restarting), never Complete. Even if converge passed
on a completed-but-ineffective one-shot, two INDEPENDENT gates still catch it — the generic
`test_serving` HTTP floor and the custom-tier functional test (lasuite-drive
`test_minio_storage.py` upload→list→download is the real bucket gate). Defense-in-depth holds; I
could not construct a false-green path.
- **Tests** `tests/unit/test_converged_oneshot.py` (read + cold-ran): 7 cases pin exactly the
non-vacuity criteria — completed→converged, Failed→NOT, mixed Complete+Failed→NOT (covers the
`docker service ps` history concern), Preparing→NOT, no-tasks→NOT, N/N→converged, 0/0→converged.
- **Cold suite+lint from fresh be2026a checkout:** `cc-ci-run -m pytest tests/unit -q` → **199
passed**; the 7 new tests pass alone; `nix develop .#lint --command scripts/lint.sh` → **lint:
PASS**. Matches Builder's claim.
- **Root cause judged genuine P2b regression** (hook moved into ops.py pre_install runs BEFORE the
install assert; the completed one-shot's 0/1 then burns DEPLOY_TIMEOUT in the converge poll). The
fix accepts a genuinely-healthy deploy (HTTP 200, all other services 1/1) the old `cur!=want`
wrongly rejected — correction, not masking.
- **Not on main** — confirmed `all(s == "Complete")` absent from origin/main; Builder held the gate.
- **Disclosed semantic delta** (a failing one-shot now blocks install convergence earlier vs later
at custom-tier): ACCEPTED — both paths RED, no false-green, no enrolled recipe has a
baseline-failing one-shot.
**VERDICT: fix-forward be2026a APPROVED, conditional on:**
1. Post-merge lasuite-drive proof re-run @ffa7d585afa2 PR=1 lands **L5** (binding end-to-end proof
the fix resolves the converge hang — if it doesn't, the diagnosis was wrong and approval voids).
2. I re-verify the MERGED diff == be2026a diff (no extra change sneaks in at merge).
3. discourse PR=2 A/B pair (m2p-discourse / ab-discourse-7ae7b0f-oldmain — no one-shots, unaffected
by this fix) completes and I cold-verify those levels too.
This APPROVE does NOT clear M2; M2 still needs all per-recipe levels reconciled + my independent
sample re-check + zero-leak teardown.
### be2026a merge cold-verify — condition #2 SATISFIED @2026-06-11T00:42Z
Builder merged be2026a as 6cabbe7 (build 350 green, origin/main now b4505ac). Independently checked:
`diff origin/main:runner/harness/lifecycle.py be2026a:...` → **IDENTICAL**; the merged
`tests/unit/test_converged_oneshot.py` → **IDENTICAL** to be2026a. Clean merge, no extra change
slipped in — approval condition #2 met. m2p-lasuite-drive (pre-fix) landed L0 (install/converge
timeout) = the diagnosed symptom (Builder disclosed b4505ac it SIGINT-shortcut the doomed burn;
binding proof is the post-fix m2p2 re-run). REMAINING be2026a conditions: #1 post-fix lasuite-drive
L5, #3 discourse PR=2 A/B cold-check — both pending (m2p-discourse running, then ab-oldmain, then
m2p2-lasuite-drive).
### be2026a conditions CLEARED + SSO-baseline staleness finding (independent) @2026-06-11T01:12Z
Reached the conclusions below COLD (own git archaeology + run-dir jq) BEFORE reading the Builder's
01:10Z inbox — which then concurred. Anti-anchoring preserved (no JOURNAL read; inbox read after my
own derivation).
**be2026a fix-forward — ALL 3 CONDITIONS SATISFIED → fix-forward FULLY CLEARED:**
1. **Post-fix lasuite-drive (m2p2, merged main 6cabbe7, ffa7d585afa2, PR=1): L4, rc=0, 3m19s.**
Independently verified: flags clean_teardown=true + no_secret_leak=true; all 4 essential rungs
pass; `test_minio_storage::...object_roundtrip` PASSED; `test_oidc_..._keycloak` PASSED. The
install converge no longer hangs — both fix-forwards (1357544 best-effort poll + 6cabbe7
completed-one-shot converge) exercised in one run. The literal "L5" in my condition is
**unmeetable on current code and NOT an rcust effect** — see staleness finding below; I accept
the L4-equivalence. Fix works end-to-end.
2. **Merged diff == branch diff** — verified earlier (4428e76): lifecycle.py + test file
byte-identical to be2026a.
3. **discourse A/B — restructure-NEUTRAL.** m2p-discourse (NEW main, 7ae7b0f, PR=2) = L1 and
ab-discourse-7ae7b0f-oldmain (OLD main, SAME ref, SAME PR=2) = L1, SAME stage (upgrade), SAME
message (`eb96de94+U` HC1 re-checkout). old==new byte-identical → rcust did NOT regress discourse.
The L4(184)→L1 vs baseline is pre-existing env drift since 06-05 (filed below), not rcust.
**FINDING [adversary] — M2 baseline matrix has 3 STALE L5 entries (lasuite-docs/drive/meet).**
Independently established: the level ladder dropped 6-rung(L5)→4-rung(max L4, integration &
recipe-local now OPTIONAL/non-laddered) in mainline PR#6 (c51cd84 "4-rung ladder", + 46e2cdb),
which `git merge-base --is-ancestor c51cd84 01e6d49^` confirms is an ANCESTOR OF PRE-RCUST MAIN.
The rcust merge touches level.py NOT AT ALL and results.py by +4 cosmetic P5 lines; compute_level
+ derive_rungs are byte-identical old-main↔merged-main. So NO current-code run (rcust or pre-rcust)
can produce L5; baselines 188/189/204 (L5, integration:pass) were recorded under the OLD schema
(run 204 ran 06-09 hours before the refactor deployed). **rcust is INNOCENT of L4≠L5.** Integration
coverage is NOT lost: the requires_deps OIDC tests EXECUTE and PASS (skip-count 0) on current code —
verified in m2p2 AND the sweep's m2r-lasuite-docs (`test_oidc_login_via_keycloak` +
`test_oidc_password_grant_...` PASSED) and m2r-lasuite-meet (`...password_grant...` PASSED).
ACCEPTED equivalence for the M2 matrix: **old L5 ≡ new L4 (all 4 essential rungs pass) + requires_deps
OIDC test PASSED (skip-count 0)**. Under this, lasuite-docs (m2r L4) / lasuite-meet (m2r L4) /
lasuite-drive (m2p2 L4) all MATCH. (Note: this validates — but corrects the basis of — the Builder's
first-sweep "lasuite-docs/meet matched baseline"; they are L4+OIDC, not numeric L5.) This is a
matrix-staleness correction, NOT a rcust regression; no VETO.
**Still OPEN for the M2 verdict (my side):** (a) per-recipe levels reconciled vs the CORRECTED
baseline for all 21; (b) bluesky-pds is L0 on BOTH old & new main (upstream image
`Cannot find module index.js`) — restructure-neutral but also cannot match its L4-equiv baseline on
ANY current run → needs a DECISIONS/DEFERRED note as non-rcust upstream breakage, not a silent
mismatch; (c) the 2 drone-path !testme runs (immich#2/plausible#3); (d) zero-leak teardown sweep;
(e) my own independent re-check of ≥5 recipes' logs + ALL mismatches before any M2 PASS.
---
## M2 — merged-main real-CI regression sweep: **PASS** @2026-06-11T01:15Z
Cold-verified the M2 claim (STATUS gate "M2 CLAIMED ~01:30Z") from my own clone + direct on cc-ci,
re-running/ re-parsing rather than trusting Builder logs. Every M2.0M2.4 item holds.
**M2.2 canaries — cold RE-RAN myself** from a fresh `origin/main` checkout (/root/adv-be2026a @
origin/main): `cc-ci-run -m pytest tests/regression/ -m canary -v` → **7/7 passed (301s)**, incl.
`bad-false-green` (the false-green detector) + all four RED canaries (bad-install/upgrade/backup/
restore) caught at their designed tier. The level system is NOT inflating. (log /root/adv-canary.log)
**M2.3 per-recipe — all 21 reconciled (cold jq on each run dir):**
- 13 clean: cryptpad/custom-html/ghost/hedgedoc/keycloak/matrix-synapse/n8n/uptime-kuma = L4;
mailu/custom-html-tiny = L2 (backup_restore N/A); mumble = L4 (deploy-count=1) — all == baseline,
clean_teardown=true.
- 2 designed-bad canaries genuinely exercised: bkp-bad rungs backup_restore=**fail** (backup=fail);
rst-bad backup_restore=**fail** (backup=pass→restore=fail). The L1 cap is upgrade-N/A ladder
semantics; the designed failure is recorded in the rung (verified — NOT a coincidental
level-match).
- immich/mattermost-lts/plausible: **L4 @ exact baseline refs** (m2b-*) — baseline REPRODUCED on the
restructured harness (cold-verified earlier this session).
- discourse: m2p-discourse (NEW main) == ab-discourse-7ae7b0f-oldmain (OLD main) — SAME ref/PR=2,
SAME stage, SAME upgrade-HC1 message (`eb96de94+U`), SAME L1. **old==new ⇒ rcust-neutral**; the
L4(184)→L1 is pre-existing env drift since 06-05 (DEFERRED.md), NOT caused by the restructure.
- lasuite-docs/-meet/-drive: L4 all-rungs-pass + requires_deps OIDC test PASSED (skip-count 0)
[lasuite-drive m2p2 also MinIO PASSED, post-both-fixes, rc=0]. Their "L5" baselines are STALE:
the 6→4-rung ladder landed in mainline c51cd84 (PR#6), which `git merge-base --is-ancestor
c51cd84 01e6d49^` confirms PREDATES the rcust merge; level.py untouched by the merge, derive_rungs
byte-identical old↔new. **rcust-innocent; integration coverage preserved** (OIDC tests execute &
pass). Accepted equivalence old L5 ≡ new L4-all-pass + OIDC-pass.
- bluesky-pds: EXCLUDED — `Cannot find module /app/index.js` crash-loop on BOTH old & new main at
every ref → upstream image breakage, rcust-neutral. DEFERRED.md note present.
**M2.3 drone→harness path:** drone builds **356 (immich) + 357 (plausible)** = `build_event=custom`
(bridge-triggered; distinct from push builds 358-361), trigger=autonomic-bot, both **success**
(verified in drone sqlite DB); run dirs 356/357 = immich L4 pr=2 / plausible L4 pr=3, customization
manifest present, clean_teardown=true.
**M2.4 customizations actually executed (cold-grep):** manifest block **21/21** logs; mumble
`ready-probe OK (tcp 3x) 127.0.0.1:64738`; ghost `ccci-overlay: provided compose.ccci.yml ...
base deploy auto-chaos` (P2a first-class path live); cryptpad `EXTRA_ENV='<hook>'`; immich
`ops.py[pre_backup,pre_restore,pre_upgrade]` + `pre-op seed` lines (migrated ctx hooks run).
**Teardown:** `docker stack ls` = infra (backups/bridge/dashboard/reports/drone/traefik) +
warm-keycloak ONLY, **zero leaked app stacks** (checked after ALL runs incl. drone-path).
**Fix-forwards (both Adversary-approved, additive):** 1357544 (lasuite-drive best-effort poll, appr
57c66ad) + be2026a/6cabbe7 (services_converged completed-one-shot, appr a531746) — merged diff ==
branch diff, all 3 be2026a conditions cleared (24a203a). Cold unit suite on post-fix main = 199
passed, lint PASS.
**VERDICT: M2 PASS.** No regression CAUSED BY the restructure: every deviation from the baseline
matrix is proven rcust-neutral by same-ref old-vs-new A/B (discourse, bluesky) or is a pre-rcust
stale-schema artifact with coverage preserved (3 lasuite), all documented in DEFERRED.md — not a
silent mismatch. The false-green detector is green on my own cold canary run. No findings filed,
no VETO.
**M1 PASS (01f9f70) + M2 PASS (this entry) both stand** → the phase DoD handshake is satisfied; the
Builder may write `## DONE` to STATUS-rcust.md. (M1's unit+lint acceptance still holds on post-fix
main: 199 passed / lint PASS, the fix-forwards being additive + separately approved.)

184
REVIEW-shot.md Normal file
View File

@ -0,0 +1,184 @@
# REVIEW-shot.md — Adversary verdicts, phase `shot` (recipe screenshot audit & repair)
Owner: Adversary loop. Append-only verdict log. Gates: M1 (audit+diagnosis), M2 (all working).
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md`.
No gate CLAIMED yet (phase just opened; Builder has not bootstrapped STATUS-shot.md). Doing
independent cold ground-truth prep below so M1/M2 cold-verify is fast and un-anchored.
---
## Independent cold pre-audit (Adversary, @2026-06-11T01:20Z)
Method: ssh cc-ci, scanned `/var/lib/cc-ci-runs/*/results.json` for recipe + `screenshot` field +
on-disk `screenshot.png` size; scp'd suspect PNGs locally and **looked at them** (Read tool).
This is MY ground truth, formed before any Builder claim — to compare against the Builder's matrix.
PNG sizes from latest representative runs (m2r-* sweep + numbered drone runs):
| recipe | PNG bytes | my visual read | class |
|---|---|---|---|
| immich | 4801 | pure blank white frame | **BLANK** |
| n8n | 4801 | blank near-white frame | **BLANK** |
| lasuite-meet | 4801 | (size-identical to immich/n8n 4801B — blank tell) | BLANK (to confirm visually) |
| cryptpad | 4802 | blank light-grey frame | **BLANK** |
| keycloak | 8764 | spinner + "Loading the Administration Console" — paint-race loading state, NOT a real login form | **BLANK/LOADING** (not the "genuine sparse login" §2 guessed) |
| lasuite-docs | 6022 | bare spinner on white | **BLANK/LOADING** |
| lasuite-drive | ~5.9K | (size sibling of lasuite-docs — likely same spinner) | BLANK (to confirm) |
| plausible | null / NO PNG | every run null (122→357 incl. 357); run dir has no screenshot.png; capture stdout not in run dir (goes to Drone build log) — root cause still to trace | **NULL** |
| ghost | 444183 | (reference healthy, §2) | OK (visual-confirm at M2) |
| mattermost-lts | 242139 | reference healthy | OK |
| hedgedoc | 131967 | reference healthy | OK |
| discourse | 66-67K | reference healthy | OK |
| custom-html | 35707 | reference healthy | OK |
| mailu | 33800 | reference healthy | OK |
| matrix-synapse | 33296 | reference healthy | OK |
| uptime-kuma | 30858 | reference healthy | OK |
| custom-html-tiny | 12950 | reference healthy | OK |
| mumble | 7913 | voice server — web-UI N/A candidate (confirm) | N/A? |
Confirmed defect classes match the orchestrator pre-audit (§2): SPA paint-race (domcontentloaded
fires before JS paints) → immich/n8n/cryptpad fully blank, keycloak/lasuite-docs/-drive caught at
loading spinner; plausible never captures (null on every run). **The 4801B byte-identical size is a
reliable blank-frame fingerprint.**
Open items I must still resolve when verifying:
- plausible NULL root cause — need the Drone build log for a plausible run (capture stdout: "capture
failed" vs "produced no file" vs step never reached). Run dir alone doesn't have it.
- lasuite-meet / lasuite-drive / mumble — visual confirm.
- Authoritative enrolled-recipe set: every `tests/<recipe>/recipe_meta.py` minus fixtures
(`_generic`, `regression`, `concurrency`, `custom-html-bkp-bad`, `custom-html-rst-bad`).
No verdict yet. Awaiting `claim(shot): M1`.
---
## M1: PASS @2026-06-11T01:38Z (audit + diagnosis complete)
Claim: `claim(shot): M1` commit e005897; matrix+diagnoses at 8978fa6. STATUS-shot.md "M1 claim".
Verified COLD from my own clone + ssh cc-ci, **without reading JOURNAL-shot.md** (anti-anchoring).
My independent pre-audit (commit 4f3a747, formed BEFORE reading the Builder's matrix) already
agreed on every BLANK/LOADING/NULL read I had pre-formed — no anchoring.
**Enrolled set — complete, no omissions.** `ls tests/*/recipe_meta.py` = 21. Minus the two harness
canaries `custom-html-bkp-bad`, `custom-html-rst-bad` (plan §2 explicitly excludes both) = **19**.
The 19 matrix rows are *exactly* that set (diffed by hand) and exactly the plan §2 expected set.
`_generic`/`regression`/`concurrency`/`unit` have no recipe_meta.py → correctly absent. ✓
**Every non-OK row has evidence-backed root cause (independently re-derived):**
- plausible NULL — ran the Builder's drone-log command myself: build 357 step log shows
`capture failed … page.goto(https://plau-…/) never returned a status in (200,301,302,303,401,403)
after 15 attempts (45s); last status=500`. `/` 500s by design (DISABLE_AUTH) → default landing
capture can never succeed; needs a SCREENSHOT hook to a rendering path. Confirmed. ✓
- bluesky-pds NULL — capture is `if deploy_ok:`-gated, OUTSIDE the deploy try/except
(runner/run_recipe_ci.py:1024, read it). install=fail level=0 → capture correctly skipped. Not a
screenshot defect; upstream image breakage already in DEFERRED.md (rcust). ✓
- BLANK/LOADING — screenshot.py:84-93 navigates `wait_until="domcontentloaded"` then screenshots
immediately, no paint wait; accept_statuses excludes 500 (plausible mechanism). Read the code. ✓
- mumble NOT N/A — tests/mumble/recipe_meta.py header: deploys `compose.mumbleweb.yml`, a mumble-web
HTTP client routed through Traefik, HEALTH_PATH "/". A real web surface IS served → correctly the
HARDER (non-N/A) call. ✓
**Independent visual spot-checks (Read tool) — 11 artifacts, matrix matched reality on every one:**
immich 4801B = pure white; n8n 4801B = blank; cryptpad 4802B = blank grey; lasuite-meet 4801B =
pure white; keycloak 8764B = "Loading the Administration Console" spinner (NOT a real login — the
§2 "might be a genuine login" guess was wrong, Builder classed it LOADING correctly); lasuite-docs
6022B = bare spinner; mumble 7913B = spinner ring on grey; mattermost-lts 242139B = blue brand
splash + logo, NO login form (correctly LOADING despite large size — size alone is NOT a sufficient
signal, good catch); n8n run 197 30256B = real "Set up owner account" form, empty fields,
credential-free (flaky-pass + secret-safe, confirmed); custom-html 35707B = genuine "Welcome to
nginx!" (honest fresh-install view for a bare static host — OK); plausible = NULL via drone log.
Includes plausible ✓ and multiple 4801B cases ✓ (M1 minimum was ≥5 incl. those — exceeded).
**N/A arguments — agreed:**
- bluesky-pds → justified N/A (deploy-gated: can't screenshot what can't deploy; upstream breakage
is pre-existing/DEFERRED, not a screenshot defect). Agreed, contingent on the upstream image still
being broken at M2 — if it becomes deployable, it re-enters as a real recipe.
- mumble → NOT N/A. Agreed (real mumble-web surface, evidence above).
No omissions, no fabricated visual reads, diagnoses are causal not symptomatic. **M1 PASS.**
Watch-list for M2 (so the Builder has it early — NOT blocking M1):
1. Harness default-wait fix must stay within NAV_DEADLINE_S=45 / step worst-case ≤~60s and must
NEVER affect a verdict on screenshot failure (R7) — I will test the failure path has teeth but
no verdict impact, and compare pre/post run durations.
2. plausible SCREENSHOT hook must land on a credential-free *rendering* path (not /login showing a
generated secret; not a 500 page).
3. mattermost-lts proof: a bigger PNG is NOT acceptance — I will visually confirm the real login,
not a brand splash.
4. Secret-safety: every final PNG must show no generated credentials (install wizards, secrets
pages). n8n's "Set up owner account" with EMPTY fields is the safe shape; a pre-filled one is not.
5. M2 requires ≥2 proof runs via the drone `!testme` path + me Reading *every* final PNG.
Did not read JOURNAL-shot.md before this verdict. No finding filed (audit is accurate). No VETO.
---
## M2: PASS @2026-06-11T07:17:53Z — all screenshots working (cold-verified from scratch)
Verified independently from a cold start (my own clone, my own scp/Read/re-runs; did NOT read
JOURNAL before this verdict). Claim commit 196156e. Every M2 DoD item checked:
**1. Every final PNG Read (18/18) — real, representative, credential-free.** Pulled each PNG by scp,
Read it with the image tool, byte-size matched the claim on all 18:
- Fixed-class (10): immich 234351B "Welcome to Immich" onboarding; plausible 64132B real
registration form (EMPTY fields); keycloak 215587B real "Sign in to your account" (EMPTY) — was
the 8764B "Loading Admin Console" spinner at M1, settle fix resolved it; cryptpad 57310B real
landing + doc-type picker; lasuite-meet 225686B real video-conf landing; lasuite-docs 284769B real
Docs landing; lasuite-drive 132037B real "Fichiers" landing; n8n 26433B "Set up owner account"
(ALL fields EMPTY — secret-safe, now deterministic); mattermost-lts 178367B **real "Log in to your
account" form (EMPTY) — NOT the byte-identical interstitial** (hook v2 click-through works — my
sharpest watch-item, resolved); mumble 7980B loader spinner (see §N/A).
- Healthy-class (8): ghost 444183B blog landing; hedgedoc 131967B landing; discourse 66121B forum +
welcome topic; custom-html 35707B "Welcome to nginx!" (honest fresh-install); custom-html-tiny
12950B seeded content; mailu 33800B sign-in (EMPTY); matrix-synapse 33296B "It works!"; uptime-kuma
30858B "Create your admin account" (EMPTY).
Every login/setup form has EMPTY fields — NO generated credential is shown anywhere. Secret-safety
cardinal guardrail holds across all 18.
**2. No verdict/level regression.** All 10 proof runs status=pass at their baseline level (immich
/plausible/keycloak/cryptpad/lasuite-*/n8n/mumble=4, mattermost-lts=2). screenshot field populated
on every one. no_secret_leak=true on every proof run I sampled (370/371/keycloak/n8n/mattermost
/mumble).
**3. ≥2 genuine drone `!testme` proofs — confirmed end-to-end, NOT manual.** ccci-bridge_app logs:
`[poll] triggered build 370 for immich@107d7220 (PR #2, comment 14321) by autonomic-bot` and
`...build 371 for plausible@13458fac (PR #3, comment 14322)...`, both `reflected outcome ...:
success`. The bridge polled Gitea, found real !testme comments, triggered the builds, reflected
verdicts back — the full comment→build path. Drone params {RECIPE,PR,REF,SRC}, event=custom,
trigger/sender=autonomic-bot — matches the Phase-1c bridge-!testme fingerprint (REVIEW-1c:110).
**4. Durations unaffected (no balloon).** Drone same-recipe pre/post: immich 199s→198s, plausible
209s→166s (faster — capture no longer burns 45s failing on the 500). Screenshot step wait budget =
60000ms exactly (unit test_wait_budget_within_step_cap + my own cold probe). ≤~60s holds.
**5. R7 (cosmetics never block) — intact.** Call site run_recipe_ci.py ~1024-1037 is OUTSIDE the
deploy try/except AND double-wrapped in its own try/except (`_scrub`-bed log) — and git log proves
NO shot-phase commit touched run_recipe_ci.py (call site unchanged). capture() swallows everything →
None → placeholder. I cold-probed the new helpers independently: _settle swallows all exceptions,
_snap keeps the larger frame (A1 fix, 5/5), 60s budget — 9/9+5/5 pass. Screenshot unit suite 12/12
+ card suite 10/10 ran GREEN cold on the real harness (cc-ci-run) from my scp'd clone.
**6. Dashboard/card/badge render — live 200.** GET dashboard / → 200; runs/370+371/screenshot.png →
200 image/png; badge/immich.svg + badge/plausible.svg → 200 image/svg+xml.
**7. N/A set (19/19 enrolled, no omissions) — AGREED.**
- bluesky-pds → N/A, re-confirmed at M2 (ab-bluesky-pds-oldmain: install=fail, level=0,
screenshot=null → placeholder correct; upstream MODULE_NOT_FOUND still broken, DEFERRED).
- mumble → N/A-variant, AGREED — **this reverses my M1 "NOT N/A" stance, on NEW evidence not
available at M1.** rankenstein/mumble-web:0.5 renders no usable UI for an anonymous browser:
connect-dialog DOM genuinely absent (probe4 console: `#connect-dialog_input_address ... did not
match any element`), perpetual loading-container spinner at 5/15/30/60/90s (probe2) — corroborated
by my own Read of the 7980B spinner PNG. The loader frame is the literal web-surface reality every
visitor gets; mumble's actual function (voice) is fully protocol-tested; fix needs a recipe/overlay
change (out of scope, guardrail prefers upstream). Documented in DEFERRED with an upstream
question. NOTE (not a defect, not a veto): the dashboard shows the honest loader frame rather than
the "no screenshot" placeholder — acceptable as a documented, agreed limitation, NOT a healthy-app
screenshot.
Finding A1 (blank-retry regression) was filed, fixed (7ad7d1f), and CLOSED after my cold re-test.
No open findings. No fabricated reads — every matrix/claim value matched what I independently
observed. **M2 PASS. No VETO.** With M1 PASS (ae10b55) + M2 PASS both fresh and A1 closed, the DoD
handshake (§6.1) is satisfied — the Builder may write `## DONE` to STATUS-shot.md.
(Consulted no JOURNAL-shot.md before forming this verdict.)

6
STATUS-lvl5.md Normal file
View File

@ -0,0 +1,6 @@
# STATUS — Phase lvl5 (L5 lint rung + de-cap)
Phase: lvl5 — OPEN (bootstrapped 2026-06-11)
Gate: none claimed yet
In flight: P1 — level.py new semantics + lint executor design (abra lint behavior probe on CI host first)
Blockers: none

View File

@ -1,22 +1,293 @@
# STATUS — sub-phase rcust (recipe-customization restructure)
## DONE
Phase complete 2026-06-11: M1 PASS (REVIEW-rcust.md 01f9f70, 2026-06-10) + M2 PASS (REVIEW-rcust.md
3245150, 2026-06-11) — both fresh, Adversary-verified, no standing VETO. Restructure merged to main
(01e6d49 + approved fix-forwards 1357544, 6cabbe7); all 21 recipes reconciled vs corrected
baseline; canaries 7/7 (Adversary's own cold run); drone path covered; zero leaked apps.
Non-rcust follow-ups filed in machine-docs/DEFERRED.md (discourse abra-stamp env drift,
bluesky-pds upstream image breakage re-pin).
Plan: /srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md (SSOT for this phase).
Reference spec: docs/recipe-customization.md @ 76a4b6b.
Work branch: `restructure/recipe-custom` (one commit per phase P1P6; merged to main only after M1 PASS).
## Phase progress
- [ ] P1 — harness/meta.py single loader + key registry + migrate L1L6 + unit tests + doc gen
- [ ] P2 — delete legacy keys/paths (CHAOS_BASE_DEPLOY, OIDC_AT_INSTALL, SKIP_GENERIC meta, conftest cleanup)
- [ ] P3uniform ctx hook convention
- [ ] P4 — custom-test ergonomics (placement rule, op_state/deps fixtures)
- [ ] P5 — customization manifest
- [ ] P6 — docs
- [x] P1 — single loader + key registry + migrate L1L6 + unit tests + doc gen
(branch commit 472a68b)
- [x] P2delete legacy keys/paths: compose.ccci.yml first-class+auto-chaos; install-time deps only
(lasuite-docs migrated, setup_custom_tests.sh gone); SKIP_GENERIC meta deleted (env dev-only +
loud CI warning); conftest cleanup (deployed/deployed_app/app_domain gone, one `deps` fixture)
(branch commit 8cd72fd)
- [x] P3 — uniform ctx hook convention: HookCtx(.domain/.base_url/.meta/.deps/.op); all hooks
take ctx; legacy signatures raise MetaError at load naming the migration (branch fd02d9f)
- [x] P4 — custom-test ergonomics: placement rule (custom under functional/+playwright/ only),
op_state fixture, deps fixture tests (branch 29a28e2)
- [x] P5 — customization manifest: one block at run start (non-default meta keys, hooks, overlays,
custom-test counts, active CCCI_SKIP_GENERIC* env overrides with !! CI flag) printed +
embedded verbatim in results.json under "customization"; pure presentation, HC2-honoring
(branch commit 68954be — new runner/harness/manifest.py + tests/unit/test_manifest.py)
- [x] P6 — docs rewritten to the end state: recipe-customization.md is now the REFERENCE (was
review spec) — §8 records R1R9 resolutions, §4 keeps the generated table + HookCtx, §5 the
end-state shapes; testing.md invariant updated to install-time-deps isolation, generic
opt-out documented dev-only; enroll-recipe.md worked examples (lasuite-docs install-time
OIDC, mumble post-F2-14c), deps fixture, ctx signatures (branch commit da558ca)
- [x] Adversary inbox 19:06Z (P5 manifest dashboard hygiene) — addressed: secret-NAMED meta
values (top-level + nested dict keys) render as '<redacted>' in manifest + results.json;
key names stay visible; unit-test pinned (branch commit 858e0f5)
## P1P6 verification facts (for the eventual M1 cold-verify)
- WHERE: branch `restructure/recipe-custom`, P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2,
P5=68954be, P6=da558ca, manifest-redaction fix=858e0f5 (branch head).
- HOW: `cc-ci-run -m pytest tests/unit -q` and `nix develop .#lint --command scripts/lint.sh`
from a clean checkout of the branch.
- EXPECTED: 192 passed; `lint: PASS`.
- New single loader: `runner/harness/meta.py::load()`; all-recipes typo gate + R2 proof in
`tests/unit/test_meta.py`; docs §4 table generated by `scripts/gen-meta-docs.py` (sync pinned
by unit test).
## M2 baseline matrix (built BEFORE merge, per plan M2.1)
Expected outcome per recipe dir for the post-merge regression sweep = most recent known-good
evidence. Levels are results.json `level`; evidence = run id under /var/lib/cc-ci-runs/<id>/
(on cc-ci) unless noted. Bad canaries are EXPECTED to fail at their designed tier.
| Recipe | Expected | Evidence |
|---|---|---|
| bluesky-pds | full lifecycle green: 5 tiers + 4 custom pass, deploy-count=1 (L4-equiv; pre-results-era) | Adversary cold run, REVIEW e45e0ee (Phase 2 Q4.3); weekly 06-05: up-to-date |
| cryptpad | L4 (all four essential rungs pass) | run 181 (06-05) |
| custom-html | L4 | run 182 (06-05) |
| custom-html-bkp-bad | DESIGNED-BAD: backup tier fail → backup_restore=fail, L1 | run regression-bad-restore-2 (06-02) |
| custom-html-rst-bad | DESIGNED-BAD: restore tier fail → backup_restore=fail, L1 | run regression-bad-restore-3 (06-02) |
| custom-html-tiny | L2 (backup_restore N/A — declared EXPECTED_NA; functional N/A) | run 205 (06-09) |
| discourse | L4 | run 184 (06-05) |
| ghost | L4 | run 185 (06-05) |
| hedgedoc | L4 | run 113 (06-02) |
| immich | L4 | run 307 (06-10) |
| keycloak | L4 | run 187 (06-05) |
| lasuite-docs | L5 (integration pass) | run 188 (06-05) |
| lasuite-drive | L5 (integration pass) | run 189 (06-05) |
| lasuite-meet | L5 (integration pass) | run 204 (06-09) |
| mailu | L2 (backup_restore N/A — no backupbot labels; functional pass) | run 191 (06-05) |
| matrix-synapse | L4 | run 203 (06-08) |
| mattermost-lts | L4 | run 196 (06-05) |
| mumble | all 5 tiers pass, deploy-count=1 (L4-equiv; pre-results-era) | log ~/ccci-mumble-f214c.log on cc-ci (05-31) |
| n8n | L4 | run 197 (06-05) |
| plausible | L4 | run 308 (06-10) |
| uptime-kuma | L4 | run 165 (06-02) |
Customization-executed spot-greps for M2.4 (mumble READY_PROBE tcp lines, cryptpad
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + chaos base, lasuite-* deps
provisioning + OIDC skip-count 0, immich ops.py seeds, manifest block in every log) apply on the
sweep runs, not retroactively here.
## Gate
(none claimed yet — phase bootstrap)
**Gate: M2 CLAIMED 2026-06-11 ~01:30Z, awaiting Adversary.**
### M2 claim — WHAT / HOW / EXPECTED / WHERE
WHAT: plan M2.0M2.4 complete on merged main. Merge 01e6d49 (build 326 green) + two
Adversary-approved fix-forwards: 1357544 (lasuite-drive best-effort bucket poll, approval 57c66ad)
and 6cabbe7 = merge of be2026a (services_converged completed-one-shot rule, approval a531746,
build 350 green on 914c166, merged-diff==branch-diff verified 4428e76). Canaries 7/7. All 21
recipe dirs reconciled vs the CORRECTED baseline (the Adversary-accepted L5≡L4+OIDC equivalence
for the three stale lasuite-* rows; one justified exclusion: bluesky-pds, non-rcust upstream image
breakage, DEFERRED.md). Drone→harness path covered (2 PR !testme runs green). Zero leaked apps.
RECONCILIATION (final evidence per recipe; run dirs under /var/lib/cc-ci-runs/):
| Recipe | Baseline | Final evidence | Match |
|---|---|---|---|
| bluesky-pds | full green (pre-results-era) | m2r L0 == m2rr L0 == ab-oldmain L0, all `Cannot find module /app/index.js` crash-loop | EXCLUDED: upstream image breakage, harness-neutral (DEFERRED.md) |
| cryptpad | L4 | m2r-cryptpad L4 | ✓ |
| custom-html | L4 | m2r-custom-html L4 | ✓ |
| custom-html-bkp-bad | designed backup fail, L1 | m2r: backup fail exactly | ✓ |
| custom-html-rst-bad | designed restore fail, L1 | m2r: backup pass → restore fail exactly | ✓ |
| custom-html-tiny | L2 (declared EXPECTED_NA) | m2r-custom-html-tiny L2 | ✓ |
| discourse | L4 (184, 06-05) | m2r/m2b/m2p + ab-oldmain×2: ALL deviations byte-identical old==new harness (restore race @default head: L2==L2; upgrade-HC1 @baseline ref PR=2: L1==L1, stamp eb96de94+U both) | env drift since 06-05, rcust-neutral (Adversary-verified, condition 3 of a531746) |
| ghost | L4 | m2r-ghost L4 | ✓ |
| hedgedoc | L4 | m2r-hedgedoc L4 | ✓ |
| immich | L4 | m2b-immich L4 @baseline ref + drone-path run 356 L4 | ✓ |
| keycloak | L4 | m2r-keycloak L4 | ✓ |
| lasuite-docs | L5 (stale schema) | m2r-lasuite-docs L4 all-pass + OIDC PASSED skip-0 | ✓ (accepted equivalence) |
| lasuite-drive | L5 (stale schema) | m2p2-lasuite-drive L4 all-pass + OIDC + MinIO PASSED, rc=0, post-both-fixes | ✓ (accepted equivalence) |
| lasuite-meet | L5 (stale schema) | m2r-lasuite-meet L4 all-pass + OIDC PASSED | ✓ (accepted equivalence) |
| mailu | L2 | m2r-mailu L2 | ✓ |
| matrix-synapse | L4 | m2r-matrix-synapse L4 | ✓ |
| mattermost-lts | L4 | m2b-mattermost-lts L4 @baseline ref | ✓ |
| mumble | all 5 tiers (pre-results-era) | m2r-mumble all tiers pass, deploy-count=1 | ✓ |
| n8n | L4 | m2r-n8n L4 | ✓ |
| plausible | L4 | m2b-plausible L4 @baseline ref + drone-path run 357 L4 | ✓ |
| uptime-kuma | L4 | m2r-uptime-kuma L4 | ✓ |
HOW (cold, from the Adversary's own clone / direct on cc-ci):
- per-recipe: `jq '{recipe,level,rungs,flags}' /var/lib/cc-ci-runs/<id>/results.json` for every id
above; logs in /root/m2-logs/, /root/m2-baseline-logs/, /root/m2-proof-logs/, /root/m2-ab-logs/.
- canaries: /root/m2-canary.log (7/7, fresh clone of merged main).
- drone path: builds 356 (immich#2) + 357 (plausible#3) `custom` events SUCCESS in drone DB
(`docker cp <drone_cid>:/data/database.sqlite` + sqlite query, as documented above); run dirs
356/357 carry `customization` manifest keys + clean flags; triggered by real `!testme` comments
(gitea comment ids 14317/14318).
- M2.4 spot-greps: section above (manifest 21/21, mumble tcp probe, ghost/discourse overlay+
BACKUP_VERIFY, lasuite deps+OIDC, immich seeds, cryptpad EXTRA_ENV hook+playwright).
- zero-leak: `docker stack ls` on cc-ci → infra (backups/bridge/dashboard/reports/drone/traefik)
+ warm-keycloak ONLY (checked 01:27Z, after ALL runs incl. drone-path).
- tree: origin/main, working tree clean, every claim-referenced commit pushed.
EXPECTED: every check above reproduces as stated; no recipe regresses vs the corrected baseline.
WHERE: origin/main @ (this commit); REVIEW-rcust.md holds M1 PASS (01f9f70), be2026a approval +
all-conditions-cleared (a531746, 24a203a); DEFERRED.md holds the two non-rcust follow-ups
(discourse abra-stamp mechanism, bluesky-pds upstream re-pin).
**Gate history: M2 IN PROGRESS** — M1 PASS in REVIEW-rcust.md (01f9f70, 2026-06-10).
- M2.0 merge: `restructure/recipe-custom` merged to main as 01e6d49 (merge commit, no force);
push build green: drone build **326 success** on 01e6d49 (API-verified).
- M2.2 canary suite: **7/7 PASSED** in 286s (fresh clone of merged main at /root/m2-sweep on
cc-ci, log /root/m2-canary.log) — green canaries pass, all four RED canaries still caught at
their designed tiers (bad-install/bad-upgrade/bad-backup/bad-restore).
- M2.3 per-recipe sweep (driver /root/m2-driver.sh, 2 concurrent, REF = mirror heads; logs
/root/m2-logs/<r>.log; results /var/lib/cc-ci-runs/m2r-<r>/): first pass **15/21 matched
baseline** —
hedgedoc/custom-html/custom-html-tiny/uptime-kuma/n8n/cryptpad/ghost/keycloak/mumble/mailu/
matrix-synapse/lasuite-docs/lasuite-meet at baseline level; both DESIGNED-BAD canaries failed
at exactly their designed tier (bkp-bad: backup fail; rst-bad: backup pass→restore fail).
6 below baseline, ALL flake-shaped (known modes, not new assertion semantics):
discourse+plausible+mattermost-lts+immich restore data-integrity (the documented pre-existing
truncated-dump capture race — discourse BACKUP_VERIFY honestly failed 3/3 attempts, its
docstring + the 06-05 weekly report record this exact mode pre-restructure; seeds verified
committed by ops.py read-back asserts, i.e. the migrated ctx hooks executed correctly);
bluesky-pds abra `FATA deploy timed out` at default 600s during concurrent image pulls;
lasuite-drive pre_install MinIO one-shot 90s timeout (bucket appeared later — every
subsequent tier passed). Serial re-runs (MAX=1, /root/m2-rerun.sh, logs /root/m2-rerun-logs/,
results m2rr-<r>/) completed 20:44Z — but ran default heads, not baseline refs (superseded by
the targeted runs below).
- M2.3 reconciliation runs (serial, MAX=1):
- **Baseline-ref re-runs on merged main** (/root/m2-baseline-runs.sh, logs /root/m2-baseline-logs/,
results m2b-<r>/): **plausible L4, mattermost-lts L4, immich L4** at their exact baseline refs —
baseline REPRODUCED on the restructured harness; restore-race cluster closed for those three.
m2b-discourse @7ae7b0f (ran PR=0; baseline run 184 was PR=2): **L1, NEW mode** — upgrade HC1
`deployed chaos commit 'eb96de94+U', not PR-head '7ae7b0f76efb'`. Investigated facts (cold-checkable
in /var/lib/cc-ci-runs/m2b-discourse/): `eb96de94` IS the prev-base tag commit `0.7.0+3.3.1`
(`git -C .../abra/recipes/discourse rev-list -n1 0.7.0+3.3.1`); the preserved per-run clone HEAD =
7ae7b0f (the upgrade re-checkout DID run and persist); the
`service "sidekiq" depends on undefined service "discourse"` log line is benign noise (appears
verbatim in the PASSING m2r/m2rr upgrade sections too; published compose ships a dangling
depends_on — see tests/discourse/compose.ccci.yml NOTE). So the chaos redeploy itself left the
base stamp in place at this ref. NOT folded into the restore-flake cluster; discriminating runs
queued (below).
- **Old-main A/B at the m2r ref** (/root/m2-ab.sh, /root/m2-ab-logs/, results ab-<r>-oldmain/):
discourse @7d53d4ec on OLD main = **L2 restore fail** == new-main m2r L2 at the same ref →
restore race harness-neutral at that ref. bluesky-pds @b2d86ef on OLD main = **L0 install fail**.
- **bluesky-pds re-characterized (not a pull timeout)**: the app container crash-loops
`Error: Cannot find module '/app/index.js'` (MODULE_NOT_FOUND, Node v24.15.0) in ALL THREE
failures — m2r (new main @ mirror head), m2rr (new main, serial), ab-oldmain (OLD main @ old
default head b2d86ef). Same pinned tag, both harnesses, both refs → upstream image content moved
under the tag; recipe cannot deploy on ANY harness. Evidence:
`grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/default/`.
Restructure-neutral (old==new L0).
- M2.3 in-flight proof runs (serial queue /root/m2-proof.sh + /root/m2-proof2.sh, logs
/root/m2-proof-logs/, driver /root/m2-proof-logs/driver.log):
1. **lasuite-drive @baseline ref ffa7d585afa2 PR=1 on merged main @5c0676b** (post-fix-forward
1357544) → run id m2p-lasuite-drive: **WILL LAND L0 — second P2b regression found via this
run, root-caused LIVE.** The 1357544 best-effort path WORKED (`!!` warn + continue in the
log); the one-shot task went **Complete** ~3min in (bucket created); but a completed
restart_policy-none one-shot reports replicas 0/1 FOREVER, and services_converged requires
cur==want → the install assert burned DEPLOY_TIMEOUT (1800s) and failed. Old world never saw
this: setup_custom_tests.sh ran POST-install-assert (its own header: orchestrator runs it
after the deploy is healthy); P2b moved the trigger to ops.py pre_install = PRE-assert.
Verified live during the run: app HTTP 200, all other services 1/1,
`docker service ps ..._minio-createbuckets` = Complete, pytest in converge loop 27+ min.
**Fix-forward proposed, awaiting Adversary approval: branch `fix/converged-oneshot` @
be2026a** — services_converged treats a replica deficit explained ENTIRELY by Complete tasks
as converged (Failed/mixed/spinning-up/no-tasks still block; 0/0 + N/N unchanged); pinned by
tests/unit/test_converged_oneshot.py (7 cases). Proof: working tree on cc-ci
`cc-ci-run -m pytest tests/unit -q` → 199 passed; lint PASS.
**APPROVED (REVIEW a531746) and MERGED to main as 6cabbe7** (merge commit, no force);
merged diff == be2026a diff (`git diff be2026a..main -- runner/harness/lifecycle.py
tests/unit/test_converged_oneshot.py` = empty). Push build green: drone build **350
success** on 914c166 (branch head incl. the merge; verify on cc-ci:
`docker cp <drone_cid>:/data/database.sqlite /tmp/d.sqlite && sqlite3 /tmp/d.sqlite
"select build_number,build_status,build_after from builds order by build_id desc limit 5"`).
Post-fix re-run QUEUED: /root/m2-proof3.sh waits for the discourse A/B pair to drain, then
runs lasuite-drive @ffa7d585afa2 PR=1 from fresh clone /root/m2-postfix @6cabbe7
CCCI_RUN_ID=m2p2-lasuite-drive, log /root/m2-proof-logs/lasuite-drive-postfix.log.
EXPECTED **L5** (binding condition 1 of the approval).
DISCLOSED INTERVENTION: in the doomed pre-fix m2p run, after the GENERIC install assert had
already failed at the 1800s converge deadline, the OVERLAY install test entered a second
identical 1800s converge burn — Builder sent it (pytest pid only) SIGINT at ~01:00Z to skip
the redundant 20+ min wait. The log therefore shows `KeyboardInterrupt` at generic.py:97
(the converge poll — the exact diagnosed line). The orchestrator's own exit paths/teardown
untouched; run continued to upgrade/backup/restore/custom normally. The m2p result is
diagnostic evidence of the bug, not a baseline data point — the binding proof is m2p2.
2. **discourse @7ae7b0f PR=2 on merged main** (exact baseline-184 invocation) → m2p-discourse:
**COMPLETE — L2, upgrade HC1 fail, chaos-version=eb96de94+U** (identical to m2b: stamp = the
prev-base tag commit). Deterministic at this ref on new main; NOT a PR=0 artifact, NOT a race.
install/backup/restore/custom all pass.
3. **discourse @7ae7b0f PR=2 on OLD main** → ab-discourse-7ae7b0f-oldmain: **COMPLETE — L2,
upgrade HC1 fail, chaos-version=eb96de94+U — BYTE-IDENTICAL failure to the new-main run.**
**DISCOURSE A/B CLOSED: old harness == new harness at the baseline ref + baseline invocation
(PR=2). The upgrade-HC1 mode is HARNESS-NEUTRAL — not an rcust regression.** Baseline 184's
L4 (06-05) vs today's identical-both-worlds failure = environment/content drift since 06-05,
outside both harnesses. Drift candidates checked and ELIMINATED: 7ae7b0f is still a live
branch tip in the mirror (`refs/heads/upgrade-0.8.0+3.5.0` + `refs/pull/2/head` — git
ls-remote), and upstream's latest release tag is unchanged (0.7.0+3.3.1 = eb96de94, no new
tag since 06-05). flake.lock (abra pin) identical in both worlds. HC1 firing rather than
false-greening is the guard working as designed.
Cold-verify: results.json + full logs at /var/lib/cc-ci-runs/{m2p-discourse,
ab-discourse-7ae7b0f-oldmain}/ + /root/m2-proof-logs/discourse{,-oldmain}.log.
4. **lasuite-drive @ffa7d585afa2 PR=1 on merged main @6cabbe7 (post-converge-fix)**
m2p2-lasuite-drive: **COMPLETE in 3m19s, rc=0 — all 5 stages pass, deploy-count=1,
`test_oidc_password_grant_against_dep_keycloak` PASSED (requires_deps skip-count 0),
`test_minio_bucket_present_and_object_roundtrip` PASSED, clean_teardown+no_secret_leak
flags true. NO converge burn: the one-shot again exceeded its 90s window (`!!` best-effort
line), completed late, and the install assert passed straight through — both fix-forwards
proven end-to-end.** results.json `level=4`, NOT 5 — see schema note below.
- **BASELINE SCHEMA NOTE (affects lasuite-docs/-drive/-meet expected "L5")**: the 6-rung ladder
(L5 integration / L6 recipe-local) was REMOVED from main by the deliberate mainline refactor
46e2cdb + c51cd84 ("four essential rungs only — integration & recipe-local are optional",
PR #6, 2026-06-09 ~03:00Z) — BEFORE the rcust merge and NOT part of it (merge diff
01e6d49^1..01e6d49 touches level.py not at all and results.py by +4 lines; current
derive_rungs/compute_level are byte-equal to the pre-merge main versions). Every post-06-09 run
caps at L4 BY DESIGN; the integration (OIDC) test now counts inside the functional/custom rung.
Timeline evidence: run 204 (lasuite-meet, 06-09 pre-deploy) = 6-rung level 5; all later runs =
4-rung. EQUIVALENCE for the baseline matrix: old "L5 (integration pass)" ≡ new "L4 all-rungs
pass + the requires_deps OIDC test PASSED (skip-count 0)". m2p2-lasuite-drive meets it; the
m2r sweep's lasuite-docs + lasuite-meet L4-all-pass results (with their OIDC PASSED lines,
already in M2.4 spot-greps) meet it identically.
- M2.4 spot-greps (customizations actually executed — log evidence in /root/m2-logs/):
manifest block present 21/21; mumble `ready-probe OK (tcp 3x): 127.0.0.1:64738`; ghost+discourse
`ccci-overlay: provided compose.ccci.yml ... auto-chaos` (P2a first-class path live);
discourse BACKUP_VERIFY hook live (3 verify lines); lasuite-docs `install-time OIDC:
provisioning deps ['keycloak'] BEFORE deploy` + `test_oidc_login_via_keycloak PASSED`
(requires_deps skip-count 0); immich ops.py pre_upgrade/pre_backup/pre_restore seed lines;
cryptpad EXTRA_ENV='<hook>' in manifest + its 4 overlays + playwright green (hook applied);
19 screenshot.png across m2r-* dirs.
- Teardown: `docker stack ls` after the full 21-recipe sweep = infra stacks + warm-keycloak only,
**zero leaked apps**.
- Drone→harness path: !testme on two open recipe PRs pending after the re-runs.
**Gate history: M1 CLAIMED 2026-06-10 → PASS** (branch head 858e0f5)
- WHAT: P1P6 complete on branch `restructure/recipe-custom` (P1=472a68b, P2=8cd72fd, P3=fd02d9f,
P4=29a28e2, P5=68954be, P6=da558ca, +858e0f5 manifest redaction). Working tree clean, all pushed.
- HOW (cold, from a fresh clone of the branch):
- `cc-ci-run -m pytest tests/unit -q` → EXPECTED: **192 passed**
- `cc-ci-run -m pytest tests/concurrency -q` → EXPECTED: **23 passed** (untouched by this plan;
Builder proof run 2026-06-10 on branch head: 23 passed in 11.46s)
- `nix develop .#lint --command scripts/lint.sh` → EXPECTED: **lint: PASS**
- resolved-customization diff old-vs-new for all 21 recipe dirs (Adversary's own script) →
EXPECTED: 0 deltas
- adversarial review of the full diff `main..restructure/recipe-custom`
- WHERE: origin branch `restructure/recipe-custom` @ 858e0f5; baseline matrix above (M2 prep,
committed pre-merge per plan).
## Current
Bootstrapping phase; starting P1.
M2 CLAIMED (see Gate above) — awaiting Adversary cold-verify. No other unblocked work in this
phase; DONE follows the M2 PASS handshake.

65
STATUS-shot.md Normal file
View File

@ -0,0 +1,65 @@
# STATUS-shot.md — Builder status, phase `shot`
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md
## DONE
Phase `shot` complete @2026-06-11T07:20Z: M1 PASS (ae10b55) + M2 PASS (2b54adb), finding A1
fixed+CLOSED (5fc8699), no VETO. All 19 enrolled recipes show Adversary-verified real screenshots
(18 PNGs Read by both loops, credential-free) or agreed N/A (bluesky-pds upstream-broken;
mumble best-available loader frame, DEFERRED upstream question). Fixes on main through 196156e.
## Gate history
Gate: M1 PASS (REVIEW-shot.md ae10b55). Finding A1 CLOSED (5fc8699).
Gate: M2 PASS (REVIEW-shot.md 2b54adb).
## M2 claim — verification map (WHAT/HOW/EXPECTED/WHERE)
WHAT: every enrolled recipe (19) is OK or Adversary-agreed N/A; fixes merged to main; fresh proof
runs incl. 2 via drone !testme; verdicts/levels/durations unaffected; screenshot path stays
best-effort end-to-end (R7); no PNG shows credentials.
Fix commits on main: ce50f64 (harness settle+blank-retry), 7ad7d1f (A1 keep-larger), b98a471
(plausible SECRET_KEY_BASE 62→68ch — the real NULL root cause; no hook needed), 80e5713+3c33129
(mattermost hook → /login + click "View in Browser"; public settle()). Unit: 207 pass
(`cc-ci-run -m pytest tests/unit -q`), lint PASS (`nix develop .#lint --command scripts/lint.sh`).
HOW to verify per recipe — artifacts on cc-ci `/var/lib/cc-ci-runs/<run>/{results.json,
screenshot.png,summary.html}`; scp the PNG and Read it. Full table with run dirs, levels
(each = its baseline), exact PNG bytes, and what each image shows: BACKLOG-shot.md "P4 — Proof
runs". Fixed-class proofs: immich=370 (drone !testme immich#2, posted 05:56:32Z), plausible=371
(drone !testme plausible#3), keycloak, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n,
mattermost-lts (shot-proof3-* = hook v2 → real login form), mumble (best-available loader frame —
see N/A-variant below). Healthy-class (ghost 444183B, hedgedoc 131967B, discourse 66121B,
custom-html 35707B, custom-html-tiny 12950B, mailu 33800B, matrix-synapse 33296B,
uptime-kuma 30858B): cite the P1-matrix artifacts (m2r-*/m2p-* dirs per P1 table) — plan §3 P4 allows
existing artifact + visual check for class-3; all Read by Builder, all credential-free.
EXPECTED on re-run of any fixed recipe: results.json `screenshot: "screenshot.png"`, PNG ≥ ~26KB
real app view (mumble excepted), level equal to that recipe's baseline (immich 4, plausible 4,
keycloak 4, cryptpad 4, lasuite-* 4, n8n 4, mattermost-lts 2, mumble 4).
R7 / budget: wait components 45(nav, only-on-failure)+10(settle)+0.5+4(blank retry)+0.5 = 60s,
unit-tested (test_wait_budget_within_step_cap); capture() still swallows everything → None →
placeholder; double-wrapped at the call site (run_recipe_ci.py:1024-1037, unchanged).
Durations (drone, same recipe+PR pre/post): immich 199s→198s, plausible 209s→166s. Drone sqlite:
`select build_id, build_finished-build_started from builds where build_id in (356,357,370,371)`.
Dashboard/card: `https://ci.commoninternet.net/` grid references runs/370+371 screenshot.png (both
HTTP 200); summary.html embeds screenshot.png; /badge/immich.svg 200.
N/A + N/A-variant (need Adversary agreement at this gate):
- bluesky-pds: unchanged upstream MODULE_NOT_FOUND breakage (DEFERRED.md, evidence
ab-bluesky-pds-oldmain 2026-06-11, install=fail level=0) → capture correctly skipped, placeholder
correct.
- mumble: web client (rankenstein/mumble-web:0.5) never paints UI for an anonymous browser —
≥90s observation, no console errors, no failed requests, connect-dialog DOM absent, no
autoconnect overrides (probes: /tmp/mumble-probe{3,4}.out, /tmp/mumble-orch{4,5}.log on cc-ci).
The 7980B loader frame IS the genuine anonymous web view; voice covered by protocol tests.
DEFERRED.md entry filed (upstream question). Claimed as documented best-available, not a defect.
## Blocked
(nothing)

View File

@ -38,6 +38,7 @@ _RUN_FILES = {
"screenshot.png": "image/png",
"badge.svg": "image/svg+xml",
"summary.html": "text/html; charset=utf-8",
"lint.txt": "text/plain; charset=utf-8",
}
_RUN_ID_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._-]*$")
@ -71,8 +72,7 @@ _LEVEL_COLOR = {
2: "#e0823d",
3: "#d9b343",
4: "#a0b93f",
5: "#57ab5a",
6: "#3fb950",
5: "#3fb950", # bright green — full 5-rung climb incl. lint (phase lvl5)
}
@ -152,7 +152,6 @@ def _build_row(b):
"ref": ref[:8],
"version": res.get("version") or ref[:12] or "",
"level": res.get("level"),
"level_cap_reason": res.get("level_cap_reason") or "",
"has_screenshot": bool(res.get("screenshot")),
"flags": res.get("flags") or {},
"finished": b.get("finished") or 0,
@ -220,7 +219,6 @@ a{color:#58a6ff;text-decoration:none} a:hover{text-decoration:underline}
.name{font-weight:700;font-size:1.05rem;color:#e6edf3}
.row{display:flex;align-items:center;gap:.5rem;flex-wrap:wrap;font-size:.82rem}
.pill{color:#fff;padding:.08rem .5rem;border-radius:.5rem;font-size:.75rem;font-weight:600}
.cap{color:#8b949e;font-size:.75rem}
code{background:#0d1117;border:1px solid #21262d;border-radius:.3rem;padding:0 .3rem;font-size:.78rem;color:#c9d1d9}
.flags{display:flex;gap:.4rem;font-size:.72rem;color:#8b949e}
.foot{margin-top:auto;display:flex;justify-content:space-between;font-size:.8rem;padding-top:.3rem;border-top:1px solid #21262d}
@ -274,17 +272,12 @@ def _card(r):
f'<a class="shot" href="{run_url}" title="open run">'
f'<span class="ph">no screenshot</span>{_level_pill(r["level"])}</a>'
)
cap = (
f'<div class="cap">{html.escape(r["level_cap_reason"])}</div>'
if r["level_cap_reason"]
else ""
)
return (
f'<div class="card">{shot}<div class="body">'
f'<div class="name">{html.escape(r["recipe"])}</div>'
f'<div class="row"><span class="pill" style="background:{color}">{html.escape(r["status"])}</span>'
f'<code>{html.escape(r["version"])}</code></div>'
f"{cap}{_flags_html(r['flags'])}"
f"{_flags_html(r['flags'])}"
f'<div class="foot"><a href="{run_url}">run #{num} · {_ago(r["finished"])}</a>'
f'<a href="/recipe/{html.escape(r["recipe"])}">history →</a></div>'
f"</div></div>"

View File

@ -115,8 +115,8 @@ _This table is GENERATED from the `runner/harness/meta.py` KEYS registry by `scr
| `HEALTH_OK` | `tuple[int]` | `(200, 301, 302)` | Acceptable HTTP status codes for health. |
| `DEPLOY_TIMEOUT` | `int` | `600` | Max seconds to wait for swarm convergence per deploy. |
| `HTTP_TIMEOUT` | `int` | `300` | Max seconds to wait for HTTP health after convergence. |
| `BACKUP_CAPABLE` | `bool` | `None` | Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces N/A; `True` forces the tier on; unset = auto-detect. |
| `EXPECTED_NA` | `dict` | `None` | Declare an N/A rung intentional: `{rung: reason}`. The cap stands either way; only the report wording changes. |
| `BACKUP_CAPABLE` | `bool` | `None` | Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces an intentional skip of the backup/restore rung; `True` forces the tier on; unset = auto-detect. |
| `EXPECTED_NA` | `dict` | `None` | Declare a non-run rung an INTENTIONAL skip: `{rung: reason}` — the level climbs past it; an undeclared non-run rung is *unverified* and blocks the level above it (classification table: machine-docs/DECISIONS.md phase lvl5). Never overrides an exercised pass/fail; the `lint` rung has no escape hatch. |
| `READY_PROBE` | `hook` | `None` | Callable `(ctx) -> [probe, ...]` returning extra readiness probes, run after install AND after upgrade: HTTP `{host, path, ok}` or TCP `{tcp_host, tcp_port, stable}`. |
| `UPGRADE_BASE_VERSION` | `str` | `None` | Exact published tag overriding the upgrade tier's base (default: `recipe_versions[-2]`). |
| `BACKUP_VERIFY` | `hook` | `None` | Callable `(ctx) -> bool` post-backup data-capture check; `False` re-runs the backup (truncated-dump race guard), retried up to 3 attempts. |

View File

@ -10,12 +10,9 @@ It is the R8 reference for Phase 3 (`plan-phase3-results-ux.md`).
---
## 1. The level ladder (R1)
## 1. The level ladder (phase lvl5 semantics, operator-decided 2026-06-11)
Every run earns a single integer **level 06**. The ladder is cumulative with **YunoHost
gap-caps-the-level** semantics: you earn level `L` only if **every rung 1..L was a clean PASS**. The
first rung that is not a clean PASS — a real **FAIL** *or* genuinely **N/A** for this recipe — stops
the climb, and `level_cap_reason` records which rung and why.
Every run earns a single integer **level 05** over the FIVE essential rungs:
| Level | Rung | Earned when |
|------:|------|-------------|
@ -24,42 +21,52 @@ the climb, and `level_cap_reason` records which rung and why.
| **L2** | upgrade | previous published version → PR/latest, stays healthy, data intact. |
| **L3** | backup/restore | seeded data survives backup → wipe → restore. |
| **L4** | functional | the recipe-specific functional tests pass. |
| **L5** | integration | SSO/OIDC + cross-app integration tests pass. |
| **L6** | recipe-local | the recipe repo's own `tests/` (D4) pass and are merged. |
| **L5** | lint | `abra recipe lint` passes against the exact ref under test. |
**N/A caps, fairly.** A rung that does not apply to a recipe (only one published version → no
upgrade; not backup-capable; no SSO/integration surface; no recipe-local tests) is **N/A**, which
caps the climb at the rung below it with a recorded reason — it is *not* counted as a failure. This is
the only fair reading of "a missing lower rung caps the level": e.g. a recipe with **no integration
surface caps at L4 by definition**, shown as `level_cap_reason = "L5 integration … N/A"`. A stateless
app whose functional tests pass but which cannot be backed up is honestly capped at **L2** (`"L3
backup/restore … N/A"`) rather than shown as L4 — understating is safe; overstating is forbidden.
Each rung has one of FOUR statuses, and the level is:
Worked examples (real runs):
- `uptime-kuma` — install+upgrade+backup+restore+functional all pass, no SSO surface → **L4**
(`cap = "L5 integration (SSO/OIDC + cross-app) N/A"`).
- `custom-html-tiny` — stateless, not backup-capable: install+upgrade pass, backup/restore N/A →
**L2** (`cap = "L3 backup/restore (data integrity) N/A"`).
level = the highest rung that PASSED, where every rung below it is "pass" or an intentional skip
- **pass / fail** — the rung was exercised. A FAIL blocks: no rung above it counts, however green.
- **skip (intentional)** — the rung *genuinely does not apply*, from a declared or structural fact:
not backup-capable (declared), only one published version (no upgrade target), or a declared
`EXPECTED_NA`. Intentional skips are **climbed past** — a stateless recipe with passing
functional tests and a clean lint reaches **L5**, not the old "capped at 2".
- **unver (unverified)** — the rung *should* have run but didn't: infra error, missing tool,
harness exception, prior-stage abort, timeout. **The level cannot rise above an unverified
rung** — it blocks exactly like a fail (we never claim what we didn't check). Anything
unclassifiable defaults to unver (conservative).
There is **no capping concept** (no `cap_reason`, no `capped`): the per-rung table
(✔ / ✘ / intentional-skip / unverified) on the card and in `results.json.rungs` is the sole
carrier of "why isn't this level higher". Worked examples:
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → **level 1** (fail blocks).
- install ✔, upgrade ✔, backup skip (not capable), functional ✔, lint ✔ → **level 5**.
- install ✔, upgrade ✔, backup unver (harness error), functional ✔, lint ✔ → **level 2**.
- all four ✔, lint unver (abra missing) → **level 4** (an unverified top rung isn't earned).
Integration (SSO/OIDC + cross-app) and recipe-local tests are **optional capabilities**, not
rungs — they never affect the level (SSO remains enforced for the run VERDICT).
### How tiers map to rungs (the translation layer)
`run_recipe_ci.py` holds the run's per-tier results (`install/upgrade/backup/restore/custom`) +
deps/SSO signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict that
`runner/harness/level.py::compute_level` scores. The mapping (also in `DECISIONS.md`, Phase 3):
structural signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict
that `runner/harness/level.py::compute_level` scores. The full intentional-vs-unintentional
classification table for every N/A source is in `machine-docs/DECISIONS.md` (phase lvl5). Summary:
- **install** ← install tier (pass/fail).
- **upgrade** ← upgrade tier; `skip`**na** (only one published version).
- **install** ← install tier (pass/fail; a non-run is unver — install always applies).
- **upgrade** ← upgrade tier; tier skipped with no upgrade target (single published version,
structural) → skip; declared `EXPECTED_NA` → skip; otherwise unver.
- **backup_restore** ← backup AND restore tiers both pass → pass; either fail → fail; not
backup-capable **na**.
- **functional** ← the custom tier minus its SSO tests; a custom failure conservatively fails this
rung (we don't split functional-vs-SSO failure → never inflate); no custom tests → **na**.
- **integration** ← applies only if the recipe declares deps; pass iff deps wired and SSO verified and
custom didn't fail; recipes with no declared deps → **na** (the "caps at L4" rule).
- **recipe_local** ← the recipe repo's own `tests/` (discovery source `repo-local`) ran and passed;
none present → **na**.
The pure scorer is exhaustively unit-tested + fuzz-verified (all 729 rung combinations: level ==
count of leading consecutive passes, zero inflation).
backup-capable (structural/declared) → skip; unverified-while-capable → unver.
- **functional** ← the custom tier; a custom failure conservatively fails this rung; no custom
tests is a coverage GAP → unver, unless declared `EXPECTED_NA["functional"]` → skip.
- **lint** ← the lint executor (`runner/harness/lint.py`): `abra recipe lint` on a pristine
scratch clone of the run's recipe tree at the exact tested sha, 60s hard budget, full output in
the run artifact `lint.txt`. pass/fail only — when lint can't run the rung is **unver** (never
a silent pass, never an intentional skip). Lint never changes the run verdict.
### Invariant flags (shown, not climbed)
@ -77,19 +84,29 @@ build number, or the run's unique app domain for a hand-run). Schema:
```json
{
"schema": 1, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
"schema": 2, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
"finished": 0.0,
"level": 4, "level_cap_reason": "L5 integration (SSO/OIDC + cross-app) N/A",
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"pass","functional":"pass",
"integration":"na","recipe_local":"na"},
"level": 5,
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"skip","functional":"pass",
"lint":"pass"},
"lint": {"status":"pass","detail":"","rules_failed":[]},
"skips": {"intentional": {"backup_restore": "not backup-capable (no backupbot labels / declared)"},
"unintentional": []},
"stages": [{"name":"install","status":"pass",
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
"results": {"install":"pass","upgrade":"pass","backup":"pass","restore":"pass","custom":"pass"},
"results": {"install":"pass","upgrade":"pass","backup":"skip","restore":"skip","custom":"pass"},
"flags": {"clean_teardown": true, "no_secret_leak": true},
"screenshot": "screenshot.png", "summary_card": "summary.png"
}
```
`rungs` carries the four-status vocabulary above; `skips.intentional` maps each intentionally
skipped rung to its (declared or structural) reason and `skips.unintentional` lists the
unverified rungs. `lint` carries the L5 rung outcome + failing rule ids; the full
`abra recipe lint` output is served at `/runs/<run_id>/lint.txt`. Pre-lvl5 artifacts
(`"schema": 1`, 4-rung ladder, `level_cap_reason`/`level_cap_rung` present, `"na"` statuses)
are still rendered as-is by the dashboard/card — their stored level is never recomputed.
Assembly is **best-effort**: a failure to build/write `results.json` is logged but never changes the
run's exit code (cosmetics never block the pipeline, R7).

View File

@ -1295,3 +1295,61 @@ the abra CLI and abra.recipe_dir()). No test assertion, gate, or overlay content
phase guardrail's "never touch tests/<recipe>/ content" is read as protecting test/gate SEMANTICS;
this is required P3 fallout, equivalent to the harness-side path routing. Flagged here for the
Adversary's gate-integrity review.
## Phase lvl5 — L5 lint rung + level semantics de-cap (SETTLED 2026-06-11, operator-specified)
**The level formula (replaces the Phase-3 "N/A caps" stance).** Operator decision 2026-06-11
(explicit Q&A, recorded verbatim in plan-phase-lvl5-lint-rung.md): with per-rung statuses
{pass, fail, skip (intentional), unver (unintentional/not-verified)}:
level = max i such that rung_i == "pass" and all j < i have status in {"pass","skip"}; else 0.
A real FAIL blocks. An INTENTIONAL skip (the rung genuinely does not apply, from a declared or
structural fact) is climbed past — this is the de-cap: a non-backup-capable recipe is no longer
stuck at L2. An UNVERIFIED rung (should have run, wasn't checked) blocks exactly like a fail —
this preserves the honest core of the old N/A-caps rule: never claim what wasn't checked. The
words cap/capped/cap_reason are deleted from code, schema (results.json schema 2), card,
dashboard, badge and docs; the per-rung table (✔/✘/intentional-skip/unverified) is the SOLE
carrier of "why isn't the level higher". The big level badges (card corner, dashboard pill,
/badge/<recipe>.svg) show ONLY number + colour (operator-specified). Old schema-1 artifacts are
rendered as-is (their stored level, their 4-rung ladder) — no retroactive relabeling.
**The ladder is now five rungs:** install(1) upgrade(2) backup_restore(3) functional(4)
**lint(5) = `abra recipe lint` passes against the exact ref under test** (PR head on PR builds).
Lint is a LEVEL RUNG, not a run gate: no lint outcome ever changes the run verdict.
**N/A classification table (derive_rungs, results.py — every N/A source, Adversary-reviewed).
Default for anything unclassifiable: UNVER (conservative).**
| rung | source of non-pass/fail | class | status |
|---|---|---|---|
| install | tier skipped / missing (any reason — install always applies) | unintentional | unver |
| upgrade | tier skipped by orchestrator AND no upgrade target (`prev is None`: only one published version — structural) | intentional | skip |
| upgrade | declared `EXPECTED_NA["upgrade"]` (tier not pass/fail) | intentional | skip |
| upgrade | tier skipped though a target exists (install failed → downstream abort), or tier missing (CCCI_STAGES dev escape) | unintentional | unver |
| backup_restore | not backup-capable (no backupbot labels / `BACKUP_CAPABLE=False` — structural/declared) | intentional | skip |
| backup_restore | declared `EXPECTED_NA["backup_restore"]` (tiers not pass/fail) | intentional | skip |
| backup_restore | backup-capable but either tier did not produce pass/fail (abort, partial run) | unintentional | unver |
| functional | declared `EXPECTED_NA["functional"]` (no custom tests / tier skipped) | intentional | skip |
| functional | no custom tests / tier skipped, undeclared — absent functional coverage is a GAP, not a property | unintentional | unver |
| lint | executor could not produce pass/fail (timeout, abra/script missing, env FATA, unparseable output) — NO escape hatch, `EXPECTED_NA["lint"]` is ignored | unintentional | unver |
EXPECTED_NA never overrides an exercised rung: pass/fail always stand.
**Lint executor mirror-context decision (plan-phase-lvl5 §2.3).** Probed on cc-ci 2026-06-11
(JOURNAL-lvl5): (a) abra lint globs every `compose*.yml` in the recipe tree, so the CI's
untracked install_steps overlays (e.g. compose.ccci.yml) FATA it — harness artifact; (b) abra
lint force-fetches tags from `origin`, so a PR run's private-mirror origin (token never written
to .git/config) FATAs "unable to fetch tags" — harness artifact; (c) `abra recipe lint` exits
non-zero ONLY on FATA — rule verdicts live in its table (error-severity ❌ rows + a trailing
"WARN critical errors present" sentinel, rc still 0). Decision: the executor (harness/lint.py)
lints a PRISTINE SCRATCH CLONE of the per-run recipe tree checked out at the exact tested sha —
origin becomes a local path (offline tag fetch, no auth) and the run's true tag set rides along
(fetch_recipe already fetches the canonical upstream version tags into the per-run tree, so
R014 evaluates the recipe's real tags). **No lint rule is filtered or ignored** — the
plumbing pollution is solved by context, not by exemptions. Classifier: fail iff an
error-severity rule is unsatisfied (or the FATA is content-attributable: "unable to validate
recipe"); pass iff the table rendered clean; anything else unver + loud log. Hard 60s budget
(observed ~0.7s); executor runs before the tiers (tree at tested ref), double-wrapped, R7
verdict-neutral. Full output → run artifact `lint.txt` (dashboard-served); status + failing
rule ids → results.json `lint`.

View File

@ -335,3 +335,28 @@ before the build is called done) — but does **not** force closure.
- **Re-entry trigger:** Builder authors recipe-PR Q4.7b (cache tarball on a volume / wget
retry+backoff / drop `2>/dev/null` / `set +e` w/ fallback), then runs plausible-full green + claims.
- **Linked:** REVIEW-2 `e850281` (root-cause + DENY), `71af595` (§4.3 floor); DECISIONS 2026-05-30.
- discourse upgrade-HC1 @7ae7b0f stamps prev-base tag commit (eb96de94+U) on BOTH old+new harness since ~06-10 (baseline 184 was L4 on 06-05); harness-neutral (rcust exonerated, M2-closed) but abra stamp-resolution mechanism UNATTRIBUTED — worth a standalone dig outside rcust. Evidence: /var/lib/cc-ci-runs/{m2p-discourse,ab-discourse-7ae7b0f-oldmain}, JOURNAL-rcust 2026-06-11.
- bluesky-pds: UPSTREAM IMAGE BREAKAGE (non-rcust, M2-justified exclusion from baseline match).
The app container crash-loops `Error: Cannot find module '/app/index.js'` (MODULE_NOT_FOUND,
Node v24.15.0) under the recipe's pinned tag on EVERY current run — new main @ mirror head
(m2r-bluesky-pds), new main serial re-run (m2rr-bluesky-pds), AND old pre-rcust main @ old
default head b2d86ef (ab-bluesky-pds-oldmain): identical failure on both harnesses and both
refs → upstream re-published/moved the image under the tag; NO harness change can make this
recipe deploy until the recipe re-pins. Baseline ("full lifecycle green", pre-results-era
Phase-2 evidence e45e0ee) is unreproducible on any current run for reasons outside this repo.
Evidence: `grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/
default/`; REVIEW-rcust.md 2026-06-11 entries. Follow-up (post-phase): file/propose a re-pin PR
against the bluesky-pds recipe mirror.
- mumble-web client never paints UI for an anonymous browser (phase-shot, 2026-06-11). The recipe's
pinned web client (rankenstein/mumble-web:0.5 via compose.mumbleweb.yml, served by websockify)
stays at its `loading-container` spinner ≥90s with NO console errors, NO failed asset/requests,
connect-dialog DOM elements absent, and no autoconnect overrides in config.local.js (defaults
untouched) — so the CI screenshot's best-available frame is the genuine loader view every visitor
gets. The voice server itself is fully exercised (protocol handshake/config tests pass; that is
mumble's actual function). A harness-side fix is impossible without changing what the recipe
deploys (guardrail: prefer upstream over cc-ci overlays). **Operator input needed:** whether to
pursue an upstream recipe issue/PR (newer mumble-web image or one that renders its connect dialog)
— until then the dashboard shows the loader frame as the recipe's web-surface reality.
Evidence: /tmp/mumble-probe{2,3,4}.out + /tmp/mumble-orch{4,5}.log on cc-ci (90s DOM/console/
network observation; websockify reachable, /ws & /websocket 404 from websockify itself);
/var/lib/cc-ci-runs/shot-proof-mumble/screenshot.png (L4 run, loader frame).

View File

@ -21,23 +21,24 @@ from __future__ import annotations
import html
import os
# Level → colour ramp (YunoHost-ish): red at the floor, climbing to green at the top.
# Level → colour ramp (YunoHost-ish): red at the floor, climbing to green at the top (L5 = full
# clean climb incl. lint — phase lvl5).
LEVEL_COLOR = {
0: "#e5534b", # red — install failed
1: "#e0823d", # orange
2: "#e0823d",
3: "#d9b343", # amber
4: "#a0b93f", # yellow-green
5: "#57ab5a", # green
6: "#3fb950", # bright green — full climb
4: "#a0b93f", # yellow-green — above functional, lint not earned
5: "#3fb950", # bright green — full climb (lint passed)
}
STATUS_MARK = {"pass": "", "fail": "", "skip": "", "error": "", "na": ""}
STATUS_MARK = {"pass": "", "fail": "", "skip": "", "error": "", "na": "", "unver": ""}
STATUS_COLOR = {
"pass": "#3fb950",
"fail": "#f85149",
"error": "#f85149",
"skip": "#8b949e",
"na": "#8b949e",
"unver": "#d29922", # amber — exercised? no: should have run and wasn't verified
}
@ -79,44 +80,15 @@ def render_badge_svg(label: str, message: str, color: str) -> str:
)
# Third-segment colours for the level badge: amber = an UNINTENTIONAL skip (a rung skipped but not
# in the recipe's intentional list — likely missing coverage) capped the climb; muted = an
# INTENTIONAL skip (declared in recipe_meta.EXPECTED_NA — nothing to fix). Font-safe text labels
# (no emoji) so the SVG renders anywhere.
# Amber for UNVERIFIED rung rows in the table (a rung that should have run and wasn't checked).
GAP_COLOR = "#d29922"
EXPECT_COLOR = "#6e7681"
def level_badge_svg(level: int, cap_reason: str = "", cap_skip: str = "") -> str:
"""Per-recipe/-run LEVEL badge: 'cc-ci | level N' coloured by level (R6), with a THIRD segment
that differentiates *why* the climb stopped when a SKIP capped it (`cap_skip`):
- "unintentional" (a rung skipped but not in the recipe's intentional list): amber 'gap?'.
- "intentional" (a skip declared in recipe_meta.EXPECTED_NA): muted 'expected'.
- "" (clean cap / full climb / a real failure): no third segment (the level + card carry it).
The badge never inflates — it only annotates the cap the level already reflects."""
label, msg = "cc-ci", f"level {int(level)}"
lw, mw = _text_width(label), _text_width(msg)
third: tuple[str, str] | None = None
if cap_skip == "unintentional":
third = ("gap?", GAP_COLOR)
elif cap_skip == "intentional":
third = ("expected", EXPECT_COLOR)
if third is None:
return render_badge_svg(label, msg, level_color(level))
txt, tcolor = third
tw = _text_width(txt)
w = lw + mw + tw
return (
f'<svg xmlns="http://www.w3.org/2000/svg" width="{w}" height="20" role="img" '
f'aria-label="{html.escape(label)}: {html.escape(msg)} ({html.escape(txt)})">'
f'<rect width="{lw}" height="20" fill="#555"/>'
f'<rect x="{lw}" width="{mw}" height="20" fill="{level_color(level)}"/>'
f'<rect x="{lw + mw}" width="{tw}" height="20" fill="{tcolor}"/>'
f'<g fill="#fff" font-family="Verdana,Geneva,sans-serif" font-size="11">'
f'<text x="6" y="14">{html.escape(label)}</text>'
f'<text x="{lw + 6}" y="14">{html.escape(msg)}</text>'
f'<text x="{lw + mw + 6}" y="14">{html.escape(txt)}</text></g></svg>'
)
def level_badge_svg(level: int) -> str:
"""Per-recipe/-run LEVEL badge: 'cc-ci | level N' coloured by level — NUMBER + COLOUR ONLY
(operator-specified, phase lvl5). 'Why isn't it higher' lives in the card's per-rung table,
never on the badge."""
return render_badge_svg("cc-ci", f"level {int(level)}", level_color(level))
def _stage_rows(stages: list[dict]) -> str:
@ -141,12 +113,13 @@ def _stage_rows(stages: list[dict]) -> str:
return "\n".join(rows) or '<tr><td colspan="3">no stages</td></tr>'
# Friendly rung labels for the skip rows (the four essential rungs).
# Friendly rung labels for the skip/unverified rows (the five essential rungs).
RUNG_LABEL = {
"install": "install",
"upgrade": "upgrade",
"backup_restore": "backup/restore",
"functional": "functional",
"lint": "lint",
}
SKIP_GREEN = (
"#57ab5a" # muted green — an intentional skip reads like a pass (but labelled, never inflating)
@ -154,9 +127,10 @@ SKIP_GREEN = (
def _skip_rows(skips: dict) -> str:
"""Render SKIPPED rungs as stage-like rows. An intentional (declared) skip looks like a pass row
but its status says 'INTENTIONAL SKIP' (muted green) with the declared reason on the line below;
an unintentional skip is amber 'UNINTENTIONAL SKIP' with a prompt to add a test or declare it."""
"""Render the non-run rungs as stage-like rows (phase lvl5 semantics). An INTENTIONAL skip
(declared/structural — the rung does not apply, the climb continues past it) is muted green
with its reason on the line below; an UNVERIFIED rung (should have run, wasn't checked — the
level cannot rise above it) is amber 'unverified'."""
rows = []
for rung, reason in (skips.get("intentional") or {}).items():
rows.append(
@ -171,11 +145,11 @@ def _skip_rows(skips: dict) -> str:
rows.append(
f'<tr class="stage"><td colspan="2"><span class="mark" style="color:{GAP_COLOR}">⊘</span>'
f"<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>"
f'<td class="st" style="color:{GAP_COLOR}">unintentional skip</td></tr>'
f'<td class="st" style="color:{GAP_COLOR}">unverified</td></tr>'
)
rows.append(
'<tr class="skipreason"><td></td><td colspan="2">not declared in EXPECTED_NA — add the '
"missing test/label, or declare the skip with a reason</td></tr>"
'<tr class="skipreason"><td></td><td colspan="2">rung did not run / could not be '
"checked — the level cannot rise above an unverified rung</td></tr>"
)
return "\n".join(rows)
@ -184,13 +158,15 @@ def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png")
"""Build the summary-card HTML from a results.json dict. `screenshot_rel` is the relative path to
the screenshot PNG (same dir as the card) — omitted from the card if None / absent.
The card shows exactly what the data says: recipe + version, the level badge + cap reason, the
per-stage/per-test ✔/✘ table, the invariant flags, and the app screenshot. No computation here."""
The card shows exactly what the data says: recipe + version, the level, the per-stage/per-test
✔/✘ table (+ skip/unverified rung rows — the SOLE carrier of "why isn't the level higher"),
the invariant flags, and the app screenshot. No computation here. Tolerates old (schema-1)
artifacts: the ladder height is read off the rungs the artifact actually has."""
recipe = html.escape(str(data.get("recipe", "?")))
version = html.escape(str(data.get("version") or data.get("ref") or ""))
level = int(data.get("level", 0))
cap_reason = str(data.get("level_cap_reason") or "")
cap = html.escape(cap_reason)
# Old (pre-lvl5) artifacts have a 4-rung ladder — render their "of N" honestly.
ladder_top = 5 if "lint" in (data.get("rungs") or {}) else 4
sk = data.get("skips", {}) or {}
color = level_color(level)
flags = data.get("flags", {}) or {}
@ -221,7 +197,7 @@ body{{margin:0;font-family:system-ui,-apple-system,Segoe UI,sans-serif;backgroun
.lvl .num{{display:inline-block;min-width:64px;padding:.3rem .7rem;border-radius:10px;
font-size:1.6rem;font-weight:700;color:#0d1117;background:{color}}}
.lvl .lbl{{display:block;color:#8b949e;font-size:.72rem;text-transform:uppercase;margin-top:.2rem}}
.cap{{padding:.4rem 1.3rem;color:#8b949e;font-size:.82rem;border-bottom:1px solid #21262d}}
.ladder{{padding:.4rem 1.3rem;color:#8b949e;font-size:.82rem;border-bottom:1px solid #21262d}}
.body{{display:flex;gap:1rem;padding:1rem 1.3rem}}
.tbl{{flex:1}}
table{{border-collapse:collapse;width:100%;font-size:.85rem}}
@ -238,12 +214,12 @@ tr.skipreason td{{color:#8b949e;font-size:.78rem;font-style:italic;padding-top:0
.shot.noshot{{display:flex;align-items:center;justify-content:center;height:225px;color:#8b949e;font-size:.85rem}}
.flags{{display:flex;gap:.6rem;padding:.6rem 1.3rem 1rem}}
.flag{{border:1px solid;border-radius:6px;padding:.15rem .5rem;font-size:.78rem;color:#c9d1d9}}
.cap b{{color:#c9d1d9}}
.ladder b{{color:#c9d1d9}}
</style></head><body><div class="card">
<div class="hd">{FLOWER_SVG}
<div class="title"><h1>{recipe}</h1><span class="ver">{version}</span></div>
<div class="lvl"><span class="num">{level}</span><span class="lbl">level</span></div></div>
<div class="cap">{("<b>capped:</b> " + cap) if cap else "<b>full clean climb</b> — top level (4)"}</div>
<div class="ladder"><b>level {level} of {ladder_top}</b></div>
<div class="body"><div class="tbl"><table>{rows}</table></div>{shot_html}</div>
<div class="flags">{"".join(flag_bits)}</div>
</div></body></html>"""

View File

@ -1,67 +1,67 @@
"""Phase 3 — the level ladder (plan-phase3-results-ux.md §4.1, R1).
"""The level ladder — five rungs, no capping (phase lvl5, plan-phase-lvl5-lint-rung.md).
A single integer **level** summarising how far up the quality ladder a recipe run climbed, with
YunoHost semantics: **a gap caps the level** — you only earn level L if every rung 1..L was a clean
PASS. The first rung that is not a clean PASS (a real FAIL *or* genuinely N/A for this recipe) stops
the climb; `cap_reason` records why. This is deliberately conservative: presentation must NEVER make
a run look greener than its tests (plan §6 cardinal guardrail), so an N/A rung caps just like a fail
— with a recorded reason so the level is *fair*, not inflated.
The ladder is the FOUR essential rungs every recipe is held to:
A single integer **level** summarising how far up the quality ladder a recipe run climbed:
L0 — install failed / app never became healthy.
L1 — Installs: deploys + passes health/readiness.
L2 — Upgrades: previous published version → PR version, stays healthy, data intact.
L3 — Backup/restore: seeded data survives backup → wipe → restore.
L4 — Functional: recipe-specific functional tests pass.
L5 — Lint: `abra recipe lint` passes against the exact ref under test.
Integration (SSO/OIDC + cross-app) and recipe-local (the recipe repo's own tests/) are **OPTIONAL**
capabilities — they are NOT part of the level ladder and never cap it. They still run when present
(and SSO is still enforced for the run VERDICT via the deps/SSO checks in run_recipe_ci.py), but a
recipe without an SSO surface or without repo-local tests is simply not penalised on the level.
Semantics (operator-decided 2026-06-11, recorded in DECISIONS.md — replaces the Phase-3
"N/A caps" rule):
This module is PURE (no I/O) so it is cheaply unit-testable and the Adversary can re-run the unit
test cold (`cc-ci-run -m pytest tests/unit/test_level.py -q`). The orchestrator
(`run_recipe_ci.py`) is responsible for translating its raw per-tier results into the rung-status
dict this function consumes; that mapping is documented in DECISIONS.md (Phase 3).
level = max i such that rung_i == "pass" and every rung j < i is "pass" or "skip"; 0 if none.
Rung status vocabulary (each rung ∈ these three):
"pass" the rung was exercised and passed.
"fail" the rung was exercised and failed.
"na" the rung does not apply to this recipe (e.g. only one published version → no upgrade;
not backup-capable). N/A is NOT a failure, but it DOES cap the climb (with a distinct
cap_reason) so the level never overstates what was actually verified.
A rung has one of FOUR statuses:
"pass" — exercised and passed.
"fail" — exercised and failed. Blocks: no rung above it can count.
"skip"INTENTIONAL skip: the rung genuinely does not apply to this recipe, from a
declared or structural fact (not backup-capable; only one published version;
declared in recipe_meta.EXPECTED_NA). Does NOT stop the climb.
"unver" — UNINTENTIONAL not-verified: the rung SHOULD have run but didn't (infra error,
missing tool, harness exception, prior-stage abort, timeout). Blocks exactly
like a fail — the level never rises above a rung that wasn't actually checked.
The per-rung table (results.json `rungs`, card, dashboard) is the SOLE carrier of "why isn't
this level higher" — there is no cap_reason. The classification of every N/A source into
skip-vs-unver lives in derive_rungs (results.py) and is tabulated in DECISIONS.md; anything
unclassifiable defaults to "unver" (conservative: never claim what wasn't checked).
Integration (SSO/OIDC + cross-app) and recipe-local (the recipe repo's own tests/) remain
OPTIONAL capabilities — not rungs, never counted (SSO is still enforced for the run VERDICT
via the deps/SSO checks in run_recipe_ci.py).
This module is PURE (no I/O) so it is cheaply unit-testable and the Adversary can re-run the
unit test cold (`cc-ci-run -m pytest tests/unit/test_level.py -q`).
"""
from __future__ import annotations
# The climbable rungs in ascending order. install (L1) is the foundation; L0 means install itself
# did not pass. Each later rung requires every earlier rung to be a clean PASS. These four are the
# ESSENTIAL rungs — integration/recipe-local are optional and deliberately NOT in this tuple.
RUNGS = ("install", "upgrade", "backup_restore", "functional")
# The climbable rungs in ascending order. install (L1) is the foundation; L0 means install
# itself did not pass. These five are the ESSENTIAL rungs — integration/recipe-local are
# optional and deliberately NOT in this tuple.
RUNGS = ("install", "upgrade", "backup_restore", "functional", "lint")
# Human-readable label per rung level, for cap_reason + the summary card.
# Human-readable label per rung level, for the summary card / docs.
RUNG_LABEL = {
1: "install (deploy + health)",
2: "upgrade (prev published → PR)",
3: "backup/restore (data integrity)",
4: "functional (recipe-specific tests)",
5: "lint (abra recipe lint)",
}
VALID = {"pass", "fail", "na"}
VALID = {"pass", "fail", "skip", "unver"}
def compute_level(rungs: dict[str, str]) -> tuple[int, str]:
"""Map a rung-status dict → (level 0..4, cap_reason).
def compute_level(rungs: dict[str, str]) -> int:
"""Map a rung-status dict → level 0..5.
`rungs` must contain a status in {"pass","fail","na"} for every name in RUNGS. The level is the
highest L such that rungs[1..L] are all "pass"; the first non-"pass" rung caps the climb. L0 is
returned when the install rung itself is not "pass" (install failed / never healthy).
cap_reason explains where the climb stopped:
- "" (empty) when the recipe earned the top rung (L4, full clean climb).
- "L<k> <label> FAILED" when a rung was exercised and failed.
- "L<k> <label> N/A" when a rung does not apply to this recipe.
Returns the reason for the FIRST rung that stopped the climb (the binding constraint).
`rungs` must contain a status in VALID for every name in RUNGS. The level is the highest
i such that rungs[i] == "pass" and every rung below i is "pass" or "skip" (an intentional
skip does not stop the climb). A "fail" or "unver" rung blocks: rungs above it cannot
count, however green. 0 when no rung qualifies.
"""
for name in RUNGS:
st = rungs.get(name)
@ -69,52 +69,44 @@ def compute_level(rungs: dict[str, str]) -> tuple[int, str]:
raise ValueError(
f"rung {name!r} has invalid status {st!r} (expect one of {sorted(VALID)})"
)
# L0: install did not pass.
if rungs["install"] != "pass":
if rungs["install"] == "fail":
return 0, "L1 " + RUNG_LABEL[1] + " FAILED"
# install N/A is not a real-world state for a deploy run, but handle it for totality.
return 0, "L1 " + RUNG_LABEL[1] + " N/A"
# Climb: stop at the first rung that is not a clean pass.
level = 0
for idx, name in enumerate(RUNGS, start=1):
if rungs[name] == "pass":
st = rungs[name]
if st == "pass":
level = idx
elif st == "skip":
continue
# first non-pass rung — caps the climb
kind = "FAILED" if rungs[name] == "fail" else "N/A"
return level, f"L{idx} {RUNG_LABEL[idx]} {kind}"
# Full clean climb to the top rung.
return level, ""
else: # fail / unver — nothing above this rung can count
break
return level
def backup_restore_status(backup: str | None, restore: str | None, backup_capable: bool) -> str:
"""Collapse the backup + restore tier results into the single L3 rung status.
Both tiers must pass for the rung to pass (the rung is "seeded data survives backup→wipe→restore",
which is only verified if BOTH the backup and the restore tier are green). If the recipe is not
backup-capable, both tiers skip → the rung is N/A (caps at L2, recorded). A fail in either tier
fails the rung.
Not backup-capable (a declared/structural fact: no backupbot labels, or
recipe_meta.BACKUP_CAPABLE=False) → "skip" — the rung genuinely does not apply.
Otherwise both tiers must pass for the rung to pass; a fail in either tier fails it; any
other shape (tier skipped or never ran while backup-capable — e.g. a prior-stage abort)
is "unver": the rung should have been verified and wasn't.
"""
if not backup_capable:
return "na"
return "skip"
vals = {backup, restore}
if "fail" in vals:
return "fail"
if backup == "pass" and restore == "pass":
return "pass"
# any skip/None while backup-capable → not verified → treat as N/A (cannot claim L3)
return "na"
return "unver"
def tier_to_rung(status: str | None) -> str:
"""Map a single tier result ('pass'|'fail'|'skip'|None) to a rung status. 'skip'/None → 'na'
(the tier did not apply / did not run), so it caps the climb without being counted as a failure."""
"""Map a single tier result ('pass'|'fail'|'skip'|None) to a rung status, with NO
intentionality information: a tier that did not produce a pass/fail is "unver" (it should
have run and wasn't verified). The caller (derive_rungs) upgrades "unver" to "skip" where
a declared/structural fact makes the skip intentional — never the other way around."""
if status == "pass":
return "pass"
if status == "fail":
return "fail"
return "na"
return "unver"

View File

@ -348,8 +348,27 @@ def services_converged(domain: str) -> bool:
# `want == "0"` rejection wrongly treated those as never-converged, hanging the deploy
# forever. `cur == want` (with `want` present) is the correct convergence test; a service
# still spinning up shows e.g. "0/1" (cur != want) and is correctly not-yet-converged.
if not want or cur != want:
if not want:
return False
if cur != want:
# A TRIGGERED one-shot (restart_policy none, scaled 0→1, runs once, exits 0) reports
# "0/1" FOREVER after its task completes — swarm never restarts it, so a bare
# `cur != want` rejection would block convergence for the rest of the run (lasuite-drive
# minio-createbuckets, rcust M2: install assert burned the full DEPLOY_TIMEOUT after the
# P2b port moved the bucket trigger BEFORE the install assert; pre-restructure the
# trigger ran after it, so converge never saw the 0/1). A replica deficit explained
# entirely by COMPLETE tasks IS converged: the one-shot did its job and will never run
# again. Anything else in the deficit (Running/Starting/Pending = still spinning up;
# Failed/Rejected = genuinely broken) stays not-converged, and a desired>0 service with
# no tasks yet is still scheduling.
tasks = subprocess.run(
["docker", "service", "ps", name, "--format", "{{.CurrentState}}"],
capture_output=True,
text=True,
)
states = [ln.split()[0] for ln in tasks.stdout.split("\n") if ln.strip()]
if not (states and all(s == "Complete" for s in states)):
return False
# N/N alone is NOT convergence during a stop-first rolling update: a chaos redeploy that changes
# a non-app service image (e.g. immich's db pin) registers the update immediately, but swarm may
# not have cycled that service's task yet — the OLD task still shows 1/1, then dies seconds later

174
runner/harness/lint.py Normal file
View File

@ -0,0 +1,174 @@
"""L5 lint rung — run `abra recipe lint` against the exact ref under test (phase lvl5).
Executor + classifier for the fifth ladder rung. Design constraints (plan-phase-lvl5 §2):
- **Lints the recipe's CONTENT, not the harness plumbing.** abra lint reads every
`compose*.yml` in the tree (including the CI's untracked install_steps overlays) and
force-fetches tags from `origin` (which on PR runs is the private mirror, unauthenticated
here → FATA). Both are harness artifacts, so the executor lints a PRISTINE scratch clone of
the per-run tree, checked out at the exact tested ref: `origin` becomes a local path (tag
fetch works offline, no auth) and the run's true tag set rides along (fetch_recipe pulls the
upstream version tags into the per-run tree). No lint rule is filtered or ignored.
- **rc is not the verdict.** `abra recipe lint` exits non-zero only when it cannot lint
(FATA); rule outcomes live in its table — error-severity ❌ rows print a trailing
"WARN critical errors present …" sentinel but still exit 0. So the classifier parses the
table: FAIL iff an error-severity rule is unsatisfied (or the FATA is content-attributable:
"unable to validate recipe" — the recipe config itself is invalid). PASS iff the table
rendered and no error rule failed. ANYTHING else — timeout, abra/script missing, tag-fetch
FATA, unparseable output — is "unver": loud, never a silent pass, never an intentional skip.
- **Best-effort + time-bounded.** Hard ~60s timeout (observed runtime ≈0.7s); the caller
wraps run_lint in try/except besides — a wedged lint can never hang or fail a run, and the
run VERDICT is untouched by any lint outcome (lint is a level rung, not a gate).
- Full command output (+ cmd, rc, ref header) is captured to `lint.txt` in the run artifact
dir; results.json carries status + short excerpt (failing rule ids).
abra needs a PTY even with -n ("inappropriate ioctl on device") → run via util-linux
`script -qec`, same trick as harness.abra._run_pty.
"""
from __future__ import annotations
import os
import re
import shlex
import shutil
import subprocess
import tempfile
from . import abra
LINT_TIMEOUT = 60 # hard budget, seconds; observed ~0.7s per recipe
# Strip ANSI escape sequences from PTY output before parsing.
_ANSI = re.compile(r"\x1b\[[0-9;?]*[A-Za-z]")
# A table row: ┃ R014 ┃ description ┃ error ┃ ✅/❌ ┃ skipped ┃ how-to-fix ┃ — abra renders the
# grid with HEAVY box-drawing verticals (┃ U+2503); accept the light variant (│ U+2502) too.
_ROW = re.compile(
r"^\s*[│┃]\s*(R\d+)\s*[│┃](.*?)[│┃]\s*(warn|error)\s*[│┃]\s*(✅|❌)\s*[│┃]\s*([^│┃]*)[│┃]"
)
# abra's trailing sentinel when any error-severity rule is unsatisfied (cross-check only).
_SENTINEL = "critical errors present"
# FATA classes that are the RECIPE's fault (its config cannot even be validated) — a lint
# FAIL, not an unverified rung. Everything else non-zero is environmental → unver.
_CONTENT_FATA = "unable to validate recipe"
def parse_table(output: str) -> list[dict]:
"""Parse the lint table → rows {rule, desc, severity, satisfied(bool), skipped(bool)}.
Tolerant: lines that don't match are ignored; returns [] when no table rendered."""
rows = []
for line in _ANSI.sub("", output).replace("\r", "\n").splitlines():
m = _ROW.match(line)
if not m:
continue
rule, desc, severity, mark, skipped = m.groups()
rows.append(
{
"rule": rule,
"desc": desc.strip(),
"severity": severity,
"satisfied": mark == "",
"skipped": skipped.strip() not in ("", "-"),
}
)
return rows
def classify(rc: int | None, output: str) -> tuple[str, str, list[str]]:
"""(status, detail, failed_rule_ids) from a finished lint invocation.
status ∈ {"pass","fail","unver"}; never a silent pass: pass requires a parsed table with
zero unsatisfied error-severity rules AND no sentinel. `rc=None` means the run itself blew
up (timeout/missing binary) — always unver; the caller supplies the detail.
"""
if rc is None:
return "unver", "lint did not run", []
if rc != 0:
first = next((ln for ln in _ANSI.sub("", output).splitlines() if "FATA" in ln), "").strip()
if _CONTENT_FATA in output:
# The recipe config itself failed validation — attributable to recipe content.
return "fail", first or "recipe config failed validation", []
return "unver", first or f"abra recipe lint exited {rc} with no table", []
rows = parse_table(output)
if not rows:
return "unver", "no lint table in output (rc=0)", []
failed = [
r["rule"]
for r in rows
if r["severity"] == "error" and not r["satisfied"] and not r["skipped"]
]
if failed:
return "fail", f"error rule(s) unsatisfied: {', '.join(failed)}", failed
if _SENTINEL in output:
# abra says critical errors but our parse found none — distrust the parse, never inflate.
return "fail", "abra reported critical errors (table parse found none)", []
return "pass", "", []
def run_lint(recipe: str, ref: str | None, out_dir: str | None) -> dict:
"""Execute the lint rung for `recipe` at exactly `ref` (a sha; None → the per-run tree's
current HEAD). Returns {"status","detail","rules_failed"} and writes lint.txt into
`out_dir` (when given). Never raises: every failure mode is caught into status "unver"."""
scratch = None
rc: int | None = None
output = ""
try:
src_tree = abra.recipe_dir(recipe)
scratch = tempfile.mkdtemp(prefix="ccci-lint-")
lint_abra = os.path.join(scratch, "abra")
os.makedirs(os.path.join(lint_abra, "recipes"))
clone = os.path.join(lint_abra, "recipes", recipe)
subprocess.run(
["git", "clone", "--quiet", src_tree, clone],
check=True,
capture_output=True,
text=True,
timeout=LINT_TIMEOUT,
)
if ref:
subprocess.run(
["git", "-C", clone, "checkout", "-f", "--quiet", ref],
check=True,
capture_output=True,
text=True,
timeout=LINT_TIMEOUT,
)
# catalogue: R006 (published catalogue version) reads it; servers: harmless, some abra
# paths stat it. Symlink the live ones (read-only use).
for shared in ("catalogue", "servers"):
src = os.path.join(abra.abra_dir(), shared)
if os.path.exists(src):
os.symlink(os.path.realpath(src), os.path.join(lint_abra, shared))
env = dict(os.environ, ABRA_DIR=lint_abra)
proc = subprocess.run(
["script", "-qec", f"abra recipe lint -n {shlex.quote(recipe)}", "/dev/null"],
capture_output=True,
text=True,
timeout=LINT_TIMEOUT,
env=env,
)
rc, output = proc.returncode, proc.stdout + proc.stderr
status, detail, failed = classify(rc, output)
except subprocess.TimeoutExpired:
status, detail, failed = "unver", f"lint timed out after {LINT_TIMEOUT}s", []
except Exception as e: # noqa: BLE001 — rung must never break the run; unver is the honest floor
status, detail, failed = "unver", f"lint executor error: {e.__class__.__name__}: {e}", []
finally:
if scratch:
shutil.rmtree(scratch, ignore_errors=True)
if status == "unver":
print(f"!! lint rung UNVERIFIED for {recipe}: {detail}", flush=True)
if out_dir:
try:
os.makedirs(out_dir, exist_ok=True)
with open(os.path.join(out_dir, "lint.txt"), "w", encoding="utf-8") as f:
f.write(
f"$ abra recipe lint -n {recipe} (ref={ref or 'HEAD'})\n"
f"rc={rc} status={status} {detail}\n\n{output}"
)
except OSError as e:
print(f" lint: could not write lint.txt (non-fatal): {e}", flush=True)
return {"status": status, "detail": detail, "rules_failed": failed}

View File

@ -70,13 +70,13 @@ KEYS: tuple[Key, ...] = (
"BACKUP_CAPABLE",
"bool",
None,
"Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces N/A; `True` forces the tier on; unset = auto-detect.",
"Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces an intentional skip of the backup/restore rung; `True` forces the tier on; unset = auto-detect.",
),
Key(
"EXPECTED_NA",
"dict",
None,
"Declare an N/A rung intentional: `{rung: reason}`. The cap stands either way; only the report wording changes.",
"Declare a non-run rung an INTENTIONAL skip: `{rung: reason}` — the level climbs past it; an undeclared non-run rung is *unverified* and blocks the level above it (classification table: machine-docs/DECISIONS.md phase lvl5). Never overrides an exercised pass/fail; the `lint` rung has no escape hatch.",
),
Key(
"READY_PROBE",

View File

@ -1,20 +1,22 @@
"""Phase 3 — structured run results + results.json (plan-phase3-results-ux.md §4.2, R1/R3).
"""Structured run results + results.json (Phase 3 §4.2 R1/R3; level semantics: phase lvl5).
Turns a run's per-tier pytest outcomes into a single `results.json` artifact carrying, per the plan:
Turns a run's per-tier pytest outcomes into a single `results.json` artifact carrying:
{ recipe, version, pr, ref, run_id, finished, stages:[{name,status,tests:[{name,status,ms}]}],
level, level_cap_reason, level_cap_rung, rungs,
level, rungs, lint:{status,detail,rules_failed},
skips:{intentional:{rung:reason}, unintentional:[rung]},
flags:{clean_teardown,no_secret_leak}, screenshot, summary_card }
`skips` splits the N/A (skipped) rungs by a simple rule: a skip is INTENTIONAL iff the recipe lists
it (with a reason) in `recipe_meta.EXPECTED_NA = {rung: reason}`; any rung skipped but not listed is
UNINTENTIONAL (a coverage gap to fill or declare). Skips still cap the level either way — the harness
never claims a rung it did not verify; this only labels *why* a skip happened.
Rung statuses (phase lvl5, operator-decided — see harness.level + DECISIONS.md): every rung is
"pass" | "fail" | "skip" (INTENTIONAL — a declared/structural fact says the rung does not apply)
| "unver" (UNINTENTIONAL — the rung should have run and wasn't verified; blocks the level like a
fail). `derive_rungs` is the single place every N/A source is classified; anything it cannot
attribute to a declared/structural fact defaults to "unver" (conservative). `skips` mirrors that
split into results.json: intentional {rung: reason} / unintentional [rung] (= the unver rungs).
The per-test breakdown comes from JUnit XML emitted by each tier's pytest invocation (`--junitxml`),
parsed here with the stdlib (no new dep). The integer **level** is computed by harness.level from a
rung-status dict derived here (`derive_rungs`) from the tier results + deps/SSO signals the
orchestrator holds; that mapping is documented in DECISIONS.md (Phase 3).
rung-status dict derived here (`derive_rungs`) from the tier results + structural signals the
orchestrator holds; the classification table is in DECISIONS.md (phase lvl5).
This module is import-pure (no side effects at import). `write_results` is the only writer; the
orchestrator calls the build/write path inside a try/except so a results failure NEVER changes the
@ -138,53 +140,90 @@ def derive_rungs(
results: dict[str, str],
*,
backup_capable: bool,
has_custom: bool,
has_upgrade_target: bool,
expected_na: dict | None = None,
lint_status: str | None = None,
) -> dict[str, str]:
"""Translate the orchestrator's tier results into the rung-status dict harness.level consumes —
the FOUR essential rungs only. Conservative by design — never reports a rung 'pass' it can't
substantiate (cardinal guardrail: presentation never inflates).
"""Translate the orchestrator's tier results + structural signals into the rung-status dict
harness.level consumes — the FIVE essential rungs. This is the SINGLE place every N/A source
is classified intentional ("skip") vs unintentional ("unver"); the table lives in DECISIONS.md
(phase lvl5). Conservative by design: never reports "pass" it can't substantiate, and any
rung that did not produce a pass/fail and has NO declared/structural reason is "unver".
L1 install : install tier pass.
L2 upgrade : upgrade tier (skip → N/A: only one published version).
L3 backup/res : backup AND restore tiers pass (N/A if not backup-capable).
L4 functional : recipe-specific functional tests pass — the custom tier. N/A if none ran.
L1 install : install tier pass. Always applies — never "skip" (non-run → unver).
L2 upgrade : upgrade tier. Tier skipped + no upgrade target (only one published
version, structural) → "skip"; declared in EXPECTED_NA → "skip";
anything else non-pass/fail (prior-stage abort, tier excluded) → "unver".
L3 backup/res : backup AND restore tiers pass. Not backup-capable (declared/structural)
"skip"; EXPECTED_NA → "skip"; unverified-while-capable → "unver".
L4 functional : the custom tier. No custom tests / tier skipped → EXPECTED_NA-declared
"skip", else "unver" (absent functional coverage is a gap, not an
intentional property of the recipe).
L5 lint : from the lint executor (harness.lint). pass/fail only — every recipe can
be linted, so there is NO intentional-skip escape hatch: a lint that
could not run (timeout, abra missing, executor error) is "unver".
Integration (SSO/OIDC) and recipe-local are OPTIONAL and intentionally NOT rungs here — they
never cap the level (SSO is still enforced for the run VERDICT in run_recipe_ci.py).
never affect the level (SSO is still enforced for the run VERDICT in run_recipe_ci.py).
"""
expected = set((expected_na or {}).keys())
rungs: dict[str, str] = {}
rungs["install"] = level_mod.tier_to_rung(results.get("install"))
rungs["upgrade"] = level_mod.tier_to_rung(results.get("upgrade"))
rungs["backup_restore"] = level_mod.backup_restore_status(
up = results.get("upgrade")
if up in ("pass", "fail"):
rungs["upgrade"] = up
elif up == "skip" and not has_upgrade_target:
# The orchestrator skipped the tier for the structural reason: nothing to upgrade from.
rungs["upgrade"] = "skip"
elif "upgrade" in expected:
rungs["upgrade"] = "skip"
else:
rungs["upgrade"] = "unver"
br = level_mod.backup_restore_status(
results.get("backup"), results.get("restore"), backup_capable
)
if br == "unver" and "backup_restore" in expected:
br = "skip"
rungs["backup_restore"] = br
custom = results.get("custom")
if not has_custom or custom == "skip" or custom is None:
rungs["functional"] = "na"
elif custom == "fail":
rungs["functional"] = "fail"
else: # custom == "pass"
rungs["functional"] = "pass"
if custom in ("pass", "fail"):
rungs["functional"] = custom
elif "functional" in expected:
rungs["functional"] = "skip"
else:
rungs["functional"] = "unver"
rungs["lint"] = lint_status if lint_status in ("pass", "fail") else "unver"
return rungs
def skips(rungs: dict[str, str], expected_na: dict | None) -> dict:
"""Split the SKIPPED (N/A) rungs into intentional vs unintentional (operator model).
# Reasons attached to STRUCTURAL intentional skips (no EXPECTED_NA declaration needed — the
# fact is read off the recipe itself).
_STRUCTURAL_REASON = {
"upgrade": "only one published version — no upgrade target",
"backup_restore": "not backup-capable (no backupbot labels / declared)",
}
A recipe lists the rungs it intentionally skips, each with a reason, in
`recipe_meta.EXPECTED_NA = {rung: reason}`. The rule is dead simple: a skipped rung is
**intentional** iff it is in that list; any rung that is skipped and NOT in the list is
**unintentional** (a coverage gap someone should either fill or declare). N/A still caps the
level either way — the harness never claims a rung it did not verify — this only labels *why* a
skip happened. Returns:
{ "intentional": {rung: reason, ...}, # skipped AND declared in EXPECTED_NA
"unintentional": [rung, ...] } # skipped but NOT declared
"""
def skips(
rungs: dict[str, str],
expected_na: dict | None,
) -> dict:
"""Mirror the rung classification into results.json's `skips` block:
{ "intentional": {rung: reason, ...}, # status "skip" — declared/structural, with why
"unintentional": [rung, ...] } # status "unver" — should have run, wasn't verified
The reason is the recipe's EXPECTED_NA declaration when present, else the structural fact
derive_rungs skipped on. Purely descriptive — the level math lives in harness.level."""
expected = {str(k): str(v) for k, v in (expected_na or {}).items()}
na = [r for r, st in rungs.items() if st == "na"]
intentional = {r: expected[r] for r in na if r in expected}
unintentional = sorted(r for r in na if r not in expected)
intentional = {
r: expected.get(r) or _STRUCTURAL_REASON.get(r, "declared intentional")
for r, st in rungs.items()
if st == "skip"
}
unintentional = sorted(r for r, st in rungs.items() if st == "unver")
return {"intentional": intentional, "unintentional": unintentional}
@ -200,6 +239,8 @@ def build_results(
clean_teardown: bool,
no_secret_leak: bool,
finished_ts: float | None,
has_upgrade_target: bool = True,
lint: dict | None = None,
screenshot: str | None = None,
summary_card: str | None = None,
expected_na: dict | None = None,
@ -207,17 +248,41 @@ def build_results(
) -> dict:
"""Assemble the full results.json dict (no I/O). `finished_ts` is passed in (the orchestrator
stamps it) so this stays pure and deterministic for unit tests. `expected_na` is the recipe's
declared intentional-skip map (recipe_meta.EXPECTED_NA) used to distinguish a deliberate skip from
accidentally-missing coverage."""
declared intentional-skip map (recipe_meta.EXPECTED_NA); `has_upgrade_target` is the structural
"a previous published version exists" fact; `lint` is harness.lint.run_lint's result dict
(None — e.g. an old caller — derives the lint rung as "unver": never a silent pass)."""
stages = collect_stages(records)
has_custom = any(r["tier"] == "custom" for r in records)
rungs = derive_rungs(results, backup_capable=backup_capable, has_custom=has_custom)
lvl, cap_reason = level_mod.compute_level(rungs)
# The rung that capped the climb (lowest non-pass), or None on a full climb — lets a consumer
# (card/badge) tell whether the cap was an intentional skip, an unintentional one, or a failure.
capped = level_mod.RUNGS[lvl] if cap_reason else None
lint = lint or {}
lint_status = lint.get("status")
rungs = derive_rungs(
results,
backup_capable=backup_capable,
has_upgrade_target=has_upgrade_target,
expected_na=expected_na,
lint_status=lint_status,
)
# Surface lint in the per-stage table too (it has no pytest/JUnit tier), so the card's
# stage breakdown carries all five rungs.
if rungs["lint"] != "skip": # lint is never "skip", but stay defensive
stages.append(
{
"name": "lint",
"status": rungs["lint"],
"tests": [
{
"name": "abra recipe lint",
"classname": "lint",
"source": "harness",
"status": rungs["lint"],
"ms": 0,
"message": str(lint.get("detail") or ""),
}
],
}
)
lvl = level_mod.compute_level(rungs)
return {
"schema": 1,
"schema": 2,
"run_id": run_id(),
"recipe": recipe,
"version": version,
@ -225,9 +290,12 @@ def build_results(
"ref": (ref or "")[:12],
"finished": finished_ts,
"level": lvl,
"level_cap_reason": cap_reason,
"level_cap_rung": capped,
"rungs": rungs,
"lint": {
"status": rungs["lint"],
"detail": str(lint.get("detail") or ""),
"rules_failed": list(lint.get("rules_failed") or []),
},
"skips": skips(rungs, expected_na),
"stages": stages,
"results": results,

View File

@ -18,6 +18,7 @@ missing, app slow, navigation error) is swallowed and returns None so the run/ve
from __future__ import annotations
import contextlib
import os
from . import browser as harness_browser
@ -28,6 +29,73 @@ VIEWPORT = {"width": 1280, "height": 800}
# Hard cap so a wedged app can never hang the run on the screenshot step (R7 / Phase-1 timeouts).
NAV_DEADLINE_S = 45
# ---- post-navigation settle (phase-shot fix, 2026-06-11) ----
# SPAs (immich, n8n, cryptpad, the keycloak admin console, lasuite-*, mumble-web, mattermost) fire
# `domcontentloaded` on their empty HTML shell and only paint after the JS bundle loads — snapping
# immediately produced solid blank frames (byte-stable 4801-2 B) or loading spinners. After nav,
# wait for network-idle up to SETTLE_TIMEOUT_MS (apps that never go idle — continuous polling —
# simply spend the cap; bounded, never raises), then RENDER_GRACE_MS for the final paint.
SETTLE_TIMEOUT_MS = 10_000
RENDER_GRACE_MS = 500
# A 1280x800 PNG below this is near-certainly a solid frame or a bare loading spinner (phase-shot
# audit: blank frames were 4801-2 B across three different apps, lone spinners 5.9-8.8 KB; the
# smallest real page was 12950 B). One bounded retry with an extra settle, then keep what we get —
# an honest late frame beats none, and the retry only ever replaces a tiny frame with a later one.
BLANK_SIZE_BYTES = 10_000
BLANK_RETRY_SETTLE_MS = 4_000
# Wait-budget arithmetic (plan-phase-shot §3 P3: step worst case ≤ ~60s): NAV_DEADLINE_S (45s,
# spent only while the app isn't serving yet) + SETTLE_TIMEOUT_MS + RENDER_GRACE_MS +
# BLANK_RETRY_SETTLE_MS + RENDER_GRACE_MS = 60s of bounded waiting; tested in unit tests.
def _settle(page, idle_timeout_ms: int) -> None:
"""Best-effort bounded settle: network-idle up to the cap, then a short render grace.
Never raises (R7) — a timeout just means the page kept polling; we snap what's painted."""
# cosmetic path (R7): a timeout on a never-idle app is expected — the cap IS the wait
with contextlib.suppress(Exception):
page.wait_for_load_state("networkidle", timeout=idle_timeout_ms)
with contextlib.suppress(Exception):
page.wait_for_timeout(RENDER_GRACE_MS)
def settle(page, idle_timeout_ms: int = SETTLE_TIMEOUT_MS) -> None:
"""Public settle for recipe SCREENSHOT hooks: after the hook navigates to its safe view, call
this so the snap happens post-paint. Same bounded best-effort contract as the default path."""
_settle(page, idle_timeout_ms)
def _snap_with_blank_retry(page, out_path: str) -> None:
"""Screenshot the page; if the PNG is blank/spinner-sized, retry ONCE after a longer settle.
The retry is snapped to a temp path and kept only if it is >= the first frame's size — later
is usually more painted, but a page can also regress (redirect, error overlay) and a worse
frame must never overwrite a better one (adversary finding A1)."""
page.screenshot(path=out_path, full_page=False)
try:
first = os.path.getsize(out_path)
except OSError:
return
if first >= BLANK_SIZE_BYTES:
return
print(
f" screenshot: frame looks blank/loading ({first} B < {BLANK_SIZE_BYTES} B) — "
"one retry after a longer settle",
flush=True,
)
_settle(page, BLANK_RETRY_SETTLE_MS)
retry_path = out_path + ".retry"
try:
page.screenshot(path=retry_path, full_page=False)
retry = os.path.getsize(retry_path)
if retry >= first:
os.replace(retry_path, out_path)
print(f" screenshot: retry frame kept ({retry} B >= {first} B)", flush=True)
else:
os.remove(retry_path)
print(f" screenshot: retry frame discarded ({retry} B < {first} B)", flush=True)
finally:
with contextlib.suppress(OSError):
os.remove(retry_path)
def screenshot_path(run_artifact_dir: str) -> str:
"""Canonical on-disk path for a run's app screenshot (pure)."""
@ -79,7 +147,7 @@ def capture(domain: str, out_path: str, *, recipe_meta: dict | None = None) -> s
# the uniform ctx convention (rcust P3).
hook(page, meta_mod.hook_ctx(domain, recipe_meta))
if not os.path.exists(out_path):
page.screenshot(path=out_path, full_page=False)
_snap_with_blank_retry(page, out_path)
else:
# Default: landing page. Accept any rendered status (200 or an auth redirect to a
# login form) — both are credential-free and representative of "the app is up".
@ -90,7 +158,9 @@ def capture(domain: str, out_path: str, *, recipe_meta: dict | None = None) -> s
deadline_seconds=NAV_DEADLINE_S,
wait_until="domcontentloaded",
)
page.screenshot(path=out_path, full_page=False)
# SPA paint race fix (phase-shot): settle before snapping, retry a blank frame.
_settle(page, SETTLE_TIMEOUT_MS)
_snap_with_blank_retry(page, out_path)
finally:
browser.close()
if os.path.exists(out_path) and os.path.getsize(out_path) > 0:

View File

@ -58,6 +58,9 @@ from harness import ( # noqa: E402
from harness import ( # noqa: E402
deps as deps_mod,
)
from harness import ( # noqa: E402
lint as lint_mod,
)
from harness import ( # noqa: E402
manifest as manifest_mod,
)
@ -928,6 +931,24 @@ def main() -> int:
run_artifact_dir = os.path.join(results_mod.runs_dir(), results_mod.run_id())
junit_dir = os.path.join(run_artifact_dir, "junit")
records: list[dict] = []
# L5 lint rung (phase lvl5): `abra recipe lint` against the EXACT tested ref, in a pristine
# scratch clone (harness.lint — the per-run tree is still at head_ref here, before any
# version-pinning checkout). Level rung only — NEVER the verdict: run_lint catches every
# failure mode into status "unver" (60s hard budget) and this belt-and-braces wrap makes a
# crashed executor identical to "could not verify".
lint_result = {"status": "unver", "detail": "lint executor crashed", "rules_failed": []}
try:
lint_result = lint_mod.run_lint(recipe, head_ref, run_artifact_dir)
except Exception as e: # noqa: BLE001 — lint is a rung, not a gate; never touches the verdict
print(
f"!! lint rung executor crashed (non-fatal, rung=unver): {_scrub(str(e))}", flush=True
)
print(
f"lint rung: {lint_result['status']}"
f"{'' + lint_result['detail'] if lint_result.get('detail') else ''}",
flush=True,
)
with contextlib.suppress(OSError):
os.makedirs(junit_dir, exist_ok=True)
@ -1253,6 +1274,8 @@ def main() -> int:
records=records,
results=results,
backup_capable=backup_cap,
has_upgrade_target=prev is not None, # structural: a previous published version exists
lint=lint_result, # L5 rung (phase lvl5)
clean_teardown=clean_teardown,
no_secret_leak=True, # narrowed below by an actual scan of the serialised artifact
screenshot=screenshot_rel, # Phase 3 U1 (R4): relative PNG name iff capture succeeded
@ -1270,17 +1293,15 @@ def main() -> int:
file=sys.stderr,
)
path = results_mod.write_results(data)
print(
f"results.json written: {path} (level={data['level']}"
f"{'' + data['level_cap_reason'] if data['level_cap_reason'] else ''})",
flush=True,
)
# Surface UNINTENTIONAL skips in the CI log (non-blocking, R7): a rung that was skipped (N/A)
# but is not in the recipe's intentional list — either add the missing coverage or declare it.
print(f"results.json written: {path} (level={data['level']} of 5)", flush=True)
# Surface UNVERIFIED rungs in the CI log (non-blocking, R7): a rung that should have run
# and wasn't verified blocks the level above it — fill the coverage, or (where a
# declared/structural reason genuinely applies) declare it in EXPECTED_NA.
for rung in data.get("skips", {}).get("unintentional", []):
print(
f"⚠ coverage: rung '{rung}' was skipped (N/A) but is not declared intentional — add "
f"the missing test/label, or list it in tests/{recipe}/recipe_meta.py "
f"⚠ coverage: rung '{rung}' is UNVERIFIED (did not run / could not be checked) — "
f"the level cannot rise above it. Add the missing test/coverage, or declare a "
f"genuine inapplicability in tests/{recipe}/recipe_meta.py "
f"EXPECTED_NA = {{'{rung}': '<why>'}}.",
flush=True,
)
@ -1302,21 +1323,10 @@ def main() -> int:
with open(html_path, "w", encoding="utf-8") as f:
f.write(card_mod.render_card_html(data, screenshot_rel=data.get("screenshot")))
png = card_mod.render_card_png(html_path, os.path.join(run_artifact_dir, "summary.png"))
capped = data.get("level_cap_rung")
sk = data.get("skips", {})
cap_skip = (
"intentional"
if capped in (sk.get("intentional") or {})
else "unintentional"
if capped in (sk.get("unintentional") or [])
else ""
)
# Badge = level only (number + colour) — the per-rung table on the card is the sole
# carrier of "why isn't this higher" (operator-specified, phase lvl5).
with open(os.path.join(run_artifact_dir, "badge.svg"), "w", encoding="utf-8") as f:
f.write(
card_mod.level_badge_svg(
data["level"], data.get("level_cap_reason", ""), cap_skip
)
)
f.write(card_mod.level_badge_svg(data["level"]))
print(
f"summary card {'rendered ' + png if png else '(PNG render unavailable)'} + "
f"badge.svg written into {run_artifact_dir}",

View File

@ -19,7 +19,12 @@ def pre_install(ctx):
NOT create the MinIO bucket: `minio-createbuckets` is a `replicas:0` one-shot (restart_policy:
none) that must be triggered. The MinIO storage test asserts the bucket exists, so trigger it
here and poll. `--detach` is REQUIRED: the job creates the bucket then EXITS 0, so it never
holds a steady 1/1 replica — a blocking scale would wait forever."""
holds a steady 1/1 replica — a blocking scale would wait forever.
BEST-EFFORT, like the setup_custom_tests.sh it replaced: on poll timeout we WARN and continue
(the one-shot often lands just after the window). The custom-tier MinIO storage test is the
real gate for a genuinely missing bucket — failing the install op here was an rcust M2
regression (the original hook fell through on timeout by design)."""
stack = ctx.domain.replace(".", "_")
print(" pre_install: creating MinIO bucket via the minio-createbuckets one-shot", flush=True)
subprocess.run(
@ -51,7 +56,12 @@ def pre_install(ctx):
)
return
time.sleep(3)
raise AssertionError("minio-createbuckets one-shot did not create drive-media-storage in 90s")
print(
" !! pre_install: minio-createbuckets one-shot did not create drive-media-storage in 90s "
"— continuing (best-effort, as the pre-restructure hook did); the custom-tier MinIO test "
"gates a genuinely missing bucket",
flush=True,
)
def _wait_collabora_ready(domain, timeout=420):

View File

@ -18,3 +18,31 @@ HEALTH_OK = (200, 302)
DEPLOY_TIMEOUT = 900
HTTP_TIMEOUT = 600
EXTRA_ENV = {"TIMEOUT": "600"}
def SCREENSHOT(page, ctx):
"""Land the real sign-in form for the CI card (phase-shot). Mattermost serves a
"view in desktop app or browser?" interstitial on a browser's FIRST visit to ANY route
(including /login — proven by shot-proof2-mattermost-lts: byte-identical interstitial PNG with
and without a plain /login hook); a real user clicks "View in Browser" to reach the login
form, so the hook does exactly that. Click + second settle are best-effort (if the
interstitial is absent we are already on the form). Credential-free (empty fields, R7
secret-safety); the harness snaps the PNG after this returns. Waits are kept short (8s/3s/8s)
so the realistic hook path stays well inside the ~60s step budget — the 45s nav deadline is
only burned when the app never serves, and then the hook raises before any settle."""
import contextlib
from harness import browser as harness_browser
from harness import screenshot as screenshot_mod
harness_browser.goto_with_retry(
page,
f"{ctx.base_url}/login",
accept_statuses=(200,),
deadline_seconds=screenshot_mod.NAV_DEADLINE_S,
wait_until="domcontentloaded",
)
screenshot_mod.settle(page, 8_000)
with contextlib.suppress(Exception):
page.click("text=View in Browser", timeout=3_000)
screenshot_mod.settle(page, 8_000)

View File

@ -12,8 +12,10 @@ from harness import http as harness_http # noqa: E402
def test_plausible_root_serves(live_app):
"""GET /api/health → 200 (clickhouse+postgres ready).
`/` itself 500s via auth_controller under DISABLE_AUTH, so it is NOT a
reliable health probe; the dedicated /api/health endpoint is.
`/` is NOT a reliable health probe (500s during datastore init; 302s to
/register once ready — and 500'd permanently under the pre-2026-06-11
62-char SECRET_KEY_BASE, see recipe_meta.EXTRA_ENV); the dedicated
/api/health endpoint is.
"""
url = f"https://{live_app}/api/health"
status, _ = harness_http.retry_http_get(url, expect_status=(200,), max_wait=60, interval=3)

View File

@ -7,9 +7,10 @@ HEALTH_OK = (200,)
# `events_db` but the service is named `plausible_events_db`, so swarm applies no ordering) and returns
# 500 until clickhouse + DB migrations finish — several minutes on a cold deploy. The dedicated
# /api/health endpoint returns 200 with {"clickhouse":"ok","postgres":"ok","sites_cache":"ok"} only
# once both datastores are ready, so it is a true readiness probe; `/` is unreliable (500s during init,
# 302s once ready, so it cannot distinguish "not ready" from "ready"). Give a wide HTTP window so the
# health poll waits out that init. [v1 failed at HTTP_TIMEOUT=600 polling `/`.]
# once both datastores are ready, so it is a true readiness probe; `/` is unreliable (500s during init;
# 302s to /register once ready — and with the pre-2026-06-11 62-char SECRET_KEY_BASE every HTML render
# 500'd permanently, see EXTRA_ENV). Give a wide HTTP window so the health poll waits out that init.
# [v1 failed at HTTP_TIMEOUT=600 polling `/`.]
DEPLOY_TIMEOUT = 1200
HTTP_TIMEOUT = 1200
@ -17,8 +18,12 @@ HTTP_TIMEOUT = 1200
EXTRA_ENV = {
"DISABLE_AUTH": "true",
"DISABLE_REGISTRATION": "true",
# 64-char stable value for CI — plausible (Phoenix) requires >= 64 chars
"SECRET_KEY_BASE": "ccciplausibletestkeybase64charsexactlyforCIephemeral4567890123",
# Stable CI value, 68 chars — Phoenix's cookie session store requires >= 64 BYTES and raises
# `ArgumentError ... at least 64 bytes` → HTTP 500 on EVERY page render (HTML routes only;
# /api/* never touches the cookie store, so health + event tests were unaffected) if it is
# shorter. The previous value was 62 chars, which is why every page (and the app screenshot)
# 500'd while the API tiers all passed (phase-shot root cause, 2026-06-11).
"SECRET_KEY_BASE": "ccciplausibletestkeybase64charsexactlyforCIephemeralrun4567890123456",
}
# The upgrade tier defaults its base to recipe_versions[-2]. For the 3.1.0 upgrade PR the

View File

@ -1,8 +1,11 @@
"""Unit tests for the pure card/badge renderers (harness.card), Phase 3 U2 (R3/R6).
"""Unit tests for the pure card/badge renderers (harness.card) — phase lvl5 semantics.
Covers the deterministic HTML + SVG string builders (the PNG step needs Playwright + is exercised in
the U2 live demo). The cardinal check: the card REPORTS the data verbatim — level/marks come straight
from the dict, never recomputed. Run cold: cc-ci-run -m pytest tests/unit/test_card.py -q
Covers the deterministic HTML + SVG string builders (the PNG step needs Playwright + is exercised
live). The cardinal check: the card REPORTS the data verbatim — level/marks come straight from the
dict, never recomputed — the badge is NUMBER + COLOUR ONLY, and the per-rung table rows (incl.
intentional-skip / unverified) are the sole carrier of "why isn't the level higher". Old schema-1
artifacts (4-rung ladder, cap fields present) must render without error and without relabeling.
Run cold: cc-ci-run -m pytest tests/unit/test_card.py -q
"""
from __future__ import annotations
@ -14,12 +17,19 @@ sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner")
from harness import card as C # noqa: E402
def _data(level=3, cap="L4 functional (recipe-specific tests) N/A"):
return {
def _data(level=5, **kw):
d = {
"schema": 2,
"recipe": "uptime-kuma",
"version": "1.23.0",
"level": level,
"level_cap_reason": cap,
"rungs": {
"install": "pass",
"upgrade": "pass",
"backup_restore": "pass",
"functional": "pass",
"lint": "pass",
},
"flags": {"clean_teardown": True, "no_secret_leak": True},
"screenshot": "screenshot.png",
"stages": [
@ -36,46 +46,54 @@ def _data(level=3, cap="L4 functional (recipe-specific tests) N/A"):
{"name": "test_broken", "status": "fail", "ms": 5},
],
},
{
"name": "lint",
"status": "pass",
"tests": [{"name": "abra recipe lint", "status": "pass", "ms": 0}],
},
],
}
d.update(kw)
return d
def test_level_color_ramp():
assert C.level_color(0) != C.level_color(6)
assert C.level_color(6) == "#3fb950"
assert C.level_color(99) == "#8b949e" # unknown → grey
# 0 (red) … 5 (bright green — full 5-rung climb); unknown → grey.
assert C.level_color(0) != C.level_color(5)
assert C.level_color(5) == "#3fb950"
assert C.level_color(99) == "#8b949e"
def test_badge_svg_wellformed():
def test_badge_svg_is_number_and_color_only():
svg = C.level_badge_svg(4)
assert svg.startswith("<svg") and svg.endswith("</svg>")
assert "level 4" in svg
assert C.level_color(4) in svg
# plain cap (no intent) → two-box badge, no third segment
assert "expected" not in svg and "gap?" not in svg
# operator-specified (phase lvl5): NOTHING but the level on the badge no annotation
# segment of any kind, whatever the rung situation.
assert "expected" not in svg and "gap?" not in svg and "skip" not in svg
def test_badge_svg_differentiates_intentional_vs_unintentional_skip():
# an intentional (declared) skip capped the climb → muted "expected" third segment
exp = C.level_badge_svg(2, "L3 backup/restore N/A", "intentional")
assert "level 2" in exp and "expected" in exp and C.EXPECT_COLOR in exp
assert "gap?" not in exp
# an unintentional skip (not declared) → amber "gap?" third segment
gap = C.level_badge_svg(2, "L3 backup/restore N/A", "unintentional")
assert "level 2" in gap and "gap?" in gap and C.GAP_COLOR in gap
assert "expected" not in gap
def test_badge_svg_level5():
svg = C.level_badge_svg(5)
assert "level 5" in svg and "#3fb950" in svg
def test_skip_rows_intentional_and_unintentional():
def test_skip_rows_intentional_and_unverified():
html_out = C._skip_rows(
{"intentional": {"backup_restore": "no persistent data"}, "unintentional": ["functional"]}
)
# intentional skip: labelled row (muted green) + the reason on its own line
assert "intentional skip" in html_out and C.SKIP_GREEN in html_out
assert "backup/restore" in html_out and "no persistent data" in html_out
# unintentional skip: amber row + prompt to declare/add coverage
assert "unintentional skip" in html_out and C.GAP_COLOR in html_out
assert "functional" in html_out and "EXPECTED_NA" in html_out
# unverified rung: amber row + the blocks-the-level explanation
assert "unverified" in html_out and C.GAP_COLOR in html_out
assert "functional" in html_out and "cannot rise above" in html_out
def test_skip_rows_lint_label_known():
html_out = C._skip_rows({"intentional": {}, "unintentional": ["lint"]})
assert ">lint<" in html_out.replace("</b>", "<") # rung label renders, not a KeyError
def test_skip_rows_empty_when_no_skips():
@ -83,22 +101,68 @@ def test_skip_rows_empty_when_no_skips():
def test_card_html_reports_level_verbatim():
html = C.render_card_html(_data(level=2, cap="L3 backup/restore (data integrity) N/A"))
html = C.render_card_html(_data(level=2))
assert "uptime-kuma" in html
assert "1.23.0" in html
# the level shown is exactly what was passed (no recompute)
assert ">2<" in html
assert "L3 backup/restore" in html
assert "level 2 of 5" in html
assert C.level_color(2) in html
def test_card_html_shows_stage_and_test_marks():
def test_card_html_no_cap_language():
html = C.render_card_html(_data())
assert "capped" not in html and "cap_reason" not in html
assert "level 5 of 5" in html
def test_card_html_old_schema1_artifact_renders():
# history compatibility: a pre-lvl5 results.json (4-rung ladder, cap fields, "na" statuses)
# renders without KeyError and shows ITS OWN ladder height (no retroactive relabeling).
old = {
"schema": 1,
"recipe": "legacy",
"version": "0.9",
"level": 4,
"level_cap_reason": "",
"level_cap_rung": None,
"rungs": {
"install": "pass",
"upgrade": "pass",
"backup_restore": "pass",
"functional": "pass",
},
"skips": {"intentional": {}, "unintentional": []},
"flags": {"clean_teardown": True, "no_secret_leak": True},
"screenshot": None,
"stages": [],
}
html = C.render_card_html(old)
assert "legacy" in html
assert "level 4 of 4" in html # the old top, not 5
assert "capped" not in html
def test_card_html_shows_stage_and_test_marks_incl_lint():
html = C.render_card_html(_data())
assert "install" in html and "custom" in html
assert "abra recipe lint" in html
assert "test_serving" in html and "test_broken" in html
assert C.STATUS_MARK["pass"] in html and C.STATUS_MARK["fail"] in html
def test_card_html_unver_stage_mark_renders():
d = _data()
d["stages"][2] = {
"name": "lint",
"status": "unver",
"tests": [{"name": "abra recipe lint", "status": "unver", "ms": 0, "message": "timed out"}],
}
html = C.render_card_html(d)
assert C.STATUS_MARK["unver"] in html
assert C.STATUS_COLOR["unver"] in html
def test_card_html_flags_rendered():
html = C.render_card_html(_data())
assert "clean teardown" in html and "no secret leak" in html

View File

@ -0,0 +1,96 @@
"""Unit tests for lifecycle.services_converged's completed-one-shot rule (rcust M2 fix-forward).
A TRIGGERED one-shot service (restart_policy none, scaled 0→1, runs once, exits 0) reports "0/1"
forever after its task completes — swarm never restarts it. A bare `cur != want` rejection then
blocks convergence for the REST OF THE RUN (lasuite-drive minio-createbuckets: the P2b port moved
the bucket trigger BEFORE the install assert, so the assert burned the full DEPLOY_TIMEOUT —
pre-restructure the trigger ran after the assert and converge never saw the 0/1).
Pins (the Adversary's non-vacuity criteria):
- deficit explained ENTIRELY by Complete tasks → converged (the one-shot did its job).
- deficit with a Failed task → NOT converged (a broken one-shot must not pass).
- deficit with a Running/Preparing task → NOT converged (still spinning up; no early green).
- deficit with NO tasks yet → NOT converged (still scheduling).
- plain N/N services still converge; plain 0/1-spinning-up still doesn't (regression guards).
"""
from __future__ import annotations
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
from harness import lifecycle as lc # noqa: E402
class _R:
def __init__(self, stdout="", stderr="", returncode=0):
self.stdout, self.stderr, self.returncode = stdout, stderr, returncode
def _patch_docker(monkeypatch, replicas_rows, task_states_by_service=None, update_state=""):
"""Fake subprocess.run for the three docker calls services_converged makes."""
task_states_by_service = task_states_by_service or {}
def fake_run(args, **kw):
if args[:3] == ["docker", "stack", "services"]:
return _R(stdout="\n".join(replicas_rows) + "\n")
if args[:3] == ["docker", "service", "ps"]:
name = args[3]
return _R(stdout="\n".join(task_states_by_service.get(name, [])) + "\n")
if args[:3] == ["docker", "service", "inspect"]:
return _R(stdout=update_state + "\n")
raise AssertionError(f"unexpected docker call: {args}")
monkeypatch.setattr(lc.subprocess, "run", fake_run)
def test_completed_oneshot_deficit_is_converged(monkeypatch):
_patch_docker(
monkeypatch,
["stack_app 1/1", "stack_minio-createbuckets 0/1"],
{"stack_minio-createbuckets": ["Complete 28 minutes ago"]},
)
assert lc.services_converged("app.example.com") is True
def test_failed_oneshot_deficit_is_not_converged(monkeypatch):
_patch_docker(
monkeypatch,
["stack_app 1/1", "stack_minio-createbuckets 0/1"],
{"stack_minio-createbuckets": ["Failed 2 minutes ago"]},
)
assert lc.services_converged("app.example.com") is False
def test_mixed_complete_and_failed_tasks_not_converged(monkeypatch):
_patch_docker(
monkeypatch,
["stack_oneshot 0/1"],
{"stack_oneshot": ["Complete 5 minutes ago", "Failed 6 minutes ago"]},
)
assert lc.services_converged("app.example.com") is False
def test_still_spinning_up_not_converged(monkeypatch):
_patch_docker(
monkeypatch,
["stack_app 0/1"],
{"stack_app": ["Preparing 10 seconds ago"]},
)
assert lc.services_converged("app.example.com") is False
def test_deficit_with_no_tasks_yet_not_converged(monkeypatch):
_patch_docker(monkeypatch, ["stack_app 0/1"], {"stack_app": []})
assert lc.services_converged("app.example.com") is False
def test_all_full_replicas_still_converged(monkeypatch):
_patch_docker(monkeypatch, ["stack_app 1/1", "stack_db 1/1"])
assert lc.services_converged("app.example.com") is True
def test_on_demand_zero_zero_oneshot_still_converged(monkeypatch):
_patch_docker(monkeypatch, ["stack_app 1/1", "stack_minio-createbuckets 0/0"])
assert lc.services_converged("app.example.com") is True

View File

@ -28,7 +28,6 @@ def _row(**kw):
"ref": "db9a9502",
"version": "db9a95024e9d",
"level": 4,
"level_cap_reason": "",
"has_screenshot": True,
"flags": {"clean_teardown": True, "no_secret_leak": True},
"finished": 0,
@ -40,7 +39,7 @@ def _row(**kw):
def test_level_color_ramp_and_fallback():
assert dashboard.level_color(0) == "#e5534b"
assert dashboard.level_color(6) == "#3fb950"
assert dashboard.level_color(5) == "#3fb950" # full 5-rung climb (phase lvl5)
assert dashboard.level_color(4) == "#a0b93f"
assert dashboard.level_color(99) == "#8b949e"
assert dashboard.level_color(None) == "#8b949e"
@ -61,20 +60,12 @@ def test_overview_grid_mirrors_results():
def test_overview_never_greener_than_data():
# A failed run at level 0 must show level 0 + the failure pill — never a green/high level.
out = dashboard.render_overview(
[
_row(
status="failure",
level=0,
has_screenshot=False,
flags={},
level_cap_reason="L1 install FAILED",
)
]
[_row(status="failure", level=0, has_screenshot=False, flags={})]
)
assert "level 0" in out
assert dashboard.level_color(0) in out # red
assert dashboard._COLORS["failure"] in out
assert "level 4" not in out and "level 5" not in out and "level 6" not in out
assert "level 4" not in out and "level 5" not in out
assert "no screenshot" in out # placeholder, no broken image
@ -104,7 +95,6 @@ def test_build_row_projects_results(monkeypatch):
lambda n: {
"version": "1.2.3",
"level": 2,
"level_cap_reason": "cap",
"screenshot": "screenshot.png",
"flags": {"clean_teardown": True},
},
@ -123,6 +113,38 @@ def test_build_row_projects_results(monkeypatch):
assert r["url"].endswith("/cc-ci/7")
def test_build_row_old_schema1_artifact_renders(monkeypatch):
# History compatibility (phase lvl5): pre-lvl5 results.json still carries cap fields and a
# 4-rung ladder — it must project + render without KeyError, level shown VERBATIM (no
# retroactive relabeling), and the old cap text simply isn't resurfaced anywhere.
monkeypatch.setattr(
dashboard,
"_results_for",
lambda n: {
"schema": 1,
"version": "0.9.1",
"level": 2,
"level_cap_reason": "L3 backup/restore (data integrity) N/A",
"level_cap_rung": "backup_restore",
"screenshot": "screenshot.png",
"flags": {"clean_teardown": True, "no_secret_leak": True},
},
)
b = {
"number": 11,
"status": "success",
"event": "custom",
"params": {"RECIPE": "legacy", "REF": "abc123"},
"finished": 5,
}
r = dashboard._build_row(b)
out = dashboard.render_overview([r])
assert "level 2" in out and dashboard.level_color(2) in out
assert "N/A" not in out and "capped" not in out # cap language gone from the surface
hist = dashboard.render_history("legacy", [r])
assert "L2" in hist
def test_build_row_degrades_without_results(monkeypatch):
# No results.json (e.g. an old run): grid still renders from Drone fields, level absent.
monkeypatch.setattr(dashboard, "_results_for", lambda n: {})

View File

@ -1,8 +1,14 @@
"""Unit tests for the Phase-3 level ladder (harness.level), plan-phase3-results-ux.md §4.1 / R1.
"""Unit tests for the level ladder (harness.level) — phase lvl5 semantics.
Pure function — no I/O. Proves the YunoHost gap-caps-the-level semantics, including the U0 gate
acceptance: a recipe that climbs through L4 reports 4, and one that fails at L2 is capped at 1
(the level just below the failed rung). Run cold with: cc-ci-run -m pytest tests/unit/test_level.py -q
Pure function — no I/O. Proves the operator-decided rule (plan-phase-lvl5-lint-rung.md,
DECISIONS.md phase lvl5):
level = max i such that rung_i == "pass" and every rung j < i is "pass" or "skip"
— a real FAIL blocks, an UNVERIFIED rung blocks exactly like a fail, an INTENTIONAL skip is
climbed past. Includes the mission's four worked examples verbatim, and the old N/A cases
(single-published-version recipe, non-backup-capable recipe) now climbing past their former
caps. Run cold with: cc-ci-run -m pytest tests/unit/test_level.py -q
"""
from __future__ import annotations
@ -19,69 +25,115 @@ def _rungs(
upgrade="pass",
backup_restore="pass",
functional="pass",
lint="pass",
):
return {
"install": install,
"upgrade": upgrade,
"backup_restore": backup_restore,
"functional": functional,
"lint": lint,
}
# ---- the ladder: four essential rungs, top is L4 (functional) ----
# ---- the ladder: five essential rungs, top is L5 (lint) ----
def test_full_clean_climb_to_L4():
# All four essential rungs pass → L4 (the top; integration/recipe-local are optional, not leveled).
lvl, reason = L.compute_level(_rungs())
assert lvl == 4
assert reason == ""
def test_full_clean_climb_is_L5():
assert L.compute_level(_rungs()) == 5
def test_fails_at_L2_capped_at_L1():
# GATE: upgrade fails → capped at L1 even though higher rungs would pass.
lvl, reason = L.compute_level(_rungs(upgrade="fail", backup_restore="pass", functional="pass"))
assert lvl == 1
assert "L2" in reason and "FAILED" in reason
def test_ladder_is_five_rungs_lint_on_top():
assert L.RUNGS == ("install", "upgrade", "backup_restore", "functional", "lint")
assert "lint" in L.RUNG_LABEL[5]
# ---- L0 / install ----
# ---- mission worked examples (operator Q&A 2026-06-11, verbatim) ----
def test_mission_example_fail_blocks():
# install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → level 1 (fail blocks).
assert L.compute_level(_rungs(upgrade="fail")) == 1
def test_mission_example_intentional_skip_climbs():
# install ✔, upgrade ✔, backup skip (not capable), functional ✔, lint ✔ → level 5
# (previously capped at 2 — the confusing part the operator removed).
assert L.compute_level(_rungs(backup_restore="skip")) == 5
def test_mission_example_unverified_blocks():
# install ✔, upgrade ✔, backup UNVER (harness error), functional ✔, lint ✔ → level 2
# (we cannot claim what we didn't check).
assert L.compute_level(_rungs(backup_restore="unver")) == 2
def test_mission_example_unverified_top_rung_not_earned():
# all four ✔, lint unver (abra missing) → level 4.
assert L.compute_level(_rungs(lint="unver")) == 4
# ---- blocking semantics ----
def test_install_fail_is_L0():
lvl, reason = L.compute_level(_rungs(install="fail"))
assert lvl == 0
assert "L1" in reason and "FAILED" in reason
assert L.compute_level(_rungs(install="fail")) == 0
# ---- gap-caps semantics: a higher pass can't rescue a lower gap ----
def test_install_unver_is_L0():
assert L.compute_level(_rungs(install="unver")) == 0
def test_higher_pass_does_not_rescue_lower_na():
# backup/restore N/A (stateless app) caps at L2 even though functional would pass.
lvl, reason = L.compute_level(_rungs(backup_restore="na", functional="pass"))
assert lvl == 2
assert "L3" in reason and "N/A" in reason
def test_higher_pass_never_rescues_a_fail():
# everything above a failed rung is dead, however green.
assert L.compute_level(_rungs(upgrade="fail", backup_restore="pass", functional="pass")) == 1
def test_upgrade_na_caps_at_L1():
# only one published version → no upgrade possible → N/A caps at L1 (upgrade is essential).
lvl, reason = L.compute_level(_rungs(upgrade="na"))
assert lvl == 1
assert "L2" in reason and "N/A" in reason
def test_lint_fail_blocks_at_4():
assert L.compute_level(_rungs(lint="fail")) == 4
def test_functional_na_caps_at_L3():
# no recipe-specific functional tests → functional N/A caps at L3.
lvl, reason = L.compute_level(_rungs(functional="na"))
assert lvl == 3
assert "L4" in reason and "N/A" in reason
def test_unver_blocks_even_after_a_skip():
# skip at L2 is climbed past, but the unver at L3 still blocks → level 1.
assert L.compute_level(_rungs(upgrade="skip", backup_restore="unver")) == 1
def test_functional_fail_caps_at_L3():
lvl, reason = L.compute_level(_rungs(functional="fail"))
assert lvl == 3
assert "L4" in reason and "FAILED" in reason
# ---- intentional-skip climbing (the de-cap) ----
def test_single_version_recipe_climbs_past_upgrade_skip():
# old rule: upgrade N/A capped at L1. New rule: skip is climbed past → full climb 5.
assert L.compute_level(_rungs(upgrade="skip")) == 5
def test_stateless_recipe_climbs_past_backup_skip_to_lint():
assert L.compute_level(_rungs(upgrade="skip", backup_restore="skip")) == 5
def test_skip_does_not_count_as_pass():
# ALL skips → nothing passed → level 0 (a skip climbs, but never earns).
assert (
L.compute_level(
_rungs(
install="skip",
upgrade="skip",
backup_restore="skip",
functional="skip",
lint="skip",
)
)
== 0
)
def test_skip_then_pass_earns_the_higher_rung():
# skip at L4, pass at L5 → level 5 (the skip below doesn't stop the climb).
assert L.compute_level(_rungs(functional="skip")) == 5
def test_trailing_skip_keeps_last_pass():
# passes up to L3, skips above → level stays 3 (skips never raise).
assert L.compute_level(_rungs(functional="skip", lint="skip")) == 3
# ---- input validation ----
@ -89,7 +141,7 @@ def test_functional_fail_caps_at_L3():
def test_invalid_status_raises():
bad = _rungs()
bad["functional"] = "passed" # not in the vocabulary
bad["functional"] = "na" # the OLD vocabulary is no longer valid — every N/A is classified
try:
L.compute_level(bad)
except ValueError:
@ -97,6 +149,16 @@ def test_invalid_status_raises():
raise AssertionError("expected ValueError on invalid rung status")
def test_missing_rung_raises():
bad = _rungs()
del bad["lint"]
try:
L.compute_level(bad)
except ValueError:
return
raise AssertionError("expected ValueError on missing rung")
# ---- helpers: backup_restore_status ----
@ -104,8 +166,8 @@ def test_backup_restore_status_pass():
assert L.backup_restore_status("pass", "pass", True) == "pass"
def test_backup_restore_status_not_capable_is_na():
assert L.backup_restore_status("skip", "skip", False) == "na"
def test_backup_restore_status_not_capable_is_intentional_skip():
assert L.backup_restore_status("skip", "skip", False) == "skip"
def test_backup_restore_status_fail_on_either():
@ -113,16 +175,20 @@ def test_backup_restore_status_fail_on_either():
assert L.backup_restore_status("fail", "pass", True) == "fail"
def test_backup_restore_partial_is_na():
# backup-capable but restore didn't run cleanly (not pass, not fail) → cannot claim L3
assert L.backup_restore_status("pass", "skip", True) == "na"
def test_backup_restore_partial_is_unverified():
# backup-capable but restore didn't run cleanly (not pass, not fail) → cannot claim L3,
# and the non-run is NOT intentional → unver (blocks the level above it).
assert L.backup_restore_status("pass", "skip", True) == "unver"
assert L.backup_restore_status(None, None, True) == "unver"
# ---- helpers: tier_to_rung ----
def test_tier_to_rung_mapping():
def test_tier_to_rung_mapping_defaults_unverified():
assert L.tier_to_rung("pass") == "pass"
assert L.tier_to_rung("fail") == "fail"
assert L.tier_to_rung("skip") == "na"
assert L.tier_to_rung(None) == "na"
# no intentionality information here — a non-run is unver; derive_rungs upgrades to "skip"
# only on a declared/structural fact, never the other way.
assert L.tier_to_rung("skip") == "unver"
assert L.tier_to_rung(None) == "unver"

196
tests/unit/test_lint.py Normal file
View File

@ -0,0 +1,196 @@
"""Unit tests for the L5 lint executor (harness.lint) — phase lvl5.
Covers the table parser + classifier against real abra-0.13 output shapes (probed on the CI
host 2026-06-11, JOURNAL-lvl5), and run_lint's never-raise / never-silent-pass guarantees via
a fake-PATH `script` shim (no real abra needed). Run cold:
cc-ci-run -m pytest tests/unit/test_lint.py -q
"""
from __future__ import annotations
import os
import stat
import subprocess
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
from harness import lint as L # noqa: E402
# Realistic abra lint table rows, as captured on cc-ci: abra renders HEAVY box-drawing
# verticals (┃ U+2503) — the parser must match those, not just the light │.
TABLE_OK = (
"┏━━━━━━┳━━━━━━┓\r\n"
"┃ R001 ┃ compose config has expected version ┃ warn ┃ ✅ ┃ - ┃ ensure ┃\r\n"
"┃ R015 ┃ long secret names ┃ warn ┃ ❌ ┃ - ┃ reduce ┃\r\n"
"┃ R008 ┃ .env.sample provided ┃ error ┃ ✅ ┃ - ┃ create ┃\r\n"
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ✅ ┃ - ┃ retag ┃\r\n"
"┗━━━━━━┻━━━━━━┛\r\n"
"WARN secret session_secret is longer than 12 characters\r\n"
)
# The light-vertical variant must parse identically (defensive: abra theme/version drift).
TABLE_OK_LIGHT = TABLE_OK.replace("", "")
TABLE_R014_FAIL = (
TABLE_OK.replace(
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ✅",
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ❌",
)
+ "WARN critical errors present in hedgedoc config\r\n"
)
TABLE_SKIPPED_ERROR = TABLE_OK.replace(
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ✅ ┃ - ┃",
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ❌ ┃ skipped ┃",
)
# ---- parse_table ----
def test_parse_table_rows_and_marks():
rows = L.parse_table(TABLE_OK)
by = {r["rule"]: r for r in rows}
assert set(by) == {"R001", "R015", "R008", "R014"}
assert by["R001"]["severity"] == "warn" and by["R001"]["satisfied"]
assert by["R015"]["severity"] == "warn" and not by["R015"]["satisfied"]
assert by["R014"]["severity"] == "error" and by["R014"]["satisfied"]
assert not any(r["skipped"] for r in rows)
def test_parse_table_strips_ansi():
rows = L.parse_table("\x1b[1m" + TABLE_OK + "\x1b[0m")
assert len(rows) == 4
def test_parse_table_light_verticals_too():
assert L.parse_table(TABLE_OK_LIGHT) == L.parse_table(TABLE_OK)
def test_parse_table_garbage_is_empty():
assert L.parse_table("FATA something exploded\r\n") == []
assert L.parse_table("") == []
# ---- classify ----
def test_classify_pass_with_warn_misses_only():
# warn-severity ❌ (R015) does NOT fail the rung — only error-severity rules do.
assert L.classify(0, TABLE_OK) == ("pass", "", [])
def test_classify_error_rule_fails():
status, detail, failed = L.classify(0, TABLE_R014_FAIL)
assert status == "fail"
assert failed == ["R014"]
assert "R014" in detail
def test_classify_skipped_error_rule_does_not_fail_but_sentinel_guards():
# a skipped error rule isn't counted as failed by the parser, but abra's own sentinel line
# (if present) still forces fail — the classifier never out-greens abra.
status, _, failed = L.classify(0, TABLE_SKIPPED_ERROR)
assert failed == []
assert status == "pass"
status2, detail2, _ = L.classify(
0, TABLE_SKIPPED_ERROR + "WARN critical errors present in x config\r\n"
)
assert status2 == "fail"
assert "critical errors" in detail2
def test_classify_rc0_without_table_is_unver():
# rc=0 but nothing parseable → cannot claim pass.
assert L.classify(0, "weird output")[0] == "unver"
def test_classify_content_fata_is_fail():
out = "FATA unable to validate recipe: .env.sample for x couldn't be read\r\n"
status, detail, _ = L.classify(1, out)
assert status == "fail"
assert "unable to validate recipe" in detail
def test_classify_environment_fata_is_unver():
out = "FATA unable to fetch tags in /x: repository not found: Not found.\r\n"
status, detail, _ = L.classify(1, out)
assert status == "unver"
assert "fetch tags" in detail
def test_classify_did_not_run_is_unver():
assert L.classify(None, "")[0] == "unver"
# ---- run_lint: never raises, never silently passes ----
def _mkrecipe(tmp_path):
repo = tmp_path / "abra" / "recipes" / "fakerec"
repo.mkdir(parents=True)
(repo / "compose.yml").write_text("version: '3.8'\n")
for cmd in (
["git", "init", "-q"],
["git", "add", "."],
["git", "-c", "user.email=t@t", "-c", "user.name=t", "commit", "-qm", "x"],
):
subprocess.run(cmd, cwd=repo, check=True)
return repo
def _shim(tmp_path, body):
"""Drop a fake `script` executable on PATH (run_lint invokes `script -qec "abra ..."`)."""
bindir = tmp_path / "bin"
bindir.mkdir(exist_ok=True)
sh = bindir / "script"
sh.write_text("#!/bin/sh\n" + body)
sh.chmod(sh.stat().st_mode | stat.S_IEXEC)
return str(bindir)
def test_run_lint_pass_via_shim(tmp_path, monkeypatch):
_mkrecipe(tmp_path)
monkeypatch.setenv("ABRA_DIR", str(tmp_path / "abra"))
out = TABLE_OK.replace("\r\n", "\\n")
monkeypatch.setenv(
"PATH", _shim(tmp_path, f'printf "{out}"\nexit 0\n') + os.pathsep + os.environ["PATH"]
)
res = L.run_lint("fakerec", None, str(tmp_path / "artifacts"))
assert res["status"] == "pass"
txt = (tmp_path / "artifacts" / "lint.txt").read_text()
assert "abra recipe lint -n fakerec" in txt and "R001" in txt
def test_run_lint_fail_via_shim(tmp_path, monkeypatch):
_mkrecipe(tmp_path)
monkeypatch.setenv("ABRA_DIR", str(tmp_path / "abra"))
out = TABLE_R014_FAIL.replace("\r\n", "\\n")
monkeypatch.setenv(
"PATH", _shim(tmp_path, f'printf "{out}"\nexit 0\n') + os.pathsep + os.environ["PATH"]
)
res = L.run_lint("fakerec", None, str(tmp_path / "artifacts"))
assert res["status"] == "fail"
assert res["rules_failed"] == ["R014"]
def test_run_lint_missing_recipe_is_unver_not_raise(tmp_path, monkeypatch):
monkeypatch.setenv("ABRA_DIR", str(tmp_path / "abra-none"))
res = L.run_lint("no-such-recipe", None, str(tmp_path / "artifacts"))
assert res["status"] == "unver"
assert res["detail"]
# lint.txt still written with the failure context (loud, never silent)
assert (tmp_path / "artifacts" / "lint.txt").exists()
def test_run_lint_abra_blowup_is_unver(tmp_path, monkeypatch):
_mkrecipe(tmp_path)
monkeypatch.setenv("ABRA_DIR", str(tmp_path / "abra"))
monkeypatch.setenv(
"PATH",
_shim(tmp_path, 'echo "FATA inappropriate ioctl for device"\nexit 1\n')
+ os.pathsep
+ os.environ["PATH"],
)
res = L.run_lint("fakerec", None, None)
assert res["status"] == "unver"

View File

@ -1,7 +1,8 @@
"""Unit tests for Phase-3 results assembly (harness.results), plan-phase3-results-ux.md §4.2 / R1/R3.
"""Unit tests for results assembly (harness.results) — phase lvl5 semantics.
Covers JUnit parsing, stage roll-up, the tier→rung derivation (the documented mapping the level
depends on), and full results.json assembly incl. the U0 gate cases. Pure / tmp-file only. Run cold:
Covers JUnit parsing, stage roll-up, the tier→rung derivation (the SINGLE place every N/A source
is classified intentional-skip vs unverified — the table in DECISIONS.md phase lvl5), the L5 lint
rung wiring, and full results.json assembly. Pure / tmp-file only. Run cold:
cc-ci-run -m pytest tests/unit/test_results.py -q
"""
@ -27,6 +28,8 @@ JUNIT_MIXED = """<?xml version="1.0"?>
<testcase classname="tests.y" name="test_skipped" time="0"><skipped message="no deps"/></testcase>
</testsuite></testsuites>"""
LINT_PASS = {"status": "pass", "detail": "", "rules_failed": []}
def _write(tmp_path, name, content):
p = tmp_path / name
@ -90,7 +93,7 @@ def test_collect_stages_synthesizes_when_no_junit():
assert len(stages[0]["tests"]) == 1
# ---- derive_rungs: the documented mapping ----
# ---- derive_rungs: the documented N/A-classification mapping (DECISIONS.md phase lvl5) ----
def _results(**kw):
@ -105,34 +108,113 @@ def _results(**kw):
return base
def test_derive_rungs_full_climb_four_essential():
rungs = R.derive_rungs(_results(), backup_capable=True, has_custom=True)
# only the four essential rungs — integration/recipe-local are optional, not produced here.
def test_derive_rungs_full_climb_five_rungs():
rungs = R.derive_rungs(
_results(), backup_capable=True, has_upgrade_target=True, lint_status="pass"
)
# the five essential rungs — integration/recipe-local are optional, not produced here.
assert rungs == {
"install": "pass",
"upgrade": "pass",
"backup_restore": "pass",
"functional": "pass",
"lint": "pass",
}
def test_derive_rungs_stateless_backup_and_functional_na():
def test_derive_rungs_structural_skips_are_intentional():
# single published version (tier skipped, no upgrade target) + not backup-capable →
# both rungs are INTENTIONAL skips, not unverified.
rungs = R.derive_rungs(
_results(backup="skip", restore="skip", custom="skip"),
_results(upgrade="skip", backup="skip", restore="skip"),
backup_capable=False,
has_custom=False,
has_upgrade_target=False,
lint_status="pass",
)
assert rungs["backup_restore"] == "na"
assert rungs["functional"] == "na"
assert rungs["upgrade"] == "skip"
assert rungs["backup_restore"] == "skip"
assert "integration" not in rungs and "recipe_local" not in rungs
def test_derive_rungs_functional_fail():
rungs = R.derive_rungs(_results(custom="fail"), backup_capable=True, has_custom=True)
def test_derive_rungs_upgrade_skip_with_target_is_unverified():
# the tier skipped although an upgrade target exists (e.g. install failed → downstream
# skipped): NOT structural → unver.
rungs = R.derive_rungs(
_results(install="fail", upgrade="skip", backup="skip", restore="skip", custom="skip"),
backup_capable=True,
has_upgrade_target=True,
lint_status="pass",
)
assert rungs["install"] == "fail"
assert rungs["upgrade"] == "unver"
assert rungs["backup_restore"] == "unver"
assert rungs["functional"] == "unver"
def test_derive_rungs_missing_tier_is_unverified():
# a tier excluded from the run entirely (dev CCCI_STAGES escape) → no result key → unver,
# never an intentional skip (the recipe didn't declare anything).
res = {"install": "pass"}
rungs = R.derive_rungs(res, backup_capable=True, has_upgrade_target=True, lint_status="pass")
assert rungs["upgrade"] == "unver"
assert rungs["backup_restore"] == "unver"
assert rungs["functional"] == "unver"
def test_derive_rungs_expected_na_declares_intentional():
# EXPECTED_NA turns a non-run rung into an intentional skip (declared source).
rungs = R.derive_rungs(
_results(custom="skip"),
backup_capable=True,
has_upgrade_target=True,
expected_na={"functional": "no functional surface"},
lint_status="pass",
)
assert rungs["functional"] == "skip"
def test_derive_rungs_no_custom_tests_defaults_unverified():
# absent functional coverage with NO declaration is a gap → unver (conservative default).
rungs = R.derive_rungs(
_results(custom="skip"), backup_capable=True, has_upgrade_target=True, lint_status="pass"
)
assert rungs["functional"] == "unver"
def test_derive_rungs_expected_na_never_overrides_a_real_result():
# a declaration cannot soften an exercised rung: fail stays fail.
rungs = R.derive_rungs(
_results(custom="fail"),
backup_capable=True,
has_upgrade_target=True,
expected_na={"functional": "declared"},
lint_status="pass",
)
assert rungs["functional"] == "fail"
# ---- build_results: end-to-end incl level + flags ----
def test_derive_rungs_lint_never_skips():
# lint has NO intentional-skip escape hatch: pass/fail from the executor, anything else
# (None, "unver", junk) → unver — even if a recipe tries to declare it away.
for status, want in (("pass", "pass"), ("fail", "fail"), ("unver", "unver"), (None, "unver")):
rungs = R.derive_rungs(
_results(),
backup_capable=True,
has_upgrade_target=True,
expected_na={"lint": "nope"},
lint_status=status,
)
assert rungs["lint"] == want, status
def test_derive_rungs_functional_fail():
rungs = R.derive_rungs(
_results(custom="fail"), backup_capable=True, has_upgrade_target=True, lint_status="pass"
)
assert rungs["functional"] == "fail"
# ---- build_results: end-to-end incl level + lint + flags ----
def test_build_results_level_and_flags(tmp_path):
@ -163,17 +245,75 @@ def test_build_results_level_and_flags(tmp_path):
clean_teardown=True,
no_secret_leak=True,
finished_ts=1234.0,
lint=LINT_PASS,
)
# all four essential rungs pass → full climb to L4 (the top), no cap
assert data["level"] == 4
assert data["level_cap_reason"] == ""
# all five essential rungs pass → full climb to L5; no cap concept anywhere.
assert data["schema"] == 2
assert data["level"] == 5
assert "level_cap_reason" not in data and "level_cap_rung" not in data
assert data["recipe"] == "hedgedoc"
assert data["ref"] == "deadbeefcafe"
assert data["flags"] == {"clean_teardown": True, "no_secret_leak": True}
assert [s["name"] for s in data["stages"]] == ["install", "custom"]
# lint appears as a synthetic stage so the card's table carries all five rungs.
assert [s["name"] for s in data["stages"]] == ["install", "custom", "lint"]
assert data["lint"] == {"status": "pass", "detail": "", "rules_failed": []}
def test_build_results_capped_at_L1_on_upgrade_fail(tmp_path):
def test_build_results_lint_fail_blocks_at_4(tmp_path):
recs = [
{
"tier": "install",
"source": "generic",
"file": "g/test_install.py",
"rc": 0,
"junit": _write(tmp_path, "i.xml", JUNIT_PASS),
}
]
data = R.build_results(
recipe="x",
version=None,
pr="0",
ref=None,
records=recs,
results=_results(),
backup_capable=True,
clean_teardown=True,
no_secret_leak=True,
finished_ts=0.0,
lint={
"status": "fail",
"detail": "error rule(s) unsatisfied: R014",
"rules_failed": ["R014"],
},
)
assert data["level"] == 4
assert data["rungs"]["lint"] == "fail"
assert data["lint"]["rules_failed"] == ["R014"]
lint_stage = [s for s in data["stages"] if s["name"] == "lint"][0]
assert lint_stage["status"] == "fail"
assert "R014" in lint_stage["tests"][0]["message"]
def test_build_results_no_lint_given_is_unverified_never_pass(tmp_path):
# an old/lint-less caller must NEVER get a free L5: the rung derives as unver → level 4 max.
data = R.build_results(
recipe="x",
version=None,
pr="0",
ref=None,
records=[],
results=_results(),
backup_capable=True,
clean_teardown=True,
no_secret_leak=True,
finished_ts=0.0,
)
assert data["rungs"]["lint"] == "unver"
assert data["level"] == 4
assert "lint" in data["skips"]["unintentional"]
def test_build_results_level1_on_upgrade_fail(tmp_path):
recs = [
{
"tier": "install",
@ -194,12 +334,13 @@ def test_build_results_capped_at_L1_on_upgrade_fail(tmp_path):
clean_teardown=True,
no_secret_leak=True,
finished_ts=0.0,
lint=LINT_PASS,
)
assert data["level"] == 1
assert "L2" in data["level_cap_reason"]
assert data["rungs"]["upgrade"] == "fail"
# ---- skips: intentional (declared) vs unintentional (everything else skipped) ----
# ---- skips: intentional (declared/structural, with reason) vs unintentional (= unver) ----
def _rungs(**kw):
@ -208,24 +349,26 @@ def _rungs(**kw):
"upgrade": "pass",
"backup_restore": "pass",
"functional": "pass",
"lint": "pass",
}
base.update(kw)
return base
def test_skips_intentional_vs_unintentional():
rungs = _rungs(backup_restore="na", functional="na")
def test_skips_declared_reason_and_unverified_split():
rungs = _rungs(backup_restore="skip", functional="unver")
sk = R.skips(rungs, {"backup_restore": "stateless static server"})
# backup_restore is declared (intentional, with reason); functional skipped but not declared.
assert sk["intentional"] == {"backup_restore": "stateless static server"}
assert sk["unintentional"] == ["functional"]
def test_skips_none_declared_all_unintentional():
rungs = _rungs(backup_restore="na")
def test_skips_structural_reason_when_undeclared():
# a structural skip (derive_rungs) carries its structural reason even without EXPECTED_NA.
rungs = _rungs(upgrade="skip", backup_restore="skip")
sk = R.skips(rungs, None)
assert sk["intentional"] == {}
assert sk["unintentional"] == ["backup_restore"]
assert "only one published version" in sk["intentional"]["upgrade"]
assert "not backup-capable" in sk["intentional"]["backup_restore"]
assert sk["unintentional"] == []
def test_skips_declaration_only_counts_when_actually_skipped():
@ -236,9 +379,9 @@ def test_skips_declaration_only_counts_when_actually_skipped():
assert "backup_restore" not in sk["unintentional"]
def test_build_results_threads_expected_na(tmp_path):
# Mirrors custom-html-tiny post-change: install + a passing functional (custom) test, but no
# backup surface (backup_restore declared intentionally skipped).
def test_build_results_stateless_recipe_climbs(tmp_path):
# custom-html-tiny shape: no backup surface (declared), single published version, passing
# functional — formerly capped at L2 by the N/A; now climbs to L5 (the de-cap, mission §2).
recs = [
{
"tier": "install",
@ -261,23 +404,47 @@ def test_build_results_threads_expected_na(tmp_path):
pr="0",
ref=None,
records=recs,
results=_results(backup="skip", restore="skip"), # custom=pass (default) → functional pass
backup_capable=False, # no backupbot label → backup_restore skipped (N/A)
results=_results(upgrade="skip", backup="skip", restore="skip"),
backup_capable=False, # no backupbot label → structural intentional skip
has_upgrade_target=False, # single published version → structural intentional skip
clean_teardown=True,
no_secret_leak=True,
finished_ts=0.0,
lint=LINT_PASS,
expected_na={"backup_restore": "stateless static file server"},
)
# backup_restore skip still caps at L2 (never inflates) — even though functional passes above it,
# the skip caps the climb — but it's the declared (intentional) rung that capped.
assert data["level"] == 2
assert "L3" in data["level_cap_reason"]
assert data["level_cap_rung"] == "backup_restore"
assert data["rungs"]["functional"] == "pass"
assert data["level"] == 5 # skips are climbed past; nothing was inflated to get here
assert data["rungs"] == {
"install": "pass",
"upgrade": "skip",
"backup_restore": "skip",
"functional": "pass",
"lint": "pass",
}
assert data["skips"]["intentional"]["backup_restore"] == "stateless static file server"
assert (
data["skips"]["unintentional"] == []
) # backup_restore declared; functional passed → clean
assert "only one published version" in data["skips"]["intentional"]["upgrade"]
assert data["skips"]["unintentional"] == []
def test_build_results_unverified_backup_blocks(tmp_path):
# synthesized tier abort: backup-capable but the tiers never produced a result → unver → the
# level stays below the unverified rung (mission worked example #3).
data = R.build_results(
recipe="x",
version=None,
pr="0",
ref=None,
records=[],
results=_results(backup="skip", restore="skip"),
backup_capable=True,
clean_teardown=True,
no_secret_leak=True,
finished_ts=0.0,
lint=LINT_PASS,
)
assert data["rungs"]["backup_restore"] == "unver"
assert data["level"] == 2
assert data["skips"]["unintentional"] == ["backup_restore"]
def test_build_results_threads_customization(tmp_path):
@ -310,6 +477,7 @@ def test_build_results_threads_customization(tmp_path):
"clean_teardown": True,
"no_secret_leak": True,
"finished_ts": 0.0,
"lint": LINT_PASS,
}
assert R.build_results(**kwargs, customization=cust)["customization"] == cust
assert R.build_results(**kwargs)["customization"] is None

View File

@ -32,6 +32,144 @@ def test_hook_returned_when_callable():
assert S._load_screenshot_hook({"SCREENSHOT": hook}) is hook
class _FakePage:
"""Minimal Playwright-page stand-in for the settle/blank-retry helpers (no browser needed)."""
def __init__(self, shot_sizes, idle_raises=False):
self._shot_sizes = list(shot_sizes) # bytes written per successive screenshot() call
self._idle_raises = idle_raises
self.idle_waits = [] # (state, timeout) per wait_for_load_state call
self.timeout_waits = [] # ms per wait_for_timeout call
self.shots = 0
def wait_for_load_state(self, state, timeout=None):
self.idle_waits.append((state, timeout))
if self._idle_raises:
raise TimeoutError(f"page kept polling past {timeout}ms")
def wait_for_timeout(self, ms):
self.timeout_waits.append(ms)
def screenshot(self, path, full_page=False):
self.shots += 1
with open(path, "wb") as f:
f.write(b"\x89PNG" + b"\0" * (self._shot_sizes.pop(0) - 4))
def test_settle_swallows_never_idle_pages():
"""R7: an app that never reaches network-idle (continuous polling) must not raise — the
timeout cap IS the wait."""
page = _FakePage([], idle_raises=True)
S._settle(page, 1234) # must not raise
assert page.idle_waits == [("networkidle", 1234)]
assert page.timeout_waits == [S.RENDER_GRACE_MS]
def test_snap_retries_blank_frame(tmp_path):
"""A blank-sized first frame (audit fingerprint: 4801 B) triggers exactly one retry with a
longer settle, overwriting the tiny frame with the later (painted) one."""
out = str(tmp_path / "shot.png")
page = _FakePage([4801, 30256])
S._snap_with_blank_retry(page, out)
assert page.shots == 2
assert page.idle_waits == [("networkidle", S.BLANK_RETRY_SETTLE_MS)]
assert os.path.getsize(out) == 30256
def test_snap_no_retry_for_real_frame(tmp_path):
"""A real-sized first frame is kept as-is — no second screenshot, no extra waiting."""
out = str(tmp_path / "shot.png")
page = _FakePage([35707])
S._snap_with_blank_retry(page, out)
assert page.shots == 1
assert page.idle_waits == []
assert os.path.getsize(out) == 35707
def test_snap_retry_keeps_late_frame_even_if_still_blank(tmp_path):
"""If the retry frame is still tiny we keep it (honest best-effort) — exactly one retry,
never a loop."""
out = str(tmp_path / "shot.png")
page = _FakePage([4801, 4801])
S._snap_with_blank_retry(page, out)
assert page.shots == 2
assert os.path.getsize(out) == 4801
assert not os.path.exists(out + ".retry"), "temp retry frame must be cleaned up"
def test_snap_retry_never_regresses_to_smaller_frame(tmp_path):
"""Adversary finding A1: a partial-but-real first frame (just under the threshold) must
survive a retry that comes back WORSE (page regressed to blank during the extra settle) —
the larger frame wins."""
out = str(tmp_path / "shot.png")
page = _FakePage([9999, 4801])
S._snap_with_blank_retry(page, out)
assert page.shots == 2
assert os.path.getsize(out) == 9999, "retry must never overwrite a larger frame (A1)"
assert not os.path.exists(out + ".retry"), "temp retry frame must be cleaned up"
def test_blank_threshold_brackets_observed_sizes():
"""Threshold sits between the audited defect sizes (blank 4801-2 B, lone spinners up to
8764 B) and the smallest real page (custom-html-tiny, 12950 B)."""
for defect in (4801, 4802, 5895, 6022, 7913, 8764):
assert defect < S.BLANK_SIZE_BYTES
assert S.BLANK_SIZE_BYTES < 12950
def test_wait_budget_within_step_cap():
"""plan-phase-shot §3 P3: the screenshot step's bounded waiting must stay ≤ ~60s worst case."""
total_ms = (
S.NAV_DEADLINE_S * 1000
+ S.SETTLE_TIMEOUT_MS
+ S.RENDER_GRACE_MS
+ S.BLANK_RETRY_SETTLE_MS
+ S.RENDER_GRACE_MS
)
assert total_ms <= 60_000, f"screenshot wait budget {total_ms}ms exceeds the ~60s step cap"
def test_mattermost_screenshot_hook_lands_login():
"""phase-shot: mattermost-lts ships the first real SCREENSHOT hook — `/` serves the
desktop-or-browser interstitial, so the hook must navigate to /login (the representative,
credential-free sign-in form) and settle; the harness then snaps the PNG."""
class _Resp:
status = 200
class _NavPage(_FakePage):
def __init__(self, click_raises=False):
super().__init__([])
self.urls = []
self.clicks = []
self._click_raises = click_raises
def goto(self, url, wait_until=None, timeout=None):
self.urls.append(url)
return _Resp()
def click(self, selector, timeout=None):
self.clicks.append(selector)
if self._click_raises:
raise TimeoutError("no interstitial")
tests_dir = os.path.join(os.path.dirname(__file__), "..")
meta = meta_mod.load("mattermost-lts", tests_dir=tests_dir)
hook = S._load_screenshot_hook(meta)
assert callable(hook), "mattermost-lts SCREENSHOT hook missing from the real load path"
page = _NavPage()
hook(page, meta_mod.hook_ctx("mm.example.org", meta))
assert page.urls == ["https://mm.example.org/login"]
assert page.clicks == ["text=View in Browser"], "hook must click through the interstitial"
assert len(page.idle_waits) == 2, "hook must settle after nav AND after the click"
# no interstitial (already on the form): the click times out and the hook still succeeds
page2 = _NavPage(click_raises=True)
hook(page2, meta_mod.hook_ctx("mm.example.org", meta))
assert page2.clicks == ["text=View in Browser"]
assert len(page2.idle_waits) == 1, "failed click must skip the second settle, not raise"
def test_screenshot_reachable_through_real_load_path(tmp_path):
"""R2 proof (rcust P1): a recipe SCREENSHOT hook declared in recipe_meta.py arrives at
screenshot._load_screenshot_hook through the REAL orchestrator load path (meta.load — the