Compare commits
73 Commits
restructur
...
phase-lvl5
| Author | SHA1 | Date | |
|---|---|---|---|
| 3d8d286cf3 | |||
| 1d3b61c6c2 | |||
| af7488a498 | |||
| 392f7df48f | |||
| e219a7891d | |||
| df301a5917 | |||
| 4822115b2b | |||
| 2b54adbe46 | |||
| 196156e497 | |||
| 2b2a7ba823 | |||
| 6104a9970d | |||
| 3c33129ebd | |||
| 5fc86991dd | |||
| 58d3505ea7 | |||
| 7ad7d1f20d | |||
| ea0e3e9d2f | |||
| 80e5713c5c | |||
| b8414a8fdb | |||
| b98a471dac | |||
| ce50f641cc | |||
| ae10b553b0 | |||
| e005897cb9 | |||
| 8978fa6ae3 | |||
| 4f3a74759d | |||
| 1bcb2ed8fe | |||
| 3245150982 | |||
| f7b9b6f167 | |||
| d7f85c3f28 | |||
| 89dec5188f | |||
| 24a203a098 | |||
| f359069d40 | |||
| a13a83a775 | |||
| 4428e76f48 | |||
| b4505acbbd | |||
| 9715ab5c50 | |||
| 914c1663b5 | |||
| 6cabbe73b7 | |||
| a531746e53 | |||
| 49d796d9ac | |||
| 73421dabb4 | |||
| be2026aafb | |||
| 77a9415b37 | |||
| 4dcfb5ba96 | |||
| 1ec0e772e8 | |||
| 40b59b356b | |||
| 5c0676b7d0 | |||
| efd7efc32b | |||
| 1357544301 | |||
| 57c66add51 | |||
| a95fad4fa0 | |||
| b9abf48116 | |||
| 4cb1f57e2c | |||
| e30a414ce1 | |||
| 41033b4500 | |||
| a7a558ada3 | |||
| 37dcfab07d | |||
| ffc88848f3 | |||
| 85d14101ef | |||
| 9aa0c5d624 | |||
| 4d342a2c5d | |||
| 01e6d497ba | |||
| 01f9f70970 | |||
| c2508c7fd2 | |||
| 8984b57b35 | |||
| 5ccc0d1c34 | |||
| 52f5266dfb | |||
| 270476beb3 | |||
| ff09c4075b | |||
| 63befd05b0 | |||
| 802b2792a7 | |||
| 0264af72c7 | |||
| 8945d13674 | |||
| f5119a9703 |
18
BACKLOG-lvl5.md
Normal file
18
BACKLOG-lvl5.md
Normal file
@ -0,0 +1,18 @@
|
||||
# BACKLOG — Phase lvl5
|
||||
|
||||
## Build backlog
|
||||
|
||||
- [ ] B1 (P1) `level.py`: append rung `lint` (L5); new status vocabulary {pass, fail, skip, unver}; `compute_level()` → new formula (level = max i: rung_i pass ∧ ∀j<i status ∈ {pass,skip}); DELETE cap_reason/capped concepts.
|
||||
- [ ] B2 (P1) lint executor (`harness/lint.py`): `abra recipe lint <recipe>` against the exact tested ref; hard ~60s timeout; rc+full output → `lint.txt` artifact; pass/fail/unver classification (missing abra / timeout / exception → unver, never pass, never skip); mirror-context handling per phase-plan §2.3 (probe abra behavior first; any filtering = named + unit-tested + DECISIONS.md).
|
||||
- [ ] B3 (P1) `results.py`: wire lint into `derive_rungs` + explicit intentional-vs-unintentional classification of EVERY N/A source; drop level_cap_reason/level_cap_rung from schema; `skips()` reflects new statuses; orchestrator (`run_recipe_ci.py`) runs lint executor at the tested-ref point + passes result through; verdict-neutral (R7 wrap).
|
||||
- [ ] B4 (P1) unit tests: rewrite test_level.py/test_results.py to new semantics incl. mission worked examples (fail-blocks → L1; intentional-skip climbs → L5; unver-blocks → L2; lint unver → L4; unclassifiable N/A → unver default); lint executor tests; old-artifact rendering compat tests.
|
||||
- [ ] B5 (P2) `card.py`: 0–5 color ramp; cap line removed ("level N of 5" neutral); rung table renders ✔/✘/intentional-skip/unverified; level_badge_svg loses cap_skip third segment (badge = number+color only); tolerate old artifacts.
|
||||
- [ ] B6 (P2) `dashboard.py`: _LEVEL_COLOR 5-scale; _level_pill/badge SVG number-only; legend text; old results.json (cap_reason present, lint absent) render without KeyError.
|
||||
- [ ] B7 (P2) docs: results-ux.md, testing.md, recipe-customization.md §EXPECTED_NA wording — L5 ladder, de-cap semantics.
|
||||
- [ ] B8 (P1) DECISIONS.md: semantics change record (replaces Phase-3 "N/A caps"); N/A classification table (every derive_rungs N/A source → intentional|unintentional); mirror-filter decision for lint (if any filtering).
|
||||
- [ ] B9 — gate M1: claim (branch w/ P1+P2; clean tree; cold-verifiable).
|
||||
- [ ] B10 (P3) lint sweep over ALL enrolled recipes (scratch clones — never touch ~/.abra/recipes during builds); matrix here (pass/fail + rule hits); mechanical fixes → mirror PRs (never push main/never merge); rest → DEFERRED.md.
|
||||
- [ ] B11 (P4) real-CI proofs: ≥1 genuine L5; ≥1 lint-blocked L4 (synth branch ok); ≥1 N/A-skip climb; 2× drone !testme; canary suite at re-derived designed levels; 1 synthesized unver-blocks run; before/after level table for ALL enrolled recipes; card/dashboard PNG/SVG visually verified.
|
||||
- [ ] B12 — gate M2: claim; then ## DONE after fresh PASS.
|
||||
|
||||
## Adversary findings
|
||||
128
BACKLOG-shot.md
Normal file
128
BACKLOG-shot.md
Normal file
@ -0,0 +1,128 @@
|
||||
# BACKLOG-shot.md — phase `shot` (recipe screenshot audit & repair)
|
||||
|
||||
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md. Gates: M1 (audit+diagnosis), M2 (all OK / agreed N/A).
|
||||
|
||||
## Build backlog
|
||||
|
||||
### P1 — Audit matrix (status: complete, all 19 PNGs visually inspected 2026-06-11)
|
||||
|
||||
Enrolled set (19) = `tests/<r>/recipe_meta.py` minus fixtures (`_generic`, `regression`, `concurrency`,
|
||||
`custom-html-bkp-bad`, `custom-html-rst-bad`). Evidence: `/var/lib/cc-ci-runs/<run>/` on cc-ci;
|
||||
PNGs pulled to /tmp/shot-audit/ on the builder host and each one Read (visually).
|
||||
|
||||
| recipe | latest run w/ artifacts | screenshot field | PNG bytes | visual content (I looked) | class |
|
||||
|---|---|---|---|---|---|
|
||||
| bluesky-pds | ab-bluesky-pds-oldmain | null | — | no PNG; install=fail level=0 (upstream image breakage, rcust DEFERRED) → capture correctly skipped (`if deploy_ok`) | N-A-candidate (blocked upstream) |
|
||||
| cryptpad | m2r-cryptpad | screenshot.png | 4802 | solid light-grey frame, nothing else | BLANK |
|
||||
| custom-html | m2r-custom-html | screenshot.png | 35707 | "Welcome to nginx!" default page | OK? (diagnose: is this the recipe's true fresh-install content?) |
|
||||
| custom-html-tiny | m2r-custom-html-tiny | screenshot.png | 12950 | seeded CI content ("cc-ci custom-html-tiny … DG5") | OK |
|
||||
| discourse | m2p-discourse | screenshot.png | 66121 | real forum UI, welcome topic, Sign Up/Log In | OK |
|
||||
| ghost | m2r-ghost | screenshot.png | 444183 | real blog landing ("Thoughts, stories and ideas") | OK |
|
||||
| hedgedoc | m2r-hedgedoc | screenshot.png | 131967 | real landing (logo, Sign In, feature intro) | OK |
|
||||
| immich | 356 | screenshot.png | 4801 | pure white frame | BLANK |
|
||||
| keycloak | m2r-keycloak | screenshot.png | 8764 | spinner + "Loading the Administration Console" | LOADING |
|
||||
| lasuite-docs | m2r-lasuite-docs | screenshot.png | 6022 | lone spinner on white | LOADING |
|
||||
| lasuite-drive | m2p2-lasuite-drive | screenshot.png | 5895 | lone spinner on white | LOADING |
|
||||
| lasuite-meet | m2r-lasuite-meet | screenshot.png | 4801 | pure white frame | BLANK |
|
||||
| mailu | m2r-mailu | screenshot.png | 33800 | real sign-in page (empty fields) | OK |
|
||||
| matrix-synapse | m2r-matrix-synapse | screenshot.png | 33296 | "It works! Synapse is running" landing | OK |
|
||||
| mattermost-lts | m2b-mattermost-lts | screenshot.png | 242139 | brand splash/loading screen (logo on blue), NOT the login form | LOADING (borderline — brand-recognizable but a loading state) |
|
||||
| mumble | m2r-mumble | screenshot.png | 7913 | spinner on grey — a web page IS served on the domain | LOADING (diagnose what serves it; N/A may NOT be justified) |
|
||||
| n8n | m2r-n8n | screenshot.png | 4801 | off-white blank frame. Flaky: run 197 (30256 B) shows the real "Set up owner account" form (empty fields, credential-free) | BLANK (flaky) |
|
||||
| plausible | 357 | null | — | no PNG on ANY run (122→357) | NULL |
|
||||
| uptime-kuma | m2r-uptime-kuma | screenshot.png | 30858 | real "Create your admin account" setup form (empty fields) | OK |
|
||||
|
||||
PNG-size note: 4801/4802 B at 1280×800 is a byte-stable blank-frame fingerprint (3 different apps, same size).
|
||||
|
||||
### P2 — Root-cause diagnoses
|
||||
|
||||
- [x] **NULL — plausible** (evidence: Drone build 357 ci-step log, t=73s):
|
||||
`screenshot: capture failed (non-fatal, verdict unaffected): page.goto(https://plau-b51425.ci.commoninternet.net/) never returned a status in (200, 301, 302, 303, 401, 403) after 15 attempts (45s); last status=500`.
|
||||
Plausible's `/` 500s **by design** under `DISABLE_AUTH=true` (auth_controller; documented in
|
||||
`tests/plausible/functional/test_health_check.py` docstring and recipe_meta — that's why HEALTH_PATH
|
||||
is `/api/health`). Default landing-page capture can NEVER succeed → needs a per-recipe SCREENSHOT
|
||||
hook to a path that actually renders (probe live: e.g. /login or /sites).
|
||||
- [x] **NULL — bluesky-pds**: install fails (level=0) before the app is up → `if deploy_ok:` gate in
|
||||
runner/run_recipe_ci.py:1024 correctly skips capture. Not a screenshot defect; upstream image
|
||||
breakage already filed in machine-docs/DEFERRED.md (rcust). → documented N/A while upstream is broken.
|
||||
- [x] **BLANK class — immich, lasuite-meet, n8n(flaky), cryptpad**: SPA paint race. capture() navigates
|
||||
with `wait_until="domcontentloaded"` (runner/harness/screenshot.py:91) and screenshots immediately;
|
||||
SPA shell HTML has loaded but JS hasn't painted → solid 4801-2 B frame. n8n flakiness = same race,
|
||||
sometimes JS wins (run 197 captured the real form).
|
||||
- [x] **LOADING class — keycloak, lasuite-docs, lasuite-drive, mumble, mattermost-lts(borderline)**:
|
||||
same race, caught mid-paint (spinner/splash rendered, app JS still loading/connecting).
|
||||
- [x] **mumble** web stack identified: recipe deploys a `web` service (mumble-web client) on the domain —
|
||||
spinner is its connecting state; landing renders a connect dialog once JS settles. NOT an N/A.
|
||||
- [x] **custom-html** nginx-welcome question: the recipe's fresh install genuinely serves the nginx
|
||||
default page at `/` (no content seeded for this recipe's install; only custom-html-tiny seeds via
|
||||
install_steps.sh). Screenshot is an honest representative view of a fresh install. → OK as-is.
|
||||
|
||||
### P3 — Fixes (all merged to main)
|
||||
|
||||
- [x] Harness default improvement (ce50f64 + A1 hardening 7ad7d1f): bounded networkidle settle
|
||||
(10s) + 0.5s render grace after domcontentloaded; blank/spinner-frame detect (<10000 B) → ONE
|
||||
retry with 4s settle, larger frame kept (A1). Wait budget 45+10+0.5+4+0.5 = 60s, unit-tested.
|
||||
8 new unit tests; 207 pass; lint PASS.
|
||||
- [x] plausible — NOT a hook in the end: the real root cause was EXTRA_ENV SECRET_KEY_BASE being
|
||||
62 chars (<64-byte Phoenix cookie-store minimum) → every HTML render 500'd. Fixed to 68 chars
|
||||
(b98a471); default capture then lands the genuine registration page. Stale auth_controller
|
||||
comments corrected (no assertion touched).
|
||||
- [x] mattermost-lts SCREENSHOT hook (80e5713 + 3c33129): interstitial appears on ANY first-visit
|
||||
route incl /login (proven byte-identical PNG) → hook navigates /login, clicks "View in Browser"
|
||||
best-effort, settles; lands the real login form. First real hook; public screenshot.settle().
|
||||
- [x] keycloak / lasuite-docs / lasuite-drive / lasuite-meet / immich / cryptpad / n8n: fixed by
|
||||
the harness default alone (no hooks needed — proof PNGs below).
|
||||
- [x] mumble: NOT fixable harness-side — pinned mumble-web:0.5 client never paints UI for an
|
||||
anonymous browser (≥90s DOM/console/network observation: no errors, no failed requests,
|
||||
connect-dialog elements absent, no autoconnect overrides). Loader frame = the genuine anonymous
|
||||
web view; voice (the recipe's function) fully covered by protocol tests. DEFERRED.md entry filed
|
||||
(upstream question for the operator).
|
||||
- [x] bluesky-pds: documented N/A while upstream image broken (rcust DEFERRED; Adversary-agreed at
|
||||
M1, contingent re-check at M2 — latest failing evidence ab-bluesky-pds-oldmain, 2026-06-11).
|
||||
|
||||
### P4 — Proof runs (fresh, post-fix; every PNG visually Read by Builder)
|
||||
|
||||
| recipe | proof run (dir on cc-ci) | level (baseline) | PNG B | visual |
|
||||
|---|---|---|---|---|
|
||||
| immich | 370 (drone !testme immich#2) | 4 (=356:4) | 234351 | real "Welcome to Immich" onboarding |
|
||||
| plausible | 371 (drone !testme plausible#3) | 4 (=357:4) | 64132 | real registration form, empty fields |
|
||||
| keycloak | shot-proof-keycloak | 4 | 215587 | real "Sign in to your account" form |
|
||||
| cryptpad | shot-proof-cryptpad | 4 | 57310 | real landing + document-type picker |
|
||||
| lasuite-meet | shot-proof-lasuite-meet | 4 | 225686 | real video-conferencing landing |
|
||||
| lasuite-docs | shot-proof-lasuite-docs | 4 | 284769 | real Docs landing |
|
||||
| lasuite-drive | shot-proof2-lasuite-drive | 4 | 132037 | real Drive landing |
|
||||
| n8n | shot-proof-n8n | 4 | 26433 | real "Set up owner account", empty fields (now deterministic) |
|
||||
| mattermost-lts | shot-proof3-mattermost-lts | 2 (=m2r:2) | 178367 | real "Log in to your account" form (hook v2) |
|
||||
| mumble | shot-proof-mumble | 4 | 7980 | loader frame — best-available (see P3/DEFERRED) |
|
||||
|
||||
Drone durations pre/post (same recipe+PR): immich 199s→198s; plausible 209s→166s (faster — capture
|
||||
no longer burns 45s failing). Healthy class (ghost, hedgedoc, discourse, custom-html,
|
||||
custom-html-tiny, mailu, matrix-synapse, uptime-kuma): existing artifacts cited in P1 matrix, each
|
||||
visually verified real + credential-free; no new runs needed per plan §3 P4.
|
||||
Dashboard/card: grid thumbnails for runs 370/371 served 200, summary.html embeds screenshot.png,
|
||||
/badge/immich.svg 200.
|
||||
|
||||
## Adversary findings
|
||||
|
||||
### [adversary] A1 — blank-retry can REGRESS a larger frame to a worse one (LOW, non-blocking) — CLOSED @2026-06-11T06:32Z
|
||||
**CLOSED:** fixed in 7ad7d1f (retry snapped to a temp path; `os.replace` only if `retry >= first`,
|
||||
else discard + cleanup in `finally`). Re-verified COLD with my own probe (not the Builder's test):
|
||||
the exact filed case `[9999,4801]` now keeps **9999** (retry discarded, no temp leak); originals
|
||||
intact (`[4801,30256]`→30256, `[4801,4802]`→4802, `[35707]`→1 shot, `[5000,5000]`→replace). 5/5 pass.
|
||||
R7 contract preserved (retry-raise still propagates to capture's swallow → None; first frame on disk).
|
||||
--- original finding (for the record) ---
|
||||
**Where:** `runner/harness/screenshot.py` `_snap_with_blank_retry` (ce50f64).
|
||||
**What:** the retry overwrites `out_path` *unconditionally* with the second screenshot. The code/comment
|
||||
claim "the retry only ever replaces a tiny frame with a later one" — but *later ≠ better*. If the first
|
||||
frame is e.g. 9999 B (a partial render, just under `BLANK_SIZE_BYTES=10000`) and the page regresses in the
|
||||
extra 4 s settle (redirect, session-timeout splash, error overlay), the retry can yield a 4801 B blank that
|
||||
**overwrites the better 9999 B frame**. The Builder's unit test only covers blank→blank (4801→4802); the
|
||||
bigger→smaller regression is untested.
|
||||
**Repro (cold, my independent probe, not the Builder's test file):** fake page returning sizes
|
||||
`[9999, 4801]` → `_snap_with_blank_retry` keeps **4801** (the worse frame).
|
||||
**Severity:** LOW. R7 holds (cosmetic only, never affects verdict); my M2 per-PNG visual check is the
|
||||
backstop — any actually-blank final PNG will FAIL that recipe regardless. Filed for hardening, not a veto.
|
||||
**Suggested guard (trivial, strictly safer):** keep the larger frame — only overwrite if
|
||||
`getsize(retry) >= getsize(first)` (or snap retry to a temp path and pick `max`). Then extend the unit
|
||||
test with a bigger→smaller case asserting the larger frame survives.
|
||||
**Closes:** only I close this, after re-test. Non-blocking for an M2 claim, but I will re-check at M2.
|
||||
19
JOURNAL-lvl5.md
Normal file
19
JOURNAL-lvl5.md
Normal file
@ -0,0 +1,19 @@
|
||||
# JOURNAL — Phase lvl5
|
||||
|
||||
## 2026-06-11 bootstrap
|
||||
- Read plan-phase-lvl5-lint-rung.md in full + plan.md §6/§6.1/§7/§9. Phase files created.
|
||||
- Orientation reads: level.py (RUNGS 4, compute_level gap-caps, backup_restore_status, tier_to_rung), results.py derive_rungs/build_results (cap fields at :215-229), card.py (LEVEL_COLOR 0-6!, cap line :246, level_badge_svg cap_skip third segment), dashboard.py (_LEVEL_COLOR :68, _level_pill :245, cap div :277, render_level_badge :363), run_recipe_ci.py build_results call :1248 + badge wiring :1296-1320, bridge.py :224 (badge embed — number-only already, no cap text → likely untouched), docs (results-ux.md has cap language; recipe-customization.md EXPECTED_NA row).
|
||||
- Notable: card.py LEVEL_COLOR already has keys 0-6 (5=green, 6=bright green) — only 0-4 reachable today; dashboard._LEVEL_COLOR needs checking for the same.
|
||||
- Lint context: abra.py:105-127 documents the R014/lightweight-tag + origin-repoint/go-git history. Per-run recipe tree = $ABRA_DIR/recipes/<recipe>, origin = private mirror (SRC) on PR runs, upstream tags fetched in by fetch_recipe. OPEN QUESTION for B2: what does `abra recipe lint` actually touch (origin fetch? auth? R014 against which tags?) — probe on cc-ci host next, in a scratch clone, both origin-shapes (mirror-origin vs canonical-origin).
|
||||
- Next: probe abra lint behavior on cc-ci (scratch clones, no shared-checkout touch), then B1.
|
||||
|
||||
## 2026-06-11 abra lint probe (B2 design input) — all on cc-ci, scratch ABRA_DIR=/tmp/lvl5-lint-probe/abra
|
||||
- `abra recipe lint hedgedoc` (fresh canonical clone): FATA "inappropriate ioctl for device" rc=1 — needs a PTY even with `-n`. Under `script -qec "abra recipe lint -n hedgedoc" /dev/null`: rc=0, 21-line unicode table R001–R016 (cols: ref|rule|severity|satisfied ✅/❌|skipped|how-to-fix), maxlen 146 no wrapping, wall time 0.7s.
|
||||
- rc SEMANTICS: rc≠0 ONLY on FATA (cannot lint). Probes:
|
||||
- rm .env.sample + commit → rc=1 FATA "unable to validate recipe: .env.sample ... no such file" (content-attributable FATA).
|
||||
- lightweight tag added → table renders R014 error ❌, final line `WARN critical errors present in <recipe> config`, **rc=0**. So pass/fail MUST be parsed from the table (error-severity ❌ rows), sentinel line as cross-check. Baseline warn-only ❌ (R015) → NO sentinel, rc=0 → pass.
|
||||
- untracked compose.ccci.yml (CI overlay) in tree → FATA "version mismatched between two composefiles" rc=1 — abra lint globs compose*.yml INCLUDING untracked harness overlays ⇒ lint MUST run on a pristine clone of the exact ref, not the deploy tree.
|
||||
- origin repointed to auth-required mirror URL → rc=1 FATA "unable to fetch tags in ...: repository not found" — lint force-fetches tags from origin ⇒ scratch clone's origin must be fetchable without auth. Cloning FROM the per-run tree (local path origin) satisfies this offline and preserves the run's true tag set (fetch_recipe pulls upstream tags into the per-run tree).
|
||||
- run_quick emits no results.json/card (build_results only at run_recipe_ci.py:1248, cold path) → lint rung wiring is full-path only.
|
||||
- Executor design settled (DECISIONS.md entry to come with B2): scratch ABRA_DIR (recipes/<r> = `git clone <per-run-tree>` + `checkout -f <exact tested sha>`; catalogue/servers symlinks to canonical), `script -qec "abra recipe lint -n <r>"`, hard 60s timeout, full output → lint.txt artifact, parse table rows; status = fail iff any error-severity row ❌(not skipped) or content-attributable FATA ("unable to validate recipe"); pass iff table rendered & no error-row ❌; anything else (timeout, abra missing, fetch FATA, unparseable) → unver + loud log. No rule filtering needed (mirror pollution solved by context, not by ignoring rules).
|
||||
- Tier-skip sources mapped for derive_rungs classification (run_recipe_ci.py:1040-1131): upgrade skip ⟺ `prev` falsy ("only one published version", structural-intentional) given install passed; backup/restore skip ⟺ not backup_cap (structural-intentional); install-fail → downstream tiers skip (unintentional); custom skip ⟺ no custom tests (unintentional unless EXPECTED_NA declares functional); tier absent from `stages` (CCCI_STAGES dev escape) → missing key (unintentional).
|
||||
297
JOURNAL-rcust.md
297
JOURNAL-rcust.md
@ -8,3 +8,300 @@ be `restructure/recipe-custom` off main @ 76a4b6b. Starting P1: reading the six
|
||||
(run_recipe_ci.py::_load_meta, conftest.py::_recipe_meta, lifecycle.py::_recipe_extra_env,
|
||||
lifecycle.py::_recipe_meta_flag, deps.py::declared_deps, canonical.py::is_canonical_enrolled)
|
||||
before writing harness/meta.py.
|
||||
|
||||
## 2026-06-10 P1 — single loader + registry (branch 472a68b)
|
||||
|
||||
Wrote runner/harness/meta.py: KEYS registry (14 keys + CHAOS_BASE_DEPLOY/OIDC_AT_INSTALL/
|
||||
SKIP_GENERIC kept registered as deprecated=True so P1 lands green before P2 deletes them),
|
||||
RecipeMeta generated from KEYS via dataclasses.make_dataclass (frozen; field set cannot drift from
|
||||
the registry), load() = the only exec() of recipe_meta.py, MetaError on unknown ALL-CAPS/type
|
||||
mismatch/callable-on-data-key, difflib suggestion in the unknown-key message. BACKUP_CAPABLE keeps
|
||||
its tri-state via default None (None = auto-detect — preserves the old `"BACKUP_CAPABLE" in meta`
|
||||
semantics in generic.backup_capable).
|
||||
|
||||
Migrations: orchestrator loads once + passes meta down (deploy_app/perform_upgrade/_perform_op/
|
||||
run_lifecycle_tier all take the object); conftest meta fixture returns full RecipeMeta (R3 closed);
|
||||
lifecycle._recipe_extra_env/_recipe_meta_flag and deps.declared_deps deleted; canonical.is_enrolled
|
||||
+ enrolled_recipes go through meta.load (tests monkeypatch meta.TESTS_DIR now instead of
|
||||
canonical.__file__); screenshot._load_screenshot_hook reads the attribute (R2 fixed — unit test
|
||||
proves SCREENSHOT survives the real orchestrator load path). deploy_app keeps an optional
|
||||
meta=None fallback (loads via the single loader) for fixture/manual callers — exec still happens
|
||||
in exactly one function.
|
||||
|
||||
Effective-value safety check before committing: dumped non_default() for all 21 recipe dirs through
|
||||
the new loader — every recipe's customized key set matches its recipe_meta.py source (e.g. mumble:
|
||||
DEPLOY_TIMEOUT/EXTRA_ENV/HEALTH_OK/READY_PROBE/UPGRADE_EXTRA_ENV). One intentional delta class:
|
||||
deps.deploy_deps' fallback timeouts for a MISSING dep meta change from literal 900/600 to loading
|
||||
the dep's real meta (orchestrator path always supplied metas, so CI behavior is identical).
|
||||
|
||||
Verified on cc-ci (rsynced working tree before committing):
|
||||
cc-ci-run -m pytest tests/unit -q -> 175 passed
|
||||
nix develop .#lint --command scripts/lint.sh -> lint: PASS
|
||||
Three pre-existing f212 unit tests passed dicts to wait_ready_probes — updated mechanically to
|
||||
construct RecipeMeta via dataclasses.replace (assertions untouched).
|
||||
|
||||
Next: P2a compose.ccci.yml first-class + auto-chaos.
|
||||
|
||||
## 2026-06-10 P2 — legacy keys & paths deleted (branch 8cd72fd)
|
||||
|
||||
P2a: lifecycle.provide_ccci_overlay copies tests/<recipe>/compose.ccci.yml into the per-run
|
||||
checkout (after install_steps hook, before prepull/deploy); pinned base deploys auto-chaos on
|
||||
overlay presence (has_ccci_overlay replaces the meta.CHAOS_BASE_DEPLOY elif). ghost/discourse
|
||||
install_steps.sh were copy-only -> deleted whole; their metas keep COMPOSE_FILE in EXTRA_ENV
|
||||
(unchanged wiring, the harness now owns the copy).
|
||||
|
||||
P2b: oidc_at_install condition removed — `if declared:` provisions before the single deploy,
|
||||
legacy post-deploy block + _run_setup_custom_tests_hook deleted. lasuite-docs install_steps.sh is
|
||||
the meet/drive hook with docs' exact env names (diffed against the deleted setup_custom_tests.sh:
|
||||
same keys incl. OIDC_OP_DISCOVERY_ENDPOINT + scopes 'openid email profile'; secret-insert bump
|
||||
identical; only the abra-redeploy step is gone — the single deploy reads the env instead).
|
||||
lasuite-drive's MinIO bucket one-shot -> ops.py pre_install (runs at install-tier start, post-
|
||||
deploy; bucket lives in the minio volume so it survives upgrade/restore; same scale --detach +
|
||||
30x3s poll as the shell version). run_quick: deps still provision (realm/creds), hook call gone —
|
||||
no quick-enrolled recipe declares DEPS today; noted inline.
|
||||
|
||||
P2c: SKIP_GENERIC out of the registry; _skip_generic(op) env-only; skip_generic_env_overrides()
|
||||
prints a `!!` warning when active under DRONE (P5 will embed in the manifest).
|
||||
|
||||
P2d: conftest deps fixture = dict of _DepEntry (dict subclass w/ attribute sugar) — the 6 lasuite
|
||||
files only ever used deps_creds, renamed param to deps, zero assertion changes. NOTE for Adversary:
|
||||
some assert MESSAGE strings ('setup_custom_tests should have populated this.' -> 'dep
|
||||
provisioning...') and docstrings updated — message text only, no assert logic/expected values.
|
||||
|
||||
Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 175 passed;
|
||||
nix develop .#lint --command scripts/lint.sh -> PASS. Doc table regenerated to the 14-key registry
|
||||
(doc-sync unit test pins it).
|
||||
|
||||
Next: P3 — HookCtx + ctx-hook signatures everywhere.
|
||||
|
||||
## 2026-06-10 P3 — uniform ctx hook convention (branch fd02d9f)
|
||||
|
||||
HookCtx frozen dataclass + hook_ctx() constructor in harness/meta.py; ctx.deps read straight from
|
||||
$CCCI_DEPS_FILE (json, both shapes) — meta.py stays import-cycle-free (deps.py imports lifecycle
|
||||
which imports meta). Registry keys carry hook_params; meta.load() enforces the expected positional
|
||||
names per hook key (READY_PROBE/BACKUP_VERIFY/EXTRA_ENV/UPGRADE_EXTRA_ENV=(ctx,),
|
||||
SCREENSHOT=(page, ctx)); _run_pre_hook applies meta.check_hook_signature(fn, ("ctx",)) to ops.py
|
||||
hooks before calling. Conversion of 17 ops.py + 8 recipe_meta hooks was scripted (def-line regex +
|
||||
bare `domain` -> `ctx.domain` inside the pre_*/hook function bodies only) and diff-reviewed; the
|
||||
only manual fixes: keycloak pre_restore passed `meta` -> `ctx.meta`, and two comment lines in
|
||||
lasuite-drive/-meet metas that the regex over-replaced were restored. wait_ready_probes gained
|
||||
op= (install/upgrade call sites pass it) so probes can know the phase.
|
||||
|
||||
Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 180 passed; lint PASS.
|
||||
|
||||
Next: P4 — discovery placement rule + op_state/deps fixtures + migrate hand-parsers.
|
||||
|
||||
## 2026-06-10 P4 — custom-test ergonomics (branch 29a28e2)
|
||||
|
||||
Pre-change sweeps confirmed the plan's zero-users claims: no top-level non-lifecycle test_*.py in
|
||||
any recipe dir; no recipe test file reads os.environ / CCCI_OP_STATE_FILE directly (the only
|
||||
op-state consumers are the generic assertions via harness.generic.op_state — harness-side, fine).
|
||||
So P4 = discovery glob removal + new op_state fixture + pinning tests; no test migrations needed.
|
||||
test_discovery.py's HC2 gate test moved its repo-local custom fixture under functional/ (the rule);
|
||||
test_discovery_phase2.py now asserts top-level custom is NOT discovered. op_state fixture skips
|
||||
(clear reason) when env unset / file missing / unparseable; tested via request.getfixturevalue.
|
||||
|
||||
Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 184 passed; lint PASS.
|
||||
|
||||
Next: P5 — customization manifest (print block + results.json key).
|
||||
|
||||
## 2026-06-10 P5 — customization manifest (branch 68954be)
|
||||
|
||||
(Resumed after a usage-limit pause mid-P5; working tree carried the in-flight manifest.py.)
|
||||
New runner/harness/manifest.py: build() collects {meta_non_default, hooks, overlays, custom_tests,
|
||||
env_overrides} via the SAME discovery/meta functions the run uses (so the manifest can never
|
||||
disagree with what actually executes — incl. the HC2 _gated() repo-local gate), render() prints
|
||||
the block. Orchestrator builds+prints right after meta load / repo-local snapshot, BEFORE the
|
||||
quick-lane branch (both lanes get the block); the dict rides into build_results(customization=...)
|
||||
verbatim. run_quick writes no results.json, so the single build_results call site covers all.
|
||||
Hooks render as "<hook>", tuples as lists (JSON-clean); ops.py pre-ops listed by cheap source
|
||||
scan (same approach as discovery._module_defines — no import at manifest time).
|
||||
|
||||
Lint flagged: C408 dict() literal, import-block order (manifest after deps), ruff-format on the
|
||||
new test file — all fixed. Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest
|
||||
tests/unit -q -> 191 passed; nix develop .#lint --command scripts/lint.sh -> lint: PASS.
|
||||
|
||||
Next: P6 docs, then M1 prep (tests/concurrency proof run + 21-recipe baseline matrix).
|
||||
|
||||
## 2026-06-10 P6 — docs (branch da558ca) + inbox response (858e0f5)
|
||||
|
||||
Rewrote the three docs to the restructured end state; kept the generated §4 table byte-identical
|
||||
(doc-sync test pins it). recipe-customization.md flipped from review spec to reference; §8 is now
|
||||
the R1–R9 resolution ledger. Facts double-checked against code before writing: R2 proof lives in
|
||||
test_screenshot.py::test_screenshot_reachable_through_real_load_path (not test_meta.py — fixed a
|
||||
first-draft error); mumble's post-F2-14c shape has NO install_steps.sh/CHAOS_BASE_DEPLOY (base =
|
||||
mumbleweb-only COMPOSE_FILE, host-ports added at head via UPGRADE_EXTRA_ENV); lasuite-docs now
|
||||
ships install_steps.sh (P2b migration); deps file shape is dict recipe->entry; custom_tests
|
||||
discovery is NON-recursive over functional/+playwright/ (old doc said recursive — corrected).
|
||||
|
||||
Adversary inbox (19:06Z, non-blocking): manifest dumps meta values verbatim -> dashboard shows a
|
||||
field named SECRET_KEY_BASE (plausible's committed CI dummy — public, no real leak). Took the
|
||||
redaction option: _jsonable masks values whose key NAME matches
|
||||
SECRET|PASSWORD|TOKEN|CREDENTIAL|word-segment-KEY, recursing into dict values (the plausible case
|
||||
is a NESTED key under EXTRA_ENV); names stay visible. KEYCLOAK_URL deliberately not matched
|
||||
(word-segment KEY). Unit test pins redacted+passthrough both.
|
||||
|
||||
Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 192 passed;
|
||||
nix develop .#lint --command scripts/lint.sh -> lint: PASS.
|
||||
|
||||
Next: M1 prep — tests/concurrency proof run on the branch + the 21-dir baseline matrix.
|
||||
|
||||
## 2026-06-10 M1 prep + claim
|
||||
|
||||
Concurrency proof run on branch head 858e0f5 (rsynced tree on cc-ci): cc-ci-run -m pytest
|
||||
tests/concurrency -q -> 23 passed in 11.46s (suite untouched by the restructure, as planned).
|
||||
|
||||
Baseline matrix: pulled every /var/lib/cc-ci-runs/*/results.json (141 files) and took the most
|
||||
recent per recipe. 19/21 dirs covered by results.json; mumble's last full run predates the
|
||||
results system (log ~/ccci-mumble-f214c.log, 5 tiers pass 05-31); bluesky-pds likewise
|
||||
(Adversary Phase-2 cold verify e45e0ee). plausible's weekly-report RED was its PR branch
|
||||
(pg13->14, build 200); its default-branch baseline is run 308 (06-10) L4 — runs 307/308 are
|
||||
today's, from the conc-phase M2 sweep. Bad canaries recorded at their designed-fail tier.
|
||||
|
||||
Claimed M1. While waiting: nothing else unblocked in this phase (M2 is gated on M1) — will hold
|
||||
with short fallback polls per §7 case 2.
|
||||
|
||||
## 2026-06-11 M2 reconciliation — discourse upgrade-HC1 root-cause hunt + bluesky re-characterization
|
||||
|
||||
Resumed after a loop stall (~21:18Z–23:50Z): the m2b/ab sweeps had finished but nothing processed
|
||||
them. Adversary's 23:53Z inbox asked for (1) a same-ref A/B for the m2b-discourse upgrade-HC1 L1
|
||||
and (2) a fresh post-fix lasuite-drive L5 at baseline ref — both now queued/running.
|
||||
|
||||
Discourse dig (why I don't yet have a mechanism): first hypothesis was my own invocation error —
|
||||
m2b ran PR=0 where baseline 184 ran PR=2, and I guessed the PR-head sha was unreachable without
|
||||
the PR fetch. WRONG: fetch_recipe clones all mirror branches and `git checkout <sha>` is check=True
|
||||
— and the preserved per-run clone sits at HEAD=7ae7b0f, so the re-checkout ran AND persisted.
|
||||
Second hypothesis (prepull resets the checkout): also wrong — prepull_images is pure
|
||||
`docker compose config --images` in cwd, never touches git. The scary
|
||||
`service "sidekiq" depends on undefined service "discourse"` line turned out benign: it appears in
|
||||
the PASSING m2r/m2rr upgrade sections verbatim (the published compose ships a dangling depends_on;
|
||||
swarm ignores it — documented in the overlay NOTE). What's left: abra stamped the PREV-TAG commit
|
||||
(eb96de94 = 0.7.0+3.3.1) on the chaos redeploy while the tree was at 7ae7b0f. One live hypothesis:
|
||||
the cc-ci overlay clamps app+sidekiq images to bitnamilegacy/discourse:3.3.1; at this PR head
|
||||
(0.9.0+3.5.0 bump) the redeploy spec may end up close enough to the base spec that the label
|
||||
update path degenerates — but that requires abra-internals knowledge I can't verify analytically,
|
||||
and m2r at 7d53d4ec (which also post-dates the 3.5.0 bump?) stamped correctly with the same
|
||||
overlay, so content-difference-between-refs is doing SOMETHING. Decision: stop theorizing, let the
|
||||
2x2 complete — m2p-discourse (new main, PR=2, @7ae7b0f) distinguishes PR=0-artifact/race from
|
||||
deterministic; ab-discourse-7ae7b0f-oldmain (old main, PR=2, @7ae7b0f) distinguishes regression
|
||||
from pre-existing. Run 184 left no orchestrator log (drone-side), so its chaos stamp is unknowable
|
||||
— the old-main re-run stands in for it.
|
||||
|
||||
lifecycle.py diff c2508c7..main re-read for the upgrade path: overlay copy moved from per-recipe
|
||||
install_steps.sh to first-class auto-chaos (P2a) but the copied FILE and its untracked-persistence
|
||||
semantics are byte-identical; run_upgrade order (checkout → upgrade_env → prepull → chaos
|
||||
redeploy -c → own wait_healthy) unchanged from old main. Nothing jumps out as the delta.
|
||||
|
||||
bluesky-pds: pulled the swarm service logs from all three failed runs — identical
|
||||
`Cannot find module '/app/index.js'` crash-loop (Node v24.15.0) on new main @ mirror head, new
|
||||
main serial re-run, AND old main @ old default head. The earlier "deploy timed out during
|
||||
concurrent image pulls" guess in STATUS was wrong (the 600s timeout was the SYMPTOM; the ~2min
|
||||
A/B failure exposed the crash-loop). Upstream re-published the pinned tag with a different image
|
||||
layout — no harness can deploy it. Filed in STATUS as restructure-neutral with grep-able evidence.
|
||||
|
||||
## 2026-06-11 lasuite-drive root cause #2 — completed one-shot poisons convergence (caught live)
|
||||
|
||||
Watching the m2p proof run instead of just waiting paid off: the fix-forward's best-effort line
|
||||
printed (so #1 is fixed), but the install assert then sat in pytest for 25+ minutes. Live state:
|
||||
app serving 200, every service 1/1 EXCEPT minio-createbuckets 0/1 with its task **Complete 28
|
||||
minutes ago**. services_converged demands cur==want for every service; a completed
|
||||
restart_policy-none one-shot never returns to 1/1, so the bounded converge poll (DEPLOY_TIMEOUT
|
||||
1800s for this recipe) was always going to burn to the deadline and fail install.
|
||||
|
||||
Why nobody ever saw this before P2b: the old setup_custom_tests.sh ran AFTER the install asserts
|
||||
(post-deploy hook path), so converge never observed desired=1 on the one-shot, and the upgrade
|
||||
tier's chaos redeploy reapplied the compose spec (replicas: 0) before its own converge checks.
|
||||
P2b folded the trigger into ops.py pre_install — which the orchestrator runs BEFORE the generic
|
||||
install assert. Also explains m2rr's odd "install fail but upgrade/backup/restore/custom all pass"
|
||||
shape exactly (redeploy resets the spec).
|
||||
|
||||
Fix options weighed: (a) hook scales the one-shot back to 0 after the poll — rejected: on the
|
||||
timeout path the task is typically still Preparing (image pull) and scale-to-0 CANCELS it, so the
|
||||
observed "bucket lands just after the window" runs would become custom-tier RED, i.e. strictly
|
||||
worse than baseline; (b) move the trigger to a post-assert hook point — no such hook exists in the
|
||||
new convention and inventing one mid-M2 is scope creep; (c) teach services_converged that a
|
||||
replica deficit consisting entirely of Complete tasks IS converged — chosen: semantically correct
|
||||
(the one-shot did its job), restores baseline behavior for any triggered one-shot, and the
|
||||
converge window doubles as the late-landing grace. Disclosed delta: a genuinely FAILING one-shot
|
||||
now reds at install (converge timeout) instead of at the custom bucket test — both red, no false
|
||||
green. Guard: Failed/mixed/spinning-up/no-tasks-yet still block (unit-pinned, 7 cases).
|
||||
|
||||
Branch fix/converged-oneshot @ be2026a, proposal in ADVERSARY-INBOX, awaiting approval per the M2
|
||||
fix-forward protocol. Unit suite 199 passed + lint PASS from the cc-ci working-tree rsync.
|
||||
|
||||
## 2026-06-11 ~01:00Z — merge landed, queue shortened
|
||||
|
||||
be2026a approved (REVIEW a531746, cold-verified independently) and merged as 6cabbe7; drone build
|
||||
350 green on the push head 914c166. Merged diff verified == branch diff (empty git diff be2026a..
|
||||
main for the two files). Post-fix proof m2p2-lasuite-drive queued from a FRESH clone
|
||||
/root/m2-postfix @6cabbe7 rather than git-updating /root/m2-sweep, because the serial queue's
|
||||
discourse runs exec from m2-sweep and swapping code under an active/imminent run is how you get
|
||||
unexplainable results. The discourse A/B therefore runs at 5c0676b (pre-converge-fix) — irrelevant
|
||||
to discourse (no one-shots), and the Adversary's approval explicitly noted that.
|
||||
|
||||
Shortened the doomed m2p run: the generic install assert had already burned its 1800s converge
|
||||
deadline and failed; the overlay install test then started an IDENTICAL second 1800s burn (same
|
||||
assert_serving). SIGINT'd the overlay pytest child only — KeyboardInterrupt surfaced at
|
||||
generic.py:97, the exact diagnosed converge-poll line (a nice live confirmation), and the
|
||||
orchestrator advanced to the upgrade tier on its normal path. Teardown semantics untouched.
|
||||
Disclosed in STATUS so the log's KeyboardInterrupt is pre-explained.
|
||||
|
||||
Drone API note for future me: no token on disk; fastest read-only check is docker cp the drone
|
||||
sqlite out and query builds (documented in STATUS). The Gitea statuses API returned empty for
|
||||
these shas (drone evidently doesn't post commit statuses here).
|
||||
|
||||
## 2026-06-11 ~00:55Z — discourse A/B closed (harness-neutral), mechanism still unattributed
|
||||
|
||||
m2p-discourse (new main, PR=2, @7ae7b0f) and ab-discourse-7ae7b0f-oldmain (old main, PR=2, same
|
||||
ref) failed the upgrade IDENTICALLY: HC1, chaos-version=eb96de94+U, all other tiers pass, L2.
|
||||
Same invocation as baseline 184 which was L4 five days ago. So: deterministic, harness-neutral,
|
||||
and something outside both harnesses drifted since 06-05. Eliminated: branch-tip existence (7ae7b0f
|
||||
still tips upgrade-0.8.0+3.5.0 + pr/2), upstream tag set (0.7.0+3.3.1 still latest), abra pin
|
||||
(flake.lock untouched by the restructure). Not eliminated: abra-internal interaction with repo/app
|
||||
state (the chaos stamp lands on the prev-base TAG commit despite the tree being at the PR head —
|
||||
my best guess remains something in how abra resolves the version/commit for the chaos label when
|
||||
COMPOSE_FILE includes the overlay and the project normalizes invalid, but m2r at 7d53d4ec stamping
|
||||
correctly with the same dangling depends_on kills the simple version of that theory). The
|
||||
`service "sidekiq" depends on...` line appears in passing AND failing upgrades, position-identical,
|
||||
so it discriminates nothing. M2-wise the question is settled — the restructure is exonerated by
|
||||
byte-identical old==new failure; chasing abra's stamp resolution further is post-phase work, filed
|
||||
as a DEFERRED note rather than burning more M2 wall-clock on a non-rcust mechanism.
|
||||
|
||||
m2p2-lasuite-drive (the binding post-fix proof) auto-started at 00:48:58Z from /root/m2-postfix
|
||||
@6cabbe7. Watching for: no 1800s converge burn after the one-shot completes, then L5.
|
||||
|
||||
## 2026-06-11 ~01:10Z — m2p2 green; "L5" turned out to be a moved goalpost (mainline, not ours)
|
||||
|
||||
m2p2-lasuite-drive: rc=0, 3m19s, all stages pass, OIDC + MinIO custom tests green, and the
|
||||
fix-forward pair demonstrably exercised (one-shot overshot 90s again → best-effort line → late
|
||||
Complete → converge fix admitted it). But results.json said level=4 where the binding condition
|
||||
said L5 — heart-stopper until the git archaeology: run 189's level-5 + "L6 recipe-local N/A" cap
|
||||
didn't match ANY derive_rungs I could find in either world, because the 6-rung ladder was removed
|
||||
on MAIN by 46e2cdb+c51cd84 (PR #6) on 06-09, between the baseline runs and the merge — by the
|
||||
mirror/report phase, not rcust. The merge didn't touch level.py (checked 01e6d49^1..01e6d49), and
|
||||
run 204 on 06-09 (hours pre-deploy of the refactor) still shows 6 rungs — clean timeline. So the
|
||||
baseline matrix's "L5" rows need a schema-equivalence reading, declared in STATUS BEFORE the claim
|
||||
rather than negotiated after the Adversary trips on it. Lesson re-learned: a baseline matrix
|
||||
should pin the SCHEMA VERSION of its evidence, not just the level number.
|
||||
|
||||
## 2026-06-11 ~01:30Z — M2 claim assembled
|
||||
|
||||
Drone-path runs landed green (356 immich#2 L4, 357 plausible#3 L4, both with embedded
|
||||
customization manifests + clean flags, triggered by real !testme comments). Zero-leak verified
|
||||
after everything. Plausible's missing screenshot.png checked against its other runs — it never
|
||||
produces one (no screenshot surface), so not a capture regression. Claimed M2 with the full
|
||||
21-recipe reconciliation table against the corrected baseline; the three lasuite rows ride the
|
||||
Adversary-accepted L5≡L4+OIDC equivalence, bluesky-pds is the one justified exclusion, discourse
|
||||
is reconciled as env-drift with byte-identical old==new evidence. Nothing else unblocked in this
|
||||
phase while the verdict is out — holding per §7 case 2.
|
||||
|
||||
## 2026-06-11 ~01:20Z — M2 PASS → ## DONE
|
||||
|
||||
Adversary cold-verified the whole claim independently (re-ran the canaries themselves, jq'd all 21
|
||||
run dirs, re-checked the drone DB and the zero-leak state) and passed M2 with no findings and no
|
||||
VETO. M1 + M2 both stand; ## DONE written. Phase summary: 6 plan phases landed on one branch,
|
||||
merged after M1; the real-CI sweep then caught exactly TWO genuine regressions (both in the same
|
||||
lasuite-drive P2b hook port: raise-on-timeout, and one-shot-vs-converge ordering), both root-caused
|
||||
live, fixed forward under approval, and proven end-to-end — plus it surfaced two pre-existing
|
||||
environment drifts (discourse upgrade-HC1, bluesky-pds upstream image) that the A/B discipline
|
||||
kept from being misattributed to the restructure. The sweep-as-safety-net worked as designed.
|
||||
|
||||
105
JOURNAL-shot.md
Normal file
105
JOURNAL-shot.md
Normal file
@ -0,0 +1,105 @@
|
||||
# JOURNAL-shot.md — Builder journal, phase `shot`
|
||||
|
||||
## 2026-06-11 ~01:17–01:35Z — phase open, P1+P2 in one sweep
|
||||
|
||||
Read the phase plan + plan.md §6.1/§7/§9. Enumerated enrolled recipes (19). Pulled per-recipe
|
||||
latest-run data off cc-ci (`results.json` screenshot field + PNG size for all ~190 run dirs),
|
||||
scp'd 18 PNGs to /tmp/shot-audit/ and Read every one of them.
|
||||
|
||||
Findings vs the orchestrator pre-audit: all four 4801-2B suspects are indeed blank frames
|
||||
(immich pure white, lasuite-meet white, n8n off-white, cryptpad grey). keycloak 8.7KB is a
|
||||
"Loading the Administration Console" spinner — NOT a sparse login page as §2 guessed.
|
||||
lasuite-docs/drive ~5.9KB are lone spinners. Two surprises: (1) mattermost-lts 242KB, classed
|
||||
healthy by size, is actually the brand splash/loading screen, not the login form — size
|
||||
heuristics lie in both directions; (2) mumble serves a real web page (mumble-web client per
|
||||
compose.mumbleweb.yml, deployed since Phase 2 for HTTP health) showing its connecting spinner —
|
||||
so mumble is fixable, not an N/A.
|
||||
|
||||
plausible root cause: traced via Drone sqlite (no python3 on host; ran alpine+sqlite3 against
|
||||
the drone data volume). Build 357 log t=73s: capture failed, last status=500 after 45s. Cross-ref
|
||||
tests/plausible/functional/test_health_check.py: `/` 500s via auth_controller under
|
||||
DISABLE_AUTH=true — permanent, not an init race. So the default landing capture can never work;
|
||||
plausible needs a SCREENSHOT hook to a path that renders (will probe /login, /sites on a live
|
||||
deploy during P3).
|
||||
|
||||
bluesky-pds: null because install fails at level 0 (upstream image breakage, already in
|
||||
DEFERRED.md from rcust) — capture gated on deploy_ok, correctly skipped. N/A while upstream broken.
|
||||
|
||||
custom-html nginx-welcome: verified no install-time seeding exists for this recipe (custom-html-tiny
|
||||
has install_steps.sh; custom-html only seeds in pre_backup/pre_upgrade ops, after capture). The
|
||||
nginx default page IS the honest fresh-install view. Leaving OK; flagged in matrix for Adversary.
|
||||
|
||||
Adversary opened REVIEW-shot.md with its own cold pre-audit (4f3a747) before my first push —
|
||||
good: my visual reads agree with theirs on every overlapping row.
|
||||
|
||||
Design thinking for P3 (next iteration): default-path improvement = after goto(domcontentloaded),
|
||||
try a bounded `wait_for_load_state("networkidle")` (~10-15s cap) and/or wait for a non-trivial
|
||||
painted body, then screenshot; then a blank-detect (PNG < ~6KB or near-uniform) → one retry with
|
||||
a longer settle. Keep total ≤ ~60s worst case, all inside the existing capture() try/except so R7
|
||||
(cosmetics never block) is preserved. Unit tests: blank-detector pure function + retry logic with
|
||||
a fake page. Per-recipe hooks only for plausible (500 root) + whatever the re-audit still shows.
|
||||
|
||||
## 2026-06-11 ~05:45-06:00Z — plausible root cause was a 62-char SECRET_KEY_BASE; M1 PASSed meanwhile
|
||||
|
||||
M1 PASS (ae10b55) with a watch-list. P3 done in two commits: ce50f64 (harness settle+blank-retry,
|
||||
6 unit tests, 205 pass, lint PASS) and b98a471 (plausible fix). The plausible story changed under
|
||||
probing: three live probes (shot-probe{,2,3}-plausible) showed / and every HTML route 302→/register
|
||||
which 500s; app logs gave the smoking gun: `(ArgumentError) cookie store expects conn.secret_key_base
|
||||
to be at least 64 bytes`. Our EXTRA_ENV value — comment claimed "64-char" — measures 62. So every
|
||||
page render 500'd while /api/* (no cookie store) passed all tiers. NOT auth_controller/DISABLE_AUTH
|
||||
as the old comments claimed; corrected both stale comments. Fix = 68-char value; verified
|
||||
shot-fix-plausible run: install pass, screenshot.png 64132B = real registration page (empty fields,
|
||||
placeholders only — same safe shape the Adversary blessed for n8n/uptime-kuma). No hook needed.
|
||||
|
||||
P4 started: !testme posted 05:56:32Z on immich#2 + plausible#3 (drone builds 370+371 running,
|
||||
concurrent). Manual full proof run keycloak launched (shot-proof-keycloak). Remaining queue:
|
||||
mattermost-lts, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n, mumble.
|
||||
|
||||
## 2026-06-11 ~06:05-06:30Z — proof sweep underway; A1 fixed; mumble is the holdout
|
||||
|
||||
Proofs verified visually so far (each level matches its baseline): drone 370 immich L4 234KB real
|
||||
onboarding card (was 4801B); drone 371 plausible L4 64KB registration page (was null); keycloak L4
|
||||
real sign-in form (was loading spinner); cryptpad L4 real landing w/ document picker (was grey blank);
|
||||
lasuite-meet L4 real product landing (was white blank); mattermost-lts L2(=m2r baseline L2) — real
|
||||
page but it's the desktop-or-browser interstitial, so per the watch-list I added the first
|
||||
SCREENSHOT hook (80e5713, → /login + public settle()); re-run pending.
|
||||
|
||||
A1 (blank-retry could regress a larger frame): fixed in 7ad7d1f — retry goes to a temp path and
|
||||
only replaces via os.replace when >= first; regression test [9999,4801]→9999. 207 unit, lint PASS.
|
||||
|
||||
mumble: proof run still spinner after settle+retry (7980B). Probing live what mumble-web does over
|
||||
90s (it printed real mumble-web HTML while up; suspect autoconnect overlay that never resolves
|
||||
because the websocket voice path may not be browser-reachable). Orchestrated probe2 running.
|
||||
Also in flight: n8n + lasuite-docs proofs from the A1-fixed tree. Queue: lasuite-drive, mattermost
|
||||
re-run; then ghost/hedgedoc/etc. healthy-class citations + dashboard/card check + runtime compare.
|
||||
|
||||
## 2026-06-11 ~06:40-07:15Z — mattermost solved via click-through; mumble settled as best-available; M2 assembled
|
||||
|
||||
mattermost: hook v1 (/login) produced a byte-identical interstitial PNG — mattermost shows the
|
||||
desktop-or-browser chooser on ANY first-visit route. Hook v2 clicks "View in Browser" (best-effort,
|
||||
suppress) → shot-proof3 PNG is the genuine "Log in to your account" form at L2=baseline. That's
|
||||
watch-list item 3 satisfied the hard way.
|
||||
|
||||
mumble: three live probes. probe4 (90s DOM+console watch): localization loads, NO errors, NO failed
|
||||
requests, connect-dialog selectors match nothing, page stays at loading-container forever. orch5:
|
||||
websockify serves everything (its own 404s on /ws,/websocket; config.local.js = untouched sample, no
|
||||
autoconnect). Conclusion: the pinned mumble-web:0.5 client never paints for an anonymous visitor —
|
||||
not a capture bug, not fixable harness-side without changing the deploy (guardrail says upstream).
|
||||
Filed DEFERRED (6104a99); claiming the loader frame as documented best-available. Voice = the
|
||||
recipe's function and is protocol-tested; the Adversary may still want a different disposition —
|
||||
their call at the gate.
|
||||
|
||||
Ops lessons this stretch: 3 simultaneous run launches race on abra catalogue fetch (lasuite-drive
|
||||
died "unable to update catalogue"; reran solo green) — stagger launches. Backgrounded one-shot ssh
|
||||
launchers with `cd X && nohup A & nohup B &` only cd for the first — give each its own cd.
|
||||
|
||||
M2 evidence: 10 fixed-class proof runs (table in BACKLOG-shot P4, every PNG Read by me), 2 of them
|
||||
real !testme drone builds (370/371, durations 198s/166s vs 199s/209s baselines — plausible FASTER
|
||||
since capture stops burning its 45s fail window), healthy-class cited from P1, dashboard grid/card/
|
||||
badge all 200. Claiming M2.
|
||||
|
||||
## 2026-06-11 ~07:20Z — phase complete
|
||||
|
||||
M2 PASS (2b54adb): 18/18 PNGs independently Read, both !testme proofs confirmed genuine via bridge
|
||||
logs, durations/levels/R7 all verified, mumble N/A-variant agreed (Adversary reversed its M1 stance
|
||||
on the new DOM evidence), bluesky-pds N/A re-confirmed. Wrote ## DONE. Loop ends.
|
||||
513
REVIEW-rcust.md
513
REVIEW-rcust.md
@ -29,6 +29,513 @@ I own this file and the `## Adversary findings` section of BACKLOG-rcust.md only
|
||||
|
||||
## Verdicts
|
||||
|
||||
_(none yet — phase just started; Builder has not yet created STATUS-rcust.md or branch
|
||||
`restructure/recipe-custom`. Only the reference spec doc `76a4b6b` has landed. Awaiting first
|
||||
`claim(rcust): M1` from the Builder.)_
|
||||
_(no GATE verdict yet — M1 is not claimed. M1 only claims after P1–P6 are all on the branch;
|
||||
Builder has landed P1 (472a68b) + P2 (8cd72fd) and is mid-P3. The interim pre-review below is
|
||||
front-loaded break-it work on the FROZEN P1/P2 commits — NOT an M1 PASS.)_
|
||||
|
||||
### Interim pre-review of frozen P1+P2 (branch @ 8cd72fd) — @2026-06-10, cold from upstream clone
|
||||
|
||||
Done as idle-time break-it work while no gate is pending. P1/P2 phase commits won't be rewritten
|
||||
(Builder adds P3+ on top), so reviewing them now is non-wasted and front-loads M1. Cold clone of
|
||||
`origin/restructure/recipe-custom` into `/tmp/rcust-verify` from the true upstream remote.
|
||||
|
||||
**No defects found so far.** Results:
|
||||
|
||||
1. **Deleted-code fallout — CLEAN.** Grepped `runner/ tests/ scripts/` for live refs to every deleted
|
||||
symbol (`_recipe_meta`, `_load_meta`, `_recipe_extra_env`, `_recipe_meta_flag`, `declared_deps`,
|
||||
`is_canonical_enrolled`, `OIDC_AT_INSTALL`, `CHAOS_BASE_DEPLOY`, `SKIP_GENERIC`,
|
||||
`setup_custom_tests`, `deps_apps`, `deps_creds`, `deployed_app`). All hits are comments/docstrings
|
||||
explaining the deletion, test names, or the intentionally-RETAINED `CCCI_SKIP_GENERIC*` env form
|
||||
(kept per P2c). Zero live call-sites. `setup_custom_tests.sh` files gone.
|
||||
2. **All-recipes-load-clean (typo gate) — PASS, independently.** Ran `meta.load()` (pure stdlib) over
|
||||
all 21 recipe dirs cold via plain python3 (did NOT trust the Builder's test_meta.py). All 21 load;
|
||||
non-default key sets sane. Every ALL-CAPS key used in any recipe_meta.py is in the 14-key registry.
|
||||
3. **Coverage-loss diff (CARDINAL check) — ZERO deltas on data keys + hook presence.** Throwaway
|
||||
harness (`/tmp/diff_meta.py`) reproduces main's six-loader effective resolution (`_load_meta`,
|
||||
`declared_deps`, `is_enrolled`, `_recipe_extra_env`) from MAIN's recipe_meta files and diffs vs the
|
||||
BRANCH's `meta.load()` for all 21 recipes. After correcting one harness artifact (EXTRA_ENV default
|
||||
is `{}` not None), **0/21 recipes show any delta** for HEALTH_PATH/HEALTH_OK/DEPLOY_TIMEOUT/
|
||||
HTTP_TIMEOUT/BACKUP_CAPABLE/EXPECTED_NA/UPGRADE_BASE_VERSION/DEPS/WARM_CANONICAL + presence of
|
||||
READY_PROBE/BACKUP_VERIFY/UPGRADE_EXTRA_ENV/EXTRA_ENV/SCREENSHOT.
|
||||
4. **Validation gaps — CLOSED.** Crafted tmp recipe_metas: typo'd key → MetaError (with "did you mean
|
||||
DEPLOY_TIMEOUT?"); wrong type (`DEPLOY_TIMEOUT="str"`) → MetaError; callable on data key
|
||||
(`DEPLOY_TIMEOUT=lambda ctx:...`) → MetaError; `_PRIVATE`/lowercase-helper → loads clean (exemption
|
||||
works). All four behave per the locked decision.
|
||||
5. **meta.py read** — single `exec()`, frozen `RecipeMeta` generated from `KEYS`, `_coerce` rejects
|
||||
bool-as-int and callable-on-data-key; `non_default` compares vs registry default. No issues.
|
||||
|
||||
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** full `pytest tests/unit -q` +
|
||||
`pytest tests/concurrency -q` + `scripts/lint.sh` cold on the cc-ci host; R2 end-to-end through the
|
||||
real orchestrator screenshot path; P3 ctx-hook signature migration (assert byte-identical, legacy
|
||||
`lambda domain:` raises clear MetaError); P4/P5/P6; re-run the coverage diff on the FINAL branch
|
||||
(P3 changes hook signatures); recipe-test diffs are mechanical-only (no assertion weakening);
|
||||
HC2/F2-11/generic-floor integrity. These wait for the `claim(rcust): M1`.
|
||||
|
||||
### Interim pre-review of frozen P3 (branch @ fd02d9f) — @2026-06-10, cold from upstream clone
|
||||
|
||||
Builder landed P3 (uniform ctx hook convention) and moved to P4, so P3 is frozen. Pre-reviewed it.
|
||||
**No defects found.**
|
||||
|
||||
1. **Mechanical-migration discipline — HELD (no VETO trigger).** `git diff 8cd72fd..fd02d9f` over
|
||||
`tests/*/` shows ZERO changed assert/expected literals. Every hook change is purely
|
||||
`def HOOK(domain[, meta])` → `def HOOK(ctx)` + `domain` → `ctx.domain` in the body. Spot-checked
|
||||
cryptpad/mumble/ghost/lasuite-drive recipe_meta.py + lasuite-drive ops.py: seeded values, return
|
||||
dicts, paths, status codes, and the `pre_restore` `assert _psql(...) in (...)` are byte-identical
|
||||
apart from the `ctx.` deref.
|
||||
2. **HookCtx — present + complete.** `meta.HookCtx` frozen dataclass has all 5 documented fields
|
||||
(`.domain`, `.base_url`, `.meta`, `.deps`, `.op`); `meta.hook_ctx(domain, meta, op=…)` factory
|
||||
builds it and pulls `deps` from `$CCCI_DEPS_FILE`. All call sites migrated: run_recipe_ci
|
||||
`pre_<op>`, BACKUP_VERIFY; lifecycle `extra_env` + READY_PROBE; screenshot `SCREENSHOT(page, ctx)`.
|
||||
(NB my first pass falsely flagged "no HookCtx" — that was a STALE WORKTREE at P2; corrected by
|
||||
checking out fd02d9f. Logged here for honesty.)
|
||||
3. **Legacy-signature guard (P3.4) — PRESENT + works, live-probed.** `meta.check_hook_signature`
|
||||
exact-matches positional params and raises a CLEAR MetaError naming the P3 migration + HookCtx
|
||||
fields. Wired into both `load()` (recipe_meta hooks; SCREENSHOT expects `(page, ctx)`, rest
|
||||
`(ctx)`) and the orchestrator (ops.py `pre_<op>`). Crafted tmp metas: legacy `READY_PROBE(domain)`,
|
||||
`SCREENSHOT(page, domain, meta)`, `EXTRA_ENV(domain)` all → MetaError at load; `READY_PROBE(ctx)`
|
||||
loads clean. No silent mid-run TypeError path.
|
||||
4. **Coverage diff re-run at P3 head — still 0/21 deltas** (hook presence + all data keys unchanged).
|
||||
|
||||
Net: P1+P2+P3 all clean under cold adversarial probing. M1 still gated on full unit+concurrency+lint
|
||||
on the cc-ci host, P4–P6, R2 end-to-end via the real screenshot orchestrator path, and a final
|
||||
coverage re-diff. No findings filed; no VETO.
|
||||
|
||||
### Interim pre-review of frozen P4 (branch @ 29a28e2) — @2026-06-10T18:55Z, cold from fresh host clone
|
||||
|
||||
Builder landed P4 (custom-test ergonomics) and moved to P5, so P4 is frozen. Pre-reviewed it cold.
|
||||
**No defects found.** NOT an M1 verdict — M1 stays gated (see "Still UNVERIFIED" below).
|
||||
|
||||
Cold acceptance (fresh `git clone` on cc-ci host at 29a28e2, my own checkout — not the Builder's):
|
||||
- `cc-ci-run -m pytest tests/unit -q` → **184 passed** (exact match to claim; full suite, no
|
||||
cross-fixture pollution from the session-scoped `deps` fixture).
|
||||
- `cc-ci-run -m pytest tests/unit/test_discovery.py test_discovery_phase2.py
|
||||
test_conftest_fixtures.py -q` → 14 passed.
|
||||
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS** (ruff format/check, deadnix,
|
||||
shfmt, shellcheck, yamllint all clean).
|
||||
|
||||
Correctness probes:
|
||||
1. **Placement-rule claim ("zero in-repo users of top-level custom tests") — HOLDS.** Filesystem
|
||||
sweep of every `tests/<recipe>/test_*.py`: ALL are lifecycle names (test_{install,upgrade,
|
||||
backup,restore}.py). No top-level non-lifecycle custom exists in-repo, so dropping the top-level
|
||||
glob in `discovery.custom_tests` loses ZERO coverage. The lifecycle-name exclusion is retained
|
||||
inside functional/playwright as the double-run safety net.
|
||||
2. **Discovery diff — clean.** Top-level `glob(test_*.py)` branch removed; functional/ + playwright/
|
||||
subdir globs retained with `basename not in lifecycle_names` guard. Docstring + module header
|
||||
updated to state the placement RULE.
|
||||
3. **Test changes are adaptation + strengthening, NOT weakening (no VETO trigger).**
|
||||
- `test_discovery_phase2`: renamed to `..._placement_rule_...`; now ASSERTS the top-level
|
||||
`test_sso_smoke.py` is `not in names` (new negative assertion proving the behavior change),
|
||||
while functional/playwright customs are still `in names` and lifecycle name excluded.
|
||||
- `test_discovery::test_custom_tests_repo_local_gated`: repo-local custom moved from top-level
|
||||
into `functional/`; HC2 default-deny (`== []` when unapproved) and approved-case
|
||||
(`functional/test_sso.py in names`, `test_install.py` excluded) both INTACT. HC2 integrity
|
||||
preserved.
|
||||
4. **op_state fixture — correct.** Skips with clear reason on unset env / missing file / non-JSON
|
||||
(`except ValueError` catches JSONDecodeError); reads & returns parsed dict otherwise. Tests
|
||||
cover 3 of 4 paths (the non-JSON skip path is untested — minor coverage gap, not a defect; the
|
||||
branch is trivially correct by inspection).
|
||||
|
||||
Net: P1+P2+P3+P4 all clean under cold adversarial probing; both halves of every phase claim
|
||||
(unit count + lint) reproduced cold on a fresh clone. No findings filed; no VETO.
|
||||
|
||||
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** P5 (manifest) + P6 (docs);
|
||||
`pytest tests/concurrency -q` cold; R2 end-to-end through the real orchestrator screenshot path;
|
||||
final coverage re-diff on the COMPLETE branch (P1–P6, all 21 recipes, effective customization set
|
||||
unchanged); recipe-test diffs mechanical-only across the whole branch; HC2/F2-11/generic-floor
|
||||
integrity at the final head. These wait for `claim(rcust): M1`.
|
||||
|
||||
### Interim pre-review of frozen P5 (branch @ 68954be) — @2026-06-10T19:06Z, cold from fresh host clone
|
||||
|
||||
Builder landed P5 (customization manifest) and moved to P6, so P5 is frozen. Pre-reviewed it cold.
|
||||
**No blocking defect; one secret-SURFACE observation raised (heads-up to Builder, NOT a VETO, NOT
|
||||
an M1 secret-leak failure).** NOT an M1 verdict.
|
||||
|
||||
Cold acceptance (fresh `git clone` on cc-ci host at 68954be, my own checkout):
|
||||
- `cc-ci-run -m pytest tests/unit -q` → **191 passed** (exact match to claim).
|
||||
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
|
||||
|
||||
Primary adversarial target — SECRET LEAKAGE via the new manifest surface (D-gate: published logs +
|
||||
dashboard contain NO secrets, incl. generated app passwords):
|
||||
1. **Generated/runtime secrets — NOT exposed (gate holds).** `manifest.build` collects only:
|
||||
`meta_non_default` (static recipe_meta), hook NAMES (pre-ops/install_steps.sh/compose.ccci.yml),
|
||||
overlay FILENAMES, custom-test COUNTS, and env-override KEY names (printed `KEY=1`, value never
|
||||
rendered). It never touches `deps` (client_secret), `op_state`, abra-generated app passwords, or
|
||||
any env VALUE. The cardinal concern — generated app passwords on the dashboard — is structurally
|
||||
absent from this surface.
|
||||
2. **Cold all-recipes sweep.** Built+rendered the manifest for all 21 recipes on the host; grepped
|
||||
the rendered blocks AND the results.json `customization` payload for secret/password/token/key/
|
||||
credential and for any 32+ char high-entropy string. The ONLY hit, across every recipe, is
|
||||
plausible's `EXTRA_ENV.SECRET_KEY_BASE` =
|
||||
`"ccciplausibletestkeybase64charsexactlyforCIephemeral4567890123"`.
|
||||
3. **OBSERVATION (not a leak):** that value is a HARDCODED, committed, PUBLIC dummy CI constant
|
||||
(tests/plausible/recipe_meta.py, in the open-source repo) — not a generated or real secret.
|
||||
`meta_non_default` dumps EXTRA_ENV literal dicts verbatim into the log AND results.json (→
|
||||
dashboard), so a field literally named `SECRET_KEY_BASE` with a value now appears on the
|
||||
dashboard. No real secret is exposed (it's public), so this is NOT a D-gate failure and does NOT
|
||||
block P5. BUT it's a standing surface: (a) a dashboard secret-scan gets a true-positive-shaped
|
||||
hit on a public dummy (noise that could mask a real leak), and (b) if any recipe ever set a real
|
||||
secret-ish literal in a meta dict, the manifest would surface it unredacted. Flagged to Builder
|
||||
via BUILDER-INBOX as a heads-up to consider redacting values of sensitive-named meta keys before
|
||||
M1. Will re-examine on the real dashboard at the M1 cold-verify.
|
||||
4. **HC2-honoring — confirmed.** Manifest routes ALL repo-local reads through `discovery._gated`
|
||||
(ops.py loop direct; `install_steps`/`resolve_overlay_op`/`custom_tests` each call `_gated`
|
||||
internally). An unapproved repo-local recipe contributes nothing to the manifest.
|
||||
5. **Pure presentation — holds.** `build()` only reads files/env and returns a dict; `render()`
|
||||
formats a string. Called at run_recipe_ci.py:889-890 (print) + embedded at :1261 into results;
|
||||
no state mutation, no verdict influence. `_jsonable` renders callables as `'<hook>'` (so a
|
||||
callable EXTRA_ENV/READY_PROBE never leaks closure internals) and tuples→lists for JSON.
|
||||
|
||||
Net: P1–P5 all clean under cold adversarial probing; every phase claim (unit count + lint)
|
||||
reproduced cold. No findings filed; no VETO. One non-blocking secret-surface heads-up sent.
|
||||
|
||||
**Still UNVERIFIED for M1:** P6 (docs); `pytest tests/concurrency -q` cold; R2 end-to-end via the
|
||||
real orchestrator screenshot path; final coverage re-diff on the COMPLETE branch (all 21 recipes,
|
||||
effective customization unchanged); recipe-test diffs mechanical-only across the whole branch;
|
||||
HC2/F2-11/generic-floor integrity at final head; AND — at the M1 dashboard check — confirm the
|
||||
SECRET_KEY_BASE-named field on the real dashboard is the accepted public dummy (or redacted).
|
||||
These wait for `claim(rcust): M1`.
|
||||
|
||||
## M1 — implementation verified: **PASS** @2026-06-10T19:27Z (branch `restructure/recipe-custom` @ 858e0f5)
|
||||
|
||||
Cold-verified from TWO fresh clones on the cc-ci host (NEW=858e0f5, OLD=main pre-restructure;
|
||||
merge-base 49fb818 confirmed → `main..858e0f5` is exactly P1–P6). Verdict formed from the phase plan
|
||||
(SSOT), the code/git history, the STATUS verification facts, and my own cold re-runs — NOT from
|
||||
JOURNAL rationale (isolation discipline; I did not need to consult JOURNAL).
|
||||
|
||||
**All M1 Definition-of-Done items PASS:**
|
||||
|
||||
1. **Cold test suites — match claim exactly.** Fresh clone @858e0f5:
|
||||
`cc-ci-run -m pytest tests/unit -q` → **192 passed**; `tests/concurrency -q` → **23 passed**
|
||||
(untouched by this plan, proven); `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
|
||||
|
||||
2. **Coverage diff (cardinal risk) — 0 REAL deltas / 21 recipes.** Wrote throwaway extractors that
|
||||
resolve EVERY recipe's effective customization in BOTH worlds — OLD via the legacy loaders
|
||||
(`_load_meta` + `lifecycle._recipe_extra_env` + `deps.declared_deps` + `_recipe_meta_flag`),
|
||||
NEW via `meta.load()` + `meta.extra_env/upgrade_extra_env` — for the common keys (HEALTH_*,
|
||||
timeouts, DEPS, EXTRA_ENV resolved at a fixed domain, UPGRADE_EXTRA_ENV, BACKUP_CAPABLE,
|
||||
EXPECTED_NA, UPGRADE_BASE_VERSION, READY_PROBE/BACKUP_VERIFY presence). Diff = **0 behavioral
|
||||
deltas**; the only raw diffs were 20× `UPGRADE_EXTRA_ENV: None→{}` (unset default representation,
|
||||
behaviorally identical) and mumble (most-customized: callable EXTRA_ENV→dict, UPGRADE_EXTRA_ENV,
|
||||
READY_PROBE) is **byte-identical** old↔new.
|
||||
Deleted keys accounted for (no silent loss): `SKIP_GENERIC` (0 recipe users); `CHAOS_BASE_DEPLOY`
|
||||
→ overlay-presence (discourse+ghost, exactly the two shipping compose.ccci.yml — perfect 1:1, no
|
||||
change either direction); `OIDC_AT_INSTALL` → install-time made universal (drive+meet were
|
||||
already install-time). **lasuite-docs** declared DEPS but NOT OIDC_AT_INSTALL → OLD post-install,
|
||||
NEW install-time: an INTENTIONAL P2b consolidation, not a drop — flagged below for M2 validation.
|
||||
|
||||
3. **Assertion weakening (VETO-class) — NONE.** Full branch diff over all recipe test files
|
||||
(excl. harness unit/concurrency/regression): 18 removed asserts, 18 added. After mechanical
|
||||
normalization (`domain`→`ctx.domain`, `deps_creds`→`deps`, `MAX_USERS`→`_MAX_USERS`, whitespace)
|
||||
the removed and added assert sets are **IDENTICAL** — zero unmatched in either direction. Every
|
||||
change is a pure signature/fixture/constant rename; no expected value altered, no assert deleted.
|
||||
Spot-confirmed discourse/ghost `_psql(domain,…ci_marker…) in (…)` → `ctx.domain` only (expected
|
||||
tuple + SQL byte-identical). **No VETO.**
|
||||
|
||||
4. **Deleted-code fallout — clean.** No dangling LIVE refs to any of the 13 deleted symbols
|
||||
(`_recipe_meta`/`_load_meta`/`_recipe_extra_env`/`_recipe_meta_flag`/`declared_deps`/
|
||||
`is_canonical_enrolled`/`OIDC_AT_INSTALL`/`CHAOS_BASE_DEPLOY`/`SKIP_GENERIC`/`setup_custom_tests`/
|
||||
`deps_apps`/`deps_creds`/`deployed_app`). Only residue: stale DOC/comment mentions of
|
||||
`OIDC_AT_INSTALL` + `setup_custom_tests.sh` in PARITY.md files (non-blocking P6 cosmetic nit).
|
||||
|
||||
5. **Validation gaps — closed.** Cold-probed `meta.load()` with synthetic bad metas: typo'd key,
|
||||
str-on-int, bool-as-int, callable-on-data-key, legacy hook sig `READY_PROBE(domain)`, and unknown
|
||||
key ALL → `MetaError` (clear, names the offending file/key). Clean + underscore-private-helper
|
||||
metas load fine (no false positives). No silent pass.
|
||||
|
||||
6. **R2 fixed end-to-end.** Cold proof through the REAL load path: a recipe declaring
|
||||
`def SCREENSHOT(page, ctx)` is surfaced by `meta.load()` and resolved callable by
|
||||
`screenshot._load_screenshot_hook` (old L1 allowlist dropped it — now arrives); orchestrator wires
|
||||
it `run_recipe_ci.py:1029 capture(…, recipe_meta=meta)` → `hook(page, hook_ctx(domain, meta))`.
|
||||
Absent recipe → None (default landing-page path). Legacy `SCREENSHOT(page, domain, meta)` sig
|
||||
rejected at load.
|
||||
|
||||
7. **HC2 / F2-11 / generic-floor integrity — preserved.** Cold-probed `discovery.custom_tests` +
|
||||
`install_steps`: UNAPPROVED repo-local → `[]` / `None` (default-deny holds); APPROVED → surfaced.
|
||||
`sso_dep_unverified` (F2-11) logic UNCHANGED (only a comment edited) — a deps-not-ready run that
|
||||
skips ≥1 `requires_deps` test still suppresses the green signal. Generic floor `_skip_generic`
|
||||
default = run (additive); opt-out now env-only (same env vars as before; the 0-user meta key
|
||||
removed) and surfaced LOUDLY in CI + flagged `!!` in the manifest — strictly stronger, never
|
||||
silent.
|
||||
|
||||
8. **(Bonus) P5 secret-surface heads-up RESOLVED + verified.** The Builder landed `858e0f5`
|
||||
redacting secret-named meta values in the manifest (my P5 BUILDER-INBOX ask). Cold-verified:
|
||||
`plausible.EXTRA_ENV.SECRET_KEY_BASE` → `<redacted>` in BOTH the log block and results.json;
|
||||
recursive into nested dict keys; word-segment `(^|_)KEY(_|$)` regex avoids over-match
|
||||
(KEYCLOAK_* passes). All-21-recipe sweep: exactly 1 redaction, ZERO over-redaction, ZERO
|
||||
under-redaction (no secret-shaped value remains). Regression test
|
||||
`test_manifest_redacts_sensitive_named_values` present.
|
||||
|
||||
**Verdict: M1 PASS.** No findings filed, no VETO.
|
||||
|
||||
**This does NOT clear `## DONE`.** Per the phase DoD, DONE requires a fresh Adversary PASS for BOTH
|
||||
M1 *and* M2. M2 (merged-main real-CI regression sweep vs the committed baseline matrix) is still
|
||||
unverified. M2 watch-items I will specifically re-check from run logs:
|
||||
- **lasuite-docs OIDC is now install-time** (post→install change above) — must pass a real run with
|
||||
OIDC wired at install (skip-count 0 on its `requires_deps` tests).
|
||||
- the customization spot-checks the plan §M2.4 enumerates (mumble READY_PROBE tcp lines, cryptpad
|
||||
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + auto-chaos base deploy, lasuite-*
|
||||
deps provisioning + OIDC tests ran, immich ops.py seeds, manifest block present in every log,
|
||||
screenshot.png where capture succeeded).
|
||||
- canary suite (RED canaries still caught at intended tier) + per-recipe level == baseline matrix.
|
||||
- zero leaked apps after teardown.
|
||||
|
||||
### M2-prep — independent hook-port audit (shell→python / best-effort↔fatal drift) @2026-06-10T20:55Z
|
||||
|
||||
Triggered by the lasuite-drive regression (below), which my M1 PASS MISSED: my M1 coverage diff
|
||||
compared recipe_meta KEYS (resolved values), not ops.py hook BODIES, and my assertion scan matched
|
||||
`assert ` not `raise AssertionError`. So a hook that flipped best-effort→fatal was invisible to my
|
||||
M1 method. M2 (real-CI sweep) caught it — the safety net working as designed. I then audited ALL
|
||||
hook ports cold (`git diff c2508c7..origin/main` per recipe ops.py + the 2 setup_custom_tests.sh
|
||||
ports), filtering for non-mechanical error-handling (raise/assert/except/exit/timeout/poll changes):
|
||||
|
||||
- **lasuite-drive `pre_install`** — GENUINE rcust regression (Builder-disclosed, I confirmed):
|
||||
OLD setup_custom_tests.sh bucket poll fell through on 90s timeout (best-effort, no failure; the
|
||||
custom-tier `test_minio_storage.py` upload→list→download is the real gate); NEW port added a
|
||||
terminal `raise AssertionError` → deterministic install RED when the bucket appears just after
|
||||
90s. Fix-forward APPROVED (restore best-effort print+return, scoped to line-54 only; conditioned
|
||||
on an L5 re-run + my diff re-verify). See approval entry in BUILDER-INBOX history (commit 57c66ad).
|
||||
- **lasuite-docs `install_steps.sh`** — INTENTIONAL P2b change, NOT a defect: OLD setup_custom_tests
|
||||
did `exit 1` on missing deps/null KC creds; NEW does `exit 0` (no-op) for missing-deps (gated now
|
||||
by F2-11: the `@requires_deps` OIDC test skips → `sso_dep_unverified` suppresses green) BUT
|
||||
preserves `exit 1` on secret-insert failure. Consistent with the install-time-deps redesign.
|
||||
WATCH-ITEM (residual): the missing-deps path now relies entirely on F2-11; the sweep didn't
|
||||
exercise it (deps were ready, skip-count 0). Mechanism verified present at M1; not blocking.
|
||||
- **All other ops.py** (cryptpad, discourse, ghost, immich, keycloak, lasuite-meet, matrix-synapse,
|
||||
mattermost-lts, mumble, n8n, plausible, custom-html) — pure mechanical ctx migration
|
||||
(`domain`→`ctx.domain`, `meta`→`ctx.meta`); expected tuples/strings byte-identical (spot-checked
|
||||
keycloak 201/409 + 204/200, discourse/ghost _psql ci_marker). No error-handling drift.
|
||||
|
||||
Net: exactly ONE accidental hook-port regression (lasuite-drive), now under approved fix. No other
|
||||
best-effort↔fatal flips. This audit closes the M1-method gap for the hook bodies.
|
||||
|
||||
---
|
||||
|
||||
### M2 proof-run independent analysis (cold, Adversary) @2026-06-10T23:53Z
|
||||
|
||||
M2 is NOT yet claimed by the Builder; this is my independent read of the proof runs sitting on
|
||||
cc-ci (`/var/lib/cc-ci-runs/{m2b-*,ab-*-oldmain}`), parsed myself via jq (NOT trusting Builder
|
||||
narrative). The 6 first-sweep mismatches break down as follows.
|
||||
|
||||
**Confirmed root fact — REF MISMATCH is real (I verified, not taken on faith).** Every baseline
|
||||
matrix run used a *PR-head* ref; the first M2.3 sweep used each mirror's *default-branch head* — a
|
||||
different commit. Independently confirmed via `results.json.ref`:
|
||||
| recipe | baseline run/ref/level | sweep ref/level |
|
||||
|---|---|---|
|
||||
| discourse | 184 / 7ae7b0f76efb / L4 | 7d53d4ec390f / L2 |
|
||||
| plausible | 308 / 13458fac56a1 / L4 | da159375d89a / L2 |
|
||||
| mattermost-lts | 196 / a333e31a6002 / L4 | 41c9eb8e5f34 / L2 |
|
||||
| immich | 307 / 107d7220adce / L4 | 7eb3937a82d0 / L2 |
|
||||
| lasuite-drive | 189 / ffa7d585afa2 / L5 | f4135d78201e / L0 |
|
||||
So the sweep was NOT apples-to-apples vs the baseline matrix. Reconciliation requires either
|
||||
(a) re-run at the baseline ref on new main == baseline level, or (b) A/B same-ref old-vs-new main
|
||||
== same level. Status per recipe:
|
||||
|
||||
- **immich** — m2b-immich (new main, baseline ref 107d7220adce) = **L4 == baseline L4. CLEAN.**
|
||||
- **mattermost-lts** — m2b (new main, a333e31a6002) = **L4 == baseline L4. CLEAN.**
|
||||
- **plausible** — m2b (new main, 13458fac56a1) = **L4 == baseline L4. CLEAN.**
|
||||
→ these three: restructure proven INNOCENT (baseline ref reproduces baseline level on merged main).
|
||||
- **bluesky-pds** — ab-bluesky-pds-oldmain (OLD main, b2d86efba3f1) = L0 == new-main sweep L0 at
|
||||
same ref → restructure-NEUTRAL at the sweep ref. (Baseline is "L4-equiv, pre-results-era", no run
|
||||
id — softer baseline; A/B neutrality is the available evidence.)
|
||||
- **discourse — NOT yet clean. OPEN.** Two *distinct* flake modes seen, and the A/B was run at the
|
||||
wrong ref to close the gap:
|
||||
- baseline 184 (OLD main, 7ae7b0f): all pass → L4.
|
||||
- m2b-discourse (NEW main, SAME ref 7ae7b0f): **upgrade FAILED**, HC1 guard fired —
|
||||
"upgrade deployed chaos commit 'eb96de94+U', not intended PR-head '7ae7b0f76efb' — re-checkout
|
||||
to code-under-test failed (HC1)" → L1. ← same-ref old=L4 vs new=L1 discrepancy, UNexplained.
|
||||
- ab-discourse-oldmain (OLD main, 7d53d4ec): **restore FAILED** (ci_marker truncated-dump race)
|
||||
→ L2 == new-main sweep L2 at that ref → neutrality proven, but for the RESTORE mode at the
|
||||
DEFAULT-head ref, NOT for the L1/upgrade-HC1 mode at the baseline ref.
|
||||
- Net: the clean A/B (ref 7ae7b0f on OLD main vs NEW main) that would explain L4→L1 was NOT run.
|
||||
The upgrade re-checkout/HC1 path lives in run_recipe_ci.py/lifecycle which the meta-param
|
||||
threading DID touch — so "pre-existing flake" is plausible but UNPROVEN here. To clear: run
|
||||
discourse @7ae7b0f on OLD main (does it deterministically reproduce L4, or also flake to L1?),
|
||||
and/or repeat @7ae7b0f on new main to characterise the HC1 re-checkout as a race. The HC1 guard
|
||||
FIRING (not silently passing the wrong commit) is the safety net working — good — but it means
|
||||
the upgrade did not exercise the PR code, so the run is inconclusive, not a clean baseline match.
|
||||
- **lasuite-drive** — fix-forward 1357544 (restore best-effort bucket poll) landed; needs a fresh
|
||||
L5 run at the baseline ref ffa7d585afa2 on merged main to confirm baseline. m2rr/earlier runs
|
||||
predate or used the default head — NOT yet a clean baseline match. OPEN.
|
||||
|
||||
**M2 disposition: still OPEN — no PASS.** 3/6 cleanly reconciled (immich/mattermost/plausible);
|
||||
bluesky neutral-at-sweep-ref; discourse + lasuite-drive NOT yet closed. I will require, at the M2
|
||||
claim: (1) discourse same-ref A/B (or repeat) explaining L4→L1; (2) a clean lasuite-drive L5 at
|
||||
baseline ref; (3) my own cold re-parse of every per-recipe level vs baseline; (4) the M2.4
|
||||
customization-executed spot-greps; (5) zero leaked apps. Recorded a BUILDER-INBOX heads-up on the
|
||||
discourse-HC1 gap so it is addressed in the claim, not glossed as "the restore flake".
|
||||
|
||||
### M2 proof-run progress + self-correction @2026-06-11T00:05Z
|
||||
|
||||
Builder is running (independently, matching my inbox ask) the decisive A/B serially on the box:
|
||||
`m2-proof.sh` → lasuite-drive @ffa7d585afa2 PR=1 (post-fix-forward 1357544) on merged main 5c0676b,
|
||||
then discourse @7ae7b0f76efb **PR=2** on merged main (m2p-discourse); `m2-proof2.sh` (queued) →
|
||||
discourse @7ae7b0f76efb **PR=2** on OLD main (/root/m2-oldmain, ab-discourse-7ae7b0f-oldmain).
|
||||
|
||||
**Self-correction to my 23:53Z discourse analysis:** my m2b-discourse run used **PR=0**, but the
|
||||
upgrade HC1 guard resolves the *PR head* for the re-checkout. The L1 failure message ("deployed
|
||||
chaos commit 'eb96de94+U', not PR-head 7ae7b0f — re-checkout failed") is plausibly a **PR=0
|
||||
artifact** (no real PR to resolve the head from), NOT a restructure regression. The Builder's proof
|
||||
runs correctly use PR=2 (matching baseline run 184's pr=2). So the apples-to-apples comparison I
|
||||
need is m2p-discourse (PR=2, new main) vs ab-discourse-7ae7b0f-oldmain (PR=2, old main) vs baseline
|
||||
184 (PR=2, old main, L4). I will cold-verify those three when they land; my L4→L1 concern is on
|
||||
hold pending the PR=2 result, not yet a confirmed regression. Live lasu-f68b63 stack = active
|
||||
lasuite-drive proof run (expected, not a leak).
|
||||
|
||||
### M2 fix-forward APPROVE: be2026a (services_converged completed-one-shot rule) @2026-06-11T00:31Z
|
||||
|
||||
Builder proposed a 2nd lasuite-drive P2b fix on branch `fix/converged-oneshot @ be2026a` and asked
|
||||
approval before merging to main (M2 "trivial fix-forward w/ Adversary approval" path). Cold-verified
|
||||
independently (fresh clone of be2026a at /root/adv-be2026a on cc-ci, NOT the Builder's working tree):
|
||||
|
||||
- **Diff** (`git diff origin/main..be2026a runner/harness/lifecycle.py`, read myself): in
|
||||
`services_converged`, a `cur != want` deficit now passes ONLY if `docker service ps <svc>` shows
|
||||
ALL task states == `Complete`. Conservative: any Running/Preparing/Pending (spinning up) or
|
||||
Failed/Rejected (broken) in the deficit still returns False; no-tasks-yet still False; plain N/N
|
||||
and 0/0 unchanged. Targeted addition, not a rewrite.
|
||||
- **False-green analysis (my own):** only `restart_policy:none` one-shots ever show `Complete`; a
|
||||
normal crashed service shows Failed/Running(restarting), never Complete. Even if converge passed
|
||||
on a completed-but-ineffective one-shot, two INDEPENDENT gates still catch it — the generic
|
||||
`test_serving` HTTP floor and the custom-tier functional test (lasuite-drive
|
||||
`test_minio_storage.py` upload→list→download is the real bucket gate). Defense-in-depth holds; I
|
||||
could not construct a false-green path.
|
||||
- **Tests** `tests/unit/test_converged_oneshot.py` (read + cold-ran): 7 cases pin exactly the
|
||||
non-vacuity criteria — completed→converged, Failed→NOT, mixed Complete+Failed→NOT (covers the
|
||||
`docker service ps` history concern), Preparing→NOT, no-tasks→NOT, N/N→converged, 0/0→converged.
|
||||
- **Cold suite+lint from fresh be2026a checkout:** `cc-ci-run -m pytest tests/unit -q` → **199
|
||||
passed**; the 7 new tests pass alone; `nix develop .#lint --command scripts/lint.sh` → **lint:
|
||||
PASS**. Matches Builder's claim.
|
||||
- **Root cause judged genuine P2b regression** (hook moved into ops.py pre_install runs BEFORE the
|
||||
install assert; the completed one-shot's 0/1 then burns DEPLOY_TIMEOUT in the converge poll). The
|
||||
fix accepts a genuinely-healthy deploy (HTTP 200, all other services 1/1) the old `cur!=want`
|
||||
wrongly rejected — correction, not masking.
|
||||
- **Not on main** — confirmed `all(s == "Complete")` absent from origin/main; Builder held the gate.
|
||||
- **Disclosed semantic delta** (a failing one-shot now blocks install convergence earlier vs later
|
||||
at custom-tier): ACCEPTED — both paths RED, no false-green, no enrolled recipe has a
|
||||
baseline-failing one-shot.
|
||||
|
||||
**VERDICT: fix-forward be2026a APPROVED, conditional on:**
|
||||
1. Post-merge lasuite-drive proof re-run @ffa7d585afa2 PR=1 lands **L5** (binding end-to-end proof
|
||||
the fix resolves the converge hang — if it doesn't, the diagnosis was wrong and approval voids).
|
||||
2. I re-verify the MERGED diff == be2026a diff (no extra change sneaks in at merge).
|
||||
3. discourse PR=2 A/B pair (m2p-discourse / ab-discourse-7ae7b0f-oldmain — no one-shots, unaffected
|
||||
by this fix) completes and I cold-verify those levels too.
|
||||
This APPROVE does NOT clear M2; M2 still needs all per-recipe levels reconciled + my independent
|
||||
sample re-check + zero-leak teardown.
|
||||
|
||||
### be2026a merge cold-verify — condition #2 SATISFIED @2026-06-11T00:42Z
|
||||
|
||||
Builder merged be2026a as 6cabbe7 (build 350 green, origin/main now b4505ac). Independently checked:
|
||||
`diff origin/main:runner/harness/lifecycle.py be2026a:...` → **IDENTICAL**; the merged
|
||||
`tests/unit/test_converged_oneshot.py` → **IDENTICAL** to be2026a. Clean merge, no extra change
|
||||
slipped in — approval condition #2 met. m2p-lasuite-drive (pre-fix) landed L0 (install/converge
|
||||
timeout) = the diagnosed symptom (Builder disclosed b4505ac it SIGINT-shortcut the doomed burn;
|
||||
binding proof is the post-fix m2p2 re-run). REMAINING be2026a conditions: #1 post-fix lasuite-drive
|
||||
L5, #3 discourse PR=2 A/B cold-check — both pending (m2p-discourse running, then ab-oldmain, then
|
||||
m2p2-lasuite-drive).
|
||||
|
||||
### be2026a conditions CLEARED + SSO-baseline staleness finding (independent) @2026-06-11T01:12Z
|
||||
|
||||
Reached the conclusions below COLD (own git archaeology + run-dir jq) BEFORE reading the Builder's
|
||||
01:10Z inbox — which then concurred. Anti-anchoring preserved (no JOURNAL read; inbox read after my
|
||||
own derivation).
|
||||
|
||||
**be2026a fix-forward — ALL 3 CONDITIONS SATISFIED → fix-forward FULLY CLEARED:**
|
||||
1. **Post-fix lasuite-drive (m2p2, merged main 6cabbe7, ffa7d585afa2, PR=1): L4, rc=0, 3m19s.**
|
||||
Independently verified: flags clean_teardown=true + no_secret_leak=true; all 4 essential rungs
|
||||
pass; `test_minio_storage::...object_roundtrip` PASSED; `test_oidc_..._keycloak` PASSED. The
|
||||
install converge no longer hangs — both fix-forwards (1357544 best-effort poll + 6cabbe7
|
||||
completed-one-shot converge) exercised in one run. The literal "L5" in my condition is
|
||||
**unmeetable on current code and NOT an rcust effect** — see staleness finding below; I accept
|
||||
the L4-equivalence. Fix works end-to-end.
|
||||
2. **Merged diff == branch diff** — verified earlier (4428e76): lifecycle.py + test file
|
||||
byte-identical to be2026a.
|
||||
3. **discourse A/B — restructure-NEUTRAL.** m2p-discourse (NEW main, 7ae7b0f, PR=2) = L1 and
|
||||
ab-discourse-7ae7b0f-oldmain (OLD main, SAME ref, SAME PR=2) = L1, SAME stage (upgrade), SAME
|
||||
message (`eb96de94+U` HC1 re-checkout). old==new byte-identical → rcust did NOT regress discourse.
|
||||
The L4(184)→L1 vs baseline is pre-existing env drift since 06-05 (filed below), not rcust.
|
||||
|
||||
**FINDING [adversary] — M2 baseline matrix has 3 STALE L5 entries (lasuite-docs/drive/meet).**
|
||||
Independently established: the level ladder dropped 6-rung(L5)→4-rung(max L4, integration &
|
||||
recipe-local now OPTIONAL/non-laddered) in mainline PR#6 (c51cd84 "4-rung ladder", + 46e2cdb),
|
||||
which `git merge-base --is-ancestor c51cd84 01e6d49^` confirms is an ANCESTOR OF PRE-RCUST MAIN.
|
||||
The rcust merge touches level.py NOT AT ALL and results.py by +4 cosmetic P5 lines; compute_level
|
||||
+ derive_rungs are byte-identical old-main↔merged-main. So NO current-code run (rcust or pre-rcust)
|
||||
can produce L5; baselines 188/189/204 (L5, integration:pass) were recorded under the OLD schema
|
||||
(run 204 ran 06-09 hours before the refactor deployed). **rcust is INNOCENT of L4≠L5.** Integration
|
||||
coverage is NOT lost: the requires_deps OIDC tests EXECUTE and PASS (skip-count 0) on current code —
|
||||
verified in m2p2 AND the sweep's m2r-lasuite-docs (`test_oidc_login_via_keycloak` +
|
||||
`test_oidc_password_grant_...` PASSED) and m2r-lasuite-meet (`...password_grant...` PASSED).
|
||||
ACCEPTED equivalence for the M2 matrix: **old L5 ≡ new L4 (all 4 essential rungs pass) + requires_deps
|
||||
OIDC test PASSED (skip-count 0)**. Under this, lasuite-docs (m2r L4) / lasuite-meet (m2r L4) /
|
||||
lasuite-drive (m2p2 L4) all MATCH. (Note: this validates — but corrects the basis of — the Builder's
|
||||
first-sweep "lasuite-docs/meet matched baseline"; they are L4+OIDC, not numeric L5.) This is a
|
||||
matrix-staleness correction, NOT a rcust regression; no VETO.
|
||||
|
||||
**Still OPEN for the M2 verdict (my side):** (a) per-recipe levels reconciled vs the CORRECTED
|
||||
baseline for all 21; (b) bluesky-pds is L0 on BOTH old & new main (upstream image
|
||||
`Cannot find module index.js`) — restructure-neutral but also cannot match its L4-equiv baseline on
|
||||
ANY current run → needs a DECISIONS/DEFERRED note as non-rcust upstream breakage, not a silent
|
||||
mismatch; (c) the 2 drone-path !testme runs (immich#2/plausible#3); (d) zero-leak teardown sweep;
|
||||
(e) my own independent re-check of ≥5 recipes' logs + ALL mismatches before any M2 PASS.
|
||||
|
||||
---
|
||||
|
||||
## M2 — merged-main real-CI regression sweep: **PASS** @2026-06-11T01:15Z
|
||||
|
||||
Cold-verified the M2 claim (STATUS gate "M2 CLAIMED ~01:30Z") from my own clone + direct on cc-ci,
|
||||
re-running/ re-parsing rather than trusting Builder logs. Every M2.0–M2.4 item holds.
|
||||
|
||||
**M2.2 canaries — cold RE-RAN myself** from a fresh `origin/main` checkout (/root/adv-be2026a @
|
||||
origin/main): `cc-ci-run -m pytest tests/regression/ -m canary -v` → **7/7 passed (301s)**, incl.
|
||||
`bad-false-green` (the false-green detector) + all four RED canaries (bad-install/upgrade/backup/
|
||||
restore) caught at their designed tier. The level system is NOT inflating. (log /root/adv-canary.log)
|
||||
|
||||
**M2.3 per-recipe — all 21 reconciled (cold jq on each run dir):**
|
||||
- 13 clean: cryptpad/custom-html/ghost/hedgedoc/keycloak/matrix-synapse/n8n/uptime-kuma = L4;
|
||||
mailu/custom-html-tiny = L2 (backup_restore N/A); mumble = L4 (deploy-count=1) — all == baseline,
|
||||
clean_teardown=true.
|
||||
- 2 designed-bad canaries genuinely exercised: bkp-bad rungs backup_restore=**fail** (backup=fail);
|
||||
rst-bad backup_restore=**fail** (backup=pass→restore=fail). The L1 cap is upgrade-N/A ladder
|
||||
semantics; the designed failure is recorded in the rung (verified — NOT a coincidental
|
||||
level-match).
|
||||
- immich/mattermost-lts/plausible: **L4 @ exact baseline refs** (m2b-*) — baseline REPRODUCED on the
|
||||
restructured harness (cold-verified earlier this session).
|
||||
- discourse: m2p-discourse (NEW main) == ab-discourse-7ae7b0f-oldmain (OLD main) — SAME ref/PR=2,
|
||||
SAME stage, SAME upgrade-HC1 message (`eb96de94+U`), SAME L1. **old==new ⇒ rcust-neutral**; the
|
||||
L4(184)→L1 is pre-existing env drift since 06-05 (DEFERRED.md), NOT caused by the restructure.
|
||||
- lasuite-docs/-meet/-drive: L4 all-rungs-pass + requires_deps OIDC test PASSED (skip-count 0)
|
||||
[lasuite-drive m2p2 also MinIO PASSED, post-both-fixes, rc=0]. Their "L5" baselines are STALE:
|
||||
the 6→4-rung ladder landed in mainline c51cd84 (PR#6), which `git merge-base --is-ancestor
|
||||
c51cd84 01e6d49^` confirms PREDATES the rcust merge; level.py untouched by the merge, derive_rungs
|
||||
byte-identical old↔new. **rcust-innocent; integration coverage preserved** (OIDC tests execute &
|
||||
pass). Accepted equivalence old L5 ≡ new L4-all-pass + OIDC-pass.
|
||||
- bluesky-pds: EXCLUDED — `Cannot find module /app/index.js` crash-loop on BOTH old & new main at
|
||||
every ref → upstream image breakage, rcust-neutral. DEFERRED.md note present.
|
||||
|
||||
**M2.3 drone→harness path:** drone builds **356 (immich) + 357 (plausible)** = `build_event=custom`
|
||||
(bridge-triggered; distinct from push builds 358-361), trigger=autonomic-bot, both **success**
|
||||
(verified in drone sqlite DB); run dirs 356/357 = immich L4 pr=2 / plausible L4 pr=3, customization
|
||||
manifest present, clean_teardown=true.
|
||||
|
||||
**M2.4 customizations actually executed (cold-grep):** manifest block **21/21** logs; mumble
|
||||
`ready-probe OK (tcp 3x) 127.0.0.1:64738`; ghost `ccci-overlay: provided compose.ccci.yml ...
|
||||
base deploy auto-chaos` (P2a first-class path live); cryptpad `EXTRA_ENV='<hook>'`; immich
|
||||
`ops.py[pre_backup,pre_restore,pre_upgrade]` + `pre-op seed` lines (migrated ctx hooks run).
|
||||
|
||||
**Teardown:** `docker stack ls` = infra (backups/bridge/dashboard/reports/drone/traefik) +
|
||||
warm-keycloak ONLY, **zero leaked app stacks** (checked after ALL runs incl. drone-path).
|
||||
|
||||
**Fix-forwards (both Adversary-approved, additive):** 1357544 (lasuite-drive best-effort poll, appr
|
||||
57c66ad) + be2026a/6cabbe7 (services_converged completed-one-shot, appr a531746) — merged diff ==
|
||||
branch diff, all 3 be2026a conditions cleared (24a203a). Cold unit suite on post-fix main = 199
|
||||
passed, lint PASS.
|
||||
|
||||
**VERDICT: M2 PASS.** No regression CAUSED BY the restructure: every deviation from the baseline
|
||||
matrix is proven rcust-neutral by same-ref old-vs-new A/B (discourse, bluesky) or is a pre-rcust
|
||||
stale-schema artifact with coverage preserved (3 lasuite), all documented in DEFERRED.md — not a
|
||||
silent mismatch. The false-green detector is green on my own cold canary run. No findings filed,
|
||||
no VETO.
|
||||
|
||||
**M1 PASS (01f9f70) + M2 PASS (this entry) both stand** → the phase DoD handshake is satisfied; the
|
||||
Builder may write `## DONE` to STATUS-rcust.md. (M1's unit+lint acceptance still holds on post-fix
|
||||
main: 199 passed / lint PASS, the fix-forwards being additive + separately approved.)
|
||||
|
||||
184
REVIEW-shot.md
Normal file
184
REVIEW-shot.md
Normal file
@ -0,0 +1,184 @@
|
||||
# REVIEW-shot.md — Adversary verdicts, phase `shot` (recipe screenshot audit & repair)
|
||||
|
||||
Owner: Adversary loop. Append-only verdict log. Gates: M1 (audit+diagnosis), M2 (all working).
|
||||
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md`.
|
||||
|
||||
No gate CLAIMED yet (phase just opened; Builder has not bootstrapped STATUS-shot.md). Doing
|
||||
independent cold ground-truth prep below so M1/M2 cold-verify is fast and un-anchored.
|
||||
|
||||
---
|
||||
|
||||
## Independent cold pre-audit (Adversary, @2026-06-11T01:20Z)
|
||||
|
||||
Method: ssh cc-ci, scanned `/var/lib/cc-ci-runs/*/results.json` for recipe + `screenshot` field +
|
||||
on-disk `screenshot.png` size; scp'd suspect PNGs locally and **looked at them** (Read tool).
|
||||
This is MY ground truth, formed before any Builder claim — to compare against the Builder's matrix.
|
||||
|
||||
PNG sizes from latest representative runs (m2r-* sweep + numbered drone runs):
|
||||
|
||||
| recipe | PNG bytes | my visual read | class |
|
||||
|---|---|---|---|
|
||||
| immich | 4801 | pure blank white frame | **BLANK** |
|
||||
| n8n | 4801 | blank near-white frame | **BLANK** |
|
||||
| lasuite-meet | 4801 | (size-identical to immich/n8n 4801B — blank tell) | BLANK (to confirm visually) |
|
||||
| cryptpad | 4802 | blank light-grey frame | **BLANK** |
|
||||
| keycloak | 8764 | spinner + "Loading the Administration Console" — paint-race loading state, NOT a real login form | **BLANK/LOADING** (not the "genuine sparse login" §2 guessed) |
|
||||
| lasuite-docs | 6022 | bare spinner on white | **BLANK/LOADING** |
|
||||
| lasuite-drive | ~5.9K | (size sibling of lasuite-docs — likely same spinner) | BLANK (to confirm) |
|
||||
| plausible | null / NO PNG | every run null (122→357 incl. 357); run dir has no screenshot.png; capture stdout not in run dir (goes to Drone build log) — root cause still to trace | **NULL** |
|
||||
| ghost | 444183 | (reference healthy, §2) | OK (visual-confirm at M2) |
|
||||
| mattermost-lts | 242139 | reference healthy | OK |
|
||||
| hedgedoc | 131967 | reference healthy | OK |
|
||||
| discourse | 66-67K | reference healthy | OK |
|
||||
| custom-html | 35707 | reference healthy | OK |
|
||||
| mailu | 33800 | reference healthy | OK |
|
||||
| matrix-synapse | 33296 | reference healthy | OK |
|
||||
| uptime-kuma | 30858 | reference healthy | OK |
|
||||
| custom-html-tiny | 12950 | reference healthy | OK |
|
||||
| mumble | 7913 | voice server — web-UI N/A candidate (confirm) | N/A? |
|
||||
|
||||
Confirmed defect classes match the orchestrator pre-audit (§2): SPA paint-race (domcontentloaded
|
||||
fires before JS paints) → immich/n8n/cryptpad fully blank, keycloak/lasuite-docs/-drive caught at
|
||||
loading spinner; plausible never captures (null on every run). **The 4801B byte-identical size is a
|
||||
reliable blank-frame fingerprint.**
|
||||
|
||||
Open items I must still resolve when verifying:
|
||||
- plausible NULL root cause — need the Drone build log for a plausible run (capture stdout: "capture
|
||||
failed" vs "produced no file" vs step never reached). Run dir alone doesn't have it.
|
||||
- lasuite-meet / lasuite-drive / mumble — visual confirm.
|
||||
- Authoritative enrolled-recipe set: every `tests/<recipe>/recipe_meta.py` minus fixtures
|
||||
(`_generic`, `regression`, `concurrency`, `custom-html-bkp-bad`, `custom-html-rst-bad`).
|
||||
|
||||
No verdict yet. Awaiting `claim(shot): M1`.
|
||||
|
||||
---
|
||||
|
||||
## M1: PASS @2026-06-11T01:38Z (audit + diagnosis complete)
|
||||
|
||||
Claim: `claim(shot): M1` commit e005897; matrix+diagnoses at 8978fa6. STATUS-shot.md "M1 claim".
|
||||
Verified COLD from my own clone + ssh cc-ci, **without reading JOURNAL-shot.md** (anti-anchoring).
|
||||
My independent pre-audit (commit 4f3a747, formed BEFORE reading the Builder's matrix) already
|
||||
agreed on every BLANK/LOADING/NULL read I had pre-formed — no anchoring.
|
||||
|
||||
**Enrolled set — complete, no omissions.** `ls tests/*/recipe_meta.py` = 21. Minus the two harness
|
||||
canaries `custom-html-bkp-bad`, `custom-html-rst-bad` (plan §2 explicitly excludes both) = **19**.
|
||||
The 19 matrix rows are *exactly* that set (diffed by hand) and exactly the plan §2 expected set.
|
||||
`_generic`/`regression`/`concurrency`/`unit` have no recipe_meta.py → correctly absent. ✓
|
||||
|
||||
**Every non-OK row has evidence-backed root cause (independently re-derived):**
|
||||
- plausible NULL — ran the Builder's drone-log command myself: build 357 step log shows
|
||||
`capture failed … page.goto(https://plau-…/) never returned a status in (200,301,302,303,401,403)
|
||||
after 15 attempts (45s); last status=500`. `/` 500s by design (DISABLE_AUTH) → default landing
|
||||
capture can never succeed; needs a SCREENSHOT hook to a rendering path. Confirmed. ✓
|
||||
- bluesky-pds NULL — capture is `if deploy_ok:`-gated, OUTSIDE the deploy try/except
|
||||
(runner/run_recipe_ci.py:1024, read it). install=fail level=0 → capture correctly skipped. Not a
|
||||
screenshot defect; upstream image breakage already in DEFERRED.md (rcust). ✓
|
||||
- BLANK/LOADING — screenshot.py:84-93 navigates `wait_until="domcontentloaded"` then screenshots
|
||||
immediately, no paint wait; accept_statuses excludes 500 (plausible mechanism). Read the code. ✓
|
||||
- mumble NOT N/A — tests/mumble/recipe_meta.py header: deploys `compose.mumbleweb.yml`, a mumble-web
|
||||
HTTP client routed through Traefik, HEALTH_PATH "/". A real web surface IS served → correctly the
|
||||
HARDER (non-N/A) call. ✓
|
||||
|
||||
**Independent visual spot-checks (Read tool) — 11 artifacts, matrix matched reality on every one:**
|
||||
immich 4801B = pure white; n8n 4801B = blank; cryptpad 4802B = blank grey; lasuite-meet 4801B =
|
||||
pure white; keycloak 8764B = "Loading the Administration Console" spinner (NOT a real login — the
|
||||
§2 "might be a genuine login" guess was wrong, Builder classed it LOADING correctly); lasuite-docs
|
||||
6022B = bare spinner; mumble 7913B = spinner ring on grey; mattermost-lts 242139B = blue brand
|
||||
splash + logo, NO login form (correctly LOADING despite large size — size alone is NOT a sufficient
|
||||
signal, good catch); n8n run 197 30256B = real "Set up owner account" form, empty fields,
|
||||
credential-free (flaky-pass + secret-safe, confirmed); custom-html 35707B = genuine "Welcome to
|
||||
nginx!" (honest fresh-install view for a bare static host — OK); plausible = NULL via drone log.
|
||||
Includes plausible ✓ and multiple 4801B cases ✓ (M1 minimum was ≥5 incl. those — exceeded).
|
||||
|
||||
**N/A arguments — agreed:**
|
||||
- bluesky-pds → justified N/A (deploy-gated: can't screenshot what can't deploy; upstream breakage
|
||||
is pre-existing/DEFERRED, not a screenshot defect). Agreed, contingent on the upstream image still
|
||||
being broken at M2 — if it becomes deployable, it re-enters as a real recipe.
|
||||
- mumble → NOT N/A. Agreed (real mumble-web surface, evidence above).
|
||||
|
||||
No omissions, no fabricated visual reads, diagnoses are causal not symptomatic. **M1 PASS.**
|
||||
|
||||
Watch-list for M2 (so the Builder has it early — NOT blocking M1):
|
||||
1. Harness default-wait fix must stay within NAV_DEADLINE_S=45 / step worst-case ≤~60s and must
|
||||
NEVER affect a verdict on screenshot failure (R7) — I will test the failure path has teeth but
|
||||
no verdict impact, and compare pre/post run durations.
|
||||
2. plausible SCREENSHOT hook must land on a credential-free *rendering* path (not /login showing a
|
||||
generated secret; not a 500 page).
|
||||
3. mattermost-lts proof: a bigger PNG is NOT acceptance — I will visually confirm the real login,
|
||||
not a brand splash.
|
||||
4. Secret-safety: every final PNG must show no generated credentials (install wizards, secrets
|
||||
pages). n8n's "Set up owner account" with EMPTY fields is the safe shape; a pre-filled one is not.
|
||||
5. M2 requires ≥2 proof runs via the drone `!testme` path + me Reading *every* final PNG.
|
||||
|
||||
Did not read JOURNAL-shot.md before this verdict. No finding filed (audit is accurate). No VETO.
|
||||
|
||||
---
|
||||
|
||||
## M2: PASS @2026-06-11T07:17:53Z — all screenshots working (cold-verified from scratch)
|
||||
|
||||
Verified independently from a cold start (my own clone, my own scp/Read/re-runs; did NOT read
|
||||
JOURNAL before this verdict). Claim commit 196156e. Every M2 DoD item checked:
|
||||
|
||||
**1. Every final PNG Read (18/18) — real, representative, credential-free.** Pulled each PNG by scp,
|
||||
Read it with the image tool, byte-size matched the claim on all 18:
|
||||
- Fixed-class (10): immich 234351B "Welcome to Immich" onboarding; plausible 64132B real
|
||||
registration form (EMPTY fields); keycloak 215587B real "Sign in to your account" (EMPTY) — was
|
||||
the 8764B "Loading Admin Console" spinner at M1, settle fix resolved it; cryptpad 57310B real
|
||||
landing + doc-type picker; lasuite-meet 225686B real video-conf landing; lasuite-docs 284769B real
|
||||
Docs landing; lasuite-drive 132037B real "Fichiers" landing; n8n 26433B "Set up owner account"
|
||||
(ALL fields EMPTY — secret-safe, now deterministic); mattermost-lts 178367B **real "Log in to your
|
||||
account" form (EMPTY) — NOT the byte-identical interstitial** (hook v2 click-through works — my
|
||||
sharpest watch-item, resolved); mumble 7980B loader spinner (see §N/A).
|
||||
- Healthy-class (8): ghost 444183B blog landing; hedgedoc 131967B landing; discourse 66121B forum +
|
||||
welcome topic; custom-html 35707B "Welcome to nginx!" (honest fresh-install); custom-html-tiny
|
||||
12950B seeded content; mailu 33800B sign-in (EMPTY); matrix-synapse 33296B "It works!"; uptime-kuma
|
||||
30858B "Create your admin account" (EMPTY).
|
||||
Every login/setup form has EMPTY fields — NO generated credential is shown anywhere. Secret-safety
|
||||
cardinal guardrail holds across all 18.
|
||||
|
||||
**2. No verdict/level regression.** All 10 proof runs status=pass at their baseline level (immich
|
||||
/plausible/keycloak/cryptpad/lasuite-*/n8n/mumble=4, mattermost-lts=2). screenshot field populated
|
||||
on every one. no_secret_leak=true on every proof run I sampled (370/371/keycloak/n8n/mattermost
|
||||
/mumble).
|
||||
|
||||
**3. ≥2 genuine drone `!testme` proofs — confirmed end-to-end, NOT manual.** ccci-bridge_app logs:
|
||||
`[poll] triggered build 370 for immich@107d7220 (PR #2, comment 14321) by autonomic-bot` and
|
||||
`...build 371 for plausible@13458fac (PR #3, comment 14322)...`, both `reflected outcome ...:
|
||||
success`. The bridge polled Gitea, found real !testme comments, triggered the builds, reflected
|
||||
verdicts back — the full comment→build path. Drone params {RECIPE,PR,REF,SRC}, event=custom,
|
||||
trigger/sender=autonomic-bot — matches the Phase-1c bridge-!testme fingerprint (REVIEW-1c:110).
|
||||
|
||||
**4. Durations unaffected (no balloon).** Drone same-recipe pre/post: immich 199s→198s, plausible
|
||||
209s→166s (faster — capture no longer burns 45s failing on the 500). Screenshot step wait budget =
|
||||
60000ms exactly (unit test_wait_budget_within_step_cap + my own cold probe). ≤~60s holds.
|
||||
|
||||
**5. R7 (cosmetics never block) — intact.** Call site run_recipe_ci.py ~1024-1037 is OUTSIDE the
|
||||
deploy try/except AND double-wrapped in its own try/except (`_scrub`-bed log) — and git log proves
|
||||
NO shot-phase commit touched run_recipe_ci.py (call site unchanged). capture() swallows everything →
|
||||
None → placeholder. I cold-probed the new helpers independently: _settle swallows all exceptions,
|
||||
_snap keeps the larger frame (A1 fix, 5/5), 60s budget — 9/9+5/5 pass. Screenshot unit suite 12/12
|
||||
+ card suite 10/10 ran GREEN cold on the real harness (cc-ci-run) from my scp'd clone.
|
||||
|
||||
**6. Dashboard/card/badge render — live 200.** GET dashboard / → 200; runs/370+371/screenshot.png →
|
||||
200 image/png; badge/immich.svg + badge/plausible.svg → 200 image/svg+xml.
|
||||
|
||||
**7. N/A set (19/19 enrolled, no omissions) — AGREED.**
|
||||
- bluesky-pds → N/A, re-confirmed at M2 (ab-bluesky-pds-oldmain: install=fail, level=0,
|
||||
screenshot=null → placeholder correct; upstream MODULE_NOT_FOUND still broken, DEFERRED).
|
||||
- mumble → N/A-variant, AGREED — **this reverses my M1 "NOT N/A" stance, on NEW evidence not
|
||||
available at M1.** rankenstein/mumble-web:0.5 renders no usable UI for an anonymous browser:
|
||||
connect-dialog DOM genuinely absent (probe4 console: `#connect-dialog_input_address ... did not
|
||||
match any element`), perpetual loading-container spinner at 5/15/30/60/90s (probe2) — corroborated
|
||||
by my own Read of the 7980B spinner PNG. The loader frame is the literal web-surface reality every
|
||||
visitor gets; mumble's actual function (voice) is fully protocol-tested; fix needs a recipe/overlay
|
||||
change (out of scope, guardrail prefers upstream). Documented in DEFERRED with an upstream
|
||||
question. NOTE (not a defect, not a veto): the dashboard shows the honest loader frame rather than
|
||||
the "no screenshot" placeholder — acceptable as a documented, agreed limitation, NOT a healthy-app
|
||||
screenshot.
|
||||
|
||||
Finding A1 (blank-retry regression) was filed, fixed (7ad7d1f), and CLOSED after my cold re-test.
|
||||
No open findings. No fabricated reads — every matrix/claim value matched what I independently
|
||||
observed. **M2 PASS. No VETO.** With M1 PASS (ae10b55) + M2 PASS both fresh and A1 closed, the DoD
|
||||
handshake (§6.1) is satisfied — the Builder may write `## DONE` to STATUS-shot.md.
|
||||
|
||||
(Consulted no JOURNAL-shot.md before forming this verdict.)
|
||||
6
STATUS-lvl5.md
Normal file
6
STATUS-lvl5.md
Normal file
@ -0,0 +1,6 @@
|
||||
# STATUS — Phase lvl5 (L5 lint rung + de-cap)
|
||||
|
||||
Phase: lvl5 — OPEN (bootstrapped 2026-06-11)
|
||||
Gate: none claimed yet
|
||||
In flight: P1 — level.py new semantics + lint executor design (abra lint behavior probe on CI host first)
|
||||
Blockers: none
|
||||
287
STATUS-rcust.md
287
STATUS-rcust.md
@ -1,22 +1,293 @@
|
||||
# STATUS — sub-phase rcust (recipe-customization restructure)
|
||||
|
||||
## DONE
|
||||
|
||||
Phase complete 2026-06-11: M1 PASS (REVIEW-rcust.md 01f9f70, 2026-06-10) + M2 PASS (REVIEW-rcust.md
|
||||
3245150, 2026-06-11) — both fresh, Adversary-verified, no standing VETO. Restructure merged to main
|
||||
(01e6d49 + approved fix-forwards 1357544, 6cabbe7); all 21 recipes reconciled vs corrected
|
||||
baseline; canaries 7/7 (Adversary's own cold run); drone path covered; zero leaked apps.
|
||||
Non-rcust follow-ups filed in machine-docs/DEFERRED.md (discourse abra-stamp env drift,
|
||||
bluesky-pds upstream image breakage re-pin).
|
||||
|
||||
Plan: /srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md (SSOT for this phase).
|
||||
Reference spec: docs/recipe-customization.md @ 76a4b6b.
|
||||
Work branch: `restructure/recipe-custom` (one commit per phase P1–P6; merged to main only after M1 PASS).
|
||||
|
||||
## Phase progress
|
||||
|
||||
- [ ] P1 — harness/meta.py single loader + key registry + migrate L1–L6 + unit tests + doc gen
|
||||
- [ ] P2 — delete legacy keys/paths (CHAOS_BASE_DEPLOY, OIDC_AT_INSTALL, SKIP_GENERIC meta, conftest cleanup)
|
||||
- [ ] P3 — uniform ctx hook convention
|
||||
- [ ] P4 — custom-test ergonomics (placement rule, op_state/deps fixtures)
|
||||
- [ ] P5 — customization manifest
|
||||
- [ ] P6 — docs
|
||||
- [x] P1 — single loader + key registry + migrate L1–L6 + unit tests + doc gen
|
||||
(branch commit 472a68b)
|
||||
- [x] P2 — delete legacy keys/paths: compose.ccci.yml first-class+auto-chaos; install-time deps only
|
||||
(lasuite-docs migrated, setup_custom_tests.sh gone); SKIP_GENERIC meta deleted (env dev-only +
|
||||
loud CI warning); conftest cleanup (deployed/deployed_app/app_domain gone, one `deps` fixture)
|
||||
(branch commit 8cd72fd)
|
||||
- [x] P3 — uniform ctx hook convention: HookCtx(.domain/.base_url/.meta/.deps/.op); all hooks
|
||||
take ctx; legacy signatures raise MetaError at load naming the migration (branch fd02d9f)
|
||||
- [x] P4 — custom-test ergonomics: placement rule (custom under functional/+playwright/ only),
|
||||
op_state fixture, deps fixture tests (branch 29a28e2)
|
||||
- [x] P5 — customization manifest: one block at run start (non-default meta keys, hooks, overlays,
|
||||
custom-test counts, active CCCI_SKIP_GENERIC* env overrides with !! CI flag) printed +
|
||||
embedded verbatim in results.json under "customization"; pure presentation, HC2-honoring
|
||||
(branch commit 68954be — new runner/harness/manifest.py + tests/unit/test_manifest.py)
|
||||
- [x] P6 — docs rewritten to the end state: recipe-customization.md is now the REFERENCE (was
|
||||
review spec) — §8 records R1–R9 resolutions, §4 keeps the generated table + HookCtx, §5 the
|
||||
end-state shapes; testing.md invariant updated to install-time-deps isolation, generic
|
||||
opt-out documented dev-only; enroll-recipe.md worked examples (lasuite-docs install-time
|
||||
OIDC, mumble post-F2-14c), deps fixture, ctx signatures (branch commit da558ca)
|
||||
- [x] Adversary inbox 19:06Z (P5 manifest dashboard hygiene) — addressed: secret-NAMED meta
|
||||
values (top-level + nested dict keys) render as '<redacted>' in manifest + results.json;
|
||||
key names stay visible; unit-test pinned (branch commit 858e0f5)
|
||||
|
||||
## P1–P6 verification facts (for the eventual M1 cold-verify)
|
||||
|
||||
- WHERE: branch `restructure/recipe-custom`, P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2,
|
||||
P5=68954be, P6=da558ca, manifest-redaction fix=858e0f5 (branch head).
|
||||
- HOW: `cc-ci-run -m pytest tests/unit -q` and `nix develop .#lint --command scripts/lint.sh`
|
||||
from a clean checkout of the branch.
|
||||
- EXPECTED: 192 passed; `lint: PASS`.
|
||||
- New single loader: `runner/harness/meta.py::load()`; all-recipes typo gate + R2 proof in
|
||||
`tests/unit/test_meta.py`; docs §4 table generated by `scripts/gen-meta-docs.py` (sync pinned
|
||||
by unit test).
|
||||
|
||||
## M2 baseline matrix (built BEFORE merge, per plan M2.1)
|
||||
|
||||
Expected outcome per recipe dir for the post-merge regression sweep = most recent known-good
|
||||
evidence. Levels are results.json `level`; evidence = run id under /var/lib/cc-ci-runs/<id>/
|
||||
(on cc-ci) unless noted. Bad canaries are EXPECTED to fail at their designed tier.
|
||||
|
||||
| Recipe | Expected | Evidence |
|
||||
|---|---|---|
|
||||
| bluesky-pds | full lifecycle green: 5 tiers + 4 custom pass, deploy-count=1 (L4-equiv; pre-results-era) | Adversary cold run, REVIEW e45e0ee (Phase 2 Q4.3); weekly 06-05: up-to-date |
|
||||
| cryptpad | L4 (all four essential rungs pass) | run 181 (06-05) |
|
||||
| custom-html | L4 | run 182 (06-05) |
|
||||
| custom-html-bkp-bad | DESIGNED-BAD: backup tier fail → backup_restore=fail, L1 | run regression-bad-restore-2 (06-02) |
|
||||
| custom-html-rst-bad | DESIGNED-BAD: restore tier fail → backup_restore=fail, L1 | run regression-bad-restore-3 (06-02) |
|
||||
| custom-html-tiny | L2 (backup_restore N/A — declared EXPECTED_NA; functional N/A) | run 205 (06-09) |
|
||||
| discourse | L4 | run 184 (06-05) |
|
||||
| ghost | L4 | run 185 (06-05) |
|
||||
| hedgedoc | L4 | run 113 (06-02) |
|
||||
| immich | L4 | run 307 (06-10) |
|
||||
| keycloak | L4 | run 187 (06-05) |
|
||||
| lasuite-docs | L5 (integration pass) | run 188 (06-05) |
|
||||
| lasuite-drive | L5 (integration pass) | run 189 (06-05) |
|
||||
| lasuite-meet | L5 (integration pass) | run 204 (06-09) |
|
||||
| mailu | L2 (backup_restore N/A — no backupbot labels; functional pass) | run 191 (06-05) |
|
||||
| matrix-synapse | L4 | run 203 (06-08) |
|
||||
| mattermost-lts | L4 | run 196 (06-05) |
|
||||
| mumble | all 5 tiers pass, deploy-count=1 (L4-equiv; pre-results-era) | log ~/ccci-mumble-f214c.log on cc-ci (05-31) |
|
||||
| n8n | L4 | run 197 (06-05) |
|
||||
| plausible | L4 | run 308 (06-10) |
|
||||
| uptime-kuma | L4 | run 165 (06-02) |
|
||||
|
||||
Customization-executed spot-greps for M2.4 (mumble READY_PROBE tcp lines, cryptpad
|
||||
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + chaos base, lasuite-* deps
|
||||
provisioning + OIDC skip-count 0, immich ops.py seeds, manifest block in every log) apply on the
|
||||
sweep runs, not retroactively here.
|
||||
|
||||
## Gate
|
||||
|
||||
(none claimed yet — phase bootstrap)
|
||||
**Gate: M2 CLAIMED 2026-06-11 ~01:30Z, awaiting Adversary.**
|
||||
|
||||
### M2 claim — WHAT / HOW / EXPECTED / WHERE
|
||||
|
||||
WHAT: plan M2.0–M2.4 complete on merged main. Merge 01e6d49 (build 326 green) + two
|
||||
Adversary-approved fix-forwards: 1357544 (lasuite-drive best-effort bucket poll, approval 57c66ad)
|
||||
and 6cabbe7 = merge of be2026a (services_converged completed-one-shot rule, approval a531746,
|
||||
build 350 green on 914c166, merged-diff==branch-diff verified 4428e76). Canaries 7/7. All 21
|
||||
recipe dirs reconciled vs the CORRECTED baseline (the Adversary-accepted L5≡L4+OIDC equivalence
|
||||
for the three stale lasuite-* rows; one justified exclusion: bluesky-pds, non-rcust upstream image
|
||||
breakage, DEFERRED.md). Drone→harness path covered (2 PR !testme runs green). Zero leaked apps.
|
||||
|
||||
RECONCILIATION (final evidence per recipe; run dirs under /var/lib/cc-ci-runs/):
|
||||
|
||||
| Recipe | Baseline | Final evidence | Match |
|
||||
|---|---|---|---|
|
||||
| bluesky-pds | full green (pre-results-era) | m2r L0 == m2rr L0 == ab-oldmain L0, all `Cannot find module /app/index.js` crash-loop | EXCLUDED: upstream image breakage, harness-neutral (DEFERRED.md) |
|
||||
| cryptpad | L4 | m2r-cryptpad L4 | ✓ |
|
||||
| custom-html | L4 | m2r-custom-html L4 | ✓ |
|
||||
| custom-html-bkp-bad | designed backup fail, L1 | m2r: backup fail exactly | ✓ |
|
||||
| custom-html-rst-bad | designed restore fail, L1 | m2r: backup pass → restore fail exactly | ✓ |
|
||||
| custom-html-tiny | L2 (declared EXPECTED_NA) | m2r-custom-html-tiny L2 | ✓ |
|
||||
| discourse | L4 (184, 06-05) | m2r/m2b/m2p + ab-oldmain×2: ALL deviations byte-identical old==new harness (restore race @default head: L2==L2; upgrade-HC1 @baseline ref PR=2: L1==L1, stamp eb96de94+U both) | env drift since 06-05, rcust-neutral (Adversary-verified, condition 3 of a531746) |
|
||||
| ghost | L4 | m2r-ghost L4 | ✓ |
|
||||
| hedgedoc | L4 | m2r-hedgedoc L4 | ✓ |
|
||||
| immich | L4 | m2b-immich L4 @baseline ref + drone-path run 356 L4 | ✓ |
|
||||
| keycloak | L4 | m2r-keycloak L4 | ✓ |
|
||||
| lasuite-docs | L5 (stale schema) | m2r-lasuite-docs L4 all-pass + OIDC PASSED skip-0 | ✓ (accepted equivalence) |
|
||||
| lasuite-drive | L5 (stale schema) | m2p2-lasuite-drive L4 all-pass + OIDC + MinIO PASSED, rc=0, post-both-fixes | ✓ (accepted equivalence) |
|
||||
| lasuite-meet | L5 (stale schema) | m2r-lasuite-meet L4 all-pass + OIDC PASSED | ✓ (accepted equivalence) |
|
||||
| mailu | L2 | m2r-mailu L2 | ✓ |
|
||||
| matrix-synapse | L4 | m2r-matrix-synapse L4 | ✓ |
|
||||
| mattermost-lts | L4 | m2b-mattermost-lts L4 @baseline ref | ✓ |
|
||||
| mumble | all 5 tiers (pre-results-era) | m2r-mumble all tiers pass, deploy-count=1 | ✓ |
|
||||
| n8n | L4 | m2r-n8n L4 | ✓ |
|
||||
| plausible | L4 | m2b-plausible L4 @baseline ref + drone-path run 357 L4 | ✓ |
|
||||
| uptime-kuma | L4 | m2r-uptime-kuma L4 | ✓ |
|
||||
|
||||
HOW (cold, from the Adversary's own clone / direct on cc-ci):
|
||||
- per-recipe: `jq '{recipe,level,rungs,flags}' /var/lib/cc-ci-runs/<id>/results.json` for every id
|
||||
above; logs in /root/m2-logs/, /root/m2-baseline-logs/, /root/m2-proof-logs/, /root/m2-ab-logs/.
|
||||
- canaries: /root/m2-canary.log (7/7, fresh clone of merged main).
|
||||
- drone path: builds 356 (immich#2) + 357 (plausible#3) `custom` events SUCCESS in drone DB
|
||||
(`docker cp <drone_cid>:/data/database.sqlite` + sqlite query, as documented above); run dirs
|
||||
356/357 carry `customization` manifest keys + clean flags; triggered by real `!testme` comments
|
||||
(gitea comment ids 14317/14318).
|
||||
- M2.4 spot-greps: section above (manifest 21/21, mumble tcp probe, ghost/discourse overlay+
|
||||
BACKUP_VERIFY, lasuite deps+OIDC, immich seeds, cryptpad EXTRA_ENV hook+playwright).
|
||||
- zero-leak: `docker stack ls` on cc-ci → infra (backups/bridge/dashboard/reports/drone/traefik)
|
||||
+ warm-keycloak ONLY (checked 01:27Z, after ALL runs incl. drone-path).
|
||||
- tree: origin/main, working tree clean, every claim-referenced commit pushed.
|
||||
|
||||
EXPECTED: every check above reproduces as stated; no recipe regresses vs the corrected baseline.
|
||||
|
||||
WHERE: origin/main @ (this commit); REVIEW-rcust.md holds M1 PASS (01f9f70), be2026a approval +
|
||||
all-conditions-cleared (a531746, 24a203a); DEFERRED.md holds the two non-rcust follow-ups
|
||||
(discourse abra-stamp mechanism, bluesky-pds upstream re-pin).
|
||||
|
||||
**Gate history: M2 IN PROGRESS** — M1 PASS in REVIEW-rcust.md (01f9f70, 2026-06-10).
|
||||
|
||||
- M2.0 merge: `restructure/recipe-custom` merged to main as 01e6d49 (merge commit, no force);
|
||||
push build green: drone build **326 success** on 01e6d49 (API-verified).
|
||||
- M2.2 canary suite: **7/7 PASSED** in 286s (fresh clone of merged main at /root/m2-sweep on
|
||||
cc-ci, log /root/m2-canary.log) — green canaries pass, all four RED canaries still caught at
|
||||
their designed tiers (bad-install/bad-upgrade/bad-backup/bad-restore).
|
||||
- M2.3 per-recipe sweep (driver /root/m2-driver.sh, 2 concurrent, REF = mirror heads; logs
|
||||
/root/m2-logs/<r>.log; results /var/lib/cc-ci-runs/m2r-<r>/): first pass **15/21 matched
|
||||
baseline** —
|
||||
hedgedoc/custom-html/custom-html-tiny/uptime-kuma/n8n/cryptpad/ghost/keycloak/mumble/mailu/
|
||||
matrix-synapse/lasuite-docs/lasuite-meet at baseline level; both DESIGNED-BAD canaries failed
|
||||
at exactly their designed tier (bkp-bad: backup fail; rst-bad: backup pass→restore fail).
|
||||
6 below baseline, ALL flake-shaped (known modes, not new assertion semantics):
|
||||
discourse+plausible+mattermost-lts+immich restore data-integrity (the documented pre-existing
|
||||
truncated-dump capture race — discourse BACKUP_VERIFY honestly failed 3/3 attempts, its
|
||||
docstring + the 06-05 weekly report record this exact mode pre-restructure; seeds verified
|
||||
committed by ops.py read-back asserts, i.e. the migrated ctx hooks executed correctly);
|
||||
bluesky-pds abra `FATA deploy timed out` at default 600s during concurrent image pulls;
|
||||
lasuite-drive pre_install MinIO one-shot 90s timeout (bucket appeared later — every
|
||||
subsequent tier passed). Serial re-runs (MAX=1, /root/m2-rerun.sh, logs /root/m2-rerun-logs/,
|
||||
results m2rr-<r>/) completed 20:44Z — but ran default heads, not baseline refs (superseded by
|
||||
the targeted runs below).
|
||||
- M2.3 reconciliation runs (serial, MAX=1):
|
||||
- **Baseline-ref re-runs on merged main** (/root/m2-baseline-runs.sh, logs /root/m2-baseline-logs/,
|
||||
results m2b-<r>/): **plausible L4, mattermost-lts L4, immich L4** at their exact baseline refs —
|
||||
baseline REPRODUCED on the restructured harness; restore-race cluster closed for those three.
|
||||
m2b-discourse @7ae7b0f (ran PR=0; baseline run 184 was PR=2): **L1, NEW mode** — upgrade HC1
|
||||
`deployed chaos commit 'eb96de94+U', not PR-head '7ae7b0f76efb'`. Investigated facts (cold-checkable
|
||||
in /var/lib/cc-ci-runs/m2b-discourse/): `eb96de94` IS the prev-base tag commit `0.7.0+3.3.1`
|
||||
(`git -C .../abra/recipes/discourse rev-list -n1 0.7.0+3.3.1`); the preserved per-run clone HEAD =
|
||||
7ae7b0f (the upgrade re-checkout DID run and persist); the
|
||||
`service "sidekiq" depends on undefined service "discourse"` log line is benign noise (appears
|
||||
verbatim in the PASSING m2r/m2rr upgrade sections too; published compose ships a dangling
|
||||
depends_on — see tests/discourse/compose.ccci.yml NOTE). So the chaos redeploy itself left the
|
||||
base stamp in place at this ref. NOT folded into the restore-flake cluster; discriminating runs
|
||||
queued (below).
|
||||
- **Old-main A/B at the m2r ref** (/root/m2-ab.sh, /root/m2-ab-logs/, results ab-<r>-oldmain/):
|
||||
discourse @7d53d4ec on OLD main = **L2 restore fail** == new-main m2r L2 at the same ref →
|
||||
restore race harness-neutral at that ref. bluesky-pds @b2d86ef on OLD main = **L0 install fail**.
|
||||
- **bluesky-pds re-characterized (not a pull timeout)**: the app container crash-loops
|
||||
`Error: Cannot find module '/app/index.js'` (MODULE_NOT_FOUND, Node v24.15.0) in ALL THREE
|
||||
failures — m2r (new main @ mirror head), m2rr (new main, serial), ab-oldmain (OLD main @ old
|
||||
default head b2d86ef). Same pinned tag, both harnesses, both refs → upstream image content moved
|
||||
under the tag; recipe cannot deploy on ANY harness. Evidence:
|
||||
`grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/default/`.
|
||||
Restructure-neutral (old==new L0).
|
||||
- M2.3 in-flight proof runs (serial queue /root/m2-proof.sh + /root/m2-proof2.sh, logs
|
||||
/root/m2-proof-logs/, driver /root/m2-proof-logs/driver.log):
|
||||
1. **lasuite-drive @baseline ref ffa7d585afa2 PR=1 on merged main @5c0676b** (post-fix-forward
|
||||
1357544) → run id m2p-lasuite-drive: **WILL LAND L0 — second P2b regression found via this
|
||||
run, root-caused LIVE.** The 1357544 best-effort path WORKED (`!!` warn + continue in the
|
||||
log); the one-shot task went **Complete** ~3min in (bucket created); but a completed
|
||||
restart_policy-none one-shot reports replicas 0/1 FOREVER, and services_converged requires
|
||||
cur==want → the install assert burned DEPLOY_TIMEOUT (1800s) and failed. Old world never saw
|
||||
this: setup_custom_tests.sh ran POST-install-assert (its own header: orchestrator runs it
|
||||
after the deploy is healthy); P2b moved the trigger to ops.py pre_install = PRE-assert.
|
||||
Verified live during the run: app HTTP 200, all other services 1/1,
|
||||
`docker service ps ..._minio-createbuckets` = Complete, pytest in converge loop 27+ min.
|
||||
**Fix-forward proposed, awaiting Adversary approval: branch `fix/converged-oneshot` @
|
||||
be2026a** — services_converged treats a replica deficit explained ENTIRELY by Complete tasks
|
||||
as converged (Failed/mixed/spinning-up/no-tasks still block; 0/0 + N/N unchanged); pinned by
|
||||
tests/unit/test_converged_oneshot.py (7 cases). Proof: working tree on cc-ci
|
||||
`cc-ci-run -m pytest tests/unit -q` → 199 passed; lint PASS.
|
||||
**APPROVED (REVIEW a531746) and MERGED to main as 6cabbe7** (merge commit, no force);
|
||||
merged diff == be2026a diff (`git diff be2026a..main -- runner/harness/lifecycle.py
|
||||
tests/unit/test_converged_oneshot.py` = empty). Push build green: drone build **350
|
||||
success** on 914c166 (branch head incl. the merge; verify on cc-ci:
|
||||
`docker cp <drone_cid>:/data/database.sqlite /tmp/d.sqlite && sqlite3 /tmp/d.sqlite
|
||||
"select build_number,build_status,build_after from builds order by build_id desc limit 5"`).
|
||||
Post-fix re-run QUEUED: /root/m2-proof3.sh waits for the discourse A/B pair to drain, then
|
||||
runs lasuite-drive @ffa7d585afa2 PR=1 from fresh clone /root/m2-postfix @6cabbe7 →
|
||||
CCCI_RUN_ID=m2p2-lasuite-drive, log /root/m2-proof-logs/lasuite-drive-postfix.log.
|
||||
EXPECTED **L5** (binding condition 1 of the approval).
|
||||
DISCLOSED INTERVENTION: in the doomed pre-fix m2p run, after the GENERIC install assert had
|
||||
already failed at the 1800s converge deadline, the OVERLAY install test entered a second
|
||||
identical 1800s converge burn — Builder sent it (pytest pid only) SIGINT at ~01:00Z to skip
|
||||
the redundant 20+ min wait. The log therefore shows `KeyboardInterrupt` at generic.py:97
|
||||
(the converge poll — the exact diagnosed line). The orchestrator's own exit paths/teardown
|
||||
untouched; run continued to upgrade/backup/restore/custom normally. The m2p result is
|
||||
diagnostic evidence of the bug, not a baseline data point — the binding proof is m2p2.
|
||||
2. **discourse @7ae7b0f PR=2 on merged main** (exact baseline-184 invocation) → m2p-discourse:
|
||||
**COMPLETE — L2, upgrade HC1 fail, chaos-version=eb96de94+U** (identical to m2b: stamp = the
|
||||
prev-base tag commit). Deterministic at this ref on new main; NOT a PR=0 artifact, NOT a race.
|
||||
install/backup/restore/custom all pass.
|
||||
3. **discourse @7ae7b0f PR=2 on OLD main** → ab-discourse-7ae7b0f-oldmain: **COMPLETE — L2,
|
||||
upgrade HC1 fail, chaos-version=eb96de94+U — BYTE-IDENTICAL failure to the new-main run.**
|
||||
**DISCOURSE A/B CLOSED: old harness == new harness at the baseline ref + baseline invocation
|
||||
(PR=2). The upgrade-HC1 mode is HARNESS-NEUTRAL — not an rcust regression.** Baseline 184's
|
||||
L4 (06-05) vs today's identical-both-worlds failure = environment/content drift since 06-05,
|
||||
outside both harnesses. Drift candidates checked and ELIMINATED: 7ae7b0f is still a live
|
||||
branch tip in the mirror (`refs/heads/upgrade-0.8.0+3.5.0` + `refs/pull/2/head` — git
|
||||
ls-remote), and upstream's latest release tag is unchanged (0.7.0+3.3.1 = eb96de94, no new
|
||||
tag since 06-05). flake.lock (abra pin) identical in both worlds. HC1 firing rather than
|
||||
false-greening is the guard working as designed.
|
||||
Cold-verify: results.json + full logs at /var/lib/cc-ci-runs/{m2p-discourse,
|
||||
ab-discourse-7ae7b0f-oldmain}/ + /root/m2-proof-logs/discourse{,-oldmain}.log.
|
||||
4. **lasuite-drive @ffa7d585afa2 PR=1 on merged main @6cabbe7 (post-converge-fix)** →
|
||||
m2p2-lasuite-drive: **COMPLETE in 3m19s, rc=0 — all 5 stages pass, deploy-count=1,
|
||||
`test_oidc_password_grant_against_dep_keycloak` PASSED (requires_deps skip-count 0),
|
||||
`test_minio_bucket_present_and_object_roundtrip` PASSED, clean_teardown+no_secret_leak
|
||||
flags true. NO converge burn: the one-shot again exceeded its 90s window (`!!` best-effort
|
||||
line), completed late, and the install assert passed straight through — both fix-forwards
|
||||
proven end-to-end.** results.json `level=4`, NOT 5 — see schema note below.
|
||||
- **BASELINE SCHEMA NOTE (affects lasuite-docs/-drive/-meet expected "L5")**: the 6-rung ladder
|
||||
(L5 integration / L6 recipe-local) was REMOVED from main by the deliberate mainline refactor
|
||||
46e2cdb + c51cd84 ("four essential rungs only — integration & recipe-local are optional",
|
||||
PR #6, 2026-06-09 ~03:00Z) — BEFORE the rcust merge and NOT part of it (merge diff
|
||||
01e6d49^1..01e6d49 touches level.py not at all and results.py by +4 lines; current
|
||||
derive_rungs/compute_level are byte-equal to the pre-merge main versions). Every post-06-09 run
|
||||
caps at L4 BY DESIGN; the integration (OIDC) test now counts inside the functional/custom rung.
|
||||
Timeline evidence: run 204 (lasuite-meet, 06-09 pre-deploy) = 6-rung level 5; all later runs =
|
||||
4-rung. EQUIVALENCE for the baseline matrix: old "L5 (integration pass)" ≡ new "L4 all-rungs
|
||||
pass + the requires_deps OIDC test PASSED (skip-count 0)". m2p2-lasuite-drive meets it; the
|
||||
m2r sweep's lasuite-docs + lasuite-meet L4-all-pass results (with their OIDC PASSED lines,
|
||||
already in M2.4 spot-greps) meet it identically.
|
||||
- M2.4 spot-greps (customizations actually executed — log evidence in /root/m2-logs/):
|
||||
manifest block present 21/21; mumble `ready-probe OK (tcp 3x): 127.0.0.1:64738`; ghost+discourse
|
||||
`ccci-overlay: provided compose.ccci.yml ... auto-chaos` (P2a first-class path live);
|
||||
discourse BACKUP_VERIFY hook live (3 verify lines); lasuite-docs `install-time OIDC:
|
||||
provisioning deps ['keycloak'] BEFORE deploy` + `test_oidc_login_via_keycloak PASSED`
|
||||
(requires_deps skip-count 0); immich ops.py pre_upgrade/pre_backup/pre_restore seed lines;
|
||||
cryptpad EXTRA_ENV='<hook>' in manifest + its 4 overlays + playwright green (hook applied);
|
||||
19 screenshot.png across m2r-* dirs.
|
||||
- Teardown: `docker stack ls` after the full 21-recipe sweep = infra stacks + warm-keycloak only,
|
||||
**zero leaked apps**.
|
||||
- Drone→harness path: !testme on two open recipe PRs pending after the re-runs.
|
||||
|
||||
**Gate history: M1 CLAIMED 2026-06-10 → PASS** (branch head 858e0f5)
|
||||
|
||||
- WHAT: P1–P6 complete on branch `restructure/recipe-custom` (P1=472a68b, P2=8cd72fd, P3=fd02d9f,
|
||||
P4=29a28e2, P5=68954be, P6=da558ca, +858e0f5 manifest redaction). Working tree clean, all pushed.
|
||||
- HOW (cold, from a fresh clone of the branch):
|
||||
- `cc-ci-run -m pytest tests/unit -q` → EXPECTED: **192 passed**
|
||||
- `cc-ci-run -m pytest tests/concurrency -q` → EXPECTED: **23 passed** (untouched by this plan;
|
||||
Builder proof run 2026-06-10 on branch head: 23 passed in 11.46s)
|
||||
- `nix develop .#lint --command scripts/lint.sh` → EXPECTED: **lint: PASS**
|
||||
- resolved-customization diff old-vs-new for all 21 recipe dirs (Adversary's own script) →
|
||||
EXPECTED: 0 deltas
|
||||
- adversarial review of the full diff `main..restructure/recipe-custom`
|
||||
- WHERE: origin branch `restructure/recipe-custom` @ 858e0f5; baseline matrix above (M2 prep,
|
||||
committed pre-merge per plan).
|
||||
|
||||
## Current
|
||||
|
||||
Bootstrapping phase; starting P1.
|
||||
M2 CLAIMED (see Gate above) — awaiting Adversary cold-verify. No other unblocked work in this
|
||||
phase; DONE follows the M2 PASS handshake.
|
||||
|
||||
65
STATUS-shot.md
Normal file
65
STATUS-shot.md
Normal file
@ -0,0 +1,65 @@
|
||||
# STATUS-shot.md — Builder status, phase `shot`
|
||||
|
||||
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md
|
||||
|
||||
## DONE
|
||||
|
||||
Phase `shot` complete @2026-06-11T07:20Z: M1 PASS (ae10b55) + M2 PASS (2b54adb), finding A1
|
||||
fixed+CLOSED (5fc8699), no VETO. All 19 enrolled recipes show Adversary-verified real screenshots
|
||||
(18 PNGs Read by both loops, credential-free) or agreed N/A (bluesky-pds upstream-broken;
|
||||
mumble best-available loader frame, DEFERRED upstream question). Fixes on main through 196156e.
|
||||
|
||||
## Gate history
|
||||
|
||||
Gate: M1 PASS (REVIEW-shot.md ae10b55). Finding A1 CLOSED (5fc8699).
|
||||
Gate: M2 PASS (REVIEW-shot.md 2b54adb).
|
||||
|
||||
## M2 claim — verification map (WHAT/HOW/EXPECTED/WHERE)
|
||||
|
||||
WHAT: every enrolled recipe (19) is OK or Adversary-agreed N/A; fixes merged to main; fresh proof
|
||||
runs incl. 2 via drone !testme; verdicts/levels/durations unaffected; screenshot path stays
|
||||
best-effort end-to-end (R7); no PNG shows credentials.
|
||||
|
||||
Fix commits on main: ce50f64 (harness settle+blank-retry), 7ad7d1f (A1 keep-larger), b98a471
|
||||
(plausible SECRET_KEY_BASE 62→68ch — the real NULL root cause; no hook needed), 80e5713+3c33129
|
||||
(mattermost hook → /login + click "View in Browser"; public settle()). Unit: 207 pass
|
||||
(`cc-ci-run -m pytest tests/unit -q`), lint PASS (`nix develop .#lint --command scripts/lint.sh`).
|
||||
|
||||
HOW to verify per recipe — artifacts on cc-ci `/var/lib/cc-ci-runs/<run>/{results.json,
|
||||
screenshot.png,summary.html}`; scp the PNG and Read it. Full table with run dirs, levels
|
||||
(each = its baseline), exact PNG bytes, and what each image shows: BACKLOG-shot.md "P4 — Proof
|
||||
runs". Fixed-class proofs: immich=370 (drone !testme immich#2, posted 05:56:32Z), plausible=371
|
||||
(drone !testme plausible#3), keycloak, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n,
|
||||
mattermost-lts (shot-proof3-* = hook v2 → real login form), mumble (best-available loader frame —
|
||||
see N/A-variant below). Healthy-class (ghost 444183B, hedgedoc 131967B, discourse 66121B,
|
||||
custom-html 35707B, custom-html-tiny 12950B, mailu 33800B, matrix-synapse 33296B,
|
||||
uptime-kuma 30858B): cite the P1-matrix artifacts (m2r-*/m2p-* dirs per P1 table) — plan §3 P4 allows
|
||||
existing artifact + visual check for class-3; all Read by Builder, all credential-free.
|
||||
|
||||
EXPECTED on re-run of any fixed recipe: results.json `screenshot: "screenshot.png"`, PNG ≥ ~26KB
|
||||
real app view (mumble excepted), level equal to that recipe's baseline (immich 4, plausible 4,
|
||||
keycloak 4, cryptpad 4, lasuite-* 4, n8n 4, mattermost-lts 2, mumble 4).
|
||||
|
||||
R7 / budget: wait components 45(nav, only-on-failure)+10(settle)+0.5+4(blank retry)+0.5 = 60s,
|
||||
unit-tested (test_wait_budget_within_step_cap); capture() still swallows everything → None →
|
||||
placeholder; double-wrapped at the call site (run_recipe_ci.py:1024-1037, unchanged).
|
||||
|
||||
Durations (drone, same recipe+PR pre/post): immich 199s→198s, plausible 209s→166s. Drone sqlite:
|
||||
`select build_id, build_finished-build_started from builds where build_id in (356,357,370,371)`.
|
||||
|
||||
Dashboard/card: `https://ci.commoninternet.net/` grid references runs/370+371 screenshot.png (both
|
||||
HTTP 200); summary.html embeds screenshot.png; /badge/immich.svg 200.
|
||||
|
||||
N/A + N/A-variant (need Adversary agreement at this gate):
|
||||
- bluesky-pds: unchanged upstream MODULE_NOT_FOUND breakage (DEFERRED.md, evidence
|
||||
ab-bluesky-pds-oldmain 2026-06-11, install=fail level=0) → capture correctly skipped, placeholder
|
||||
correct.
|
||||
- mumble: web client (rankenstein/mumble-web:0.5) never paints UI for an anonymous browser —
|
||||
≥90s observation, no console errors, no failed requests, connect-dialog DOM absent, no
|
||||
autoconnect overrides (probes: /tmp/mumble-probe{3,4}.out, /tmp/mumble-orch{4,5}.log on cc-ci).
|
||||
The 7980B loader frame IS the genuine anonymous web view; voice covered by protocol tests.
|
||||
DEFERRED.md entry filed (upstream question). Claimed as documented best-available, not a defect.
|
||||
|
||||
## Blocked
|
||||
|
||||
(nothing)
|
||||
@ -38,6 +38,7 @@ _RUN_FILES = {
|
||||
"screenshot.png": "image/png",
|
||||
"badge.svg": "image/svg+xml",
|
||||
"summary.html": "text/html; charset=utf-8",
|
||||
"lint.txt": "text/plain; charset=utf-8",
|
||||
}
|
||||
_RUN_ID_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._-]*$")
|
||||
|
||||
@ -71,8 +72,7 @@ _LEVEL_COLOR = {
|
||||
2: "#e0823d",
|
||||
3: "#d9b343",
|
||||
4: "#a0b93f",
|
||||
5: "#57ab5a",
|
||||
6: "#3fb950",
|
||||
5: "#3fb950", # bright green — full 5-rung climb incl. lint (phase lvl5)
|
||||
}
|
||||
|
||||
|
||||
@ -152,7 +152,6 @@ def _build_row(b):
|
||||
"ref": ref[:8],
|
||||
"version": res.get("version") or ref[:12] or "—",
|
||||
"level": res.get("level"),
|
||||
"level_cap_reason": res.get("level_cap_reason") or "",
|
||||
"has_screenshot": bool(res.get("screenshot")),
|
||||
"flags": res.get("flags") or {},
|
||||
"finished": b.get("finished") or 0,
|
||||
@ -220,7 +219,6 @@ a{color:#58a6ff;text-decoration:none} a:hover{text-decoration:underline}
|
||||
.name{font-weight:700;font-size:1.05rem;color:#e6edf3}
|
||||
.row{display:flex;align-items:center;gap:.5rem;flex-wrap:wrap;font-size:.82rem}
|
||||
.pill{color:#fff;padding:.08rem .5rem;border-radius:.5rem;font-size:.75rem;font-weight:600}
|
||||
.cap{color:#8b949e;font-size:.75rem}
|
||||
code{background:#0d1117;border:1px solid #21262d;border-radius:.3rem;padding:0 .3rem;font-size:.78rem;color:#c9d1d9}
|
||||
.flags{display:flex;gap:.4rem;font-size:.72rem;color:#8b949e}
|
||||
.foot{margin-top:auto;display:flex;justify-content:space-between;font-size:.8rem;padding-top:.3rem;border-top:1px solid #21262d}
|
||||
@ -274,17 +272,12 @@ def _card(r):
|
||||
f'<a class="shot" href="{run_url}" title="open run">'
|
||||
f'<span class="ph">no screenshot</span>{_level_pill(r["level"])}</a>'
|
||||
)
|
||||
cap = (
|
||||
f'<div class="cap">{html.escape(r["level_cap_reason"])}</div>'
|
||||
if r["level_cap_reason"]
|
||||
else ""
|
||||
)
|
||||
return (
|
||||
f'<div class="card">{shot}<div class="body">'
|
||||
f'<div class="name">{html.escape(r["recipe"])}</div>'
|
||||
f'<div class="row"><span class="pill" style="background:{color}">{html.escape(r["status"])}</span>'
|
||||
f'<code>{html.escape(r["version"])}</code></div>'
|
||||
f"{cap}{_flags_html(r['flags'])}"
|
||||
f"{_flags_html(r['flags'])}"
|
||||
f'<div class="foot"><a href="{run_url}">run #{num} · {_ago(r["finished"])}</a>'
|
||||
f'<a href="/recipe/{html.escape(r["recipe"])}">history →</a></div>'
|
||||
f"</div></div>"
|
||||
|
||||
@ -115,8 +115,8 @@ _This table is GENERATED from the `runner/harness/meta.py` KEYS registry by `scr
|
||||
| `HEALTH_OK` | `tuple[int]` | `(200, 301, 302)` | Acceptable HTTP status codes for health. |
|
||||
| `DEPLOY_TIMEOUT` | `int` | `600` | Max seconds to wait for swarm convergence per deploy. |
|
||||
| `HTTP_TIMEOUT` | `int` | `300` | Max seconds to wait for HTTP health after convergence. |
|
||||
| `BACKUP_CAPABLE` | `bool` | `None` | Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces N/A; `True` forces the tier on; unset = auto-detect. |
|
||||
| `EXPECTED_NA` | `dict` | `None` | Declare an N/A rung intentional: `{rung: reason}`. The cap stands either way; only the report wording changes. |
|
||||
| `BACKUP_CAPABLE` | `bool` | `None` | Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces an intentional skip of the backup/restore rung; `True` forces the tier on; unset = auto-detect. |
|
||||
| `EXPECTED_NA` | `dict` | `None` | Declare a non-run rung an INTENTIONAL skip: `{rung: reason}` — the level climbs past it; an undeclared non-run rung is *unverified* and blocks the level above it (classification table: machine-docs/DECISIONS.md phase lvl5). Never overrides an exercised pass/fail; the `lint` rung has no escape hatch. |
|
||||
| `READY_PROBE` | `hook` | `None` | Callable `(ctx) -> [probe, ...]` returning extra readiness probes, run after install AND after upgrade: HTTP `{host, path, ok}` or TCP `{tcp_host, tcp_port, stable}`. |
|
||||
| `UPGRADE_BASE_VERSION` | `str` | `None` | Exact published tag overriding the upgrade tier's base (default: `recipe_versions[-2]`). |
|
||||
| `BACKUP_VERIFY` | `hook` | `None` | Callable `(ctx) -> bool` post-backup data-capture check; `False` re-runs the backup (truncated-dump race guard), retried up to 3 attempts. |
|
||||
|
||||
@ -10,12 +10,9 @@ It is the R8 reference for Phase 3 (`plan-phase3-results-ux.md`).
|
||||
|
||||
---
|
||||
|
||||
## 1. The level ladder (R1)
|
||||
## 1. The level ladder (phase lvl5 semantics, operator-decided 2026-06-11)
|
||||
|
||||
Every run earns a single integer **level 0–6**. The ladder is cumulative with **YunoHost
|
||||
gap-caps-the-level** semantics: you earn level `L` only if **every rung 1..L was a clean PASS**. The
|
||||
first rung that is not a clean PASS — a real **FAIL** *or* genuinely **N/A** for this recipe — stops
|
||||
the climb, and `level_cap_reason` records which rung and why.
|
||||
Every run earns a single integer **level 0–5** over the FIVE essential rungs:
|
||||
|
||||
| Level | Rung | Earned when |
|
||||
|------:|------|-------------|
|
||||
@ -24,42 +21,52 @@ the climb, and `level_cap_reason` records which rung and why.
|
||||
| **L2** | upgrade | previous published version → PR/latest, stays healthy, data intact. |
|
||||
| **L3** | backup/restore | seeded data survives backup → wipe → restore. |
|
||||
| **L4** | functional | the recipe-specific functional tests pass. |
|
||||
| **L5** | integration | SSO/OIDC + cross-app integration tests pass. |
|
||||
| **L6** | recipe-local | the recipe repo's own `tests/` (D4) pass and are merged. |
|
||||
| **L5** | lint | `abra recipe lint` passes against the exact ref under test. |
|
||||
|
||||
**N/A caps, fairly.** A rung that does not apply to a recipe (only one published version → no
|
||||
upgrade; not backup-capable; no SSO/integration surface; no recipe-local tests) is **N/A**, which
|
||||
caps the climb at the rung below it with a recorded reason — it is *not* counted as a failure. This is
|
||||
the only fair reading of "a missing lower rung caps the level": e.g. a recipe with **no integration
|
||||
surface caps at L4 by definition**, shown as `level_cap_reason = "L5 integration … N/A"`. A stateless
|
||||
app whose functional tests pass but which cannot be backed up is honestly capped at **L2** (`"L3
|
||||
backup/restore … N/A"`) rather than shown as L4 — understating is safe; overstating is forbidden.
|
||||
Each rung has one of FOUR statuses, and the level is:
|
||||
|
||||
Worked examples (real runs):
|
||||
- `uptime-kuma` — install+upgrade+backup+restore+functional all pass, no SSO surface → **L4**
|
||||
(`cap = "L5 integration (SSO/OIDC + cross-app) N/A"`).
|
||||
- `custom-html-tiny` — stateless, not backup-capable: install+upgrade pass, backup/restore N/A →
|
||||
**L2** (`cap = "L3 backup/restore (data integrity) N/A"`).
|
||||
level = the highest rung that PASSED, where every rung below it is "pass" or an intentional skip
|
||||
|
||||
- **pass / fail** — the rung was exercised. A FAIL blocks: no rung above it counts, however green.
|
||||
- **skip (intentional)** — the rung *genuinely does not apply*, from a declared or structural fact:
|
||||
not backup-capable (declared), only one published version (no upgrade target), or a declared
|
||||
`EXPECTED_NA`. Intentional skips are **climbed past** — a stateless recipe with passing
|
||||
functional tests and a clean lint reaches **L5**, not the old "capped at 2".
|
||||
- **unver (unverified)** — the rung *should* have run but didn't: infra error, missing tool,
|
||||
harness exception, prior-stage abort, timeout. **The level cannot rise above an unverified
|
||||
rung** — it blocks exactly like a fail (we never claim what we didn't check). Anything
|
||||
unclassifiable defaults to unver (conservative).
|
||||
|
||||
There is **no capping concept** (no `cap_reason`, no `capped`): the per-rung table
|
||||
(✔ / ✘ / intentional-skip / unverified) on the card and in `results.json.rungs` is the sole
|
||||
carrier of "why isn't this level higher". Worked examples:
|
||||
|
||||
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → **level 1** (fail blocks).
|
||||
- install ✔, upgrade ✔, backup skip (not capable), functional ✔, lint ✔ → **level 5**.
|
||||
- install ✔, upgrade ✔, backup unver (harness error), functional ✔, lint ✔ → **level 2**.
|
||||
- all four ✔, lint unver (abra missing) → **level 4** (an unverified top rung isn't earned).
|
||||
|
||||
Integration (SSO/OIDC + cross-app) and recipe-local tests are **optional capabilities**, not
|
||||
rungs — they never affect the level (SSO remains enforced for the run VERDICT).
|
||||
|
||||
### How tiers map to rungs (the translation layer)
|
||||
|
||||
`run_recipe_ci.py` holds the run's per-tier results (`install/upgrade/backup/restore/custom`) +
|
||||
deps/SSO signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict that
|
||||
`runner/harness/level.py::compute_level` scores. The mapping (also in `DECISIONS.md`, Phase 3):
|
||||
structural signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict
|
||||
that `runner/harness/level.py::compute_level` scores. The full intentional-vs-unintentional
|
||||
classification table for every N/A source is in `machine-docs/DECISIONS.md` (phase lvl5). Summary:
|
||||
|
||||
- **install** ← install tier (pass/fail).
|
||||
- **upgrade** ← upgrade tier; `skip` → **na** (only one published version).
|
||||
- **install** ← install tier (pass/fail; a non-run is unver — install always applies).
|
||||
- **upgrade** ← upgrade tier; tier skipped with no upgrade target (single published version,
|
||||
structural) → skip; declared `EXPECTED_NA` → skip; otherwise unver.
|
||||
- **backup_restore** ← backup AND restore tiers both pass → pass; either fail → fail; not
|
||||
backup-capable → **na**.
|
||||
- **functional** ← the custom tier minus its SSO tests; a custom failure conservatively fails this
|
||||
rung (we don't split functional-vs-SSO failure → never inflate); no custom tests → **na**.
|
||||
- **integration** ← applies only if the recipe declares deps; pass iff deps wired and SSO verified and
|
||||
custom didn't fail; recipes with no declared deps → **na** (the "caps at L4" rule).
|
||||
- **recipe_local** ← the recipe repo's own `tests/` (discovery source `repo-local`) ran and passed;
|
||||
none present → **na**.
|
||||
|
||||
The pure scorer is exhaustively unit-tested + fuzz-verified (all 729 rung combinations: level ==
|
||||
count of leading consecutive passes, zero inflation).
|
||||
backup-capable (structural/declared) → skip; unverified-while-capable → unver.
|
||||
- **functional** ← the custom tier; a custom failure conservatively fails this rung; no custom
|
||||
tests is a coverage GAP → unver, unless declared `EXPECTED_NA["functional"]` → skip.
|
||||
- **lint** ← the lint executor (`runner/harness/lint.py`): `abra recipe lint` on a pristine
|
||||
scratch clone of the run's recipe tree at the exact tested sha, 60s hard budget, full output in
|
||||
the run artifact `lint.txt`. pass/fail only — when lint can't run the rung is **unver** (never
|
||||
a silent pass, never an intentional skip). Lint never changes the run verdict.
|
||||
|
||||
### Invariant flags (shown, not climbed)
|
||||
|
||||
@ -77,19 +84,29 @@ build number, or the run's unique app domain for a hand-run). Schema:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": 1, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
|
||||
"schema": 2, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
|
||||
"finished": 0.0,
|
||||
"level": 4, "level_cap_reason": "L5 integration (SSO/OIDC + cross-app) N/A",
|
||||
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"pass","functional":"pass",
|
||||
"integration":"na","recipe_local":"na"},
|
||||
"level": 5,
|
||||
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"skip","functional":"pass",
|
||||
"lint":"pass"},
|
||||
"lint": {"status":"pass","detail":"","rules_failed":[]},
|
||||
"skips": {"intentional": {"backup_restore": "not backup-capable (no backupbot labels / declared)"},
|
||||
"unintentional": []},
|
||||
"stages": [{"name":"install","status":"pass",
|
||||
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
|
||||
"results": {"install":"pass","upgrade":"pass","backup":"pass","restore":"pass","custom":"pass"},
|
||||
"results": {"install":"pass","upgrade":"pass","backup":"skip","restore":"skip","custom":"pass"},
|
||||
"flags": {"clean_teardown": true, "no_secret_leak": true},
|
||||
"screenshot": "screenshot.png", "summary_card": "summary.png"
|
||||
}
|
||||
```
|
||||
|
||||
`rungs` carries the four-status vocabulary above; `skips.intentional` maps each intentionally
|
||||
skipped rung to its (declared or structural) reason and `skips.unintentional` lists the
|
||||
unverified rungs. `lint` carries the L5 rung outcome + failing rule ids; the full
|
||||
`abra recipe lint` output is served at `/runs/<run_id>/lint.txt`. Pre-lvl5 artifacts
|
||||
(`"schema": 1`, 4-rung ladder, `level_cap_reason`/`level_cap_rung` present, `"na"` statuses)
|
||||
are still rendered as-is by the dashboard/card — their stored level is never recomputed.
|
||||
|
||||
Assembly is **best-effort**: a failure to build/write `results.json` is logged but never changes the
|
||||
run's exit code (cosmetics never block the pipeline, R7).
|
||||
|
||||
|
||||
@ -1295,3 +1295,61 @@ the abra CLI and abra.recipe_dir()). No test assertion, gate, or overlay content
|
||||
phase guardrail's "never touch tests/<recipe>/ content" is read as protecting test/gate SEMANTICS;
|
||||
this is required P3 fallout, equivalent to the harness-side path routing. Flagged here for the
|
||||
Adversary's gate-integrity review.
|
||||
|
||||
## Phase lvl5 — L5 lint rung + level semantics de-cap (SETTLED 2026-06-11, operator-specified)
|
||||
|
||||
**The level formula (replaces the Phase-3 "N/A caps" stance).** Operator decision 2026-06-11
|
||||
(explicit Q&A, recorded verbatim in plan-phase-lvl5-lint-rung.md): with per-rung statuses
|
||||
{pass, fail, skip (intentional), unver (unintentional/not-verified)}:
|
||||
|
||||
level = max i such that rung_i == "pass" and all j < i have status in {"pass","skip"}; else 0.
|
||||
|
||||
A real FAIL blocks. An INTENTIONAL skip (the rung genuinely does not apply, from a declared or
|
||||
structural fact) is climbed past — this is the de-cap: a non-backup-capable recipe is no longer
|
||||
stuck at L2. An UNVERIFIED rung (should have run, wasn't checked) blocks exactly like a fail —
|
||||
this preserves the honest core of the old N/A-caps rule: never claim what wasn't checked. The
|
||||
words cap/capped/cap_reason are deleted from code, schema (results.json schema 2), card,
|
||||
dashboard, badge and docs; the per-rung table (✔/✘/intentional-skip/unverified) is the SOLE
|
||||
carrier of "why isn't the level higher". The big level badges (card corner, dashboard pill,
|
||||
/badge/<recipe>.svg) show ONLY number + colour (operator-specified). Old schema-1 artifacts are
|
||||
rendered as-is (their stored level, their 4-rung ladder) — no retroactive relabeling.
|
||||
|
||||
**The ladder is now five rungs:** install(1) upgrade(2) backup_restore(3) functional(4)
|
||||
**lint(5) = `abra recipe lint` passes against the exact ref under test** (PR head on PR builds).
|
||||
Lint is a LEVEL RUNG, not a run gate: no lint outcome ever changes the run verdict.
|
||||
|
||||
**N/A classification table (derive_rungs, results.py — every N/A source, Adversary-reviewed).
|
||||
Default for anything unclassifiable: UNVER (conservative).**
|
||||
|
||||
| rung | source of non-pass/fail | class | status |
|
||||
|---|---|---|---|
|
||||
| install | tier skipped / missing (any reason — install always applies) | unintentional | unver |
|
||||
| upgrade | tier skipped by orchestrator AND no upgrade target (`prev is None`: only one published version — structural) | intentional | skip |
|
||||
| upgrade | declared `EXPECTED_NA["upgrade"]` (tier not pass/fail) | intentional | skip |
|
||||
| upgrade | tier skipped though a target exists (install failed → downstream abort), or tier missing (CCCI_STAGES dev escape) | unintentional | unver |
|
||||
| backup_restore | not backup-capable (no backupbot labels / `BACKUP_CAPABLE=False` — structural/declared) | intentional | skip |
|
||||
| backup_restore | declared `EXPECTED_NA["backup_restore"]` (tiers not pass/fail) | intentional | skip |
|
||||
| backup_restore | backup-capable but either tier did not produce pass/fail (abort, partial run) | unintentional | unver |
|
||||
| functional | declared `EXPECTED_NA["functional"]` (no custom tests / tier skipped) | intentional | skip |
|
||||
| functional | no custom tests / tier skipped, undeclared — absent functional coverage is a GAP, not a property | unintentional | unver |
|
||||
| lint | executor could not produce pass/fail (timeout, abra/script missing, env FATA, unparseable output) — NO escape hatch, `EXPECTED_NA["lint"]` is ignored | unintentional | unver |
|
||||
|
||||
EXPECTED_NA never overrides an exercised rung: pass/fail always stand.
|
||||
|
||||
**Lint executor mirror-context decision (plan-phase-lvl5 §2.3).** Probed on cc-ci 2026-06-11
|
||||
(JOURNAL-lvl5): (a) abra lint globs every `compose*.yml` in the recipe tree, so the CI's
|
||||
untracked install_steps overlays (e.g. compose.ccci.yml) FATA it — harness artifact; (b) abra
|
||||
lint force-fetches tags from `origin`, so a PR run's private-mirror origin (token never written
|
||||
to .git/config) FATAs "unable to fetch tags" — harness artifact; (c) `abra recipe lint` exits
|
||||
non-zero ONLY on FATA — rule verdicts live in its table (error-severity ❌ rows + a trailing
|
||||
"WARN critical errors present" sentinel, rc still 0). Decision: the executor (harness/lint.py)
|
||||
lints a PRISTINE SCRATCH CLONE of the per-run recipe tree checked out at the exact tested sha —
|
||||
origin becomes a local path (offline tag fetch, no auth) and the run's true tag set rides along
|
||||
(fetch_recipe already fetches the canonical upstream version tags into the per-run tree, so
|
||||
R014 evaluates the recipe's real tags). **No lint rule is filtered or ignored** — the
|
||||
plumbing pollution is solved by context, not by exemptions. Classifier: fail iff an
|
||||
error-severity rule is unsatisfied (or the FATA is content-attributable: "unable to validate
|
||||
recipe"); pass iff the table rendered clean; anything else unver + loud log. Hard 60s budget
|
||||
(observed ~0.7s); executor runs before the tiers (tree at tested ref), double-wrapped, R7
|
||||
verdict-neutral. Full output → run artifact `lint.txt` (dashboard-served); status + failing
|
||||
rule ids → results.json `lint`.
|
||||
|
||||
@ -335,3 +335,28 @@ before the build is called done) — but does **not** force closure.
|
||||
- **Re-entry trigger:** Builder authors recipe-PR Q4.7b (cache tarball on a volume / wget
|
||||
retry+backoff / drop `2>/dev/null` / `set +e` w/ fallback), then runs plausible-full green + claims.
|
||||
- **Linked:** REVIEW-2 `e850281` (root-cause + DENY), `71af595` (§4.3 floor); DECISIONS 2026-05-30.
|
||||
- discourse upgrade-HC1 @7ae7b0f stamps prev-base tag commit (eb96de94+U) on BOTH old+new harness since ~06-10 (baseline 184 was L4 on 06-05); harness-neutral (rcust exonerated, M2-closed) but abra stamp-resolution mechanism UNATTRIBUTED — worth a standalone dig outside rcust. Evidence: /var/lib/cc-ci-runs/{m2p-discourse,ab-discourse-7ae7b0f-oldmain}, JOURNAL-rcust 2026-06-11.
|
||||
- bluesky-pds: UPSTREAM IMAGE BREAKAGE (non-rcust, M2-justified exclusion from baseline match).
|
||||
The app container crash-loops `Error: Cannot find module '/app/index.js'` (MODULE_NOT_FOUND,
|
||||
Node v24.15.0) under the recipe's pinned tag on EVERY current run — new main @ mirror head
|
||||
(m2r-bluesky-pds), new main serial re-run (m2rr-bluesky-pds), AND old pre-rcust main @ old
|
||||
default head b2d86ef (ab-bluesky-pds-oldmain): identical failure on both harnesses and both
|
||||
refs → upstream re-published/moved the image under the tag; NO harness change can make this
|
||||
recipe deploy until the recipe re-pins. Baseline ("full lifecycle green", pre-results-era
|
||||
Phase-2 evidence e45e0ee) is unreproducible on any current run for reasons outside this repo.
|
||||
Evidence: `grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/
|
||||
default/`; REVIEW-rcust.md 2026-06-11 entries. Follow-up (post-phase): file/propose a re-pin PR
|
||||
against the bluesky-pds recipe mirror.
|
||||
- mumble-web client never paints UI for an anonymous browser (phase-shot, 2026-06-11). The recipe's
|
||||
pinned web client (rankenstein/mumble-web:0.5 via compose.mumbleweb.yml, served by websockify)
|
||||
stays at its `loading-container` spinner ≥90s with NO console errors, NO failed asset/requests,
|
||||
connect-dialog DOM elements absent, and no autoconnect overrides in config.local.js (defaults
|
||||
untouched) — so the CI screenshot's best-available frame is the genuine loader view every visitor
|
||||
gets. The voice server itself is fully exercised (protocol handshake/config tests pass; that is
|
||||
mumble's actual function). A harness-side fix is impossible without changing what the recipe
|
||||
deploys (guardrail: prefer upstream over cc-ci overlays). **Operator input needed:** whether to
|
||||
pursue an upstream recipe issue/PR (newer mumble-web image or one that renders its connect dialog)
|
||||
— until then the dashboard shows the loader frame as the recipe's web-surface reality.
|
||||
Evidence: /tmp/mumble-probe{2,3,4}.out + /tmp/mumble-orch{4,5}.log on cc-ci (90s DOM/console/
|
||||
network observation; websockify reachable, /ws & /websocket 404 from websockify itself);
|
||||
/var/lib/cc-ci-runs/shot-proof-mumble/screenshot.png (L4 run, loader frame).
|
||||
|
||||
@ -21,23 +21,24 @@ from __future__ import annotations
|
||||
import html
|
||||
import os
|
||||
|
||||
# Level → colour ramp (YunoHost-ish): red at the floor, climbing to green at the top.
|
||||
# Level → colour ramp (YunoHost-ish): red at the floor, climbing to green at the top (L5 = full
|
||||
# clean climb incl. lint — phase lvl5).
|
||||
LEVEL_COLOR = {
|
||||
0: "#e5534b", # red — install failed
|
||||
1: "#e0823d", # orange
|
||||
2: "#e0823d",
|
||||
3: "#d9b343", # amber
|
||||
4: "#a0b93f", # yellow-green
|
||||
5: "#57ab5a", # green
|
||||
6: "#3fb950", # bright green — full climb
|
||||
4: "#a0b93f", # yellow-green — above functional, lint not earned
|
||||
5: "#3fb950", # bright green — full climb (lint passed)
|
||||
}
|
||||
STATUS_MARK = {"pass": "✔", "fail": "✘", "skip": "–", "error": "✘", "na": "–"}
|
||||
STATUS_MARK = {"pass": "✔", "fail": "✘", "skip": "–", "error": "✘", "na": "–", "unver": "⊘"}
|
||||
STATUS_COLOR = {
|
||||
"pass": "#3fb950",
|
||||
"fail": "#f85149",
|
||||
"error": "#f85149",
|
||||
"skip": "#8b949e",
|
||||
"na": "#8b949e",
|
||||
"unver": "#d29922", # amber — exercised? no: should have run and wasn't verified
|
||||
}
|
||||
|
||||
|
||||
@ -79,44 +80,15 @@ def render_badge_svg(label: str, message: str, color: str) -> str:
|
||||
)
|
||||
|
||||
|
||||
# Third-segment colours for the level badge: amber = an UNINTENTIONAL skip (a rung skipped but not
|
||||
# in the recipe's intentional list — likely missing coverage) capped the climb; muted = an
|
||||
# INTENTIONAL skip (declared in recipe_meta.EXPECTED_NA — nothing to fix). Font-safe text labels
|
||||
# (no emoji) so the SVG renders anywhere.
|
||||
# Amber for UNVERIFIED rung rows in the table (a rung that should have run and wasn't checked).
|
||||
GAP_COLOR = "#d29922"
|
||||
EXPECT_COLOR = "#6e7681"
|
||||
|
||||
|
||||
def level_badge_svg(level: int, cap_reason: str = "", cap_skip: str = "") -> str:
|
||||
"""Per-recipe/-run LEVEL badge: 'cc-ci | level N' coloured by level (R6), with a THIRD segment
|
||||
that differentiates *why* the climb stopped when a SKIP capped it (`cap_skip`):
|
||||
- "unintentional" (a rung skipped but not in the recipe's intentional list): amber 'gap?'.
|
||||
- "intentional" (a skip declared in recipe_meta.EXPECTED_NA): muted 'expected'.
|
||||
- "" (clean cap / full climb / a real failure): no third segment (the level + card carry it).
|
||||
The badge never inflates — it only annotates the cap the level already reflects."""
|
||||
label, msg = "cc-ci", f"level {int(level)}"
|
||||
lw, mw = _text_width(label), _text_width(msg)
|
||||
third: tuple[str, str] | None = None
|
||||
if cap_skip == "unintentional":
|
||||
third = ("gap?", GAP_COLOR)
|
||||
elif cap_skip == "intentional":
|
||||
third = ("expected", EXPECT_COLOR)
|
||||
if third is None:
|
||||
return render_badge_svg(label, msg, level_color(level))
|
||||
txt, tcolor = third
|
||||
tw = _text_width(txt)
|
||||
w = lw + mw + tw
|
||||
return (
|
||||
f'<svg xmlns="http://www.w3.org/2000/svg" width="{w}" height="20" role="img" '
|
||||
f'aria-label="{html.escape(label)}: {html.escape(msg)} ({html.escape(txt)})">'
|
||||
f'<rect width="{lw}" height="20" fill="#555"/>'
|
||||
f'<rect x="{lw}" width="{mw}" height="20" fill="{level_color(level)}"/>'
|
||||
f'<rect x="{lw + mw}" width="{tw}" height="20" fill="{tcolor}"/>'
|
||||
f'<g fill="#fff" font-family="Verdana,Geneva,sans-serif" font-size="11">'
|
||||
f'<text x="6" y="14">{html.escape(label)}</text>'
|
||||
f'<text x="{lw + 6}" y="14">{html.escape(msg)}</text>'
|
||||
f'<text x="{lw + mw + 6}" y="14">{html.escape(txt)}</text></g></svg>'
|
||||
)
|
||||
def level_badge_svg(level: int) -> str:
|
||||
"""Per-recipe/-run LEVEL badge: 'cc-ci | level N' coloured by level — NUMBER + COLOUR ONLY
|
||||
(operator-specified, phase lvl5). 'Why isn't it higher' lives in the card's per-rung table,
|
||||
never on the badge."""
|
||||
return render_badge_svg("cc-ci", f"level {int(level)}", level_color(level))
|
||||
|
||||
|
||||
def _stage_rows(stages: list[dict]) -> str:
|
||||
@ -141,12 +113,13 @@ def _stage_rows(stages: list[dict]) -> str:
|
||||
return "\n".join(rows) or '<tr><td colspan="3">no stages</td></tr>'
|
||||
|
||||
|
||||
# Friendly rung labels for the skip rows (the four essential rungs).
|
||||
# Friendly rung labels for the skip/unverified rows (the five essential rungs).
|
||||
RUNG_LABEL = {
|
||||
"install": "install",
|
||||
"upgrade": "upgrade",
|
||||
"backup_restore": "backup/restore",
|
||||
"functional": "functional",
|
||||
"lint": "lint",
|
||||
}
|
||||
SKIP_GREEN = (
|
||||
"#57ab5a" # muted green — an intentional skip reads like a pass (but labelled, never inflating)
|
||||
@ -154,9 +127,10 @@ SKIP_GREEN = (
|
||||
|
||||
|
||||
def _skip_rows(skips: dict) -> str:
|
||||
"""Render SKIPPED rungs as stage-like rows. An intentional (declared) skip looks like a pass row
|
||||
but its status says 'INTENTIONAL SKIP' (muted green) with the declared reason on the line below;
|
||||
an unintentional skip is amber 'UNINTENTIONAL SKIP' with a prompt to add a test or declare it."""
|
||||
"""Render the non-run rungs as stage-like rows (phase lvl5 semantics). An INTENTIONAL skip
|
||||
(declared/structural — the rung does not apply, the climb continues past it) is muted green
|
||||
with its reason on the line below; an UNVERIFIED rung (should have run, wasn't checked — the
|
||||
level cannot rise above it) is amber 'unverified'."""
|
||||
rows = []
|
||||
for rung, reason in (skips.get("intentional") or {}).items():
|
||||
rows.append(
|
||||
@ -171,11 +145,11 @@ def _skip_rows(skips: dict) -> str:
|
||||
rows.append(
|
||||
f'<tr class="stage"><td colspan="2"><span class="mark" style="color:{GAP_COLOR}">⊘</span>'
|
||||
f"<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>"
|
||||
f'<td class="st" style="color:{GAP_COLOR}">unintentional skip</td></tr>'
|
||||
f'<td class="st" style="color:{GAP_COLOR}">unverified</td></tr>'
|
||||
)
|
||||
rows.append(
|
||||
'<tr class="skipreason"><td></td><td colspan="2">not declared in EXPECTED_NA — add the '
|
||||
"missing test/label, or declare the skip with a reason</td></tr>"
|
||||
'<tr class="skipreason"><td></td><td colspan="2">rung did not run / could not be '
|
||||
"checked — the level cannot rise above an unverified rung</td></tr>"
|
||||
)
|
||||
return "\n".join(rows)
|
||||
|
||||
@ -184,13 +158,15 @@ def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png")
|
||||
"""Build the summary-card HTML from a results.json dict. `screenshot_rel` is the relative path to
|
||||
the screenshot PNG (same dir as the card) — omitted from the card if None / absent.
|
||||
|
||||
The card shows exactly what the data says: recipe + version, the level badge + cap reason, the
|
||||
per-stage/per-test ✔/✘ table, the invariant flags, and the app screenshot. No computation here."""
|
||||
The card shows exactly what the data says: recipe + version, the level, the per-stage/per-test
|
||||
✔/✘ table (+ skip/unverified rung rows — the SOLE carrier of "why isn't the level higher"),
|
||||
the invariant flags, and the app screenshot. No computation here. Tolerates old (schema-1)
|
||||
artifacts: the ladder height is read off the rungs the artifact actually has."""
|
||||
recipe = html.escape(str(data.get("recipe", "?")))
|
||||
version = html.escape(str(data.get("version") or data.get("ref") or ""))
|
||||
level = int(data.get("level", 0))
|
||||
cap_reason = str(data.get("level_cap_reason") or "")
|
||||
cap = html.escape(cap_reason)
|
||||
# Old (pre-lvl5) artifacts have a 4-rung ladder — render their "of N" honestly.
|
||||
ladder_top = 5 if "lint" in (data.get("rungs") or {}) else 4
|
||||
sk = data.get("skips", {}) or {}
|
||||
color = level_color(level)
|
||||
flags = data.get("flags", {}) or {}
|
||||
@ -221,7 +197,7 @@ body{{margin:0;font-family:system-ui,-apple-system,Segoe UI,sans-serif;backgroun
|
||||
.lvl .num{{display:inline-block;min-width:64px;padding:.3rem .7rem;border-radius:10px;
|
||||
font-size:1.6rem;font-weight:700;color:#0d1117;background:{color}}}
|
||||
.lvl .lbl{{display:block;color:#8b949e;font-size:.72rem;text-transform:uppercase;margin-top:.2rem}}
|
||||
.cap{{padding:.4rem 1.3rem;color:#8b949e;font-size:.82rem;border-bottom:1px solid #21262d}}
|
||||
.ladder{{padding:.4rem 1.3rem;color:#8b949e;font-size:.82rem;border-bottom:1px solid #21262d}}
|
||||
.body{{display:flex;gap:1rem;padding:1rem 1.3rem}}
|
||||
.tbl{{flex:1}}
|
||||
table{{border-collapse:collapse;width:100%;font-size:.85rem}}
|
||||
@ -238,12 +214,12 @@ tr.skipreason td{{color:#8b949e;font-size:.78rem;font-style:italic;padding-top:0
|
||||
.shot.noshot{{display:flex;align-items:center;justify-content:center;height:225px;color:#8b949e;font-size:.85rem}}
|
||||
.flags{{display:flex;gap:.6rem;padding:.6rem 1.3rem 1rem}}
|
||||
.flag{{border:1px solid;border-radius:6px;padding:.15rem .5rem;font-size:.78rem;color:#c9d1d9}}
|
||||
.cap b{{color:#c9d1d9}}
|
||||
.ladder b{{color:#c9d1d9}}
|
||||
</style></head><body><div class="card">
|
||||
<div class="hd">{FLOWER_SVG}
|
||||
<div class="title"><h1>{recipe}</h1><span class="ver">{version}</span></div>
|
||||
<div class="lvl"><span class="num">{level}</span><span class="lbl">level</span></div></div>
|
||||
<div class="cap">{("<b>capped:</b> " + cap) if cap else "<b>full clean climb</b> — top level (4)"}</div>
|
||||
<div class="ladder"><b>level {level} of {ladder_top}</b></div>
|
||||
<div class="body"><div class="tbl"><table>{rows}</table></div>{shot_html}</div>
|
||||
<div class="flags">{"".join(flag_bits)}</div>
|
||||
</div></body></html>"""
|
||||
|
||||
@ -1,67 +1,67 @@
|
||||
"""Phase 3 — the level ladder (plan-phase3-results-ux.md §4.1, R1).
|
||||
"""The level ladder — five rungs, no capping (phase lvl5, plan-phase-lvl5-lint-rung.md).
|
||||
|
||||
A single integer **level** summarising how far up the quality ladder a recipe run climbed, with
|
||||
YunoHost semantics: **a gap caps the level** — you only earn level L if every rung 1..L was a clean
|
||||
PASS. The first rung that is not a clean PASS (a real FAIL *or* genuinely N/A for this recipe) stops
|
||||
the climb; `cap_reason` records why. This is deliberately conservative: presentation must NEVER make
|
||||
a run look greener than its tests (plan §6 cardinal guardrail), so an N/A rung caps just like a fail
|
||||
— with a recorded reason so the level is *fair*, not inflated.
|
||||
|
||||
The ladder is the FOUR essential rungs every recipe is held to:
|
||||
A single integer **level** summarising how far up the quality ladder a recipe run climbed:
|
||||
L0 — install failed / app never became healthy.
|
||||
L1 — Installs: deploys + passes health/readiness.
|
||||
L2 — Upgrades: previous published version → PR version, stays healthy, data intact.
|
||||
L3 — Backup/restore: seeded data survives backup → wipe → restore.
|
||||
L4 — Functional: recipe-specific functional tests pass.
|
||||
L5 — Lint: `abra recipe lint` passes against the exact ref under test.
|
||||
|
||||
Integration (SSO/OIDC + cross-app) and recipe-local (the recipe repo's own tests/) are **OPTIONAL**
|
||||
capabilities — they are NOT part of the level ladder and never cap it. They still run when present
|
||||
(and SSO is still enforced for the run VERDICT via the deps/SSO checks in run_recipe_ci.py), but a
|
||||
recipe without an SSO surface or without repo-local tests is simply not penalised on the level.
|
||||
Semantics (operator-decided 2026-06-11, recorded in DECISIONS.md — replaces the Phase-3
|
||||
"N/A caps" rule):
|
||||
|
||||
This module is PURE (no I/O) so it is cheaply unit-testable and the Adversary can re-run the unit
|
||||
test cold (`cc-ci-run -m pytest tests/unit/test_level.py -q`). The orchestrator
|
||||
(`run_recipe_ci.py`) is responsible for translating its raw per-tier results into the rung-status
|
||||
dict this function consumes; that mapping is documented in DECISIONS.md (Phase 3).
|
||||
level = max i such that rung_i == "pass" and every rung j < i is "pass" or "skip"; 0 if none.
|
||||
|
||||
Rung status vocabulary (each rung ∈ these three):
|
||||
"pass" — the rung was exercised and passed.
|
||||
"fail" — the rung was exercised and failed.
|
||||
"na" — the rung does not apply to this recipe (e.g. only one published version → no upgrade;
|
||||
not backup-capable). N/A is NOT a failure, but it DOES cap the climb (with a distinct
|
||||
cap_reason) so the level never overstates what was actually verified.
|
||||
A rung has one of FOUR statuses:
|
||||
"pass" — exercised and passed.
|
||||
"fail" — exercised and failed. Blocks: no rung above it can count.
|
||||
"skip" — INTENTIONAL skip: the rung genuinely does not apply to this recipe, from a
|
||||
declared or structural fact (not backup-capable; only one published version;
|
||||
declared in recipe_meta.EXPECTED_NA). Does NOT stop the climb.
|
||||
"unver" — UNINTENTIONAL not-verified: the rung SHOULD have run but didn't (infra error,
|
||||
missing tool, harness exception, prior-stage abort, timeout). Blocks exactly
|
||||
like a fail — the level never rises above a rung that wasn't actually checked.
|
||||
|
||||
The per-rung table (results.json `rungs`, card, dashboard) is the SOLE carrier of "why isn't
|
||||
this level higher" — there is no cap_reason. The classification of every N/A source into
|
||||
skip-vs-unver lives in derive_rungs (results.py) and is tabulated in DECISIONS.md; anything
|
||||
unclassifiable defaults to "unver" (conservative: never claim what wasn't checked).
|
||||
|
||||
Integration (SSO/OIDC + cross-app) and recipe-local (the recipe repo's own tests/) remain
|
||||
OPTIONAL capabilities — not rungs, never counted (SSO is still enforced for the run VERDICT
|
||||
via the deps/SSO checks in run_recipe_ci.py).
|
||||
|
||||
This module is PURE (no I/O) so it is cheaply unit-testable and the Adversary can re-run the
|
||||
unit test cold (`cc-ci-run -m pytest tests/unit/test_level.py -q`).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
# The climbable rungs in ascending order. install (L1) is the foundation; L0 means install itself
|
||||
# did not pass. Each later rung requires every earlier rung to be a clean PASS. These four are the
|
||||
# ESSENTIAL rungs — integration/recipe-local are optional and deliberately NOT in this tuple.
|
||||
RUNGS = ("install", "upgrade", "backup_restore", "functional")
|
||||
# The climbable rungs in ascending order. install (L1) is the foundation; L0 means install
|
||||
# itself did not pass. These five are the ESSENTIAL rungs — integration/recipe-local are
|
||||
# optional and deliberately NOT in this tuple.
|
||||
RUNGS = ("install", "upgrade", "backup_restore", "functional", "lint")
|
||||
|
||||
# Human-readable label per rung level, for cap_reason + the summary card.
|
||||
# Human-readable label per rung level, for the summary card / docs.
|
||||
RUNG_LABEL = {
|
||||
1: "install (deploy + health)",
|
||||
2: "upgrade (prev published → PR)",
|
||||
3: "backup/restore (data integrity)",
|
||||
4: "functional (recipe-specific tests)",
|
||||
5: "lint (abra recipe lint)",
|
||||
}
|
||||
|
||||
VALID = {"pass", "fail", "na"}
|
||||
VALID = {"pass", "fail", "skip", "unver"}
|
||||
|
||||
|
||||
def compute_level(rungs: dict[str, str]) -> tuple[int, str]:
|
||||
"""Map a rung-status dict → (level 0..4, cap_reason).
|
||||
def compute_level(rungs: dict[str, str]) -> int:
|
||||
"""Map a rung-status dict → level 0..5.
|
||||
|
||||
`rungs` must contain a status in {"pass","fail","na"} for every name in RUNGS. The level is the
|
||||
highest L such that rungs[1..L] are all "pass"; the first non-"pass" rung caps the climb. L0 is
|
||||
returned when the install rung itself is not "pass" (install failed / never healthy).
|
||||
|
||||
cap_reason explains where the climb stopped:
|
||||
- "" (empty) when the recipe earned the top rung (L4, full clean climb).
|
||||
- "L<k> <label> FAILED" when a rung was exercised and failed.
|
||||
- "L<k> <label> N/A" when a rung does not apply to this recipe.
|
||||
Returns the reason for the FIRST rung that stopped the climb (the binding constraint).
|
||||
`rungs` must contain a status in VALID for every name in RUNGS. The level is the highest
|
||||
i such that rungs[i] == "pass" and every rung below i is "pass" or "skip" (an intentional
|
||||
skip does not stop the climb). A "fail" or "unver" rung blocks: rungs above it cannot
|
||||
count, however green. 0 when no rung qualifies.
|
||||
"""
|
||||
for name in RUNGS:
|
||||
st = rungs.get(name)
|
||||
@ -69,52 +69,44 @@ def compute_level(rungs: dict[str, str]) -> tuple[int, str]:
|
||||
raise ValueError(
|
||||
f"rung {name!r} has invalid status {st!r} (expect one of {sorted(VALID)})"
|
||||
)
|
||||
|
||||
# L0: install did not pass.
|
||||
if rungs["install"] != "pass":
|
||||
if rungs["install"] == "fail":
|
||||
return 0, "L1 " + RUNG_LABEL[1] + " FAILED"
|
||||
# install N/A is not a real-world state for a deploy run, but handle it for totality.
|
||||
return 0, "L1 " + RUNG_LABEL[1] + " N/A"
|
||||
|
||||
# Climb: stop at the first rung that is not a clean pass.
|
||||
level = 0
|
||||
for idx, name in enumerate(RUNGS, start=1):
|
||||
if rungs[name] == "pass":
|
||||
st = rungs[name]
|
||||
if st == "pass":
|
||||
level = idx
|
||||
elif st == "skip":
|
||||
continue
|
||||
# first non-pass rung — caps the climb
|
||||
kind = "FAILED" if rungs[name] == "fail" else "N/A"
|
||||
return level, f"L{idx} {RUNG_LABEL[idx]} {kind}"
|
||||
|
||||
# Full clean climb to the top rung.
|
||||
return level, ""
|
||||
else: # fail / unver — nothing above this rung can count
|
||||
break
|
||||
return level
|
||||
|
||||
|
||||
def backup_restore_status(backup: str | None, restore: str | None, backup_capable: bool) -> str:
|
||||
"""Collapse the backup + restore tier results into the single L3 rung status.
|
||||
|
||||
Both tiers must pass for the rung to pass (the rung is "seeded data survives backup→wipe→restore",
|
||||
which is only verified if BOTH the backup and the restore tier are green). If the recipe is not
|
||||
backup-capable, both tiers skip → the rung is N/A (caps at L2, recorded). A fail in either tier
|
||||
fails the rung.
|
||||
Not backup-capable (a declared/structural fact: no backupbot labels, or
|
||||
recipe_meta.BACKUP_CAPABLE=False) → "skip" — the rung genuinely does not apply.
|
||||
Otherwise both tiers must pass for the rung to pass; a fail in either tier fails it; any
|
||||
other shape (tier skipped or never ran while backup-capable — e.g. a prior-stage abort)
|
||||
is "unver": the rung should have been verified and wasn't.
|
||||
"""
|
||||
if not backup_capable:
|
||||
return "na"
|
||||
return "skip"
|
||||
vals = {backup, restore}
|
||||
if "fail" in vals:
|
||||
return "fail"
|
||||
if backup == "pass" and restore == "pass":
|
||||
return "pass"
|
||||
# any skip/None while backup-capable → not verified → treat as N/A (cannot claim L3)
|
||||
return "na"
|
||||
return "unver"
|
||||
|
||||
|
||||
def tier_to_rung(status: str | None) -> str:
|
||||
"""Map a single tier result ('pass'|'fail'|'skip'|None) to a rung status. 'skip'/None → 'na'
|
||||
(the tier did not apply / did not run), so it caps the climb without being counted as a failure."""
|
||||
"""Map a single tier result ('pass'|'fail'|'skip'|None) to a rung status, with NO
|
||||
intentionality information: a tier that did not produce a pass/fail is "unver" (it should
|
||||
have run and wasn't verified). The caller (derive_rungs) upgrades "unver" to "skip" where
|
||||
a declared/structural fact makes the skip intentional — never the other way around."""
|
||||
if status == "pass":
|
||||
return "pass"
|
||||
if status == "fail":
|
||||
return "fail"
|
||||
return "na"
|
||||
return "unver"
|
||||
|
||||
@ -348,8 +348,27 @@ def services_converged(domain: str) -> bool:
|
||||
# `want == "0"` rejection wrongly treated those as never-converged, hanging the deploy
|
||||
# forever. `cur == want` (with `want` present) is the correct convergence test; a service
|
||||
# still spinning up shows e.g. "0/1" (cur != want) and is correctly not-yet-converged.
|
||||
if not want or cur != want:
|
||||
if not want:
|
||||
return False
|
||||
if cur != want:
|
||||
# A TRIGGERED one-shot (restart_policy none, scaled 0→1, runs once, exits 0) reports
|
||||
# "0/1" FOREVER after its task completes — swarm never restarts it, so a bare
|
||||
# `cur != want` rejection would block convergence for the rest of the run (lasuite-drive
|
||||
# minio-createbuckets, rcust M2: install assert burned the full DEPLOY_TIMEOUT after the
|
||||
# P2b port moved the bucket trigger BEFORE the install assert; pre-restructure the
|
||||
# trigger ran after it, so converge never saw the 0/1). A replica deficit explained
|
||||
# entirely by COMPLETE tasks IS converged: the one-shot did its job and will never run
|
||||
# again. Anything else in the deficit (Running/Starting/Pending = still spinning up;
|
||||
# Failed/Rejected = genuinely broken) stays not-converged, and a desired>0 service with
|
||||
# no tasks yet is still scheduling.
|
||||
tasks = subprocess.run(
|
||||
["docker", "service", "ps", name, "--format", "{{.CurrentState}}"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
states = [ln.split()[0] for ln in tasks.stdout.split("\n") if ln.strip()]
|
||||
if not (states and all(s == "Complete" for s in states)):
|
||||
return False
|
||||
# N/N alone is NOT convergence during a stop-first rolling update: a chaos redeploy that changes
|
||||
# a non-app service image (e.g. immich's db pin) registers the update immediately, but swarm may
|
||||
# not have cycled that service's task yet — the OLD task still shows 1/1, then dies seconds later
|
||||
|
||||
174
runner/harness/lint.py
Normal file
174
runner/harness/lint.py
Normal file
@ -0,0 +1,174 @@
|
||||
"""L5 lint rung — run `abra recipe lint` against the exact ref under test (phase lvl5).
|
||||
|
||||
Executor + classifier for the fifth ladder rung. Design constraints (plan-phase-lvl5 §2):
|
||||
|
||||
- **Lints the recipe's CONTENT, not the harness plumbing.** abra lint reads every
|
||||
`compose*.yml` in the tree (including the CI's untracked install_steps overlays) and
|
||||
force-fetches tags from `origin` (which on PR runs is the private mirror, unauthenticated
|
||||
here → FATA). Both are harness artifacts, so the executor lints a PRISTINE scratch clone of
|
||||
the per-run tree, checked out at the exact tested ref: `origin` becomes a local path (tag
|
||||
fetch works offline, no auth) and the run's true tag set rides along (fetch_recipe pulls the
|
||||
upstream version tags into the per-run tree). No lint rule is filtered or ignored.
|
||||
- **rc is not the verdict.** `abra recipe lint` exits non-zero only when it cannot lint
|
||||
(FATA); rule outcomes live in its table — error-severity ❌ rows print a trailing
|
||||
"WARN critical errors present …" sentinel but still exit 0. So the classifier parses the
|
||||
table: FAIL iff an error-severity rule is unsatisfied (or the FATA is content-attributable:
|
||||
"unable to validate recipe" — the recipe config itself is invalid). PASS iff the table
|
||||
rendered and no error rule failed. ANYTHING else — timeout, abra/script missing, tag-fetch
|
||||
FATA, unparseable output — is "unver": loud, never a silent pass, never an intentional skip.
|
||||
- **Best-effort + time-bounded.** Hard ~60s timeout (observed runtime ≈0.7s); the caller
|
||||
wraps run_lint in try/except besides — a wedged lint can never hang or fail a run, and the
|
||||
run VERDICT is untouched by any lint outcome (lint is a level rung, not a gate).
|
||||
- Full command output (+ cmd, rc, ref header) is captured to `lint.txt` in the run artifact
|
||||
dir; results.json carries status + short excerpt (failing rule ids).
|
||||
|
||||
abra needs a PTY even with -n ("inappropriate ioctl on device") → run via util-linux
|
||||
`script -qec`, same trick as harness.abra._run_pty.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import re
|
||||
import shlex
|
||||
import shutil
|
||||
import subprocess
|
||||
import tempfile
|
||||
|
||||
from . import abra
|
||||
|
||||
LINT_TIMEOUT = 60 # hard budget, seconds; observed ~0.7s per recipe
|
||||
|
||||
# Strip ANSI escape sequences from PTY output before parsing.
|
||||
_ANSI = re.compile(r"\x1b\[[0-9;?]*[A-Za-z]")
|
||||
|
||||
# A table row: ┃ R014 ┃ description ┃ error ┃ ✅/❌ ┃ skipped ┃ how-to-fix ┃ — abra renders the
|
||||
# grid with HEAVY box-drawing verticals (┃ U+2503); accept the light variant (│ U+2502) too.
|
||||
_ROW = re.compile(
|
||||
r"^\s*[│┃]\s*(R\d+)\s*[│┃](.*?)[│┃]\s*(warn|error)\s*[│┃]\s*(✅|❌)\s*[│┃]\s*([^│┃]*)[│┃]"
|
||||
)
|
||||
|
||||
# abra's trailing sentinel when any error-severity rule is unsatisfied (cross-check only).
|
||||
_SENTINEL = "critical errors present"
|
||||
|
||||
# FATA classes that are the RECIPE's fault (its config cannot even be validated) — a lint
|
||||
# FAIL, not an unverified rung. Everything else non-zero is environmental → unver.
|
||||
_CONTENT_FATA = "unable to validate recipe"
|
||||
|
||||
|
||||
def parse_table(output: str) -> list[dict]:
|
||||
"""Parse the lint table → rows {rule, desc, severity, satisfied(bool), skipped(bool)}.
|
||||
Tolerant: lines that don't match are ignored; returns [] when no table rendered."""
|
||||
rows = []
|
||||
for line in _ANSI.sub("", output).replace("\r", "\n").splitlines():
|
||||
m = _ROW.match(line)
|
||||
if not m:
|
||||
continue
|
||||
rule, desc, severity, mark, skipped = m.groups()
|
||||
rows.append(
|
||||
{
|
||||
"rule": rule,
|
||||
"desc": desc.strip(),
|
||||
"severity": severity,
|
||||
"satisfied": mark == "✅",
|
||||
"skipped": skipped.strip() not in ("", "-"),
|
||||
}
|
||||
)
|
||||
return rows
|
||||
|
||||
|
||||
def classify(rc: int | None, output: str) -> tuple[str, str, list[str]]:
|
||||
"""(status, detail, failed_rule_ids) from a finished lint invocation.
|
||||
|
||||
status ∈ {"pass","fail","unver"}; never a silent pass: pass requires a parsed table with
|
||||
zero unsatisfied error-severity rules AND no sentinel. `rc=None` means the run itself blew
|
||||
up (timeout/missing binary) — always unver; the caller supplies the detail.
|
||||
"""
|
||||
if rc is None:
|
||||
return "unver", "lint did not run", []
|
||||
if rc != 0:
|
||||
first = next((ln for ln in _ANSI.sub("", output).splitlines() if "FATA" in ln), "").strip()
|
||||
if _CONTENT_FATA in output:
|
||||
# The recipe config itself failed validation — attributable to recipe content.
|
||||
return "fail", first or "recipe config failed validation", []
|
||||
return "unver", first or f"abra recipe lint exited {rc} with no table", []
|
||||
rows = parse_table(output)
|
||||
if not rows:
|
||||
return "unver", "no lint table in output (rc=0)", []
|
||||
failed = [
|
||||
r["rule"]
|
||||
for r in rows
|
||||
if r["severity"] == "error" and not r["satisfied"] and not r["skipped"]
|
||||
]
|
||||
if failed:
|
||||
return "fail", f"error rule(s) unsatisfied: {', '.join(failed)}", failed
|
||||
if _SENTINEL in output:
|
||||
# abra says critical errors but our parse found none — distrust the parse, never inflate.
|
||||
return "fail", "abra reported critical errors (table parse found none)", []
|
||||
return "pass", "", []
|
||||
|
||||
|
||||
def run_lint(recipe: str, ref: str | None, out_dir: str | None) -> dict:
|
||||
"""Execute the lint rung for `recipe` at exactly `ref` (a sha; None → the per-run tree's
|
||||
current HEAD). Returns {"status","detail","rules_failed"} and writes lint.txt into
|
||||
`out_dir` (when given). Never raises: every failure mode is caught into status "unver"."""
|
||||
scratch = None
|
||||
rc: int | None = None
|
||||
output = ""
|
||||
try:
|
||||
src_tree = abra.recipe_dir(recipe)
|
||||
scratch = tempfile.mkdtemp(prefix="ccci-lint-")
|
||||
lint_abra = os.path.join(scratch, "abra")
|
||||
os.makedirs(os.path.join(lint_abra, "recipes"))
|
||||
clone = os.path.join(lint_abra, "recipes", recipe)
|
||||
subprocess.run(
|
||||
["git", "clone", "--quiet", src_tree, clone],
|
||||
check=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=LINT_TIMEOUT,
|
||||
)
|
||||
if ref:
|
||||
subprocess.run(
|
||||
["git", "-C", clone, "checkout", "-f", "--quiet", ref],
|
||||
check=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=LINT_TIMEOUT,
|
||||
)
|
||||
# catalogue: R006 (published catalogue version) reads it; servers: harmless, some abra
|
||||
# paths stat it. Symlink the live ones (read-only use).
|
||||
for shared in ("catalogue", "servers"):
|
||||
src = os.path.join(abra.abra_dir(), shared)
|
||||
if os.path.exists(src):
|
||||
os.symlink(os.path.realpath(src), os.path.join(lint_abra, shared))
|
||||
env = dict(os.environ, ABRA_DIR=lint_abra)
|
||||
proc = subprocess.run(
|
||||
["script", "-qec", f"abra recipe lint -n {shlex.quote(recipe)}", "/dev/null"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=LINT_TIMEOUT,
|
||||
env=env,
|
||||
)
|
||||
rc, output = proc.returncode, proc.stdout + proc.stderr
|
||||
status, detail, failed = classify(rc, output)
|
||||
except subprocess.TimeoutExpired:
|
||||
status, detail, failed = "unver", f"lint timed out after {LINT_TIMEOUT}s", []
|
||||
except Exception as e: # noqa: BLE001 — rung must never break the run; unver is the honest floor
|
||||
status, detail, failed = "unver", f"lint executor error: {e.__class__.__name__}: {e}", []
|
||||
finally:
|
||||
if scratch:
|
||||
shutil.rmtree(scratch, ignore_errors=True)
|
||||
if status == "unver":
|
||||
print(f"!! lint rung UNVERIFIED for {recipe}: {detail}", flush=True)
|
||||
if out_dir:
|
||||
try:
|
||||
os.makedirs(out_dir, exist_ok=True)
|
||||
with open(os.path.join(out_dir, "lint.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(
|
||||
f"$ abra recipe lint -n {recipe} (ref={ref or 'HEAD'})\n"
|
||||
f"rc={rc} status={status} {detail}\n\n{output}"
|
||||
)
|
||||
except OSError as e:
|
||||
print(f" lint: could not write lint.txt (non-fatal): {e}", flush=True)
|
||||
return {"status": status, "detail": detail, "rules_failed": failed}
|
||||
@ -70,13 +70,13 @@ KEYS: tuple[Key, ...] = (
|
||||
"BACKUP_CAPABLE",
|
||||
"bool",
|
||||
None,
|
||||
"Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces N/A; `True` forces the tier on; unset = auto-detect.",
|
||||
"Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces an intentional skip of the backup/restore rung; `True` forces the tier on; unset = auto-detect.",
|
||||
),
|
||||
Key(
|
||||
"EXPECTED_NA",
|
||||
"dict",
|
||||
None,
|
||||
"Declare an N/A rung intentional: `{rung: reason}`. The cap stands either way; only the report wording changes.",
|
||||
"Declare a non-run rung an INTENTIONAL skip: `{rung: reason}` — the level climbs past it; an undeclared non-run rung is *unverified* and blocks the level above it (classification table: machine-docs/DECISIONS.md phase lvl5). Never overrides an exercised pass/fail; the `lint` rung has no escape hatch.",
|
||||
),
|
||||
Key(
|
||||
"READY_PROBE",
|
||||
|
||||
@ -1,20 +1,22 @@
|
||||
"""Phase 3 — structured run results + results.json (plan-phase3-results-ux.md §4.2, R1/R3).
|
||||
"""Structured run results + results.json (Phase 3 §4.2 R1/R3; level semantics: phase lvl5).
|
||||
|
||||
Turns a run's per-tier pytest outcomes into a single `results.json` artifact carrying, per the plan:
|
||||
Turns a run's per-tier pytest outcomes into a single `results.json` artifact carrying:
|
||||
{ recipe, version, pr, ref, run_id, finished, stages:[{name,status,tests:[{name,status,ms}]}],
|
||||
level, level_cap_reason, level_cap_rung, rungs,
|
||||
level, rungs, lint:{status,detail,rules_failed},
|
||||
skips:{intentional:{rung:reason}, unintentional:[rung]},
|
||||
flags:{clean_teardown,no_secret_leak}, screenshot, summary_card }
|
||||
|
||||
`skips` splits the N/A (skipped) rungs by a simple rule: a skip is INTENTIONAL iff the recipe lists
|
||||
it (with a reason) in `recipe_meta.EXPECTED_NA = {rung: reason}`; any rung skipped but not listed is
|
||||
UNINTENTIONAL (a coverage gap to fill or declare). Skips still cap the level either way — the harness
|
||||
never claims a rung it did not verify; this only labels *why* a skip happened.
|
||||
Rung statuses (phase lvl5, operator-decided — see harness.level + DECISIONS.md): every rung is
|
||||
"pass" | "fail" | "skip" (INTENTIONAL — a declared/structural fact says the rung does not apply)
|
||||
| "unver" (UNINTENTIONAL — the rung should have run and wasn't verified; blocks the level like a
|
||||
fail). `derive_rungs` is the single place every N/A source is classified; anything it cannot
|
||||
attribute to a declared/structural fact defaults to "unver" (conservative). `skips` mirrors that
|
||||
split into results.json: intentional {rung: reason} / unintentional [rung] (= the unver rungs).
|
||||
|
||||
The per-test breakdown comes from JUnit XML emitted by each tier's pytest invocation (`--junitxml`),
|
||||
parsed here with the stdlib (no new dep). The integer **level** is computed by harness.level from a
|
||||
rung-status dict derived here (`derive_rungs`) from the tier results + deps/SSO signals the
|
||||
orchestrator holds; that mapping is documented in DECISIONS.md (Phase 3).
|
||||
rung-status dict derived here (`derive_rungs`) from the tier results + structural signals the
|
||||
orchestrator holds; the classification table is in DECISIONS.md (phase lvl5).
|
||||
|
||||
This module is import-pure (no side effects at import). `write_results` is the only writer; the
|
||||
orchestrator calls the build/write path inside a try/except so a results failure NEVER changes the
|
||||
@ -138,53 +140,90 @@ def derive_rungs(
|
||||
results: dict[str, str],
|
||||
*,
|
||||
backup_capable: bool,
|
||||
has_custom: bool,
|
||||
has_upgrade_target: bool,
|
||||
expected_na: dict | None = None,
|
||||
lint_status: str | None = None,
|
||||
) -> dict[str, str]:
|
||||
"""Translate the orchestrator's tier results into the rung-status dict harness.level consumes —
|
||||
the FOUR essential rungs only. Conservative by design — never reports a rung 'pass' it can't
|
||||
substantiate (cardinal guardrail: presentation never inflates).
|
||||
"""Translate the orchestrator's tier results + structural signals into the rung-status dict
|
||||
harness.level consumes — the FIVE essential rungs. This is the SINGLE place every N/A source
|
||||
is classified intentional ("skip") vs unintentional ("unver"); the table lives in DECISIONS.md
|
||||
(phase lvl5). Conservative by design: never reports "pass" it can't substantiate, and any
|
||||
rung that did not produce a pass/fail and has NO declared/structural reason is "unver".
|
||||
|
||||
L1 install : install tier pass.
|
||||
L2 upgrade : upgrade tier (skip → N/A: only one published version).
|
||||
L3 backup/res : backup AND restore tiers pass (N/A if not backup-capable).
|
||||
L4 functional : recipe-specific functional tests pass — the custom tier. N/A if none ran.
|
||||
L1 install : install tier pass. Always applies — never "skip" (non-run → unver).
|
||||
L2 upgrade : upgrade tier. Tier skipped + no upgrade target (only one published
|
||||
version, structural) → "skip"; declared in EXPECTED_NA → "skip";
|
||||
anything else non-pass/fail (prior-stage abort, tier excluded) → "unver".
|
||||
L3 backup/res : backup AND restore tiers pass. Not backup-capable (declared/structural)
|
||||
→ "skip"; EXPECTED_NA → "skip"; unverified-while-capable → "unver".
|
||||
L4 functional : the custom tier. No custom tests / tier skipped → EXPECTED_NA-declared
|
||||
"skip", else "unver" (absent functional coverage is a gap, not an
|
||||
intentional property of the recipe).
|
||||
L5 lint : from the lint executor (harness.lint). pass/fail only — every recipe can
|
||||
be linted, so there is NO intentional-skip escape hatch: a lint that
|
||||
could not run (timeout, abra missing, executor error) is "unver".
|
||||
|
||||
Integration (SSO/OIDC) and recipe-local are OPTIONAL and intentionally NOT rungs here — they
|
||||
never cap the level (SSO is still enforced for the run VERDICT in run_recipe_ci.py).
|
||||
never affect the level (SSO is still enforced for the run VERDICT in run_recipe_ci.py).
|
||||
"""
|
||||
expected = set((expected_na or {}).keys())
|
||||
rungs: dict[str, str] = {}
|
||||
rungs["install"] = level_mod.tier_to_rung(results.get("install"))
|
||||
rungs["upgrade"] = level_mod.tier_to_rung(results.get("upgrade"))
|
||||
rungs["backup_restore"] = level_mod.backup_restore_status(
|
||||
|
||||
up = results.get("upgrade")
|
||||
if up in ("pass", "fail"):
|
||||
rungs["upgrade"] = up
|
||||
elif up == "skip" and not has_upgrade_target:
|
||||
# The orchestrator skipped the tier for the structural reason: nothing to upgrade from.
|
||||
rungs["upgrade"] = "skip"
|
||||
elif "upgrade" in expected:
|
||||
rungs["upgrade"] = "skip"
|
||||
else:
|
||||
rungs["upgrade"] = "unver"
|
||||
|
||||
br = level_mod.backup_restore_status(
|
||||
results.get("backup"), results.get("restore"), backup_capable
|
||||
)
|
||||
if br == "unver" and "backup_restore" in expected:
|
||||
br = "skip"
|
||||
rungs["backup_restore"] = br
|
||||
|
||||
custom = results.get("custom")
|
||||
if not has_custom or custom == "skip" or custom is None:
|
||||
rungs["functional"] = "na"
|
||||
elif custom == "fail":
|
||||
rungs["functional"] = "fail"
|
||||
else: # custom == "pass"
|
||||
rungs["functional"] = "pass"
|
||||
if custom in ("pass", "fail"):
|
||||
rungs["functional"] = custom
|
||||
elif "functional" in expected:
|
||||
rungs["functional"] = "skip"
|
||||
else:
|
||||
rungs["functional"] = "unver"
|
||||
|
||||
rungs["lint"] = lint_status if lint_status in ("pass", "fail") else "unver"
|
||||
return rungs
|
||||
|
||||
|
||||
def skips(rungs: dict[str, str], expected_na: dict | None) -> dict:
|
||||
"""Split the SKIPPED (N/A) rungs into intentional vs unintentional (operator model).
|
||||
# Reasons attached to STRUCTURAL intentional skips (no EXPECTED_NA declaration needed — the
|
||||
# fact is read off the recipe itself).
|
||||
_STRUCTURAL_REASON = {
|
||||
"upgrade": "only one published version — no upgrade target",
|
||||
"backup_restore": "not backup-capable (no backupbot labels / declared)",
|
||||
}
|
||||
|
||||
A recipe lists the rungs it intentionally skips, each with a reason, in
|
||||
`recipe_meta.EXPECTED_NA = {rung: reason}`. The rule is dead simple: a skipped rung is
|
||||
**intentional** iff it is in that list; any rung that is skipped and NOT in the list is
|
||||
**unintentional** (a coverage gap someone should either fill or declare). N/A still caps the
|
||||
level either way — the harness never claims a rung it did not verify — this only labels *why* a
|
||||
skip happened. Returns:
|
||||
{ "intentional": {rung: reason, ...}, # skipped AND declared in EXPECTED_NA
|
||||
"unintentional": [rung, ...] } # skipped but NOT declared
|
||||
"""
|
||||
|
||||
def skips(
|
||||
rungs: dict[str, str],
|
||||
expected_na: dict | None,
|
||||
) -> dict:
|
||||
"""Mirror the rung classification into results.json's `skips` block:
|
||||
{ "intentional": {rung: reason, ...}, # status "skip" — declared/structural, with why
|
||||
"unintentional": [rung, ...] } # status "unver" — should have run, wasn't verified
|
||||
The reason is the recipe's EXPECTED_NA declaration when present, else the structural fact
|
||||
derive_rungs skipped on. Purely descriptive — the level math lives in harness.level."""
|
||||
expected = {str(k): str(v) for k, v in (expected_na or {}).items()}
|
||||
na = [r for r, st in rungs.items() if st == "na"]
|
||||
intentional = {r: expected[r] for r in na if r in expected}
|
||||
unintentional = sorted(r for r in na if r not in expected)
|
||||
intentional = {
|
||||
r: expected.get(r) or _STRUCTURAL_REASON.get(r, "declared intentional")
|
||||
for r, st in rungs.items()
|
||||
if st == "skip"
|
||||
}
|
||||
unintentional = sorted(r for r, st in rungs.items() if st == "unver")
|
||||
return {"intentional": intentional, "unintentional": unintentional}
|
||||
|
||||
|
||||
@ -200,6 +239,8 @@ def build_results(
|
||||
clean_teardown: bool,
|
||||
no_secret_leak: bool,
|
||||
finished_ts: float | None,
|
||||
has_upgrade_target: bool = True,
|
||||
lint: dict | None = None,
|
||||
screenshot: str | None = None,
|
||||
summary_card: str | None = None,
|
||||
expected_na: dict | None = None,
|
||||
@ -207,17 +248,41 @@ def build_results(
|
||||
) -> dict:
|
||||
"""Assemble the full results.json dict (no I/O). `finished_ts` is passed in (the orchestrator
|
||||
stamps it) so this stays pure and deterministic for unit tests. `expected_na` is the recipe's
|
||||
declared intentional-skip map (recipe_meta.EXPECTED_NA) used to distinguish a deliberate skip from
|
||||
accidentally-missing coverage."""
|
||||
declared intentional-skip map (recipe_meta.EXPECTED_NA); `has_upgrade_target` is the structural
|
||||
"a previous published version exists" fact; `lint` is harness.lint.run_lint's result dict
|
||||
(None — e.g. an old caller — derives the lint rung as "unver": never a silent pass)."""
|
||||
stages = collect_stages(records)
|
||||
has_custom = any(r["tier"] == "custom" for r in records)
|
||||
rungs = derive_rungs(results, backup_capable=backup_capable, has_custom=has_custom)
|
||||
lvl, cap_reason = level_mod.compute_level(rungs)
|
||||
# The rung that capped the climb (lowest non-pass), or None on a full climb — lets a consumer
|
||||
# (card/badge) tell whether the cap was an intentional skip, an unintentional one, or a failure.
|
||||
capped = level_mod.RUNGS[lvl] if cap_reason else None
|
||||
lint = lint or {}
|
||||
lint_status = lint.get("status")
|
||||
rungs = derive_rungs(
|
||||
results,
|
||||
backup_capable=backup_capable,
|
||||
has_upgrade_target=has_upgrade_target,
|
||||
expected_na=expected_na,
|
||||
lint_status=lint_status,
|
||||
)
|
||||
# Surface lint in the per-stage table too (it has no pytest/JUnit tier), so the card's
|
||||
# stage breakdown carries all five rungs.
|
||||
if rungs["lint"] != "skip": # lint is never "skip", but stay defensive
|
||||
stages.append(
|
||||
{
|
||||
"name": "lint",
|
||||
"status": rungs["lint"],
|
||||
"tests": [
|
||||
{
|
||||
"name": "abra recipe lint",
|
||||
"classname": "lint",
|
||||
"source": "harness",
|
||||
"status": rungs["lint"],
|
||||
"ms": 0,
|
||||
"message": str(lint.get("detail") or ""),
|
||||
}
|
||||
],
|
||||
}
|
||||
)
|
||||
lvl = level_mod.compute_level(rungs)
|
||||
return {
|
||||
"schema": 1,
|
||||
"schema": 2,
|
||||
"run_id": run_id(),
|
||||
"recipe": recipe,
|
||||
"version": version,
|
||||
@ -225,9 +290,12 @@ def build_results(
|
||||
"ref": (ref or "")[:12],
|
||||
"finished": finished_ts,
|
||||
"level": lvl,
|
||||
"level_cap_reason": cap_reason,
|
||||
"level_cap_rung": capped,
|
||||
"rungs": rungs,
|
||||
"lint": {
|
||||
"status": rungs["lint"],
|
||||
"detail": str(lint.get("detail") or ""),
|
||||
"rules_failed": list(lint.get("rules_failed") or []),
|
||||
},
|
||||
"skips": skips(rungs, expected_na),
|
||||
"stages": stages,
|
||||
"results": results,
|
||||
|
||||
@ -18,6 +18,7 @@ missing, app slow, navigation error) is swallowed and returns None so the run/ve
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import contextlib
|
||||
import os
|
||||
|
||||
from . import browser as harness_browser
|
||||
@ -28,6 +29,73 @@ VIEWPORT = {"width": 1280, "height": 800}
|
||||
# Hard cap so a wedged app can never hang the run on the screenshot step (R7 / Phase-1 timeouts).
|
||||
NAV_DEADLINE_S = 45
|
||||
|
||||
# ---- post-navigation settle (phase-shot fix, 2026-06-11) ----
|
||||
# SPAs (immich, n8n, cryptpad, the keycloak admin console, lasuite-*, mumble-web, mattermost) fire
|
||||
# `domcontentloaded` on their empty HTML shell and only paint after the JS bundle loads — snapping
|
||||
# immediately produced solid blank frames (byte-stable 4801-2 B) or loading spinners. After nav,
|
||||
# wait for network-idle up to SETTLE_TIMEOUT_MS (apps that never go idle — continuous polling —
|
||||
# simply spend the cap; bounded, never raises), then RENDER_GRACE_MS for the final paint.
|
||||
SETTLE_TIMEOUT_MS = 10_000
|
||||
RENDER_GRACE_MS = 500
|
||||
# A 1280x800 PNG below this is near-certainly a solid frame or a bare loading spinner (phase-shot
|
||||
# audit: blank frames were 4801-2 B across three different apps, lone spinners 5.9-8.8 KB; the
|
||||
# smallest real page was 12950 B). One bounded retry with an extra settle, then keep what we get —
|
||||
# an honest late frame beats none, and the retry only ever replaces a tiny frame with a later one.
|
||||
BLANK_SIZE_BYTES = 10_000
|
||||
BLANK_RETRY_SETTLE_MS = 4_000
|
||||
# Wait-budget arithmetic (plan-phase-shot §3 P3: step worst case ≤ ~60s): NAV_DEADLINE_S (45s,
|
||||
# spent only while the app isn't serving yet) + SETTLE_TIMEOUT_MS + RENDER_GRACE_MS +
|
||||
# BLANK_RETRY_SETTLE_MS + RENDER_GRACE_MS = 60s of bounded waiting; tested in unit tests.
|
||||
|
||||
|
||||
def _settle(page, idle_timeout_ms: int) -> None:
|
||||
"""Best-effort bounded settle: network-idle up to the cap, then a short render grace.
|
||||
Never raises (R7) — a timeout just means the page kept polling; we snap what's painted."""
|
||||
# cosmetic path (R7): a timeout on a never-idle app is expected — the cap IS the wait
|
||||
with contextlib.suppress(Exception):
|
||||
page.wait_for_load_state("networkidle", timeout=idle_timeout_ms)
|
||||
with contextlib.suppress(Exception):
|
||||
page.wait_for_timeout(RENDER_GRACE_MS)
|
||||
|
||||
|
||||
def settle(page, idle_timeout_ms: int = SETTLE_TIMEOUT_MS) -> None:
|
||||
"""Public settle for recipe SCREENSHOT hooks: after the hook navigates to its safe view, call
|
||||
this so the snap happens post-paint. Same bounded best-effort contract as the default path."""
|
||||
_settle(page, idle_timeout_ms)
|
||||
|
||||
|
||||
def _snap_with_blank_retry(page, out_path: str) -> None:
|
||||
"""Screenshot the page; if the PNG is blank/spinner-sized, retry ONCE after a longer settle.
|
||||
The retry is snapped to a temp path and kept only if it is >= the first frame's size — later
|
||||
is usually more painted, but a page can also regress (redirect, error overlay) and a worse
|
||||
frame must never overwrite a better one (adversary finding A1)."""
|
||||
page.screenshot(path=out_path, full_page=False)
|
||||
try:
|
||||
first = os.path.getsize(out_path)
|
||||
except OSError:
|
||||
return
|
||||
if first >= BLANK_SIZE_BYTES:
|
||||
return
|
||||
print(
|
||||
f" screenshot: frame looks blank/loading ({first} B < {BLANK_SIZE_BYTES} B) — "
|
||||
"one retry after a longer settle",
|
||||
flush=True,
|
||||
)
|
||||
_settle(page, BLANK_RETRY_SETTLE_MS)
|
||||
retry_path = out_path + ".retry"
|
||||
try:
|
||||
page.screenshot(path=retry_path, full_page=False)
|
||||
retry = os.path.getsize(retry_path)
|
||||
if retry >= first:
|
||||
os.replace(retry_path, out_path)
|
||||
print(f" screenshot: retry frame kept ({retry} B >= {first} B)", flush=True)
|
||||
else:
|
||||
os.remove(retry_path)
|
||||
print(f" screenshot: retry frame discarded ({retry} B < {first} B)", flush=True)
|
||||
finally:
|
||||
with contextlib.suppress(OSError):
|
||||
os.remove(retry_path)
|
||||
|
||||
|
||||
def screenshot_path(run_artifact_dir: str) -> str:
|
||||
"""Canonical on-disk path for a run's app screenshot (pure)."""
|
||||
@ -79,7 +147,7 @@ def capture(domain: str, out_path: str, *, recipe_meta: dict | None = None) -> s
|
||||
# the uniform ctx convention (rcust P3).
|
||||
hook(page, meta_mod.hook_ctx(domain, recipe_meta))
|
||||
if not os.path.exists(out_path):
|
||||
page.screenshot(path=out_path, full_page=False)
|
||||
_snap_with_blank_retry(page, out_path)
|
||||
else:
|
||||
# Default: landing page. Accept any rendered status (200 or an auth redirect to a
|
||||
# login form) — both are credential-free and representative of "the app is up".
|
||||
@ -90,7 +158,9 @@ def capture(domain: str, out_path: str, *, recipe_meta: dict | None = None) -> s
|
||||
deadline_seconds=NAV_DEADLINE_S,
|
||||
wait_until="domcontentloaded",
|
||||
)
|
||||
page.screenshot(path=out_path, full_page=False)
|
||||
# SPA paint race fix (phase-shot): settle before snapping, retry a blank frame.
|
||||
_settle(page, SETTLE_TIMEOUT_MS)
|
||||
_snap_with_blank_retry(page, out_path)
|
||||
finally:
|
||||
browser.close()
|
||||
if os.path.exists(out_path) and os.path.getsize(out_path) > 0:
|
||||
|
||||
@ -58,6 +58,9 @@ from harness import ( # noqa: E402
|
||||
from harness import ( # noqa: E402
|
||||
deps as deps_mod,
|
||||
)
|
||||
from harness import ( # noqa: E402
|
||||
lint as lint_mod,
|
||||
)
|
||||
from harness import ( # noqa: E402
|
||||
manifest as manifest_mod,
|
||||
)
|
||||
@ -928,6 +931,24 @@ def main() -> int:
|
||||
run_artifact_dir = os.path.join(results_mod.runs_dir(), results_mod.run_id())
|
||||
junit_dir = os.path.join(run_artifact_dir, "junit")
|
||||
records: list[dict] = []
|
||||
|
||||
# L5 lint rung (phase lvl5): `abra recipe lint` against the EXACT tested ref, in a pristine
|
||||
# scratch clone (harness.lint — the per-run tree is still at head_ref here, before any
|
||||
# version-pinning checkout). Level rung only — NEVER the verdict: run_lint catches every
|
||||
# failure mode into status "unver" (60s hard budget) and this belt-and-braces wrap makes a
|
||||
# crashed executor identical to "could not verify".
|
||||
lint_result = {"status": "unver", "detail": "lint executor crashed", "rules_failed": []}
|
||||
try:
|
||||
lint_result = lint_mod.run_lint(recipe, head_ref, run_artifact_dir)
|
||||
except Exception as e: # noqa: BLE001 — lint is a rung, not a gate; never touches the verdict
|
||||
print(
|
||||
f"!! lint rung executor crashed (non-fatal, rung=unver): {_scrub(str(e))}", flush=True
|
||||
)
|
||||
print(
|
||||
f"lint rung: {lint_result['status']}"
|
||||
f"{' — ' + lint_result['detail'] if lint_result.get('detail') else ''}",
|
||||
flush=True,
|
||||
)
|
||||
with contextlib.suppress(OSError):
|
||||
os.makedirs(junit_dir, exist_ok=True)
|
||||
|
||||
@ -1253,6 +1274,8 @@ def main() -> int:
|
||||
records=records,
|
||||
results=results,
|
||||
backup_capable=backup_cap,
|
||||
has_upgrade_target=prev is not None, # structural: a previous published version exists
|
||||
lint=lint_result, # L5 rung (phase lvl5)
|
||||
clean_teardown=clean_teardown,
|
||||
no_secret_leak=True, # narrowed below by an actual scan of the serialised artifact
|
||||
screenshot=screenshot_rel, # Phase 3 U1 (R4): relative PNG name iff capture succeeded
|
||||
@ -1270,17 +1293,15 @@ def main() -> int:
|
||||
file=sys.stderr,
|
||||
)
|
||||
path = results_mod.write_results(data)
|
||||
print(
|
||||
f"results.json written: {path} (level={data['level']}"
|
||||
f"{' — ' + data['level_cap_reason'] if data['level_cap_reason'] else ''})",
|
||||
flush=True,
|
||||
)
|
||||
# Surface UNINTENTIONAL skips in the CI log (non-blocking, R7): a rung that was skipped (N/A)
|
||||
# but is not in the recipe's intentional list — either add the missing coverage or declare it.
|
||||
print(f"results.json written: {path} (level={data['level']} of 5)", flush=True)
|
||||
# Surface UNVERIFIED rungs in the CI log (non-blocking, R7): a rung that should have run
|
||||
# and wasn't verified blocks the level above it — fill the coverage, or (where a
|
||||
# declared/structural reason genuinely applies) declare it in EXPECTED_NA.
|
||||
for rung in data.get("skips", {}).get("unintentional", []):
|
||||
print(
|
||||
f"⚠ coverage: rung '{rung}' was skipped (N/A) but is not declared intentional — add "
|
||||
f"the missing test/label, or list it in tests/{recipe}/recipe_meta.py "
|
||||
f"⚠ coverage: rung '{rung}' is UNVERIFIED (did not run / could not be checked) — "
|
||||
f"the level cannot rise above it. Add the missing test/coverage, or declare a "
|
||||
f"genuine inapplicability in tests/{recipe}/recipe_meta.py "
|
||||
f"EXPECTED_NA = {{'{rung}': '<why>'}}.",
|
||||
flush=True,
|
||||
)
|
||||
@ -1302,21 +1323,10 @@ def main() -> int:
|
||||
with open(html_path, "w", encoding="utf-8") as f:
|
||||
f.write(card_mod.render_card_html(data, screenshot_rel=data.get("screenshot")))
|
||||
png = card_mod.render_card_png(html_path, os.path.join(run_artifact_dir, "summary.png"))
|
||||
capped = data.get("level_cap_rung")
|
||||
sk = data.get("skips", {})
|
||||
cap_skip = (
|
||||
"intentional"
|
||||
if capped in (sk.get("intentional") or {})
|
||||
else "unintentional"
|
||||
if capped in (sk.get("unintentional") or [])
|
||||
else ""
|
||||
)
|
||||
# Badge = level only (number + colour) — the per-rung table on the card is the sole
|
||||
# carrier of "why isn't this higher" (operator-specified, phase lvl5).
|
||||
with open(os.path.join(run_artifact_dir, "badge.svg"), "w", encoding="utf-8") as f:
|
||||
f.write(
|
||||
card_mod.level_badge_svg(
|
||||
data["level"], data.get("level_cap_reason", ""), cap_skip
|
||||
)
|
||||
)
|
||||
f.write(card_mod.level_badge_svg(data["level"]))
|
||||
print(
|
||||
f"summary card {'rendered ' + png if png else '(PNG render unavailable)'} + "
|
||||
f"badge.svg written into {run_artifact_dir}",
|
||||
|
||||
@ -19,7 +19,12 @@ def pre_install(ctx):
|
||||
NOT create the MinIO bucket: `minio-createbuckets` is a `replicas:0` one-shot (restart_policy:
|
||||
none) that must be triggered. The MinIO storage test asserts the bucket exists, so trigger it
|
||||
here and poll. `--detach` is REQUIRED: the job creates the bucket then EXITS 0, so it never
|
||||
holds a steady 1/1 replica — a blocking scale would wait forever."""
|
||||
holds a steady 1/1 replica — a blocking scale would wait forever.
|
||||
|
||||
BEST-EFFORT, like the setup_custom_tests.sh it replaced: on poll timeout we WARN and continue
|
||||
(the one-shot often lands just after the window). The custom-tier MinIO storage test is the
|
||||
real gate for a genuinely missing bucket — failing the install op here was an rcust M2
|
||||
regression (the original hook fell through on timeout by design)."""
|
||||
stack = ctx.domain.replace(".", "_")
|
||||
print(" pre_install: creating MinIO bucket via the minio-createbuckets one-shot", flush=True)
|
||||
subprocess.run(
|
||||
@ -51,7 +56,12 @@ def pre_install(ctx):
|
||||
)
|
||||
return
|
||||
time.sleep(3)
|
||||
raise AssertionError("minio-createbuckets one-shot did not create drive-media-storage in 90s")
|
||||
print(
|
||||
" !! pre_install: minio-createbuckets one-shot did not create drive-media-storage in 90s "
|
||||
"— continuing (best-effort, as the pre-restructure hook did); the custom-tier MinIO test "
|
||||
"gates a genuinely missing bucket",
|
||||
flush=True,
|
||||
)
|
||||
|
||||
|
||||
def _wait_collabora_ready(domain, timeout=420):
|
||||
|
||||
@ -18,3 +18,31 @@ HEALTH_OK = (200, 302)
|
||||
DEPLOY_TIMEOUT = 900
|
||||
HTTP_TIMEOUT = 600
|
||||
EXTRA_ENV = {"TIMEOUT": "600"}
|
||||
|
||||
|
||||
def SCREENSHOT(page, ctx):
|
||||
"""Land the real sign-in form for the CI card (phase-shot). Mattermost serves a
|
||||
"view in desktop app or browser?" interstitial on a browser's FIRST visit to ANY route
|
||||
(including /login — proven by shot-proof2-mattermost-lts: byte-identical interstitial PNG with
|
||||
and without a plain /login hook); a real user clicks "View in Browser" to reach the login
|
||||
form, so the hook does exactly that. Click + second settle are best-effort (if the
|
||||
interstitial is absent we are already on the form). Credential-free (empty fields, R7
|
||||
secret-safety); the harness snaps the PNG after this returns. Waits are kept short (8s/3s/8s)
|
||||
so the realistic hook path stays well inside the ~60s step budget — the 45s nav deadline is
|
||||
only burned when the app never serves, and then the hook raises before any settle."""
|
||||
import contextlib
|
||||
|
||||
from harness import browser as harness_browser
|
||||
from harness import screenshot as screenshot_mod
|
||||
|
||||
harness_browser.goto_with_retry(
|
||||
page,
|
||||
f"{ctx.base_url}/login",
|
||||
accept_statuses=(200,),
|
||||
deadline_seconds=screenshot_mod.NAV_DEADLINE_S,
|
||||
wait_until="domcontentloaded",
|
||||
)
|
||||
screenshot_mod.settle(page, 8_000)
|
||||
with contextlib.suppress(Exception):
|
||||
page.click("text=View in Browser", timeout=3_000)
|
||||
screenshot_mod.settle(page, 8_000)
|
||||
|
||||
@ -12,8 +12,10 @@ from harness import http as harness_http # noqa: E402
|
||||
def test_plausible_root_serves(live_app):
|
||||
"""GET /api/health → 200 (clickhouse+postgres ready).
|
||||
|
||||
`/` itself 500s via auth_controller under DISABLE_AUTH, so it is NOT a
|
||||
reliable health probe; the dedicated /api/health endpoint is.
|
||||
`/` is NOT a reliable health probe (500s during datastore init; 302s to
|
||||
/register once ready — and 500'd permanently under the pre-2026-06-11
|
||||
62-char SECRET_KEY_BASE, see recipe_meta.EXTRA_ENV); the dedicated
|
||||
/api/health endpoint is.
|
||||
"""
|
||||
url = f"https://{live_app}/api/health"
|
||||
status, _ = harness_http.retry_http_get(url, expect_status=(200,), max_wait=60, interval=3)
|
||||
|
||||
@ -7,9 +7,10 @@ HEALTH_OK = (200,)
|
||||
# `events_db` but the service is named `plausible_events_db`, so swarm applies no ordering) and returns
|
||||
# 500 until clickhouse + DB migrations finish — several minutes on a cold deploy. The dedicated
|
||||
# /api/health endpoint returns 200 with {"clickhouse":"ok","postgres":"ok","sites_cache":"ok"} only
|
||||
# once both datastores are ready, so it is a true readiness probe; `/` is unreliable (500s during init,
|
||||
# 302s once ready, so it cannot distinguish "not ready" from "ready"). Give a wide HTTP window so the
|
||||
# health poll waits out that init. [v1 failed at HTTP_TIMEOUT=600 polling `/`.]
|
||||
# once both datastores are ready, so it is a true readiness probe; `/` is unreliable (500s during init;
|
||||
# 302s to /register once ready — and with the pre-2026-06-11 62-char SECRET_KEY_BASE every HTML render
|
||||
# 500'd permanently, see EXTRA_ENV). Give a wide HTTP window so the health poll waits out that init.
|
||||
# [v1 failed at HTTP_TIMEOUT=600 polling `/`.]
|
||||
DEPLOY_TIMEOUT = 1200
|
||||
HTTP_TIMEOUT = 1200
|
||||
|
||||
@ -17,8 +18,12 @@ HTTP_TIMEOUT = 1200
|
||||
EXTRA_ENV = {
|
||||
"DISABLE_AUTH": "true",
|
||||
"DISABLE_REGISTRATION": "true",
|
||||
# 64-char stable value for CI — plausible (Phoenix) requires >= 64 chars
|
||||
"SECRET_KEY_BASE": "ccciplausibletestkeybase64charsexactlyforCIephemeral4567890123",
|
||||
# Stable CI value, 68 chars — Phoenix's cookie session store requires >= 64 BYTES and raises
|
||||
# `ArgumentError ... at least 64 bytes` → HTTP 500 on EVERY page render (HTML routes only;
|
||||
# /api/* never touches the cookie store, so health + event tests were unaffected) if it is
|
||||
# shorter. The previous value was 62 chars, which is why every page (and the app screenshot)
|
||||
# 500'd while the API tiers all passed (phase-shot root cause, 2026-06-11).
|
||||
"SECRET_KEY_BASE": "ccciplausibletestkeybase64charsexactlyforCIephemeralrun4567890123456",
|
||||
}
|
||||
|
||||
# The upgrade tier defaults its base to recipe_versions[-2]. For the 3.1.0 upgrade PR the
|
||||
|
||||
@ -1,8 +1,11 @@
|
||||
"""Unit tests for the pure card/badge renderers (harness.card), Phase 3 U2 (R3/R6).
|
||||
"""Unit tests for the pure card/badge renderers (harness.card) — phase lvl5 semantics.
|
||||
|
||||
Covers the deterministic HTML + SVG string builders (the PNG step needs Playwright + is exercised in
|
||||
the U2 live demo). The cardinal check: the card REPORTS the data verbatim — level/marks come straight
|
||||
from the dict, never recomputed. Run cold: cc-ci-run -m pytest tests/unit/test_card.py -q
|
||||
Covers the deterministic HTML + SVG string builders (the PNG step needs Playwright + is exercised
|
||||
live). The cardinal check: the card REPORTS the data verbatim — level/marks come straight from the
|
||||
dict, never recomputed — the badge is NUMBER + COLOUR ONLY, and the per-rung table rows (incl.
|
||||
intentional-skip / unverified) are the sole carrier of "why isn't the level higher". Old schema-1
|
||||
artifacts (4-rung ladder, cap fields present) must render without error and without relabeling.
|
||||
Run cold: cc-ci-run -m pytest tests/unit/test_card.py -q
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@ -14,12 +17,19 @@ sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner")
|
||||
from harness import card as C # noqa: E402
|
||||
|
||||
|
||||
def _data(level=3, cap="L4 functional (recipe-specific tests) N/A"):
|
||||
return {
|
||||
def _data(level=5, **kw):
|
||||
d = {
|
||||
"schema": 2,
|
||||
"recipe": "uptime-kuma",
|
||||
"version": "1.23.0",
|
||||
"level": level,
|
||||
"level_cap_reason": cap,
|
||||
"rungs": {
|
||||
"install": "pass",
|
||||
"upgrade": "pass",
|
||||
"backup_restore": "pass",
|
||||
"functional": "pass",
|
||||
"lint": "pass",
|
||||
},
|
||||
"flags": {"clean_teardown": True, "no_secret_leak": True},
|
||||
"screenshot": "screenshot.png",
|
||||
"stages": [
|
||||
@ -36,46 +46,54 @@ def _data(level=3, cap="L4 functional (recipe-specific tests) N/A"):
|
||||
{"name": "test_broken", "status": "fail", "ms": 5},
|
||||
],
|
||||
},
|
||||
{
|
||||
"name": "lint",
|
||||
"status": "pass",
|
||||
"tests": [{"name": "abra recipe lint", "status": "pass", "ms": 0}],
|
||||
},
|
||||
],
|
||||
}
|
||||
d.update(kw)
|
||||
return d
|
||||
|
||||
|
||||
def test_level_color_ramp():
|
||||
assert C.level_color(0) != C.level_color(6)
|
||||
assert C.level_color(6) == "#3fb950"
|
||||
assert C.level_color(99) == "#8b949e" # unknown → grey
|
||||
# 0 (red) … 5 (bright green — full 5-rung climb); unknown → grey.
|
||||
assert C.level_color(0) != C.level_color(5)
|
||||
assert C.level_color(5) == "#3fb950"
|
||||
assert C.level_color(99) == "#8b949e"
|
||||
|
||||
|
||||
def test_badge_svg_wellformed():
|
||||
def test_badge_svg_is_number_and_color_only():
|
||||
svg = C.level_badge_svg(4)
|
||||
assert svg.startswith("<svg") and svg.endswith("</svg>")
|
||||
assert "level 4" in svg
|
||||
assert C.level_color(4) in svg
|
||||
# plain cap (no intent) → two-box badge, no third segment
|
||||
assert "expected" not in svg and "gap?" not in svg
|
||||
# operator-specified (phase lvl5): NOTHING but the level on the badge — no annotation
|
||||
# segment of any kind, whatever the rung situation.
|
||||
assert "expected" not in svg and "gap?" not in svg and "skip" not in svg
|
||||
|
||||
|
||||
def test_badge_svg_differentiates_intentional_vs_unintentional_skip():
|
||||
# an intentional (declared) skip capped the climb → muted "expected" third segment
|
||||
exp = C.level_badge_svg(2, "L3 backup/restore N/A", "intentional")
|
||||
assert "level 2" in exp and "expected" in exp and C.EXPECT_COLOR in exp
|
||||
assert "gap?" not in exp
|
||||
# an unintentional skip (not declared) → amber "gap?" third segment
|
||||
gap = C.level_badge_svg(2, "L3 backup/restore N/A", "unintentional")
|
||||
assert "level 2" in gap and "gap?" in gap and C.GAP_COLOR in gap
|
||||
assert "expected" not in gap
|
||||
def test_badge_svg_level5():
|
||||
svg = C.level_badge_svg(5)
|
||||
assert "level 5" in svg and "#3fb950" in svg
|
||||
|
||||
|
||||
def test_skip_rows_intentional_and_unintentional():
|
||||
def test_skip_rows_intentional_and_unverified():
|
||||
html_out = C._skip_rows(
|
||||
{"intentional": {"backup_restore": "no persistent data"}, "unintentional": ["functional"]}
|
||||
)
|
||||
# intentional skip: labelled row (muted green) + the reason on its own line
|
||||
assert "intentional skip" in html_out and C.SKIP_GREEN in html_out
|
||||
assert "backup/restore" in html_out and "no persistent data" in html_out
|
||||
# unintentional skip: amber row + prompt to declare/add coverage
|
||||
assert "unintentional skip" in html_out and C.GAP_COLOR in html_out
|
||||
assert "functional" in html_out and "EXPECTED_NA" in html_out
|
||||
# unverified rung: amber row + the blocks-the-level explanation
|
||||
assert "unverified" in html_out and C.GAP_COLOR in html_out
|
||||
assert "functional" in html_out and "cannot rise above" in html_out
|
||||
|
||||
|
||||
def test_skip_rows_lint_label_known():
|
||||
html_out = C._skip_rows({"intentional": {}, "unintentional": ["lint"]})
|
||||
assert ">lint<" in html_out.replace("</b>", "<") # rung label renders, not a KeyError
|
||||
|
||||
|
||||
def test_skip_rows_empty_when_no_skips():
|
||||
@ -83,22 +101,68 @@ def test_skip_rows_empty_when_no_skips():
|
||||
|
||||
|
||||
def test_card_html_reports_level_verbatim():
|
||||
html = C.render_card_html(_data(level=2, cap="L3 backup/restore (data integrity) N/A"))
|
||||
html = C.render_card_html(_data(level=2))
|
||||
assert "uptime-kuma" in html
|
||||
assert "1.23.0" in html
|
||||
# the level shown is exactly what was passed (no recompute)
|
||||
assert ">2<" in html
|
||||
assert "L3 backup/restore" in html
|
||||
assert "level 2 of 5" in html
|
||||
assert C.level_color(2) in html
|
||||
|
||||
|
||||
def test_card_html_shows_stage_and_test_marks():
|
||||
def test_card_html_no_cap_language():
|
||||
html = C.render_card_html(_data())
|
||||
assert "capped" not in html and "cap_reason" not in html
|
||||
assert "level 5 of 5" in html
|
||||
|
||||
|
||||
def test_card_html_old_schema1_artifact_renders():
|
||||
# history compatibility: a pre-lvl5 results.json (4-rung ladder, cap fields, "na" statuses)
|
||||
# renders without KeyError and shows ITS OWN ladder height (no retroactive relabeling).
|
||||
old = {
|
||||
"schema": 1,
|
||||
"recipe": "legacy",
|
||||
"version": "0.9",
|
||||
"level": 4,
|
||||
"level_cap_reason": "",
|
||||
"level_cap_rung": None,
|
||||
"rungs": {
|
||||
"install": "pass",
|
||||
"upgrade": "pass",
|
||||
"backup_restore": "pass",
|
||||
"functional": "pass",
|
||||
},
|
||||
"skips": {"intentional": {}, "unintentional": []},
|
||||
"flags": {"clean_teardown": True, "no_secret_leak": True},
|
||||
"screenshot": None,
|
||||
"stages": [],
|
||||
}
|
||||
html = C.render_card_html(old)
|
||||
assert "legacy" in html
|
||||
assert "level 4 of 4" in html # the old top, not 5
|
||||
assert "capped" not in html
|
||||
|
||||
|
||||
def test_card_html_shows_stage_and_test_marks_incl_lint():
|
||||
html = C.render_card_html(_data())
|
||||
assert "install" in html and "custom" in html
|
||||
assert "abra recipe lint" in html
|
||||
assert "test_serving" in html and "test_broken" in html
|
||||
assert C.STATUS_MARK["pass"] in html and C.STATUS_MARK["fail"] in html
|
||||
|
||||
|
||||
def test_card_html_unver_stage_mark_renders():
|
||||
d = _data()
|
||||
d["stages"][2] = {
|
||||
"name": "lint",
|
||||
"status": "unver",
|
||||
"tests": [{"name": "abra recipe lint", "status": "unver", "ms": 0, "message": "timed out"}],
|
||||
}
|
||||
html = C.render_card_html(d)
|
||||
assert C.STATUS_MARK["unver"] in html
|
||||
assert C.STATUS_COLOR["unver"] in html
|
||||
|
||||
|
||||
def test_card_html_flags_rendered():
|
||||
html = C.render_card_html(_data())
|
||||
assert "clean teardown" in html and "no secret leak" in html
|
||||
|
||||
96
tests/unit/test_converged_oneshot.py
Normal file
96
tests/unit/test_converged_oneshot.py
Normal file
@ -0,0 +1,96 @@
|
||||
"""Unit tests for lifecycle.services_converged's completed-one-shot rule (rcust M2 fix-forward).
|
||||
|
||||
A TRIGGERED one-shot service (restart_policy none, scaled 0→1, runs once, exits 0) reports "0/1"
|
||||
forever after its task completes — swarm never restarts it. A bare `cur != want` rejection then
|
||||
blocks convergence for the REST OF THE RUN (lasuite-drive minio-createbuckets: the P2b port moved
|
||||
the bucket trigger BEFORE the install assert, so the assert burned the full DEPLOY_TIMEOUT —
|
||||
pre-restructure the trigger ran after the assert and converge never saw the 0/1).
|
||||
|
||||
Pins (the Adversary's non-vacuity criteria):
|
||||
- deficit explained ENTIRELY by Complete tasks → converged (the one-shot did its job).
|
||||
- deficit with a Failed task → NOT converged (a broken one-shot must not pass).
|
||||
- deficit with a Running/Preparing task → NOT converged (still spinning up; no early green).
|
||||
- deficit with NO tasks yet → NOT converged (still scheduling).
|
||||
- plain N/N services still converge; plain 0/1-spinning-up still doesn't (regression guards).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||
from harness import lifecycle as lc # noqa: E402
|
||||
|
||||
|
||||
class _R:
|
||||
def __init__(self, stdout="", stderr="", returncode=0):
|
||||
self.stdout, self.stderr, self.returncode = stdout, stderr, returncode
|
||||
|
||||
|
||||
def _patch_docker(monkeypatch, replicas_rows, task_states_by_service=None, update_state=""):
|
||||
"""Fake subprocess.run for the three docker calls services_converged makes."""
|
||||
task_states_by_service = task_states_by_service or {}
|
||||
|
||||
def fake_run(args, **kw):
|
||||
if args[:3] == ["docker", "stack", "services"]:
|
||||
return _R(stdout="\n".join(replicas_rows) + "\n")
|
||||
if args[:3] == ["docker", "service", "ps"]:
|
||||
name = args[3]
|
||||
return _R(stdout="\n".join(task_states_by_service.get(name, [])) + "\n")
|
||||
if args[:3] == ["docker", "service", "inspect"]:
|
||||
return _R(stdout=update_state + "\n")
|
||||
raise AssertionError(f"unexpected docker call: {args}")
|
||||
|
||||
monkeypatch.setattr(lc.subprocess, "run", fake_run)
|
||||
|
||||
|
||||
def test_completed_oneshot_deficit_is_converged(monkeypatch):
|
||||
_patch_docker(
|
||||
monkeypatch,
|
||||
["stack_app 1/1", "stack_minio-createbuckets 0/1"],
|
||||
{"stack_minio-createbuckets": ["Complete 28 minutes ago"]},
|
||||
)
|
||||
assert lc.services_converged("app.example.com") is True
|
||||
|
||||
|
||||
def test_failed_oneshot_deficit_is_not_converged(monkeypatch):
|
||||
_patch_docker(
|
||||
monkeypatch,
|
||||
["stack_app 1/1", "stack_minio-createbuckets 0/1"],
|
||||
{"stack_minio-createbuckets": ["Failed 2 minutes ago"]},
|
||||
)
|
||||
assert lc.services_converged("app.example.com") is False
|
||||
|
||||
|
||||
def test_mixed_complete_and_failed_tasks_not_converged(monkeypatch):
|
||||
_patch_docker(
|
||||
monkeypatch,
|
||||
["stack_oneshot 0/1"],
|
||||
{"stack_oneshot": ["Complete 5 minutes ago", "Failed 6 minutes ago"]},
|
||||
)
|
||||
assert lc.services_converged("app.example.com") is False
|
||||
|
||||
|
||||
def test_still_spinning_up_not_converged(monkeypatch):
|
||||
_patch_docker(
|
||||
monkeypatch,
|
||||
["stack_app 0/1"],
|
||||
{"stack_app": ["Preparing 10 seconds ago"]},
|
||||
)
|
||||
assert lc.services_converged("app.example.com") is False
|
||||
|
||||
|
||||
def test_deficit_with_no_tasks_yet_not_converged(monkeypatch):
|
||||
_patch_docker(monkeypatch, ["stack_app 0/1"], {"stack_app": []})
|
||||
assert lc.services_converged("app.example.com") is False
|
||||
|
||||
|
||||
def test_all_full_replicas_still_converged(monkeypatch):
|
||||
_patch_docker(monkeypatch, ["stack_app 1/1", "stack_db 1/1"])
|
||||
assert lc.services_converged("app.example.com") is True
|
||||
|
||||
|
||||
def test_on_demand_zero_zero_oneshot_still_converged(monkeypatch):
|
||||
_patch_docker(monkeypatch, ["stack_app 1/1", "stack_minio-createbuckets 0/0"])
|
||||
assert lc.services_converged("app.example.com") is True
|
||||
@ -28,7 +28,6 @@ def _row(**kw):
|
||||
"ref": "db9a9502",
|
||||
"version": "db9a95024e9d",
|
||||
"level": 4,
|
||||
"level_cap_reason": "",
|
||||
"has_screenshot": True,
|
||||
"flags": {"clean_teardown": True, "no_secret_leak": True},
|
||||
"finished": 0,
|
||||
@ -40,7 +39,7 @@ def _row(**kw):
|
||||
|
||||
def test_level_color_ramp_and_fallback():
|
||||
assert dashboard.level_color(0) == "#e5534b"
|
||||
assert dashboard.level_color(6) == "#3fb950"
|
||||
assert dashboard.level_color(5) == "#3fb950" # full 5-rung climb (phase lvl5)
|
||||
assert dashboard.level_color(4) == "#a0b93f"
|
||||
assert dashboard.level_color(99) == "#8b949e"
|
||||
assert dashboard.level_color(None) == "#8b949e"
|
||||
@ -61,20 +60,12 @@ def test_overview_grid_mirrors_results():
|
||||
def test_overview_never_greener_than_data():
|
||||
# A failed run at level 0 must show level 0 + the failure pill — never a green/high level.
|
||||
out = dashboard.render_overview(
|
||||
[
|
||||
_row(
|
||||
status="failure",
|
||||
level=0,
|
||||
has_screenshot=False,
|
||||
flags={},
|
||||
level_cap_reason="L1 install FAILED",
|
||||
)
|
||||
]
|
||||
[_row(status="failure", level=0, has_screenshot=False, flags={})]
|
||||
)
|
||||
assert "level 0" in out
|
||||
assert dashboard.level_color(0) in out # red
|
||||
assert dashboard._COLORS["failure"] in out
|
||||
assert "level 4" not in out and "level 5" not in out and "level 6" not in out
|
||||
assert "level 4" not in out and "level 5" not in out
|
||||
assert "no screenshot" in out # placeholder, no broken image
|
||||
|
||||
|
||||
@ -104,7 +95,6 @@ def test_build_row_projects_results(monkeypatch):
|
||||
lambda n: {
|
||||
"version": "1.2.3",
|
||||
"level": 2,
|
||||
"level_cap_reason": "cap",
|
||||
"screenshot": "screenshot.png",
|
||||
"flags": {"clean_teardown": True},
|
||||
},
|
||||
@ -123,6 +113,38 @@ def test_build_row_projects_results(monkeypatch):
|
||||
assert r["url"].endswith("/cc-ci/7")
|
||||
|
||||
|
||||
def test_build_row_old_schema1_artifact_renders(monkeypatch):
|
||||
# History compatibility (phase lvl5): pre-lvl5 results.json still carries cap fields and a
|
||||
# 4-rung ladder — it must project + render without KeyError, level shown VERBATIM (no
|
||||
# retroactive relabeling), and the old cap text simply isn't resurfaced anywhere.
|
||||
monkeypatch.setattr(
|
||||
dashboard,
|
||||
"_results_for",
|
||||
lambda n: {
|
||||
"schema": 1,
|
||||
"version": "0.9.1",
|
||||
"level": 2,
|
||||
"level_cap_reason": "L3 backup/restore (data integrity) N/A",
|
||||
"level_cap_rung": "backup_restore",
|
||||
"screenshot": "screenshot.png",
|
||||
"flags": {"clean_teardown": True, "no_secret_leak": True},
|
||||
},
|
||||
)
|
||||
b = {
|
||||
"number": 11,
|
||||
"status": "success",
|
||||
"event": "custom",
|
||||
"params": {"RECIPE": "legacy", "REF": "abc123"},
|
||||
"finished": 5,
|
||||
}
|
||||
r = dashboard._build_row(b)
|
||||
out = dashboard.render_overview([r])
|
||||
assert "level 2" in out and dashboard.level_color(2) in out
|
||||
assert "N/A" not in out and "capped" not in out # cap language gone from the surface
|
||||
hist = dashboard.render_history("legacy", [r])
|
||||
assert "L2" in hist
|
||||
|
||||
|
||||
def test_build_row_degrades_without_results(monkeypatch):
|
||||
# No results.json (e.g. an old run): grid still renders from Drone fields, level absent.
|
||||
monkeypatch.setattr(dashboard, "_results_for", lambda n: {})
|
||||
|
||||
@ -1,8 +1,14 @@
|
||||
"""Unit tests for the Phase-3 level ladder (harness.level), plan-phase3-results-ux.md §4.1 / R1.
|
||||
"""Unit tests for the level ladder (harness.level) — phase lvl5 semantics.
|
||||
|
||||
Pure function — no I/O. Proves the YunoHost gap-caps-the-level semantics, including the U0 gate
|
||||
acceptance: a recipe that climbs through L4 reports 4, and one that fails at L2 is capped at 1
|
||||
(the level just below the failed rung). Run cold with: cc-ci-run -m pytest tests/unit/test_level.py -q
|
||||
Pure function — no I/O. Proves the operator-decided rule (plan-phase-lvl5-lint-rung.md,
|
||||
DECISIONS.md phase lvl5):
|
||||
|
||||
level = max i such that rung_i == "pass" and every rung j < i is "pass" or "skip"
|
||||
|
||||
— a real FAIL blocks, an UNVERIFIED rung blocks exactly like a fail, an INTENTIONAL skip is
|
||||
climbed past. Includes the mission's four worked examples verbatim, and the old N/A cases
|
||||
(single-published-version recipe, non-backup-capable recipe) now climbing past their former
|
||||
caps. Run cold with: cc-ci-run -m pytest tests/unit/test_level.py -q
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@ -19,69 +25,115 @@ def _rungs(
|
||||
upgrade="pass",
|
||||
backup_restore="pass",
|
||||
functional="pass",
|
||||
lint="pass",
|
||||
):
|
||||
return {
|
||||
"install": install,
|
||||
"upgrade": upgrade,
|
||||
"backup_restore": backup_restore,
|
||||
"functional": functional,
|
||||
"lint": lint,
|
||||
}
|
||||
|
||||
|
||||
# ---- the ladder: four essential rungs, top is L4 (functional) ----
|
||||
# ---- the ladder: five essential rungs, top is L5 (lint) ----
|
||||
|
||||
|
||||
def test_full_clean_climb_to_L4():
|
||||
# All four essential rungs pass → L4 (the top; integration/recipe-local are optional, not leveled).
|
||||
lvl, reason = L.compute_level(_rungs())
|
||||
assert lvl == 4
|
||||
assert reason == ""
|
||||
def test_full_clean_climb_is_L5():
|
||||
assert L.compute_level(_rungs()) == 5
|
||||
|
||||
|
||||
def test_fails_at_L2_capped_at_L1():
|
||||
# GATE: upgrade fails → capped at L1 even though higher rungs would pass.
|
||||
lvl, reason = L.compute_level(_rungs(upgrade="fail", backup_restore="pass", functional="pass"))
|
||||
assert lvl == 1
|
||||
assert "L2" in reason and "FAILED" in reason
|
||||
def test_ladder_is_five_rungs_lint_on_top():
|
||||
assert L.RUNGS == ("install", "upgrade", "backup_restore", "functional", "lint")
|
||||
assert "lint" in L.RUNG_LABEL[5]
|
||||
|
||||
|
||||
# ---- L0 / install ----
|
||||
# ---- mission worked examples (operator Q&A 2026-06-11, verbatim) ----
|
||||
|
||||
|
||||
def test_mission_example_fail_blocks():
|
||||
# install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → level 1 (fail blocks).
|
||||
assert L.compute_level(_rungs(upgrade="fail")) == 1
|
||||
|
||||
|
||||
def test_mission_example_intentional_skip_climbs():
|
||||
# install ✔, upgrade ✔, backup skip (not capable), functional ✔, lint ✔ → level 5
|
||||
# (previously capped at 2 — the confusing part the operator removed).
|
||||
assert L.compute_level(_rungs(backup_restore="skip")) == 5
|
||||
|
||||
|
||||
def test_mission_example_unverified_blocks():
|
||||
# install ✔, upgrade ✔, backup UNVER (harness error), functional ✔, lint ✔ → level 2
|
||||
# (we cannot claim what we didn't check).
|
||||
assert L.compute_level(_rungs(backup_restore="unver")) == 2
|
||||
|
||||
|
||||
def test_mission_example_unverified_top_rung_not_earned():
|
||||
# all four ✔, lint unver (abra missing) → level 4.
|
||||
assert L.compute_level(_rungs(lint="unver")) == 4
|
||||
|
||||
|
||||
# ---- blocking semantics ----
|
||||
|
||||
|
||||
def test_install_fail_is_L0():
|
||||
lvl, reason = L.compute_level(_rungs(install="fail"))
|
||||
assert lvl == 0
|
||||
assert "L1" in reason and "FAILED" in reason
|
||||
assert L.compute_level(_rungs(install="fail")) == 0
|
||||
|
||||
|
||||
# ---- gap-caps semantics: a higher pass can't rescue a lower gap ----
|
||||
def test_install_unver_is_L0():
|
||||
assert L.compute_level(_rungs(install="unver")) == 0
|
||||
|
||||
|
||||
def test_higher_pass_does_not_rescue_lower_na():
|
||||
# backup/restore N/A (stateless app) caps at L2 even though functional would pass.
|
||||
lvl, reason = L.compute_level(_rungs(backup_restore="na", functional="pass"))
|
||||
assert lvl == 2
|
||||
assert "L3" in reason and "N/A" in reason
|
||||
def test_higher_pass_never_rescues_a_fail():
|
||||
# everything above a failed rung is dead, however green.
|
||||
assert L.compute_level(_rungs(upgrade="fail", backup_restore="pass", functional="pass")) == 1
|
||||
|
||||
|
||||
def test_upgrade_na_caps_at_L1():
|
||||
# only one published version → no upgrade possible → N/A caps at L1 (upgrade is essential).
|
||||
lvl, reason = L.compute_level(_rungs(upgrade="na"))
|
||||
assert lvl == 1
|
||||
assert "L2" in reason and "N/A" in reason
|
||||
def test_lint_fail_blocks_at_4():
|
||||
assert L.compute_level(_rungs(lint="fail")) == 4
|
||||
|
||||
|
||||
def test_functional_na_caps_at_L3():
|
||||
# no recipe-specific functional tests → functional N/A caps at L3.
|
||||
lvl, reason = L.compute_level(_rungs(functional="na"))
|
||||
assert lvl == 3
|
||||
assert "L4" in reason and "N/A" in reason
|
||||
def test_unver_blocks_even_after_a_skip():
|
||||
# skip at L2 is climbed past, but the unver at L3 still blocks → level 1.
|
||||
assert L.compute_level(_rungs(upgrade="skip", backup_restore="unver")) == 1
|
||||
|
||||
|
||||
def test_functional_fail_caps_at_L3():
|
||||
lvl, reason = L.compute_level(_rungs(functional="fail"))
|
||||
assert lvl == 3
|
||||
assert "L4" in reason and "FAILED" in reason
|
||||
# ---- intentional-skip climbing (the de-cap) ----
|
||||
|
||||
|
||||
def test_single_version_recipe_climbs_past_upgrade_skip():
|
||||
# old rule: upgrade N/A capped at L1. New rule: skip is climbed past → full climb 5.
|
||||
assert L.compute_level(_rungs(upgrade="skip")) == 5
|
||||
|
||||
|
||||
def test_stateless_recipe_climbs_past_backup_skip_to_lint():
|
||||
assert L.compute_level(_rungs(upgrade="skip", backup_restore="skip")) == 5
|
||||
|
||||
|
||||
def test_skip_does_not_count_as_pass():
|
||||
# ALL skips → nothing passed → level 0 (a skip climbs, but never earns).
|
||||
assert (
|
||||
L.compute_level(
|
||||
_rungs(
|
||||
install="skip",
|
||||
upgrade="skip",
|
||||
backup_restore="skip",
|
||||
functional="skip",
|
||||
lint="skip",
|
||||
)
|
||||
)
|
||||
== 0
|
||||
)
|
||||
|
||||
|
||||
def test_skip_then_pass_earns_the_higher_rung():
|
||||
# skip at L4, pass at L5 → level 5 (the skip below doesn't stop the climb).
|
||||
assert L.compute_level(_rungs(functional="skip")) == 5
|
||||
|
||||
|
||||
def test_trailing_skip_keeps_last_pass():
|
||||
# passes up to L3, skips above → level stays 3 (skips never raise).
|
||||
assert L.compute_level(_rungs(functional="skip", lint="skip")) == 3
|
||||
|
||||
|
||||
# ---- input validation ----
|
||||
@ -89,7 +141,7 @@ def test_functional_fail_caps_at_L3():
|
||||
|
||||
def test_invalid_status_raises():
|
||||
bad = _rungs()
|
||||
bad["functional"] = "passed" # not in the vocabulary
|
||||
bad["functional"] = "na" # the OLD vocabulary is no longer valid — every N/A is classified
|
||||
try:
|
||||
L.compute_level(bad)
|
||||
except ValueError:
|
||||
@ -97,6 +149,16 @@ def test_invalid_status_raises():
|
||||
raise AssertionError("expected ValueError on invalid rung status")
|
||||
|
||||
|
||||
def test_missing_rung_raises():
|
||||
bad = _rungs()
|
||||
del bad["lint"]
|
||||
try:
|
||||
L.compute_level(bad)
|
||||
except ValueError:
|
||||
return
|
||||
raise AssertionError("expected ValueError on missing rung")
|
||||
|
||||
|
||||
# ---- helpers: backup_restore_status ----
|
||||
|
||||
|
||||
@ -104,8 +166,8 @@ def test_backup_restore_status_pass():
|
||||
assert L.backup_restore_status("pass", "pass", True) == "pass"
|
||||
|
||||
|
||||
def test_backup_restore_status_not_capable_is_na():
|
||||
assert L.backup_restore_status("skip", "skip", False) == "na"
|
||||
def test_backup_restore_status_not_capable_is_intentional_skip():
|
||||
assert L.backup_restore_status("skip", "skip", False) == "skip"
|
||||
|
||||
|
||||
def test_backup_restore_status_fail_on_either():
|
||||
@ -113,16 +175,20 @@ def test_backup_restore_status_fail_on_either():
|
||||
assert L.backup_restore_status("fail", "pass", True) == "fail"
|
||||
|
||||
|
||||
def test_backup_restore_partial_is_na():
|
||||
# backup-capable but restore didn't run cleanly (not pass, not fail) → cannot claim L3
|
||||
assert L.backup_restore_status("pass", "skip", True) == "na"
|
||||
def test_backup_restore_partial_is_unverified():
|
||||
# backup-capable but restore didn't run cleanly (not pass, not fail) → cannot claim L3,
|
||||
# and the non-run is NOT intentional → unver (blocks the level above it).
|
||||
assert L.backup_restore_status("pass", "skip", True) == "unver"
|
||||
assert L.backup_restore_status(None, None, True) == "unver"
|
||||
|
||||
|
||||
# ---- helpers: tier_to_rung ----
|
||||
|
||||
|
||||
def test_tier_to_rung_mapping():
|
||||
def test_tier_to_rung_mapping_defaults_unverified():
|
||||
assert L.tier_to_rung("pass") == "pass"
|
||||
assert L.tier_to_rung("fail") == "fail"
|
||||
assert L.tier_to_rung("skip") == "na"
|
||||
assert L.tier_to_rung(None) == "na"
|
||||
# no intentionality information here — a non-run is unver; derive_rungs upgrades to "skip"
|
||||
# only on a declared/structural fact, never the other way.
|
||||
assert L.tier_to_rung("skip") == "unver"
|
||||
assert L.tier_to_rung(None) == "unver"
|
||||
|
||||
196
tests/unit/test_lint.py
Normal file
196
tests/unit/test_lint.py
Normal file
@ -0,0 +1,196 @@
|
||||
"""Unit tests for the L5 lint executor (harness.lint) — phase lvl5.
|
||||
|
||||
Covers the table parser + classifier against real abra-0.13 output shapes (probed on the CI
|
||||
host 2026-06-11, JOURNAL-lvl5), and run_lint's never-raise / never-silent-pass guarantees via
|
||||
a fake-PATH `script` shim (no real abra needed). Run cold:
|
||||
cc-ci-run -m pytest tests/unit/test_lint.py -q
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import stat
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||
from harness import lint as L # noqa: E402
|
||||
|
||||
# Realistic abra lint table rows, as captured on cc-ci: abra renders HEAVY box-drawing
|
||||
# verticals (┃ U+2503) — the parser must match those, not just the light │.
|
||||
TABLE_OK = (
|
||||
"┏━━━━━━┳━━━━━━┓\r\n"
|
||||
"┃ R001 ┃ compose config has expected version ┃ warn ┃ ✅ ┃ - ┃ ensure ┃\r\n"
|
||||
"┃ R015 ┃ long secret names ┃ warn ┃ ❌ ┃ - ┃ reduce ┃\r\n"
|
||||
"┃ R008 ┃ .env.sample provided ┃ error ┃ ✅ ┃ - ┃ create ┃\r\n"
|
||||
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ✅ ┃ - ┃ retag ┃\r\n"
|
||||
"┗━━━━━━┻━━━━━━┛\r\n"
|
||||
"WARN secret session_secret is longer than 12 characters\r\n"
|
||||
)
|
||||
|
||||
# The light-vertical variant must parse identically (defensive: abra theme/version drift).
|
||||
TABLE_OK_LIGHT = TABLE_OK.replace("┃", "│")
|
||||
|
||||
TABLE_R014_FAIL = (
|
||||
TABLE_OK.replace(
|
||||
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ✅",
|
||||
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ❌",
|
||||
)
|
||||
+ "WARN critical errors present in hedgedoc config\r\n"
|
||||
)
|
||||
|
||||
TABLE_SKIPPED_ERROR = TABLE_OK.replace(
|
||||
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ✅ ┃ - ┃",
|
||||
"┃ R014 ┃ only annotated tags used for recipe version ┃ error ┃ ❌ ┃ skipped ┃",
|
||||
)
|
||||
|
||||
|
||||
# ---- parse_table ----
|
||||
|
||||
|
||||
def test_parse_table_rows_and_marks():
|
||||
rows = L.parse_table(TABLE_OK)
|
||||
by = {r["rule"]: r for r in rows}
|
||||
assert set(by) == {"R001", "R015", "R008", "R014"}
|
||||
assert by["R001"]["severity"] == "warn" and by["R001"]["satisfied"]
|
||||
assert by["R015"]["severity"] == "warn" and not by["R015"]["satisfied"]
|
||||
assert by["R014"]["severity"] == "error" and by["R014"]["satisfied"]
|
||||
assert not any(r["skipped"] for r in rows)
|
||||
|
||||
|
||||
def test_parse_table_strips_ansi():
|
||||
rows = L.parse_table("\x1b[1m" + TABLE_OK + "\x1b[0m")
|
||||
assert len(rows) == 4
|
||||
|
||||
|
||||
def test_parse_table_light_verticals_too():
|
||||
assert L.parse_table(TABLE_OK_LIGHT) == L.parse_table(TABLE_OK)
|
||||
|
||||
|
||||
def test_parse_table_garbage_is_empty():
|
||||
assert L.parse_table("FATA something exploded\r\n") == []
|
||||
assert L.parse_table("") == []
|
||||
|
||||
|
||||
# ---- classify ----
|
||||
|
||||
|
||||
def test_classify_pass_with_warn_misses_only():
|
||||
# warn-severity ❌ (R015) does NOT fail the rung — only error-severity rules do.
|
||||
assert L.classify(0, TABLE_OK) == ("pass", "", [])
|
||||
|
||||
|
||||
def test_classify_error_rule_fails():
|
||||
status, detail, failed = L.classify(0, TABLE_R014_FAIL)
|
||||
assert status == "fail"
|
||||
assert failed == ["R014"]
|
||||
assert "R014" in detail
|
||||
|
||||
|
||||
def test_classify_skipped_error_rule_does_not_fail_but_sentinel_guards():
|
||||
# a skipped error rule isn't counted as failed by the parser, but abra's own sentinel line
|
||||
# (if present) still forces fail — the classifier never out-greens abra.
|
||||
status, _, failed = L.classify(0, TABLE_SKIPPED_ERROR)
|
||||
assert failed == []
|
||||
assert status == "pass"
|
||||
status2, detail2, _ = L.classify(
|
||||
0, TABLE_SKIPPED_ERROR + "WARN critical errors present in x config\r\n"
|
||||
)
|
||||
assert status2 == "fail"
|
||||
assert "critical errors" in detail2
|
||||
|
||||
|
||||
def test_classify_rc0_without_table_is_unver():
|
||||
# rc=0 but nothing parseable → cannot claim pass.
|
||||
assert L.classify(0, "weird output")[0] == "unver"
|
||||
|
||||
|
||||
def test_classify_content_fata_is_fail():
|
||||
out = "FATA unable to validate recipe: .env.sample for x couldn't be read\r\n"
|
||||
status, detail, _ = L.classify(1, out)
|
||||
assert status == "fail"
|
||||
assert "unable to validate recipe" in detail
|
||||
|
||||
|
||||
def test_classify_environment_fata_is_unver():
|
||||
out = "FATA unable to fetch tags in /x: repository not found: Not found.\r\n"
|
||||
status, detail, _ = L.classify(1, out)
|
||||
assert status == "unver"
|
||||
assert "fetch tags" in detail
|
||||
|
||||
|
||||
def test_classify_did_not_run_is_unver():
|
||||
assert L.classify(None, "")[0] == "unver"
|
||||
|
||||
|
||||
# ---- run_lint: never raises, never silently passes ----
|
||||
|
||||
|
||||
def _mkrecipe(tmp_path):
|
||||
repo = tmp_path / "abra" / "recipes" / "fakerec"
|
||||
repo.mkdir(parents=True)
|
||||
(repo / "compose.yml").write_text("version: '3.8'\n")
|
||||
for cmd in (
|
||||
["git", "init", "-q"],
|
||||
["git", "add", "."],
|
||||
["git", "-c", "user.email=t@t", "-c", "user.name=t", "commit", "-qm", "x"],
|
||||
):
|
||||
subprocess.run(cmd, cwd=repo, check=True)
|
||||
return repo
|
||||
|
||||
|
||||
def _shim(tmp_path, body):
|
||||
"""Drop a fake `script` executable on PATH (run_lint invokes `script -qec "abra ..."`)."""
|
||||
bindir = tmp_path / "bin"
|
||||
bindir.mkdir(exist_ok=True)
|
||||
sh = bindir / "script"
|
||||
sh.write_text("#!/bin/sh\n" + body)
|
||||
sh.chmod(sh.stat().st_mode | stat.S_IEXEC)
|
||||
return str(bindir)
|
||||
|
||||
|
||||
def test_run_lint_pass_via_shim(tmp_path, monkeypatch):
|
||||
_mkrecipe(tmp_path)
|
||||
monkeypatch.setenv("ABRA_DIR", str(tmp_path / "abra"))
|
||||
out = TABLE_OK.replace("\r\n", "\\n")
|
||||
monkeypatch.setenv(
|
||||
"PATH", _shim(tmp_path, f'printf "{out}"\nexit 0\n') + os.pathsep + os.environ["PATH"]
|
||||
)
|
||||
res = L.run_lint("fakerec", None, str(tmp_path / "artifacts"))
|
||||
assert res["status"] == "pass"
|
||||
txt = (tmp_path / "artifacts" / "lint.txt").read_text()
|
||||
assert "abra recipe lint -n fakerec" in txt and "R001" in txt
|
||||
|
||||
|
||||
def test_run_lint_fail_via_shim(tmp_path, monkeypatch):
|
||||
_mkrecipe(tmp_path)
|
||||
monkeypatch.setenv("ABRA_DIR", str(tmp_path / "abra"))
|
||||
out = TABLE_R014_FAIL.replace("\r\n", "\\n")
|
||||
monkeypatch.setenv(
|
||||
"PATH", _shim(tmp_path, f'printf "{out}"\nexit 0\n') + os.pathsep + os.environ["PATH"]
|
||||
)
|
||||
res = L.run_lint("fakerec", None, str(tmp_path / "artifacts"))
|
||||
assert res["status"] == "fail"
|
||||
assert res["rules_failed"] == ["R014"]
|
||||
|
||||
|
||||
def test_run_lint_missing_recipe_is_unver_not_raise(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ABRA_DIR", str(tmp_path / "abra-none"))
|
||||
res = L.run_lint("no-such-recipe", None, str(tmp_path / "artifacts"))
|
||||
assert res["status"] == "unver"
|
||||
assert res["detail"]
|
||||
# lint.txt still written with the failure context (loud, never silent)
|
||||
assert (tmp_path / "artifacts" / "lint.txt").exists()
|
||||
|
||||
|
||||
def test_run_lint_abra_blowup_is_unver(tmp_path, monkeypatch):
|
||||
_mkrecipe(tmp_path)
|
||||
monkeypatch.setenv("ABRA_DIR", str(tmp_path / "abra"))
|
||||
monkeypatch.setenv(
|
||||
"PATH",
|
||||
_shim(tmp_path, 'echo "FATA inappropriate ioctl for device"\nexit 1\n')
|
||||
+ os.pathsep
|
||||
+ os.environ["PATH"],
|
||||
)
|
||||
res = L.run_lint("fakerec", None, None)
|
||||
assert res["status"] == "unver"
|
||||
@ -1,7 +1,8 @@
|
||||
"""Unit tests for Phase-3 results assembly (harness.results), plan-phase3-results-ux.md §4.2 / R1/R3.
|
||||
"""Unit tests for results assembly (harness.results) — phase lvl5 semantics.
|
||||
|
||||
Covers JUnit parsing, stage roll-up, the tier→rung derivation (the documented mapping the level
|
||||
depends on), and full results.json assembly incl. the U0 gate cases. Pure / tmp-file only. Run cold:
|
||||
Covers JUnit parsing, stage roll-up, the tier→rung derivation (the SINGLE place every N/A source
|
||||
is classified intentional-skip vs unverified — the table in DECISIONS.md phase lvl5), the L5 lint
|
||||
rung wiring, and full results.json assembly. Pure / tmp-file only. Run cold:
|
||||
cc-ci-run -m pytest tests/unit/test_results.py -q
|
||||
"""
|
||||
|
||||
@ -27,6 +28,8 @@ JUNIT_MIXED = """<?xml version="1.0"?>
|
||||
<testcase classname="tests.y" name="test_skipped" time="0"><skipped message="no deps"/></testcase>
|
||||
</testsuite></testsuites>"""
|
||||
|
||||
LINT_PASS = {"status": "pass", "detail": "", "rules_failed": []}
|
||||
|
||||
|
||||
def _write(tmp_path, name, content):
|
||||
p = tmp_path / name
|
||||
@ -90,7 +93,7 @@ def test_collect_stages_synthesizes_when_no_junit():
|
||||
assert len(stages[0]["tests"]) == 1
|
||||
|
||||
|
||||
# ---- derive_rungs: the documented mapping ----
|
||||
# ---- derive_rungs: the documented N/A-classification mapping (DECISIONS.md phase lvl5) ----
|
||||
|
||||
|
||||
def _results(**kw):
|
||||
@ -105,34 +108,113 @@ def _results(**kw):
|
||||
return base
|
||||
|
||||
|
||||
def test_derive_rungs_full_climb_four_essential():
|
||||
rungs = R.derive_rungs(_results(), backup_capable=True, has_custom=True)
|
||||
# only the four essential rungs — integration/recipe-local are optional, not produced here.
|
||||
def test_derive_rungs_full_climb_five_rungs():
|
||||
rungs = R.derive_rungs(
|
||||
_results(), backup_capable=True, has_upgrade_target=True, lint_status="pass"
|
||||
)
|
||||
# the five essential rungs — integration/recipe-local are optional, not produced here.
|
||||
assert rungs == {
|
||||
"install": "pass",
|
||||
"upgrade": "pass",
|
||||
"backup_restore": "pass",
|
||||
"functional": "pass",
|
||||
"lint": "pass",
|
||||
}
|
||||
|
||||
|
||||
def test_derive_rungs_stateless_backup_and_functional_na():
|
||||
def test_derive_rungs_structural_skips_are_intentional():
|
||||
# single published version (tier skipped, no upgrade target) + not backup-capable →
|
||||
# both rungs are INTENTIONAL skips, not unverified.
|
||||
rungs = R.derive_rungs(
|
||||
_results(backup="skip", restore="skip", custom="skip"),
|
||||
_results(upgrade="skip", backup="skip", restore="skip"),
|
||||
backup_capable=False,
|
||||
has_custom=False,
|
||||
has_upgrade_target=False,
|
||||
lint_status="pass",
|
||||
)
|
||||
assert rungs["backup_restore"] == "na"
|
||||
assert rungs["functional"] == "na"
|
||||
assert rungs["upgrade"] == "skip"
|
||||
assert rungs["backup_restore"] == "skip"
|
||||
assert "integration" not in rungs and "recipe_local" not in rungs
|
||||
|
||||
|
||||
def test_derive_rungs_functional_fail():
|
||||
rungs = R.derive_rungs(_results(custom="fail"), backup_capable=True, has_custom=True)
|
||||
def test_derive_rungs_upgrade_skip_with_target_is_unverified():
|
||||
# the tier skipped although an upgrade target exists (e.g. install failed → downstream
|
||||
# skipped): NOT structural → unver.
|
||||
rungs = R.derive_rungs(
|
||||
_results(install="fail", upgrade="skip", backup="skip", restore="skip", custom="skip"),
|
||||
backup_capable=True,
|
||||
has_upgrade_target=True,
|
||||
lint_status="pass",
|
||||
)
|
||||
assert rungs["install"] == "fail"
|
||||
assert rungs["upgrade"] == "unver"
|
||||
assert rungs["backup_restore"] == "unver"
|
||||
assert rungs["functional"] == "unver"
|
||||
|
||||
|
||||
def test_derive_rungs_missing_tier_is_unverified():
|
||||
# a tier excluded from the run entirely (dev CCCI_STAGES escape) → no result key → unver,
|
||||
# never an intentional skip (the recipe didn't declare anything).
|
||||
res = {"install": "pass"}
|
||||
rungs = R.derive_rungs(res, backup_capable=True, has_upgrade_target=True, lint_status="pass")
|
||||
assert rungs["upgrade"] == "unver"
|
||||
assert rungs["backup_restore"] == "unver"
|
||||
assert rungs["functional"] == "unver"
|
||||
|
||||
|
||||
def test_derive_rungs_expected_na_declares_intentional():
|
||||
# EXPECTED_NA turns a non-run rung into an intentional skip (declared source).
|
||||
rungs = R.derive_rungs(
|
||||
_results(custom="skip"),
|
||||
backup_capable=True,
|
||||
has_upgrade_target=True,
|
||||
expected_na={"functional": "no functional surface"},
|
||||
lint_status="pass",
|
||||
)
|
||||
assert rungs["functional"] == "skip"
|
||||
|
||||
|
||||
def test_derive_rungs_no_custom_tests_defaults_unverified():
|
||||
# absent functional coverage with NO declaration is a gap → unver (conservative default).
|
||||
rungs = R.derive_rungs(
|
||||
_results(custom="skip"), backup_capable=True, has_upgrade_target=True, lint_status="pass"
|
||||
)
|
||||
assert rungs["functional"] == "unver"
|
||||
|
||||
|
||||
def test_derive_rungs_expected_na_never_overrides_a_real_result():
|
||||
# a declaration cannot soften an exercised rung: fail stays fail.
|
||||
rungs = R.derive_rungs(
|
||||
_results(custom="fail"),
|
||||
backup_capable=True,
|
||||
has_upgrade_target=True,
|
||||
expected_na={"functional": "declared"},
|
||||
lint_status="pass",
|
||||
)
|
||||
assert rungs["functional"] == "fail"
|
||||
|
||||
|
||||
# ---- build_results: end-to-end incl level + flags ----
|
||||
def test_derive_rungs_lint_never_skips():
|
||||
# lint has NO intentional-skip escape hatch: pass/fail from the executor, anything else
|
||||
# (None, "unver", junk) → unver — even if a recipe tries to declare it away.
|
||||
for status, want in (("pass", "pass"), ("fail", "fail"), ("unver", "unver"), (None, "unver")):
|
||||
rungs = R.derive_rungs(
|
||||
_results(),
|
||||
backup_capable=True,
|
||||
has_upgrade_target=True,
|
||||
expected_na={"lint": "nope"},
|
||||
lint_status=status,
|
||||
)
|
||||
assert rungs["lint"] == want, status
|
||||
|
||||
|
||||
def test_derive_rungs_functional_fail():
|
||||
rungs = R.derive_rungs(
|
||||
_results(custom="fail"), backup_capable=True, has_upgrade_target=True, lint_status="pass"
|
||||
)
|
||||
assert rungs["functional"] == "fail"
|
||||
|
||||
|
||||
# ---- build_results: end-to-end incl level + lint + flags ----
|
||||
|
||||
|
||||
def test_build_results_level_and_flags(tmp_path):
|
||||
@ -163,17 +245,75 @@ def test_build_results_level_and_flags(tmp_path):
|
||||
clean_teardown=True,
|
||||
no_secret_leak=True,
|
||||
finished_ts=1234.0,
|
||||
lint=LINT_PASS,
|
||||
)
|
||||
# all four essential rungs pass → full climb to L4 (the top), no cap
|
||||
assert data["level"] == 4
|
||||
assert data["level_cap_reason"] == ""
|
||||
# all five essential rungs pass → full climb to L5; no cap concept anywhere.
|
||||
assert data["schema"] == 2
|
||||
assert data["level"] == 5
|
||||
assert "level_cap_reason" not in data and "level_cap_rung" not in data
|
||||
assert data["recipe"] == "hedgedoc"
|
||||
assert data["ref"] == "deadbeefcafe"
|
||||
assert data["flags"] == {"clean_teardown": True, "no_secret_leak": True}
|
||||
assert [s["name"] for s in data["stages"]] == ["install", "custom"]
|
||||
# lint appears as a synthetic stage so the card's table carries all five rungs.
|
||||
assert [s["name"] for s in data["stages"]] == ["install", "custom", "lint"]
|
||||
assert data["lint"] == {"status": "pass", "detail": "", "rules_failed": []}
|
||||
|
||||
|
||||
def test_build_results_capped_at_L1_on_upgrade_fail(tmp_path):
|
||||
def test_build_results_lint_fail_blocks_at_4(tmp_path):
|
||||
recs = [
|
||||
{
|
||||
"tier": "install",
|
||||
"source": "generic",
|
||||
"file": "g/test_install.py",
|
||||
"rc": 0,
|
||||
"junit": _write(tmp_path, "i.xml", JUNIT_PASS),
|
||||
}
|
||||
]
|
||||
data = R.build_results(
|
||||
recipe="x",
|
||||
version=None,
|
||||
pr="0",
|
||||
ref=None,
|
||||
records=recs,
|
||||
results=_results(),
|
||||
backup_capable=True,
|
||||
clean_teardown=True,
|
||||
no_secret_leak=True,
|
||||
finished_ts=0.0,
|
||||
lint={
|
||||
"status": "fail",
|
||||
"detail": "error rule(s) unsatisfied: R014",
|
||||
"rules_failed": ["R014"],
|
||||
},
|
||||
)
|
||||
assert data["level"] == 4
|
||||
assert data["rungs"]["lint"] == "fail"
|
||||
assert data["lint"]["rules_failed"] == ["R014"]
|
||||
lint_stage = [s for s in data["stages"] if s["name"] == "lint"][0]
|
||||
assert lint_stage["status"] == "fail"
|
||||
assert "R014" in lint_stage["tests"][0]["message"]
|
||||
|
||||
|
||||
def test_build_results_no_lint_given_is_unverified_never_pass(tmp_path):
|
||||
# an old/lint-less caller must NEVER get a free L5: the rung derives as unver → level 4 max.
|
||||
data = R.build_results(
|
||||
recipe="x",
|
||||
version=None,
|
||||
pr="0",
|
||||
ref=None,
|
||||
records=[],
|
||||
results=_results(),
|
||||
backup_capable=True,
|
||||
clean_teardown=True,
|
||||
no_secret_leak=True,
|
||||
finished_ts=0.0,
|
||||
)
|
||||
assert data["rungs"]["lint"] == "unver"
|
||||
assert data["level"] == 4
|
||||
assert "lint" in data["skips"]["unintentional"]
|
||||
|
||||
|
||||
def test_build_results_level1_on_upgrade_fail(tmp_path):
|
||||
recs = [
|
||||
{
|
||||
"tier": "install",
|
||||
@ -194,12 +334,13 @@ def test_build_results_capped_at_L1_on_upgrade_fail(tmp_path):
|
||||
clean_teardown=True,
|
||||
no_secret_leak=True,
|
||||
finished_ts=0.0,
|
||||
lint=LINT_PASS,
|
||||
)
|
||||
assert data["level"] == 1
|
||||
assert "L2" in data["level_cap_reason"]
|
||||
assert data["rungs"]["upgrade"] == "fail"
|
||||
|
||||
|
||||
# ---- skips: intentional (declared) vs unintentional (everything else skipped) ----
|
||||
# ---- skips: intentional (declared/structural, with reason) vs unintentional (= unver) ----
|
||||
|
||||
|
||||
def _rungs(**kw):
|
||||
@ -208,24 +349,26 @@ def _rungs(**kw):
|
||||
"upgrade": "pass",
|
||||
"backup_restore": "pass",
|
||||
"functional": "pass",
|
||||
"lint": "pass",
|
||||
}
|
||||
base.update(kw)
|
||||
return base
|
||||
|
||||
|
||||
def test_skips_intentional_vs_unintentional():
|
||||
rungs = _rungs(backup_restore="na", functional="na")
|
||||
def test_skips_declared_reason_and_unverified_split():
|
||||
rungs = _rungs(backup_restore="skip", functional="unver")
|
||||
sk = R.skips(rungs, {"backup_restore": "stateless static server"})
|
||||
# backup_restore is declared (intentional, with reason); functional skipped but not declared.
|
||||
assert sk["intentional"] == {"backup_restore": "stateless static server"}
|
||||
assert sk["unintentional"] == ["functional"]
|
||||
|
||||
|
||||
def test_skips_none_declared_all_unintentional():
|
||||
rungs = _rungs(backup_restore="na")
|
||||
def test_skips_structural_reason_when_undeclared():
|
||||
# a structural skip (derive_rungs) carries its structural reason even without EXPECTED_NA.
|
||||
rungs = _rungs(upgrade="skip", backup_restore="skip")
|
||||
sk = R.skips(rungs, None)
|
||||
assert sk["intentional"] == {}
|
||||
assert sk["unintentional"] == ["backup_restore"]
|
||||
assert "only one published version" in sk["intentional"]["upgrade"]
|
||||
assert "not backup-capable" in sk["intentional"]["backup_restore"]
|
||||
assert sk["unintentional"] == []
|
||||
|
||||
|
||||
def test_skips_declaration_only_counts_when_actually_skipped():
|
||||
@ -236,9 +379,9 @@ def test_skips_declaration_only_counts_when_actually_skipped():
|
||||
assert "backup_restore" not in sk["unintentional"]
|
||||
|
||||
|
||||
def test_build_results_threads_expected_na(tmp_path):
|
||||
# Mirrors custom-html-tiny post-change: install + a passing functional (custom) test, but no
|
||||
# backup surface (backup_restore declared intentionally skipped).
|
||||
def test_build_results_stateless_recipe_climbs(tmp_path):
|
||||
# custom-html-tiny shape: no backup surface (declared), single published version, passing
|
||||
# functional — formerly capped at L2 by the N/A; now climbs to L5 (the de-cap, mission §2).
|
||||
recs = [
|
||||
{
|
||||
"tier": "install",
|
||||
@ -261,23 +404,47 @@ def test_build_results_threads_expected_na(tmp_path):
|
||||
pr="0",
|
||||
ref=None,
|
||||
records=recs,
|
||||
results=_results(backup="skip", restore="skip"), # custom=pass (default) → functional pass
|
||||
backup_capable=False, # no backupbot label → backup_restore skipped (N/A)
|
||||
results=_results(upgrade="skip", backup="skip", restore="skip"),
|
||||
backup_capable=False, # no backupbot label → structural intentional skip
|
||||
has_upgrade_target=False, # single published version → structural intentional skip
|
||||
clean_teardown=True,
|
||||
no_secret_leak=True,
|
||||
finished_ts=0.0,
|
||||
lint=LINT_PASS,
|
||||
expected_na={"backup_restore": "stateless static file server"},
|
||||
)
|
||||
# backup_restore skip still caps at L2 (never inflates) — even though functional passes above it,
|
||||
# the skip caps the climb — but it's the declared (intentional) rung that capped.
|
||||
assert data["level"] == 2
|
||||
assert "L3" in data["level_cap_reason"]
|
||||
assert data["level_cap_rung"] == "backup_restore"
|
||||
assert data["rungs"]["functional"] == "pass"
|
||||
assert data["level"] == 5 # skips are climbed past; nothing was inflated to get here
|
||||
assert data["rungs"] == {
|
||||
"install": "pass",
|
||||
"upgrade": "skip",
|
||||
"backup_restore": "skip",
|
||||
"functional": "pass",
|
||||
"lint": "pass",
|
||||
}
|
||||
assert data["skips"]["intentional"]["backup_restore"] == "stateless static file server"
|
||||
assert (
|
||||
data["skips"]["unintentional"] == []
|
||||
) # backup_restore declared; functional passed → clean
|
||||
assert "only one published version" in data["skips"]["intentional"]["upgrade"]
|
||||
assert data["skips"]["unintentional"] == []
|
||||
|
||||
|
||||
def test_build_results_unverified_backup_blocks(tmp_path):
|
||||
# synthesized tier abort: backup-capable but the tiers never produced a result → unver → the
|
||||
# level stays below the unverified rung (mission worked example #3).
|
||||
data = R.build_results(
|
||||
recipe="x",
|
||||
version=None,
|
||||
pr="0",
|
||||
ref=None,
|
||||
records=[],
|
||||
results=_results(backup="skip", restore="skip"),
|
||||
backup_capable=True,
|
||||
clean_teardown=True,
|
||||
no_secret_leak=True,
|
||||
finished_ts=0.0,
|
||||
lint=LINT_PASS,
|
||||
)
|
||||
assert data["rungs"]["backup_restore"] == "unver"
|
||||
assert data["level"] == 2
|
||||
assert data["skips"]["unintentional"] == ["backup_restore"]
|
||||
|
||||
|
||||
def test_build_results_threads_customization(tmp_path):
|
||||
@ -310,6 +477,7 @@ def test_build_results_threads_customization(tmp_path):
|
||||
"clean_teardown": True,
|
||||
"no_secret_leak": True,
|
||||
"finished_ts": 0.0,
|
||||
"lint": LINT_PASS,
|
||||
}
|
||||
assert R.build_results(**kwargs, customization=cust)["customization"] == cust
|
||||
assert R.build_results(**kwargs)["customization"] is None
|
||||
|
||||
@ -32,6 +32,144 @@ def test_hook_returned_when_callable():
|
||||
assert S._load_screenshot_hook({"SCREENSHOT": hook}) is hook
|
||||
|
||||
|
||||
class _FakePage:
|
||||
"""Minimal Playwright-page stand-in for the settle/blank-retry helpers (no browser needed)."""
|
||||
|
||||
def __init__(self, shot_sizes, idle_raises=False):
|
||||
self._shot_sizes = list(shot_sizes) # bytes written per successive screenshot() call
|
||||
self._idle_raises = idle_raises
|
||||
self.idle_waits = [] # (state, timeout) per wait_for_load_state call
|
||||
self.timeout_waits = [] # ms per wait_for_timeout call
|
||||
self.shots = 0
|
||||
|
||||
def wait_for_load_state(self, state, timeout=None):
|
||||
self.idle_waits.append((state, timeout))
|
||||
if self._idle_raises:
|
||||
raise TimeoutError(f"page kept polling past {timeout}ms")
|
||||
|
||||
def wait_for_timeout(self, ms):
|
||||
self.timeout_waits.append(ms)
|
||||
|
||||
def screenshot(self, path, full_page=False):
|
||||
self.shots += 1
|
||||
with open(path, "wb") as f:
|
||||
f.write(b"\x89PNG" + b"\0" * (self._shot_sizes.pop(0) - 4))
|
||||
|
||||
|
||||
def test_settle_swallows_never_idle_pages():
|
||||
"""R7: an app that never reaches network-idle (continuous polling) must not raise — the
|
||||
timeout cap IS the wait."""
|
||||
page = _FakePage([], idle_raises=True)
|
||||
S._settle(page, 1234) # must not raise
|
||||
assert page.idle_waits == [("networkidle", 1234)]
|
||||
assert page.timeout_waits == [S.RENDER_GRACE_MS]
|
||||
|
||||
|
||||
def test_snap_retries_blank_frame(tmp_path):
|
||||
"""A blank-sized first frame (audit fingerprint: 4801 B) triggers exactly one retry with a
|
||||
longer settle, overwriting the tiny frame with the later (painted) one."""
|
||||
out = str(tmp_path / "shot.png")
|
||||
page = _FakePage([4801, 30256])
|
||||
S._snap_with_blank_retry(page, out)
|
||||
assert page.shots == 2
|
||||
assert page.idle_waits == [("networkidle", S.BLANK_RETRY_SETTLE_MS)]
|
||||
assert os.path.getsize(out) == 30256
|
||||
|
||||
|
||||
def test_snap_no_retry_for_real_frame(tmp_path):
|
||||
"""A real-sized first frame is kept as-is — no second screenshot, no extra waiting."""
|
||||
out = str(tmp_path / "shot.png")
|
||||
page = _FakePage([35707])
|
||||
S._snap_with_blank_retry(page, out)
|
||||
assert page.shots == 1
|
||||
assert page.idle_waits == []
|
||||
assert os.path.getsize(out) == 35707
|
||||
|
||||
|
||||
def test_snap_retry_keeps_late_frame_even_if_still_blank(tmp_path):
|
||||
"""If the retry frame is still tiny we keep it (honest best-effort) — exactly one retry,
|
||||
never a loop."""
|
||||
out = str(tmp_path / "shot.png")
|
||||
page = _FakePage([4801, 4801])
|
||||
S._snap_with_blank_retry(page, out)
|
||||
assert page.shots == 2
|
||||
assert os.path.getsize(out) == 4801
|
||||
assert not os.path.exists(out + ".retry"), "temp retry frame must be cleaned up"
|
||||
|
||||
|
||||
def test_snap_retry_never_regresses_to_smaller_frame(tmp_path):
|
||||
"""Adversary finding A1: a partial-but-real first frame (just under the threshold) must
|
||||
survive a retry that comes back WORSE (page regressed to blank during the extra settle) —
|
||||
the larger frame wins."""
|
||||
out = str(tmp_path / "shot.png")
|
||||
page = _FakePage([9999, 4801])
|
||||
S._snap_with_blank_retry(page, out)
|
||||
assert page.shots == 2
|
||||
assert os.path.getsize(out) == 9999, "retry must never overwrite a larger frame (A1)"
|
||||
assert not os.path.exists(out + ".retry"), "temp retry frame must be cleaned up"
|
||||
|
||||
|
||||
def test_blank_threshold_brackets_observed_sizes():
|
||||
"""Threshold sits between the audited defect sizes (blank 4801-2 B, lone spinners up to
|
||||
8764 B) and the smallest real page (custom-html-tiny, 12950 B)."""
|
||||
for defect in (4801, 4802, 5895, 6022, 7913, 8764):
|
||||
assert defect < S.BLANK_SIZE_BYTES
|
||||
assert S.BLANK_SIZE_BYTES < 12950
|
||||
|
||||
|
||||
def test_wait_budget_within_step_cap():
|
||||
"""plan-phase-shot §3 P3: the screenshot step's bounded waiting must stay ≤ ~60s worst case."""
|
||||
total_ms = (
|
||||
S.NAV_DEADLINE_S * 1000
|
||||
+ S.SETTLE_TIMEOUT_MS
|
||||
+ S.RENDER_GRACE_MS
|
||||
+ S.BLANK_RETRY_SETTLE_MS
|
||||
+ S.RENDER_GRACE_MS
|
||||
)
|
||||
assert total_ms <= 60_000, f"screenshot wait budget {total_ms}ms exceeds the ~60s step cap"
|
||||
|
||||
|
||||
def test_mattermost_screenshot_hook_lands_login():
|
||||
"""phase-shot: mattermost-lts ships the first real SCREENSHOT hook — `/` serves the
|
||||
desktop-or-browser interstitial, so the hook must navigate to /login (the representative,
|
||||
credential-free sign-in form) and settle; the harness then snaps the PNG."""
|
||||
|
||||
class _Resp:
|
||||
status = 200
|
||||
|
||||
class _NavPage(_FakePage):
|
||||
def __init__(self, click_raises=False):
|
||||
super().__init__([])
|
||||
self.urls = []
|
||||
self.clicks = []
|
||||
self._click_raises = click_raises
|
||||
|
||||
def goto(self, url, wait_until=None, timeout=None):
|
||||
self.urls.append(url)
|
||||
return _Resp()
|
||||
|
||||
def click(self, selector, timeout=None):
|
||||
self.clicks.append(selector)
|
||||
if self._click_raises:
|
||||
raise TimeoutError("no interstitial")
|
||||
|
||||
tests_dir = os.path.join(os.path.dirname(__file__), "..")
|
||||
meta = meta_mod.load("mattermost-lts", tests_dir=tests_dir)
|
||||
hook = S._load_screenshot_hook(meta)
|
||||
assert callable(hook), "mattermost-lts SCREENSHOT hook missing from the real load path"
|
||||
page = _NavPage()
|
||||
hook(page, meta_mod.hook_ctx("mm.example.org", meta))
|
||||
assert page.urls == ["https://mm.example.org/login"]
|
||||
assert page.clicks == ["text=View in Browser"], "hook must click through the interstitial"
|
||||
assert len(page.idle_waits) == 2, "hook must settle after nav AND after the click"
|
||||
|
||||
# no interstitial (already on the form): the click times out and the hook still succeeds
|
||||
page2 = _NavPage(click_raises=True)
|
||||
hook(page2, meta_mod.hook_ctx("mm.example.org", meta))
|
||||
assert page2.clicks == ["text=View in Browser"]
|
||||
assert len(page2.idle_waits) == 1, "failed click must skip the second settle, not raise"
|
||||
|
||||
|
||||
def test_screenshot_reachable_through_real_load_path(tmp_path):
|
||||
"""R2 proof (rcust P1): a recipe SCREENSHOT hook declared in recipe_meta.py arrives at
|
||||
screenshot._load_screenshot_hook through the REAL orchestrator load path (meta.load — the
|
||||
|
||||
Reference in New Issue
Block a user