# BACKLOG-shot.md — phase `shot` (recipe screenshot audit & repair) SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md. Gates: M1 (audit+diagnosis), M2 (all OK / agreed N/A). ## Build backlog ### P1 — Audit matrix (status: complete, all 19 PNGs visually inspected 2026-06-11) Enrolled set (19) = `tests//recipe_meta.py` minus fixtures (`_generic`, `regression`, `concurrency`, `custom-html-bkp-bad`, `custom-html-rst-bad`). Evidence: `/var/lib/cc-ci-runs//` on cc-ci; PNGs pulled to /tmp/shot-audit/ on the builder host and each one Read (visually). | recipe | latest run w/ artifacts | screenshot field | PNG bytes | visual content (I looked) | class | |---|---|---|---|---|---| | bluesky-pds | ab-bluesky-pds-oldmain | null | — | no PNG; install=fail level=0 (upstream image breakage, rcust DEFERRED) → capture correctly skipped (`if deploy_ok`) | N-A-candidate (blocked upstream) | | cryptpad | m2r-cryptpad | screenshot.png | 4802 | solid light-grey frame, nothing else | BLANK | | custom-html | m2r-custom-html | screenshot.png | 35707 | "Welcome to nginx!" default page | OK? (diagnose: is this the recipe's true fresh-install content?) | | custom-html-tiny | m2r-custom-html-tiny | screenshot.png | 12950 | seeded CI content ("cc-ci custom-html-tiny … DG5") | OK | | discourse | m2p-discourse | screenshot.png | 66121 | real forum UI, welcome topic, Sign Up/Log In | OK | | ghost | m2r-ghost | screenshot.png | 444183 | real blog landing ("Thoughts, stories and ideas") | OK | | hedgedoc | m2r-hedgedoc | screenshot.png | 131967 | real landing (logo, Sign In, feature intro) | OK | | immich | 356 | screenshot.png | 4801 | pure white frame | BLANK | | keycloak | m2r-keycloak | screenshot.png | 8764 | spinner + "Loading the Administration Console" | LOADING | | lasuite-docs | m2r-lasuite-docs | screenshot.png | 6022 | lone spinner on white | LOADING | | lasuite-drive | m2p2-lasuite-drive | screenshot.png | 5895 | lone spinner on white | LOADING | | lasuite-meet | m2r-lasuite-meet | screenshot.png | 4801 | pure white frame | BLANK | | mailu | m2r-mailu | screenshot.png | 33800 | real sign-in page (empty fields) | OK | | matrix-synapse | m2r-matrix-synapse | screenshot.png | 33296 | "It works! Synapse is running" landing | OK | | mattermost-lts | m2b-mattermost-lts | screenshot.png | 242139 | brand splash/loading screen (logo on blue), NOT the login form | LOADING (borderline — brand-recognizable but a loading state) | | mumble | m2r-mumble | screenshot.png | 7913 | spinner on grey — a web page IS served on the domain | LOADING (diagnose what serves it; N/A may NOT be justified) | | n8n | m2r-n8n | screenshot.png | 4801 | off-white blank frame. Flaky: run 197 (30256 B) shows the real "Set up owner account" form (empty fields, credential-free) | BLANK (flaky) | | plausible | 357 | null | — | no PNG on ANY run (122→357) | NULL | | uptime-kuma | m2r-uptime-kuma | screenshot.png | 30858 | real "Create your admin account" setup form (empty fields) | OK | PNG-size note: 4801/4802 B at 1280×800 is a byte-stable blank-frame fingerprint (3 different apps, same size). ### P2 — Root-cause diagnoses - [x] **NULL — plausible** (evidence: Drone build 357 ci-step log, t=73s): `screenshot: capture failed (non-fatal, verdict unaffected): page.goto(https://plau-b51425.ci.commoninternet.net/) never returned a status in (200, 301, 302, 303, 401, 403) after 15 attempts (45s); last status=500`. Plausible's `/` 500s **by design** under `DISABLE_AUTH=true` (auth_controller; documented in `tests/plausible/functional/test_health_check.py` docstring and recipe_meta — that's why HEALTH_PATH is `/api/health`). Default landing-page capture can NEVER succeed → needs a per-recipe SCREENSHOT hook to a path that actually renders (probe live: e.g. /login or /sites). - [x] **NULL — bluesky-pds**: install fails (level=0) before the app is up → `if deploy_ok:` gate in runner/run_recipe_ci.py:1024 correctly skips capture. Not a screenshot defect; upstream image breakage already filed in machine-docs/DEFERRED.md (rcust). → documented N/A while upstream is broken. - [x] **BLANK class — immich, lasuite-meet, n8n(flaky), cryptpad**: SPA paint race. capture() navigates with `wait_until="domcontentloaded"` (runner/harness/screenshot.py:91) and screenshots immediately; SPA shell HTML has loaded but JS hasn't painted → solid 4801-2 B frame. n8n flakiness = same race, sometimes JS wins (run 197 captured the real form). - [x] **LOADING class — keycloak, lasuite-docs, lasuite-drive, mumble, mattermost-lts(borderline)**: same race, caught mid-paint (spinner/splash rendered, app JS still loading/connecting). - [x] **mumble** web stack identified: recipe deploys a `web` service (mumble-web client) on the domain — spinner is its connecting state; landing renders a connect dialog once JS settles. NOT an N/A. - [x] **custom-html** nginx-welcome question: the recipe's fresh install genuinely serves the nginx default page at `/` (no content seeded for this recipe's install; only custom-html-tiny seeds via install_steps.sh). Screenshot is an honest representative view of a fresh install. → OK as-is. ### P3 — Fixes (all merged to main) - [x] Harness default improvement (ce50f64 + A1 hardening 7ad7d1f): bounded networkidle settle (10s) + 0.5s render grace after domcontentloaded; blank/spinner-frame detect (<10000 B) → ONE retry with 4s settle, larger frame kept (A1). Wait budget 45+10+0.5+4+0.5 = 60s, unit-tested. 8 new unit tests; 207 pass; lint PASS. - [x] plausible — NOT a hook in the end: the real root cause was EXTRA_ENV SECRET_KEY_BASE being 62 chars (<64-byte Phoenix cookie-store minimum) → every HTML render 500'd. Fixed to 68 chars (b98a471); default capture then lands the genuine registration page. Stale auth_controller comments corrected (no assertion touched). - [x] mattermost-lts SCREENSHOT hook (80e5713 + 3c33129): interstitial appears on ANY first-visit route incl /login (proven byte-identical PNG) → hook navigates /login, clicks "View in Browser" best-effort, settles; lands the real login form. First real hook; public screenshot.settle(). - [x] keycloak / lasuite-docs / lasuite-drive / lasuite-meet / immich / cryptpad / n8n: fixed by the harness default alone (no hooks needed — proof PNGs below). - [x] mumble: NOT fixable harness-side — pinned mumble-web:0.5 client never paints UI for an anonymous browser (≥90s DOM/console/network observation: no errors, no failed requests, connect-dialog elements absent, no autoconnect overrides). Loader frame = the genuine anonymous web view; voice (the recipe's function) fully covered by protocol tests. DEFERRED.md entry filed (upstream question for the operator). - [x] bluesky-pds: documented N/A while upstream image broken (rcust DEFERRED; Adversary-agreed at M1, contingent re-check at M2 — latest failing evidence ab-bluesky-pds-oldmain, 2026-06-11). ### P4 — Proof runs (fresh, post-fix; every PNG visually Read by Builder) | recipe | proof run (dir on cc-ci) | level (baseline) | PNG B | visual | |---|---|---|---|---| | immich | 370 (drone !testme immich#2) | 4 (=356:4) | 234351 | real "Welcome to Immich" onboarding | | plausible | 371 (drone !testme plausible#3) | 4 (=357:4) | 64132 | real registration form, empty fields | | keycloak | shot-proof-keycloak | 4 | 215587 | real "Sign in to your account" form | | cryptpad | shot-proof-cryptpad | 4 | 57310 | real landing + document-type picker | | lasuite-meet | shot-proof-lasuite-meet | 4 | 225686 | real video-conferencing landing | | lasuite-docs | shot-proof-lasuite-docs | 4 | 284769 | real Docs landing | | lasuite-drive | shot-proof2-lasuite-drive | 4 | 132037 | real Drive landing | | n8n | shot-proof-n8n | 4 | 26433 | real "Set up owner account", empty fields (now deterministic) | | mattermost-lts | shot-proof3-mattermost-lts | 2 (=m2r:2) | 178367 | real "Log in to your account" form (hook v2) | | mumble | shot-proof-mumble | 4 | 7980 | loader frame — best-available (see P3/DEFERRED) | Drone durations pre/post (same recipe+PR): immich 199s→198s; plausible 209s→166s (faster — capture no longer burns 45s failing). Healthy class (ghost, hedgedoc, discourse, custom-html, custom-html-tiny, mailu, matrix-synapse, uptime-kuma): existing artifacts cited in P1 matrix, each visually verified real + credential-free; no new runs needed per plan §3 P4. Dashboard/card: grid thumbnails for runs 370/371 served 200, summary.html embeds screenshot.png, /badge/immich.svg 200. ## Adversary findings ### [adversary] A1 — blank-retry can REGRESS a larger frame to a worse one (LOW, non-blocking) — CLOSED @2026-06-11T06:32Z **CLOSED:** fixed in 7ad7d1f (retry snapped to a temp path; `os.replace` only if `retry >= first`, else discard + cleanup in `finally`). Re-verified COLD with my own probe (not the Builder's test): the exact filed case `[9999,4801]` now keeps **9999** (retry discarded, no temp leak); originals intact (`[4801,30256]`→30256, `[4801,4802]`→4802, `[35707]`→1 shot, `[5000,5000]`→replace). 5/5 pass. R7 contract preserved (retry-raise still propagates to capture's swallow → None; first frame on disk). --- original finding (for the record) --- **Where:** `runner/harness/screenshot.py` `_snap_with_blank_retry` (ce50f64). **What:** the retry overwrites `out_path` *unconditionally* with the second screenshot. The code/comment claim "the retry only ever replaces a tiny frame with a later one" — but *later ≠ better*. If the first frame is e.g. 9999 B (a partial render, just under `BLANK_SIZE_BYTES=10000`) and the page regresses in the extra 4 s settle (redirect, session-timeout splash, error overlay), the retry can yield a 4801 B blank that **overwrites the better 9999 B frame**. The Builder's unit test only covers blank→blank (4801→4802); the bigger→smaller regression is untested. **Repro (cold, my independent probe, not the Builder's test file):** fake page returning sizes `[9999, 4801]` → `_snap_with_blank_retry` keeps **4801** (the worse frame). **Severity:** LOW. R7 holds (cosmetic only, never affects verdict); my M2 per-PNG visual check is the backstop — any actually-blank final PNG will FAIL that recipe regardless. Filed for hardening, not a veto. **Suggested guard (trivial, strictly safer):** keep the larger frame — only overwrite if `getsize(retry) >= getsize(first)` (or snap retry to a temp path and pick `max`). Then extend the unit test with a bigger→smaller case asserting the larger frame survives. **Closes:** only I close this, after re-test. Non-blocking for an M2 claim, but I will re-check at M2.