129 lines
10 KiB
Markdown
129 lines
10 KiB
Markdown
# BACKLOG-shot.md — phase `shot` (recipe screenshot audit & repair)
|
||
|
||
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md. Gates: M1 (audit+diagnosis), M2 (all OK / agreed N/A).
|
||
|
||
## Build backlog
|
||
|
||
### P1 — Audit matrix (status: complete, all 19 PNGs visually inspected 2026-06-11)
|
||
|
||
Enrolled set (19) = `tests/<r>/recipe_meta.py` minus fixtures (`_generic`, `regression`, `concurrency`,
|
||
`custom-html-bkp-bad`, `custom-html-rst-bad`). Evidence: `/var/lib/cc-ci-runs/<run>/` on cc-ci;
|
||
PNGs pulled to /tmp/shot-audit/ on the builder host and each one Read (visually).
|
||
|
||
| recipe | latest run w/ artifacts | screenshot field | PNG bytes | visual content (I looked) | class |
|
||
|---|---|---|---|---|---|
|
||
| bluesky-pds | ab-bluesky-pds-oldmain | null | — | no PNG; install=fail level=0 (upstream image breakage, rcust DEFERRED) → capture correctly skipped (`if deploy_ok`) | N-A-candidate (blocked upstream) |
|
||
| cryptpad | m2r-cryptpad | screenshot.png | 4802 | solid light-grey frame, nothing else | BLANK |
|
||
| custom-html | m2r-custom-html | screenshot.png | 35707 | "Welcome to nginx!" default page | OK? (diagnose: is this the recipe's true fresh-install content?) |
|
||
| custom-html-tiny | m2r-custom-html-tiny | screenshot.png | 12950 | seeded CI content ("cc-ci custom-html-tiny … DG5") | OK |
|
||
| discourse | m2p-discourse | screenshot.png | 66121 | real forum UI, welcome topic, Sign Up/Log In | OK |
|
||
| ghost | m2r-ghost | screenshot.png | 444183 | real blog landing ("Thoughts, stories and ideas") | OK |
|
||
| hedgedoc | m2r-hedgedoc | screenshot.png | 131967 | real landing (logo, Sign In, feature intro) | OK |
|
||
| immich | 356 | screenshot.png | 4801 | pure white frame | BLANK |
|
||
| keycloak | m2r-keycloak | screenshot.png | 8764 | spinner + "Loading the Administration Console" | LOADING |
|
||
| lasuite-docs | m2r-lasuite-docs | screenshot.png | 6022 | lone spinner on white | LOADING |
|
||
| lasuite-drive | m2p2-lasuite-drive | screenshot.png | 5895 | lone spinner on white | LOADING |
|
||
| lasuite-meet | m2r-lasuite-meet | screenshot.png | 4801 | pure white frame | BLANK |
|
||
| mailu | m2r-mailu | screenshot.png | 33800 | real sign-in page (empty fields) | OK |
|
||
| matrix-synapse | m2r-matrix-synapse | screenshot.png | 33296 | "It works! Synapse is running" landing | OK |
|
||
| mattermost-lts | m2b-mattermost-lts | screenshot.png | 242139 | brand splash/loading screen (logo on blue), NOT the login form | LOADING (borderline — brand-recognizable but a loading state) |
|
||
| mumble | m2r-mumble | screenshot.png | 7913 | spinner on grey — a web page IS served on the domain | LOADING (diagnose what serves it; N/A may NOT be justified) |
|
||
| n8n | m2r-n8n | screenshot.png | 4801 | off-white blank frame. Flaky: run 197 (30256 B) shows the real "Set up owner account" form (empty fields, credential-free) | BLANK (flaky) |
|
||
| plausible | 357 | null | — | no PNG on ANY run (122→357) | NULL |
|
||
| uptime-kuma | m2r-uptime-kuma | screenshot.png | 30858 | real "Create your admin account" setup form (empty fields) | OK |
|
||
|
||
PNG-size note: 4801/4802 B at 1280×800 is a byte-stable blank-frame fingerprint (3 different apps, same size).
|
||
|
||
### P2 — Root-cause diagnoses
|
||
|
||
- [x] **NULL — plausible** (evidence: Drone build 357 ci-step log, t=73s):
|
||
`screenshot: capture failed (non-fatal, verdict unaffected): page.goto(https://plau-b51425.ci.commoninternet.net/) never returned a status in (200, 301, 302, 303, 401, 403) after 15 attempts (45s); last status=500`.
|
||
Plausible's `/` 500s **by design** under `DISABLE_AUTH=true` (auth_controller; documented in
|
||
`tests/plausible/functional/test_health_check.py` docstring and recipe_meta — that's why HEALTH_PATH
|
||
is `/api/health`). Default landing-page capture can NEVER succeed → needs a per-recipe SCREENSHOT
|
||
hook to a path that actually renders (probe live: e.g. /login or /sites).
|
||
- [x] **NULL — bluesky-pds**: install fails (level=0) before the app is up → `if deploy_ok:` gate in
|
||
runner/run_recipe_ci.py:1024 correctly skips capture. Not a screenshot defect; upstream image
|
||
breakage already filed in machine-docs/DEFERRED.md (rcust). → documented N/A while upstream is broken.
|
||
- [x] **BLANK class — immich, lasuite-meet, n8n(flaky), cryptpad**: SPA paint race. capture() navigates
|
||
with `wait_until="domcontentloaded"` (runner/harness/screenshot.py:91) and screenshots immediately;
|
||
SPA shell HTML has loaded but JS hasn't painted → solid 4801-2 B frame. n8n flakiness = same race,
|
||
sometimes JS wins (run 197 captured the real form).
|
||
- [x] **LOADING class — keycloak, lasuite-docs, lasuite-drive, mumble, mattermost-lts(borderline)**:
|
||
same race, caught mid-paint (spinner/splash rendered, app JS still loading/connecting).
|
||
- [x] **mumble** web stack identified: recipe deploys a `web` service (mumble-web client) on the domain —
|
||
spinner is its connecting state; landing renders a connect dialog once JS settles. NOT an N/A.
|
||
- [x] **custom-html** nginx-welcome question: the recipe's fresh install genuinely serves the nginx
|
||
default page at `/` (no content seeded for this recipe's install; only custom-html-tiny seeds via
|
||
install_steps.sh). Screenshot is an honest representative view of a fresh install. → OK as-is.
|
||
|
||
### P3 — Fixes (all merged to main)
|
||
|
||
- [x] Harness default improvement (ce50f64 + A1 hardening 7ad7d1f): bounded networkidle settle
|
||
(10s) + 0.5s render grace after domcontentloaded; blank/spinner-frame detect (<10000 B) → ONE
|
||
retry with 4s settle, larger frame kept (A1). Wait budget 45+10+0.5+4+0.5 = 60s, unit-tested.
|
||
8 new unit tests; 207 pass; lint PASS.
|
||
- [x] plausible — NOT a hook in the end: the real root cause was EXTRA_ENV SECRET_KEY_BASE being
|
||
62 chars (<64-byte Phoenix cookie-store minimum) → every HTML render 500'd. Fixed to 68 chars
|
||
(b98a471); default capture then lands the genuine registration page. Stale auth_controller
|
||
comments corrected (no assertion touched).
|
||
- [x] mattermost-lts SCREENSHOT hook (80e5713 + 3c33129): interstitial appears on ANY first-visit
|
||
route incl /login (proven byte-identical PNG) → hook navigates /login, clicks "View in Browser"
|
||
best-effort, settles; lands the real login form. First real hook; public screenshot.settle().
|
||
- [x] keycloak / lasuite-docs / lasuite-drive / lasuite-meet / immich / cryptpad / n8n: fixed by
|
||
the harness default alone (no hooks needed — proof PNGs below).
|
||
- [x] mumble: NOT fixable harness-side — pinned mumble-web:0.5 client never paints UI for an
|
||
anonymous browser (≥90s DOM/console/network observation: no errors, no failed requests,
|
||
connect-dialog elements absent, no autoconnect overrides). Loader frame = the genuine anonymous
|
||
web view; voice (the recipe's function) fully covered by protocol tests. DEFERRED.md entry filed
|
||
(upstream question for the operator).
|
||
- [x] bluesky-pds: documented N/A while upstream image broken (rcust DEFERRED; Adversary-agreed at
|
||
M1, contingent re-check at M2 — latest failing evidence ab-bluesky-pds-oldmain, 2026-06-11).
|
||
|
||
### P4 — Proof runs (fresh, post-fix; every PNG visually Read by Builder)
|
||
|
||
| recipe | proof run (dir on cc-ci) | level (baseline) | PNG B | visual |
|
||
|---|---|---|---|---|
|
||
| immich | 370 (drone !testme immich#2) | 4 (=356:4) | 234351 | real "Welcome to Immich" onboarding |
|
||
| plausible | 371 (drone !testme plausible#3) | 4 (=357:4) | 64132 | real registration form, empty fields |
|
||
| keycloak | shot-proof-keycloak | 4 | 215587 | real "Sign in to your account" form |
|
||
| cryptpad | shot-proof-cryptpad | 4 | 57310 | real landing + document-type picker |
|
||
| lasuite-meet | shot-proof-lasuite-meet | 4 | 225686 | real video-conferencing landing |
|
||
| lasuite-docs | shot-proof-lasuite-docs | 4 | 284769 | real Docs landing |
|
||
| lasuite-drive | shot-proof2-lasuite-drive | 4 | 132037 | real Drive landing |
|
||
| n8n | shot-proof-n8n | 4 | 26433 | real "Set up owner account", empty fields (now deterministic) |
|
||
| mattermost-lts | shot-proof3-mattermost-lts | 2 (=m2r:2) | 178367 | real "Log in to your account" form (hook v2) |
|
||
| mumble | shot-proof-mumble | 4 | 7980 | loader frame — best-available (see P3/DEFERRED) |
|
||
|
||
Drone durations pre/post (same recipe+PR): immich 199s→198s; plausible 209s→166s (faster — capture
|
||
no longer burns 45s failing). Healthy class (ghost, hedgedoc, discourse, custom-html,
|
||
custom-html-tiny, mailu, matrix-synapse, uptime-kuma): existing artifacts cited in P1 matrix, each
|
||
visually verified real + credential-free; no new runs needed per plan §3 P4.
|
||
Dashboard/card: grid thumbnails for runs 370/371 served 200, summary.html embeds screenshot.png,
|
||
/badge/immich.svg 200.
|
||
|
||
## Adversary findings
|
||
|
||
### [adversary] A1 — blank-retry can REGRESS a larger frame to a worse one (LOW, non-blocking) — CLOSED @2026-06-11T06:32Z
|
||
**CLOSED:** fixed in 7ad7d1f (retry snapped to a temp path; `os.replace` only if `retry >= first`,
|
||
else discard + cleanup in `finally`). Re-verified COLD with my own probe (not the Builder's test):
|
||
the exact filed case `[9999,4801]` now keeps **9999** (retry discarded, no temp leak); originals
|
||
intact (`[4801,30256]`→30256, `[4801,4802]`→4802, `[35707]`→1 shot, `[5000,5000]`→replace). 5/5 pass.
|
||
R7 contract preserved (retry-raise still propagates to capture's swallow → None; first frame on disk).
|
||
--- original finding (for the record) ---
|
||
**Where:** `runner/harness/screenshot.py` `_snap_with_blank_retry` (ce50f64).
|
||
**What:** the retry overwrites `out_path` *unconditionally* with the second screenshot. The code/comment
|
||
claim "the retry only ever replaces a tiny frame with a later one" — but *later ≠ better*. If the first
|
||
frame is e.g. 9999 B (a partial render, just under `BLANK_SIZE_BYTES=10000`) and the page regresses in the
|
||
extra 4 s settle (redirect, session-timeout splash, error overlay), the retry can yield a 4801 B blank that
|
||
**overwrites the better 9999 B frame**. The Builder's unit test only covers blank→blank (4801→4802); the
|
||
bigger→smaller regression is untested.
|
||
**Repro (cold, my independent probe, not the Builder's test file):** fake page returning sizes
|
||
`[9999, 4801]` → `_snap_with_blank_retry` keeps **4801** (the worse frame).
|
||
**Severity:** LOW. R7 holds (cosmetic only, never affects verdict); my M2 per-PNG visual check is the
|
||
backstop — any actually-blank final PNG will FAIL that recipe regardless. Filed for hardening, not a veto.
|
||
**Suggested guard (trivial, strictly safer):** keep the larger frame — only overwrite if
|
||
`getsize(retry) >= getsize(first)` (or snap retry to a temp path and pick `max`). Then extend the unit
|
||
test with a bigger→smaller case asserting the larger frame survives.
|
||
**Closes:** only I close this, after re-test. Non-blocking for an M2 claim, but I will re-check at M2.
|