status(shot): phase open — P1 audit matrix complete (19/19 recipes, every PNG visually inspected) + P2 root causes (plausible /-500s-by-design via build-357 log; blank/loading = domcontentloaded paint race; bluesky-pds deploy-gated; mumble has real web UI; custom-html nginx-welcome is honest fresh-install content)
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
78
BACKLOG-shot.md
Normal file
78
BACKLOG-shot.md
Normal file
@ -0,0 +1,78 @@
|
||||
# BACKLOG-shot.md — phase `shot` (recipe screenshot audit & repair)
|
||||
|
||||
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md. Gates: M1 (audit+diagnosis), M2 (all OK / agreed N/A).
|
||||
|
||||
## Build backlog
|
||||
|
||||
### P1 — Audit matrix (status: complete, all 19 PNGs visually inspected 2026-06-11)
|
||||
|
||||
Enrolled set (19) = `tests/<r>/recipe_meta.py` minus fixtures (`_generic`, `regression`, `concurrency`,
|
||||
`custom-html-bkp-bad`, `custom-html-rst-bad`). Evidence: `/var/lib/cc-ci-runs/<run>/` on cc-ci;
|
||||
PNGs pulled to /tmp/shot-audit/ on the builder host and each one Read (visually).
|
||||
|
||||
| recipe | latest run w/ artifacts | screenshot field | PNG bytes | visual content (I looked) | class |
|
||||
|---|---|---|---|---|---|
|
||||
| bluesky-pds | ab-bluesky-pds-oldmain | null | — | no PNG; install=fail level=0 (upstream image breakage, rcust DEFERRED) → capture correctly skipped (`if deploy_ok`) | N-A-candidate (blocked upstream) |
|
||||
| cryptpad | m2r-cryptpad | screenshot.png | 4802 | solid light-grey frame, nothing else | BLANK |
|
||||
| custom-html | m2r-custom-html | screenshot.png | 35707 | "Welcome to nginx!" default page | OK? (diagnose: is this the recipe's true fresh-install content?) |
|
||||
| custom-html-tiny | m2r-custom-html-tiny | screenshot.png | 12950 | seeded CI content ("cc-ci custom-html-tiny … DG5") | OK |
|
||||
| discourse | m2p-discourse | screenshot.png | 66121 | real forum UI, welcome topic, Sign Up/Log In | OK |
|
||||
| ghost | m2r-ghost | screenshot.png | 444183 | real blog landing ("Thoughts, stories and ideas") | OK |
|
||||
| hedgedoc | m2r-hedgedoc | screenshot.png | 131967 | real landing (logo, Sign In, feature intro) | OK |
|
||||
| immich | 356 | screenshot.png | 4801 | pure white frame | BLANK |
|
||||
| keycloak | m2r-keycloak | screenshot.png | 8764 | spinner + "Loading the Administration Console" | LOADING |
|
||||
| lasuite-docs | m2r-lasuite-docs | screenshot.png | 6022 | lone spinner on white | LOADING |
|
||||
| lasuite-drive | m2p2-lasuite-drive | screenshot.png | 5895 | lone spinner on white | LOADING |
|
||||
| lasuite-meet | m2r-lasuite-meet | screenshot.png | 4801 | pure white frame | BLANK |
|
||||
| mailu | m2r-mailu | screenshot.png | 33800 | real sign-in page (empty fields) | OK |
|
||||
| matrix-synapse | m2r-matrix-synapse | screenshot.png | 33296 | "It works! Synapse is running" landing | OK |
|
||||
| mattermost-lts | m2b-mattermost-lts | screenshot.png | 242139 | brand splash/loading screen (logo on blue), NOT the login form | LOADING (borderline — brand-recognizable but a loading state) |
|
||||
| mumble | m2r-mumble | screenshot.png | 7913 | spinner on grey — a web page IS served on the domain | LOADING (diagnose what serves it; N/A may NOT be justified) |
|
||||
| n8n | m2r-n8n | screenshot.png | 4801 | off-white blank frame. Flaky: run 197 (30256 B) shows the real "Set up owner account" form (empty fields, credential-free) | BLANK (flaky) |
|
||||
| plausible | 357 | null | — | no PNG on ANY run (122→357) | NULL |
|
||||
| uptime-kuma | m2r-uptime-kuma | screenshot.png | 30858 | real "Create your admin account" setup form (empty fields) | OK |
|
||||
|
||||
PNG-size note: 4801/4802 B at 1280×800 is a byte-stable blank-frame fingerprint (3 different apps, same size).
|
||||
|
||||
### P2 — Root-cause diagnoses
|
||||
|
||||
- [x] **NULL — plausible** (evidence: Drone build 357 ci-step log, t=73s):
|
||||
`screenshot: capture failed (non-fatal, verdict unaffected): page.goto(https://plau-b51425.ci.commoninternet.net/) never returned a status in (200, 301, 302, 303, 401, 403) after 15 attempts (45s); last status=500`.
|
||||
Plausible's `/` 500s **by design** under `DISABLE_AUTH=true` (auth_controller; documented in
|
||||
`tests/plausible/functional/test_health_check.py` docstring and recipe_meta — that's why HEALTH_PATH
|
||||
is `/api/health`). Default landing-page capture can NEVER succeed → needs a per-recipe SCREENSHOT
|
||||
hook to a path that actually renders (probe live: e.g. /login or /sites).
|
||||
- [x] **NULL — bluesky-pds**: install fails (level=0) before the app is up → `if deploy_ok:` gate in
|
||||
runner/run_recipe_ci.py:1024 correctly skips capture. Not a screenshot defect; upstream image
|
||||
breakage already filed in machine-docs/DEFERRED.md (rcust). → documented N/A while upstream is broken.
|
||||
- [x] **BLANK class — immich, lasuite-meet, n8n(flaky), cryptpad**: SPA paint race. capture() navigates
|
||||
with `wait_until="domcontentloaded"` (runner/harness/screenshot.py:91) and screenshots immediately;
|
||||
SPA shell HTML has loaded but JS hasn't painted → solid 4801-2 B frame. n8n flakiness = same race,
|
||||
sometimes JS wins (run 197 captured the real form).
|
||||
- [x] **LOADING class — keycloak, lasuite-docs, lasuite-drive, mumble, mattermost-lts(borderline)**:
|
||||
same race, caught mid-paint (spinner/splash rendered, app JS still loading/connecting).
|
||||
- [x] **mumble** web stack identified: recipe deploys a `web` service (mumble-web client) on the domain —
|
||||
spinner is its connecting state; landing renders a connect dialog once JS settles. NOT an N/A.
|
||||
- [x] **custom-html** nginx-welcome question: the recipe's fresh install genuinely serves the nginx
|
||||
default page at `/` (no content seeded for this recipe's install; only custom-html-tiny seeds via
|
||||
install_steps.sh). Screenshot is an honest representative view of a fresh install. → OK as-is.
|
||||
|
||||
### P3 — Fixes
|
||||
|
||||
- [ ] Harness default improvement (fixes BLANK+LOADING classes): after domcontentloaded nav, bounded
|
||||
network-idle/paint wait + blank-frame detect (tiny PNG → one retry with stronger wait), all within
|
||||
NAV_DEADLINE_S=45 / step worst-case ≤ ~60s. Unit tests in tests/unit/test_screenshot.py.
|
||||
- [ ] plausible SCREENSHOT hook (tests/plausible/recipe_meta.py) to a rendering, credential-free path.
|
||||
- [ ] Re-audit mattermost-lts / mumble / keycloak / lasuite-* after harness fix; per-recipe hooks only
|
||||
where the default still can't work.
|
||||
- [ ] bluesky-pds: document N/A in matrix (Adversary agreement at M1/M2).
|
||||
|
||||
### P4 — Proof runs
|
||||
|
||||
- [ ] Fresh real-CI run per fixed recipe (immich, lasuite-meet, n8n, cryptpad, keycloak, lasuite-docs,
|
||||
lasuite-drive, mumble, mattermost-lts, plausible), ≥2 via drone `!testme`; visual check each PNG;
|
||||
card + dashboard render. Healthy class: cite existing artifact + visual check (done in P1).
|
||||
|
||||
## Adversary findings
|
||||
|
||||
(Adversary-owned section.)
|
||||
40
JOURNAL-shot.md
Normal file
40
JOURNAL-shot.md
Normal file
@ -0,0 +1,40 @@
|
||||
# JOURNAL-shot.md — Builder journal, phase `shot`
|
||||
|
||||
## 2026-06-11 ~01:17–01:35Z — phase open, P1+P2 in one sweep
|
||||
|
||||
Read the phase plan + plan.md §6.1/§7/§9. Enumerated enrolled recipes (19). Pulled per-recipe
|
||||
latest-run data off cc-ci (`results.json` screenshot field + PNG size for all ~190 run dirs),
|
||||
scp'd 18 PNGs to /tmp/shot-audit/ and Read every one of them.
|
||||
|
||||
Findings vs the orchestrator pre-audit: all four 4801-2B suspects are indeed blank frames
|
||||
(immich pure white, lasuite-meet white, n8n off-white, cryptpad grey). keycloak 8.7KB is a
|
||||
"Loading the Administration Console" spinner — NOT a sparse login page as §2 guessed.
|
||||
lasuite-docs/drive ~5.9KB are lone spinners. Two surprises: (1) mattermost-lts 242KB, classed
|
||||
healthy by size, is actually the brand splash/loading screen, not the login form — size
|
||||
heuristics lie in both directions; (2) mumble serves a real web page (mumble-web client per
|
||||
compose.mumbleweb.yml, deployed since Phase 2 for HTTP health) showing its connecting spinner —
|
||||
so mumble is fixable, not an N/A.
|
||||
|
||||
plausible root cause: traced via Drone sqlite (no python3 on host; ran alpine+sqlite3 against
|
||||
the drone data volume). Build 357 log t=73s: capture failed, last status=500 after 45s. Cross-ref
|
||||
tests/plausible/functional/test_health_check.py: `/` 500s via auth_controller under
|
||||
DISABLE_AUTH=true — permanent, not an init race. So the default landing capture can never work;
|
||||
plausible needs a SCREENSHOT hook to a path that renders (will probe /login, /sites on a live
|
||||
deploy during P3).
|
||||
|
||||
bluesky-pds: null because install fails at level 0 (upstream image breakage, already in
|
||||
DEFERRED.md from rcust) — capture gated on deploy_ok, correctly skipped. N/A while upstream broken.
|
||||
|
||||
custom-html nginx-welcome: verified no install-time seeding exists for this recipe (custom-html-tiny
|
||||
has install_steps.sh; custom-html only seeds in pre_backup/pre_upgrade ops, after capture). The
|
||||
nginx default page IS the honest fresh-install view. Leaving OK; flagged in matrix for Adversary.
|
||||
|
||||
Adversary opened REVIEW-shot.md with its own cold pre-audit (4f3a747) before my first push —
|
||||
good: my visual reads agree with theirs on every overlapping row.
|
||||
|
||||
Design thinking for P3 (next iteration): default-path improvement = after goto(domcontentloaded),
|
||||
try a bounded `wait_for_load_state("networkidle")` (~10-15s cap) and/or wait for a non-trivial
|
||||
painted body, then screenshot; then a blank-detect (PNG < ~6KB or near-uniform) → one retry with
|
||||
a longer settle. Keep total ≤ ~60s worst case, all inside the existing capture() try/except so R7
|
||||
(cosmetics never block) is preserved. Unit tests: blank-detector pure function + retry logic with
|
||||
a fake page. Per-recipe hooks only for plausible (500 root) + whatever the re-audit still shows.
|
||||
33
STATUS-shot.md
Normal file
33
STATUS-shot.md
Normal file
@ -0,0 +1,33 @@
|
||||
# STATUS-shot.md — Builder status, phase `shot`
|
||||
|
||||
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md
|
||||
|
||||
## Current section
|
||||
|
||||
P1 audit matrix COMPLETE (all 19 enrolled recipes, every PNG visually inspected).
|
||||
P2 diagnoses COMPLETE (see BACKLOG-shot.md P2 — each with evidence).
|
||||
Next: P3 fix design (harness default wait improvement + plausible hook + unit tests).
|
||||
Gate: none claimed yet — M1 claim coming after I re-verify the matrix is self-consistent.
|
||||
|
||||
## Verification map (WHAT/HOW/EXPECTED/WHERE for the audit, ahead of the M1 claim)
|
||||
|
||||
- Enrolled set (19): `ls tests/*/recipe_meta.py` minus fixtures `_generic, regression, concurrency,
|
||||
custom-html-bkp-bad, custom-html-rst-bad` (those first three have no recipe_meta.py; the two
|
||||
`-bad` ones do but are harness canaries).
|
||||
- Matrix: BACKLOG-shot.md "P1 — Audit matrix". Reproduce any row:
|
||||
`ssh cc-ci 'grep -o "\"screenshot\": *[^,}]*" /var/lib/cc-ci-runs/<run>/results.json; stat -c%s /var/lib/cc-ci-runs/<run>/screenshot.png'`
|
||||
then scp the PNG and Read it. Run ids are in the matrix "latest run" column.
|
||||
- plausible NULL evidence: Drone sqlite, build 357 ci step (step_id 947):
|
||||
`ssh cc-ci 'docker run --rm -v drone_ci_commoninternet_net_data:/data alpine sh -c "apk add -q sqlite; sqlite3 /data/database.sqlite \"select log_data from logs where log_id=947\"" | grep -o "screenshot[^\"]*"'`
|
||||
EXPECTED: `capture failed … last status=500` after 15 attempts/45s.
|
||||
- bluesky-pds NULL evidence: `grep '"install"' /var/lib/cc-ci-runs/m2rr-bluesky-pds/results.json`
|
||||
→ fail, level=0; capture is gated on deploy_ok (runner/run_recipe_ci.py:1024).
|
||||
- Default capture path under audit: runner/harness/screenshot.py:84-93 (domcontentloaded, no paint
|
||||
wait) — the BLANK/LOADING mechanism; accept_statuses excludes 500 — the plausible mechanism.
|
||||
- mumble web UI exists: tests/mumble/recipe_meta.py header (compose.mumbleweb.yml, HEALTH_PATH "/").
|
||||
- custom-html fresh install serves nginx default: no install_steps.sh in tests/custom-html/ (only
|
||||
pre_backup/pre_upgrade seeds in ops.py, which run AFTER the capture moment).
|
||||
|
||||
## Blocked
|
||||
|
||||
(nothing)
|
||||
Reference in New Issue
Block a user