7.3 KiB
JOURNAL-shot.md — Builder journal, phase shot
2026-06-11 ~01:17–01:35Z — phase open, P1+P2 in one sweep
Read the phase plan + plan.md §6.1/§7/§9. Enumerated enrolled recipes (19). Pulled per-recipe
latest-run data off cc-ci (results.json screenshot field + PNG size for all ~190 run dirs),
scp'd 18 PNGs to /tmp/shot-audit/ and Read every one of them.
Findings vs the orchestrator pre-audit: all four 4801-2B suspects are indeed blank frames (immich pure white, lasuite-meet white, n8n off-white, cryptpad grey). keycloak 8.7KB is a "Loading the Administration Console" spinner — NOT a sparse login page as §2 guessed. lasuite-docs/drive ~5.9KB are lone spinners. Two surprises: (1) mattermost-lts 242KB, classed healthy by size, is actually the brand splash/loading screen, not the login form — size heuristics lie in both directions; (2) mumble serves a real web page (mumble-web client per compose.mumbleweb.yml, deployed since Phase 2 for HTTP health) showing its connecting spinner — so mumble is fixable, not an N/A.
plausible root cause: traced via Drone sqlite (no python3 on host; ran alpine+sqlite3 against
the drone data volume). Build 357 log t=73s: capture failed, last status=500 after 45s. Cross-ref
tests/plausible/functional/test_health_check.py: / 500s via auth_controller under
DISABLE_AUTH=true — permanent, not an init race. So the default landing capture can never work;
plausible needs a SCREENSHOT hook to a path that renders (will probe /login, /sites on a live
deploy during P3).
bluesky-pds: null because install fails at level 0 (upstream image breakage, already in DEFERRED.md from rcust) — capture gated on deploy_ok, correctly skipped. N/A while upstream broken.
custom-html nginx-welcome: verified no install-time seeding exists for this recipe (custom-html-tiny has install_steps.sh; custom-html only seeds in pre_backup/pre_upgrade ops, after capture). The nginx default page IS the honest fresh-install view. Leaving OK; flagged in matrix for Adversary.
Adversary opened REVIEW-shot.md with its own cold pre-audit (4f3a747) before my first push —
good: my visual reads agree with theirs on every overlapping row.
Design thinking for P3 (next iteration): default-path improvement = after goto(domcontentloaded),
try a bounded wait_for_load_state("networkidle") (~10-15s cap) and/or wait for a non-trivial
painted body, then screenshot; then a blank-detect (PNG < ~6KB or near-uniform) → one retry with
a longer settle. Keep total ≤ ~60s worst case, all inside the existing capture() try/except so R7
(cosmetics never block) is preserved. Unit tests: blank-detector pure function + retry logic with
a fake page. Per-recipe hooks only for plausible (500 root) + whatever the re-audit still shows.
2026-06-11 ~05:45-06:00Z — plausible root cause was a 62-char SECRET_KEY_BASE; M1 PASSed meanwhile
M1 PASS (ae10b55) with a watch-list. P3 done in two commits: ce50f64 (harness settle+blank-retry,
6 unit tests, 205 pass, lint PASS) and b98a471 (plausible fix). The plausible story changed under
probing: three live probes (shot-probe{,2,3}-plausible) showed / and every HTML route 302→/register
which 500s; app logs gave the smoking gun: (ArgumentError) cookie store expects conn.secret_key_base to be at least 64 bytes. Our EXTRA_ENV value — comment claimed "64-char" — measures 62. So every
page render 500'd while /api/* (no cookie store) passed all tiers. NOT auth_controller/DISABLE_AUTH
as the old comments claimed; corrected both stale comments. Fix = 68-char value; verified
shot-fix-plausible run: install pass, screenshot.png 64132B = real registration page (empty fields,
placeholders only — same safe shape the Adversary blessed for n8n/uptime-kuma). No hook needed.
P4 started: !testme posted 05:56:32Z on immich#2 + plausible#3 (drone builds 370+371 running, concurrent). Manual full proof run keycloak launched (shot-proof-keycloak). Remaining queue: mattermost-lts, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n, mumble.
2026-06-11 ~06:05-06:30Z — proof sweep underway; A1 fixed; mumble is the holdout
Proofs verified visually so far (each level matches its baseline): drone 370 immich L4 234KB real
onboarding card (was 4801B); drone 371 plausible L4 64KB registration page (was null); keycloak L4
real sign-in form (was loading spinner); cryptpad L4 real landing w/ document picker (was grey blank);
lasuite-meet L4 real product landing (was white blank); mattermost-lts L2(=m2r baseline L2) — real
page but it's the desktop-or-browser interstitial, so per the watch-list I added the first
SCREENSHOT hook (80e5713, → /login + public settle()); re-run pending.
A1 (blank-retry could regress a larger frame): fixed in 7ad7d1f — retry goes to a temp path and
only replaces via os.replace when >= first; regression test [9999,4801]→9999. 207 unit, lint PASS.
mumble: proof run still spinner after settle+retry (7980B). Probing live what mumble-web does over 90s (it printed real mumble-web HTML while up; suspect autoconnect overlay that never resolves because the websocket voice path may not be browser-reachable). Orchestrated probe2 running. Also in flight: n8n + lasuite-docs proofs from the A1-fixed tree. Queue: lasuite-drive, mattermost re-run; then ghost/hedgedoc/etc. healthy-class citations + dashboard/card check + runtime compare.
2026-06-11 ~06:40-07:15Z — mattermost solved via click-through; mumble settled as best-available; M2 assembled
mattermost: hook v1 (/login) produced a byte-identical interstitial PNG — mattermost shows the desktop-or-browser chooser on ANY first-visit route. Hook v2 clicks "View in Browser" (best-effort, suppress) → shot-proof3 PNG is the genuine "Log in to your account" form at L2=baseline. That's watch-list item 3 satisfied the hard way.
mumble: three live probes. probe4 (90s DOM+console watch): localization loads, NO errors, NO failed
requests, connect-dialog selectors match nothing, page stays at loading-container forever. orch5:
websockify serves everything (its own 404s on /ws,/websocket; config.local.js = untouched sample, no
autoconnect). Conclusion: the pinned mumble-web:0.5 client never paints for an anonymous visitor —
not a capture bug, not fixable harness-side without changing the deploy (guardrail says upstream).
Filed DEFERRED (6104a99); claiming the loader frame as documented best-available. Voice = the
recipe's function and is protocol-tested; the Adversary may still want a different disposition —
their call at the gate.
Ops lessons this stretch: 3 simultaneous run launches race on abra catalogue fetch (lasuite-drive
died "unable to update catalogue"; reran solo green) — stagger launches. Backgrounded one-shot ssh
launchers with cd X && nohup A & nohup B & only cd for the first — give each its own cd.
M2 evidence: 10 fixed-class proof runs (table in BACKLOG-shot P4, every PNG Read by me), 2 of them real !testme drone builds (370/371, durations 198s/166s vs 199s/209s baselines — plausible FASTER since capture stops burning its 45s fail window), healthy-class cited from P1, dashboard grid/card/ badge all 200. Claiming M2.
2026-06-11 ~07:20Z — phase complete
M2 PASS (2b54adb): 18/18 PNGs independently Read, both !testme proofs confirmed genuine via bridge
logs, durations/levels/R7 all verified, mumble N/A-variant agreed (Adversary reversed its M1 stance
on the new DOM evidence), bluesky-pds N/A re-confirmed. Wrote ## DONE. Loop ends.