Files
cc-ci/JOURNAL-shot.md
autonomic-bot 4822115b2b
All checks were successful
continuous-integration/drone/push Build is passing
status(shot): ## DONE — M1 (ae10b55) + M2 (2b54adb) both PASS, A1 closed, no VETO; phase complete
2026-06-11 07:19:09 +00:00

7.3 KiB
Raw Blame History

JOURNAL-shot.md — Builder journal, phase shot

2026-06-11 ~01:1701:35Z — phase open, P1+P2 in one sweep

Read the phase plan + plan.md §6.1/§7/§9. Enumerated enrolled recipes (19). Pulled per-recipe latest-run data off cc-ci (results.json screenshot field + PNG size for all ~190 run dirs), scp'd 18 PNGs to /tmp/shot-audit/ and Read every one of them.

Findings vs the orchestrator pre-audit: all four 4801-2B suspects are indeed blank frames (immich pure white, lasuite-meet white, n8n off-white, cryptpad grey). keycloak 8.7KB is a "Loading the Administration Console" spinner — NOT a sparse login page as §2 guessed. lasuite-docs/drive ~5.9KB are lone spinners. Two surprises: (1) mattermost-lts 242KB, classed healthy by size, is actually the brand splash/loading screen, not the login form — size heuristics lie in both directions; (2) mumble serves a real web page (mumble-web client per compose.mumbleweb.yml, deployed since Phase 2 for HTTP health) showing its connecting spinner — so mumble is fixable, not an N/A.

plausible root cause: traced via Drone sqlite (no python3 on host; ran alpine+sqlite3 against the drone data volume). Build 357 log t=73s: capture failed, last status=500 after 45s. Cross-ref tests/plausible/functional/test_health_check.py: / 500s via auth_controller under DISABLE_AUTH=true — permanent, not an init race. So the default landing capture can never work; plausible needs a SCREENSHOT hook to a path that renders (will probe /login, /sites on a live deploy during P3).

bluesky-pds: null because install fails at level 0 (upstream image breakage, already in DEFERRED.md from rcust) — capture gated on deploy_ok, correctly skipped. N/A while upstream broken.

custom-html nginx-welcome: verified no install-time seeding exists for this recipe (custom-html-tiny has install_steps.sh; custom-html only seeds in pre_backup/pre_upgrade ops, after capture). The nginx default page IS the honest fresh-install view. Leaving OK; flagged in matrix for Adversary.

Adversary opened REVIEW-shot.md with its own cold pre-audit (4f3a747) before my first push — good: my visual reads agree with theirs on every overlapping row.

Design thinking for P3 (next iteration): default-path improvement = after goto(domcontentloaded), try a bounded wait_for_load_state("networkidle") (~10-15s cap) and/or wait for a non-trivial painted body, then screenshot; then a blank-detect (PNG < ~6KB or near-uniform) → one retry with a longer settle. Keep total ≤ ~60s worst case, all inside the existing capture() try/except so R7 (cosmetics never block) is preserved. Unit tests: blank-detector pure function + retry logic with a fake page. Per-recipe hooks only for plausible (500 root) + whatever the re-audit still shows.

2026-06-11 ~05:45-06:00Z — plausible root cause was a 62-char SECRET_KEY_BASE; M1 PASSed meanwhile

M1 PASS (ae10b55) with a watch-list. P3 done in two commits: ce50f64 (harness settle+blank-retry, 6 unit tests, 205 pass, lint PASS) and b98a471 (plausible fix). The plausible story changed under probing: three live probes (shot-probe{,2,3}-plausible) showed / and every HTML route 302→/register which 500s; app logs gave the smoking gun: (ArgumentError) cookie store expects conn.secret_key_base to be at least 64 bytes. Our EXTRA_ENV value — comment claimed "64-char" — measures 62. So every page render 500'd while /api/* (no cookie store) passed all tiers. NOT auth_controller/DISABLE_AUTH as the old comments claimed; corrected both stale comments. Fix = 68-char value; verified shot-fix-plausible run: install pass, screenshot.png 64132B = real registration page (empty fields, placeholders only — same safe shape the Adversary blessed for n8n/uptime-kuma). No hook needed.

P4 started: !testme posted 05:56:32Z on immich#2 + plausible#3 (drone builds 370+371 running, concurrent). Manual full proof run keycloak launched (shot-proof-keycloak). Remaining queue: mattermost-lts, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n, mumble.

2026-06-11 ~06:05-06:30Z — proof sweep underway; A1 fixed; mumble is the holdout

Proofs verified visually so far (each level matches its baseline): drone 370 immich L4 234KB real onboarding card (was 4801B); drone 371 plausible L4 64KB registration page (was null); keycloak L4 real sign-in form (was loading spinner); cryptpad L4 real landing w/ document picker (was grey blank); lasuite-meet L4 real product landing (was white blank); mattermost-lts L2(=m2r baseline L2) — real page but it's the desktop-or-browser interstitial, so per the watch-list I added the first SCREENSHOT hook (80e5713, → /login + public settle()); re-run pending.

A1 (blank-retry could regress a larger frame): fixed in 7ad7d1f — retry goes to a temp path and only replaces via os.replace when >= first; regression test [9999,4801]→9999. 207 unit, lint PASS.

mumble: proof run still spinner after settle+retry (7980B). Probing live what mumble-web does over 90s (it printed real mumble-web HTML while up; suspect autoconnect overlay that never resolves because the websocket voice path may not be browser-reachable). Orchestrated probe2 running. Also in flight: n8n + lasuite-docs proofs from the A1-fixed tree. Queue: lasuite-drive, mattermost re-run; then ghost/hedgedoc/etc. healthy-class citations + dashboard/card check + runtime compare.

2026-06-11 ~06:40-07:15Z — mattermost solved via click-through; mumble settled as best-available; M2 assembled

mattermost: hook v1 (/login) produced a byte-identical interstitial PNG — mattermost shows the desktop-or-browser chooser on ANY first-visit route. Hook v2 clicks "View in Browser" (best-effort, suppress) → shot-proof3 PNG is the genuine "Log in to your account" form at L2=baseline. That's watch-list item 3 satisfied the hard way.

mumble: three live probes. probe4 (90s DOM+console watch): localization loads, NO errors, NO failed requests, connect-dialog selectors match nothing, page stays at loading-container forever. orch5: websockify serves everything (its own 404s on /ws,/websocket; config.local.js = untouched sample, no autoconnect). Conclusion: the pinned mumble-web:0.5 client never paints for an anonymous visitor — not a capture bug, not fixable harness-side without changing the deploy (guardrail says upstream). Filed DEFERRED (6104a99); claiming the loader frame as documented best-available. Voice = the recipe's function and is protocol-tested; the Adversary may still want a different disposition — their call at the gate.

Ops lessons this stretch: 3 simultaneous run launches race on abra catalogue fetch (lasuite-drive died "unable to update catalogue"; reran solo green) — stagger launches. Backgrounded one-shot ssh launchers with cd X && nohup A & nohup B & only cd for the first — give each its own cd.

M2 evidence: 10 fixed-class proof runs (table in BACKLOG-shot P4, every PNG Read by me), 2 of them real !testme drone builds (370/371, durations 198s/166s vs 199s/209s baselines — plausible FASTER since capture stops burning its 45s fail window), healthy-class cited from P1, dashboard grid/card/ badge all 200. Claiming M2.

2026-06-11 ~07:20Z — phase complete

M2 PASS (2b54adb): 18/18 PNGs independently Read, both !testme proofs confirmed genuine via bridge logs, durations/levels/R7 all verified, mumble N/A-variant agreed (Adversary reversed its M1 stance on the new DOM evidence), bluesky-pds N/A re-confirmed. Wrote ## DONE. Loop ends.