Files
cc-ci/JOURNAL-shot.md
autonomic-bot b8414a8fdb
All checks were successful
continuous-integration/drone/push Build is passing
journal(shot): plausible root-cause story + P4 proof-run kickoff
2026-06-11 06:00:11 +00:00

3.9 KiB
Raw Blame History

JOURNAL-shot.md — Builder journal, phase shot

2026-06-11 ~01:1701:35Z — phase open, P1+P2 in one sweep

Read the phase plan + plan.md §6.1/§7/§9. Enumerated enrolled recipes (19). Pulled per-recipe latest-run data off cc-ci (results.json screenshot field + PNG size for all ~190 run dirs), scp'd 18 PNGs to /tmp/shot-audit/ and Read every one of them.

Findings vs the orchestrator pre-audit: all four 4801-2B suspects are indeed blank frames (immich pure white, lasuite-meet white, n8n off-white, cryptpad grey). keycloak 8.7KB is a "Loading the Administration Console" spinner — NOT a sparse login page as §2 guessed. lasuite-docs/drive ~5.9KB are lone spinners. Two surprises: (1) mattermost-lts 242KB, classed healthy by size, is actually the brand splash/loading screen, not the login form — size heuristics lie in both directions; (2) mumble serves a real web page (mumble-web client per compose.mumbleweb.yml, deployed since Phase 2 for HTTP health) showing its connecting spinner — so mumble is fixable, not an N/A.

plausible root cause: traced via Drone sqlite (no python3 on host; ran alpine+sqlite3 against the drone data volume). Build 357 log t=73s: capture failed, last status=500 after 45s. Cross-ref tests/plausible/functional/test_health_check.py: / 500s via auth_controller under DISABLE_AUTH=true — permanent, not an init race. So the default landing capture can never work; plausible needs a SCREENSHOT hook to a path that renders (will probe /login, /sites on a live deploy during P3).

bluesky-pds: null because install fails at level 0 (upstream image breakage, already in DEFERRED.md from rcust) — capture gated on deploy_ok, correctly skipped. N/A while upstream broken.

custom-html nginx-welcome: verified no install-time seeding exists for this recipe (custom-html-tiny has install_steps.sh; custom-html only seeds in pre_backup/pre_upgrade ops, after capture). The nginx default page IS the honest fresh-install view. Leaving OK; flagged in matrix for Adversary.

Adversary opened REVIEW-shot.md with its own cold pre-audit (4f3a747) before my first push — good: my visual reads agree with theirs on every overlapping row.

Design thinking for P3 (next iteration): default-path improvement = after goto(domcontentloaded), try a bounded wait_for_load_state("networkidle") (~10-15s cap) and/or wait for a non-trivial painted body, then screenshot; then a blank-detect (PNG < ~6KB or near-uniform) → one retry with a longer settle. Keep total ≤ ~60s worst case, all inside the existing capture() try/except so R7 (cosmetics never block) is preserved. Unit tests: blank-detector pure function + retry logic with a fake page. Per-recipe hooks only for plausible (500 root) + whatever the re-audit still shows.

2026-06-11 ~05:45-06:00Z — plausible root cause was a 62-char SECRET_KEY_BASE; M1 PASSed meanwhile

M1 PASS (ae10b55) with a watch-list. P3 done in two commits: ce50f64 (harness settle+blank-retry, 6 unit tests, 205 pass, lint PASS) and b98a471 (plausible fix). The plausible story changed under probing: three live probes (shot-probe{,2,3}-plausible) showed / and every HTML route 302→/register which 500s; app logs gave the smoking gun: (ArgumentError) cookie store expects conn.secret_key_base to be at least 64 bytes. Our EXTRA_ENV value — comment claimed "64-char" — measures 62. So every page render 500'd while /api/* (no cookie store) passed all tiers. NOT auth_controller/DISABLE_AUTH as the old comments claimed; corrected both stale comments. Fix = 68-char value; verified shot-fix-plausible run: install pass, screenshot.png 64132B = real registration page (empty fields, placeholders only — same safe shape the Adversary blessed for n8n/uptime-kuma). No hook needed.

P4 started: !testme posted 05:56:32Z on immich#2 + plausible#3 (drone builds 370+371 running, concurrent). Manual full proof run keycloak launched (shot-proof-keycloak). Remaining queue: mattermost-lts, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n, mumble.