Some checks failed
continuous-integration/drone/push Build is failing
STATUS/BACKLOG/REVIEW/JOURNAL for bsky/conc/dstamp/kuma/lvl5/mailu/rcust/shot (32 files) were at the repo root; move them into machine-docs/ to match the mandated file-location rule (DECISIONS/DEFERRED/INBOX + older phases already live there). AGENTS.md gains an explicit File-location rule. No content change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
106 lines
7.3 KiB
Markdown
106 lines
7.3 KiB
Markdown
# JOURNAL-shot.md — Builder journal, phase `shot`
|
||
|
||
## 2026-06-11 ~01:17–01:35Z — phase open, P1+P2 in one sweep
|
||
|
||
Read the phase plan + plan.md §6.1/§7/§9. Enumerated enrolled recipes (19). Pulled per-recipe
|
||
latest-run data off cc-ci (`results.json` screenshot field + PNG size for all ~190 run dirs),
|
||
scp'd 18 PNGs to /tmp/shot-audit/ and Read every one of them.
|
||
|
||
Findings vs the orchestrator pre-audit: all four 4801-2B suspects are indeed blank frames
|
||
(immich pure white, lasuite-meet white, n8n off-white, cryptpad grey). keycloak 8.7KB is a
|
||
"Loading the Administration Console" spinner — NOT a sparse login page as §2 guessed.
|
||
lasuite-docs/drive ~5.9KB are lone spinners. Two surprises: (1) mattermost-lts 242KB, classed
|
||
healthy by size, is actually the brand splash/loading screen, not the login form — size
|
||
heuristics lie in both directions; (2) mumble serves a real web page (mumble-web client per
|
||
compose.mumbleweb.yml, deployed since Phase 2 for HTTP health) showing its connecting spinner —
|
||
so mumble is fixable, not an N/A.
|
||
|
||
plausible root cause: traced via Drone sqlite (no python3 on host; ran alpine+sqlite3 against
|
||
the drone data volume). Build 357 log t=73s: capture failed, last status=500 after 45s. Cross-ref
|
||
tests/plausible/functional/test_health_check.py: `/` 500s via auth_controller under
|
||
DISABLE_AUTH=true — permanent, not an init race. So the default landing capture can never work;
|
||
plausible needs a SCREENSHOT hook to a path that renders (will probe /login, /sites on a live
|
||
deploy during P3).
|
||
|
||
bluesky-pds: null because install fails at level 0 (upstream image breakage, already in
|
||
DEFERRED.md from rcust) — capture gated on deploy_ok, correctly skipped. N/A while upstream broken.
|
||
|
||
custom-html nginx-welcome: verified no install-time seeding exists for this recipe (custom-html-tiny
|
||
has install_steps.sh; custom-html only seeds in pre_backup/pre_upgrade ops, after capture). The
|
||
nginx default page IS the honest fresh-install view. Leaving OK; flagged in matrix for Adversary.
|
||
|
||
Adversary opened REVIEW-shot.md with its own cold pre-audit (4f3a747) before my first push —
|
||
good: my visual reads agree with theirs on every overlapping row.
|
||
|
||
Design thinking for P3 (next iteration): default-path improvement = after goto(domcontentloaded),
|
||
try a bounded `wait_for_load_state("networkidle")` (~10-15s cap) and/or wait for a non-trivial
|
||
painted body, then screenshot; then a blank-detect (PNG < ~6KB or near-uniform) → one retry with
|
||
a longer settle. Keep total ≤ ~60s worst case, all inside the existing capture() try/except so R7
|
||
(cosmetics never block) is preserved. Unit tests: blank-detector pure function + retry logic with
|
||
a fake page. Per-recipe hooks only for plausible (500 root) + whatever the re-audit still shows.
|
||
|
||
## 2026-06-11 ~05:45-06:00Z — plausible root cause was a 62-char SECRET_KEY_BASE; M1 PASSed meanwhile
|
||
|
||
M1 PASS (ae10b55) with a watch-list. P3 done in two commits: ce50f64 (harness settle+blank-retry,
|
||
6 unit tests, 205 pass, lint PASS) and b98a471 (plausible fix). The plausible story changed under
|
||
probing: three live probes (shot-probe{,2,3}-plausible) showed / and every HTML route 302→/register
|
||
which 500s; app logs gave the smoking gun: `(ArgumentError) cookie store expects conn.secret_key_base
|
||
to be at least 64 bytes`. Our EXTRA_ENV value — comment claimed "64-char" — measures 62. So every
|
||
page render 500'd while /api/* (no cookie store) passed all tiers. NOT auth_controller/DISABLE_AUTH
|
||
as the old comments claimed; corrected both stale comments. Fix = 68-char value; verified
|
||
shot-fix-plausible run: install pass, screenshot.png 64132B = real registration page (empty fields,
|
||
placeholders only — same safe shape the Adversary blessed for n8n/uptime-kuma). No hook needed.
|
||
|
||
P4 started: !testme posted 05:56:32Z on immich#2 + plausible#3 (drone builds 370+371 running,
|
||
concurrent). Manual full proof run keycloak launched (shot-proof-keycloak). Remaining queue:
|
||
mattermost-lts, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n, mumble.
|
||
|
||
## 2026-06-11 ~06:05-06:30Z — proof sweep underway; A1 fixed; mumble is the holdout
|
||
|
||
Proofs verified visually so far (each level matches its baseline): drone 370 immich L4 234KB real
|
||
onboarding card (was 4801B); drone 371 plausible L4 64KB registration page (was null); keycloak L4
|
||
real sign-in form (was loading spinner); cryptpad L4 real landing w/ document picker (was grey blank);
|
||
lasuite-meet L4 real product landing (was white blank); mattermost-lts L2(=m2r baseline L2) — real
|
||
page but it's the desktop-or-browser interstitial, so per the watch-list I added the first
|
||
SCREENSHOT hook (80e5713, → /login + public settle()); re-run pending.
|
||
|
||
A1 (blank-retry could regress a larger frame): fixed in 7ad7d1f — retry goes to a temp path and
|
||
only replaces via os.replace when >= first; regression test [9999,4801]→9999. 207 unit, lint PASS.
|
||
|
||
mumble: proof run still spinner after settle+retry (7980B). Probing live what mumble-web does over
|
||
90s (it printed real mumble-web HTML while up; suspect autoconnect overlay that never resolves
|
||
because the websocket voice path may not be browser-reachable). Orchestrated probe2 running.
|
||
Also in flight: n8n + lasuite-docs proofs from the A1-fixed tree. Queue: lasuite-drive, mattermost
|
||
re-run; then ghost/hedgedoc/etc. healthy-class citations + dashboard/card check + runtime compare.
|
||
|
||
## 2026-06-11 ~06:40-07:15Z — mattermost solved via click-through; mumble settled as best-available; M2 assembled
|
||
|
||
mattermost: hook v1 (/login) produced a byte-identical interstitial PNG — mattermost shows the
|
||
desktop-or-browser chooser on ANY first-visit route. Hook v2 clicks "View in Browser" (best-effort,
|
||
suppress) → shot-proof3 PNG is the genuine "Log in to your account" form at L2=baseline. That's
|
||
watch-list item 3 satisfied the hard way.
|
||
|
||
mumble: three live probes. probe4 (90s DOM+console watch): localization loads, NO errors, NO failed
|
||
requests, connect-dialog selectors match nothing, page stays at loading-container forever. orch5:
|
||
websockify serves everything (its own 404s on /ws,/websocket; config.local.js = untouched sample, no
|
||
autoconnect). Conclusion: the pinned mumble-web:0.5 client never paints for an anonymous visitor —
|
||
not a capture bug, not fixable harness-side without changing the deploy (guardrail says upstream).
|
||
Filed DEFERRED (6104a99); claiming the loader frame as documented best-available. Voice = the
|
||
recipe's function and is protocol-tested; the Adversary may still want a different disposition —
|
||
their call at the gate.
|
||
|
||
Ops lessons this stretch: 3 simultaneous run launches race on abra catalogue fetch (lasuite-drive
|
||
died "unable to update catalogue"; reran solo green) — stagger launches. Backgrounded one-shot ssh
|
||
launchers with `cd X && nohup A & nohup B &` only cd for the first — give each its own cd.
|
||
|
||
M2 evidence: 10 fixed-class proof runs (table in BACKLOG-shot P4, every PNG Read by me), 2 of them
|
||
real !testme drone builds (370/371, durations 198s/166s vs 199s/209s baselines — plausible FASTER
|
||
since capture stops burning its 45s fail window), healthy-class cited from P1, dashboard grid/card/
|
||
badge all 200. Claiming M2.
|
||
|
||
## 2026-06-11 ~07:20Z — phase complete
|
||
|
||
M2 PASS (2b54adb): 18/18 PNGs independently Read, both !testme proofs confirmed genuine via bridge
|
||
logs, durations/levels/R7 all verified, mumble N/A-variant agreed (Adversary reversed its M1 stance
|
||
on the new DOM evidence), bluesky-pds N/A re-confirmed. Wrote ## DONE. Loop ends.
|