Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-shot-screenshots.md
autonomic-bot 7c042c2f2a plan: phase 'shot' — recipe screenshot audit & repair (queued after rcust)
Audit every enrolled recipe's CI badge/card screenshot, diagnose defects
(plausible null-every-run; ~4.8KB blank-frame SPAs: immich/lasuite-meet/
cryptpad/flaky n8n), fix via harness default-wait improvement first, per-recipe
SCREENSHOT hooks second; M1 audit matrix + M2 visually-verified PNGs on fresh
real-CI runs (>=2 !testme). Cosmetics-never-block and secret-safety guardrails
binding. Also: temporary hourly-wake instruction to verify the new limit-wait
system tonight; journal entry.
2026-06-11 01:17:32 +00:00

8.0 KiB
Raw Blame History

Phase shot — recipe screenshot audit & repair

Mission: every enrolled recipe's CI badge/dashboard card shows a real, representative screenshot of the deployed app. Iterate through ALL enrolled recipes, audit the screenshot that their latest runs produced, diagnose every defect, fix it, and prove the fix with a fresh real-CI run whose PNG is visually verified (you can Read a PNG — look at it).

State files for this phase (phase-namespaced, per §6.1 of plan.md): STATUS-shot.md, BACKLOG-shot.md, REVIEW-shot.md, JOURNAL-shot.md. DECISIONS.md shared.


1. System under audit (file map)

  • runner/harness/screenshot.py (94 lines) — capture(domain, out_path, recipe_meta=): Playwright chromium, viewport 1280×800, NAV_DEADLINE_S = 45. Default path: goto_with_retry(..., wait_until="domcontentloaded") then page.screenshot(). Optional per-recipe SCREENSHOT(page, ctx) hook from recipe_meta (delivered since the rcust single-loader landed; it was a dead knob before — spec §8 R2).
  • runner/run_recipe_ci.py:~1017-1035 — capture invoked while the app is up, outside the deploy try/except; double-wrapped so it can NEVER affect the verdict. screenshot field in results.json = relative PNG name iff capture succeeded, else null.
  • runner/harness/card.py:183-208 — summary card embeds the PNG; "no screenshot" placeholder when null.
  • dashboard/dashboard.py:144-156, 265-275 — grid thumbnail from /runs/<n>/screenshot.png, has_screenshot from results.json; /badge/<recipe>.svg.
  • Artifacts: /var/lib/cc-ci-runs/<run>/ on the CI server (ssh alias cc-ci; NOTE: no python3 on that host — use shell/grep/stat for remote inspection, or scp the PNGs/JSON locally to look at them).
  • Unit tests: tests/unit/test_screenshot.py, test_card.py, test_dashboard.py.

2. Evidence already in hand (orchestrator pre-audit, 2026-06-11, last ~120 runs)

Three classes observed in /var/lib/cc-ci-runs/*/results.json + PNG sizes:

  1. Never captured — screenshot: null on EVERY run: plausible (runs 122→357, all null). Root cause unknown — find it (capture failing? step never reached on its run shape? exception swallowed by design — check run logs for the screenshot: capture failed / produced no file prints).
  2. Suspected blank frames — implausibly small PNGs, byte-identical sizes across different apps: immich 4801B (every run), lasuite-meet 4801B, n8n sometimes 4801B (other runs 29-30KB — flaky), cryptpad 4802B. A 1280×800 PNG at ~4.8KB is near-certainly a solid white/blank page: SPA shells where domcontentloaded fires before the JS app paints. Borderline, verify visually: lasuite-docs ~5.9KB, lasuite-drive ~5.9KB, keycloak ~8.7KB (might be a genuine sparse login page).
  3. Healthy (sanity reference): ghost 444KB, mattermost-lts 242KB, hedgedoc 132KB, discourse 67KB, custom-html 35KB, mailu 34KB, matrix-synapse 33KB, uptime-kuma 31KB, custom-html-tiny 13KB.

Recipe enumeration is YOURS to make authoritative: every tests/<recipe>/recipe_meta.py directory EXCEPT the harness fixtures (_generic, regression, concurrency, custom-html-bkp-bad, custom-html-rst-bad). Expected real set ≈ bluesky-pds, cryptpad, custom-html, custom-html-tiny, discourse, ghost, hedgedoc, immich, keycloak, lasuite-docs, lasuite-drive, lasuite-meet, mailu, matrix-synapse, mattermost-lts, mumble, n8n, plausible, uptime-kuma. Some recipes (e.g. mumble — a voice server) may have no meaningful web UI: a justified N/A (documented in the matrix, Adversary-agreed) is an acceptable outcome for those; the dashboard placeholder is then correct behavior, not a defect.

3. Work plan

P1 — Audit matrix. For every enrolled recipe, record in BACKLOG-shot.md: latest run(s) with artifacts; screenshot null/present; PNG size; visual content (Read the PNG: app UI / login page / blank / error page); has_screenshot as the dashboard sees it. Pull PNGs locally (scp) and actually look at each one. Classify: OK / BLANK / NULL / ERROR-PAGE / N-A-candidate.

P2 — Diagnose. Per defective recipe, find the root cause, not the symptom. Expected buckets: (a) SPA paint race — needs a smarter default wait; (b) plausible's null — step not reached or capture raising, find which from run logs; (c) app-specific (redirect to an external/IdP page, splash screen, etc.) — needs a per-recipe SCREENSHOT hook.

P3 — Fix. Preference order:

  1. Harness default improvement (fixes whole classes): e.g. after domcontentloaded, wait briefly for network-idle or first rendered content, with a blank-detection retry (a captured PNG under a few KB → one retry with a stronger wait) — all WITHIN the existing NAV_DEADLINE_S=45 budget. Do not balloon run times: the screenshot step's total worst-case must stay ≤ ~60s.
  2. Per-recipe SCREENSHOT(page, ctx) hooks in tests/<recipe>/recipe_meta.py only where the default genuinely can't work (uniform ctx signature per rcust P3).
  3. results.json/card/dashboard plumbing fixes if the defect is downstream of capture. Unit tests accompany harness changes (tests/unit/test_screenshot.py).

P4 — Prove. Fresh real-CI run per fixed recipe; verify the new artifact: PNG exists, visually shows the app (login page fine; blank/error not), card + dashboard render it. At least 2 of the proof runs via the drone !testme path. Healthy-class recipes (class 3 above) need no new run — cite the existing artifact + your visual check of it.

4. Gates (Builder claims via claim(shot): ... commit; Adversary verdicts in REVIEW-shot.md)

M1 — Audit + diagnosis complete. The full matrix exists (every enrolled recipe, no omissions), every non-OK entry has a root-cause diagnosis with evidence (log lines, PNG inspection), N/A candidates argued. Adversary independently spot-checks ≥5 recipes' artifacts (including at least plausible + one 4801B case) and verifies the matrix matches reality.

M2 — All screenshots working. Every enrolled recipe is OK or Adversary-agreed N/A: fixes merged to main, fresh proof runs done (≥2 via !testme), and the Adversary has looked at every final PNG (Read tool) and confirms it is a real, representative, credential-free view of the app, and that the dashboard/card shows it. Verdicts, levels, and run durations unaffected (compare runtimes pre/post; screenshot step ≤ ~60s worst case). ## DONE in STATUS-shot.md only after a fresh Adversary M2 PASS.

5. Guardrails (binding)

  • Cosmetics never block (R7 is law): the screenshot path stays best-effort end to end — it must NEVER affect a verdict, fail a run, or hang past its deadline. Any change that could flip a verdict on screenshot failure is an automatic Adversary FAIL.
  • Secret-safety (cardinal): never capture a page displaying generated credentials (install wizards with admin passwords, secrets pages). Default = landing page. Any new SCREENSHOT hook must keep this guarantee and the Adversary must check it explicitly.
  • No gate weakening anywhere — tests/assertions in the suite are untouchable except the screenshot-specific unit tests you extend.
  • Changes live in the cc-ci repo only (harness + tests//). Never push recipe mirror repos' main / never merge their PRs.
  • Real-CI etiquette: ≤2-3 concurrent live deploys; NEVER git-checkout ~/.abra/recipes/<recipe> on cc-ci while its build runs; tear down every dev deploy on every exit path; never re-print secrets/tokens into logs or commits.
  • The CI host has no python3 in default PATH for remote one-liners — use shell, or run python via the harness venv (cc-ci-run) on the host, or copy files local.
  • Commit author: autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>; push to git.autonomic.zone right after every commit.

6. Definition of Done

All enrolled recipes' CI cards show verified-real screenshots (or documented N/A), M1 + M2 Adversary-PASSed fresh, harness changes unit-tested, no verdict/runtime regressions, all work merged to cc-ci main and pushed.