Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-shot-screenshots.md
autonomic-bot 7c042c2f2a plan: phase 'shot' — recipe screenshot audit & repair (queued after rcust)
Audit every enrolled recipe's CI badge/card screenshot, diagnose defects
(plausible null-every-run; ~4.8KB blank-frame SPAs: immich/lasuite-meet/
cryptpad/flaky n8n), fix via harness default-wait improvement first, per-recipe
SCREENSHOT hooks second; M1 audit matrix + M2 visually-verified PNGs on fresh
real-CI runs (>=2 !testme). Cosmetics-never-block and secret-safety guardrails
binding. Also: temporary hourly-wake instruction to verify the new limit-wait
system tonight; journal entry.
2026-06-11 01:17:32 +00:00

128 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase `shot` — recipe screenshot audit & repair
**Mission:** every enrolled recipe's CI badge/dashboard card shows a real, representative
screenshot of the deployed app. Iterate through ALL enrolled recipes, audit the screenshot
that their latest runs produced, diagnose every defect, fix it, and prove the fix with a
fresh real-CI run whose PNG is **visually verified** (you can Read a PNG — look at it).
State files for this phase (phase-namespaced, per §6.1 of plan.md):
`STATUS-shot.md`, `BACKLOG-shot.md`, `REVIEW-shot.md`, `JOURNAL-shot.md`. DECISIONS.md shared.
---
## 1. System under audit (file map)
- `runner/harness/screenshot.py` (94 lines) — `capture(domain, out_path, recipe_meta=)`:
Playwright chromium, viewport 1280×800, `NAV_DEADLINE_S = 45`. Default path:
`goto_with_retry(..., wait_until="domcontentloaded")` then `page.screenshot()`.
Optional per-recipe `SCREENSHOT(page, ctx)` hook from recipe_meta (delivered since the
rcust single-loader landed; it was a dead knob before — spec §8 R2).
- `runner/run_recipe_ci.py:~1017-1035` — capture invoked while the app is up, outside the
deploy try/except; double-wrapped so it can NEVER affect the verdict. `screenshot` field
in results.json = relative PNG name iff capture succeeded, else null.
- `runner/harness/card.py:183-208` — summary card embeds the PNG; "no screenshot"
placeholder when null.
- `dashboard/dashboard.py:144-156, 265-275` — grid thumbnail from
`/runs/<n>/screenshot.png`, `has_screenshot` from results.json; `/badge/<recipe>.svg`.
- Artifacts: `/var/lib/cc-ci-runs/<run>/` on the CI server (ssh alias `cc-ci`;
NOTE: no python3 on that host — use shell/grep/stat for remote inspection, or scp the
PNGs/JSON locally to look at them).
- Unit tests: `tests/unit/test_screenshot.py`, `test_card.py`, `test_dashboard.py`.
## 2. Evidence already in hand (orchestrator pre-audit, 2026-06-11, last ~120 runs)
Three classes observed in `/var/lib/cc-ci-runs/*/results.json` + PNG sizes:
1. **Never captured — `screenshot: null` on EVERY run:** `plausible` (runs 122→357, all
null). Root cause unknown — find it (capture failing? step never reached on its run
shape? exception swallowed by design — check run logs for the
`screenshot: capture failed` / `produced no file` prints).
2. **Suspected blank frames — implausibly small PNGs, byte-identical sizes across
different apps:** `immich` 4801B (every run), `lasuite-meet` 4801B, `n8n` sometimes
4801B (other runs 29-30KB — flaky), `cryptpad` 4802B. A 1280×800 PNG at ~4.8KB is
near-certainly a solid white/blank page: SPA shells where `domcontentloaded` fires
before the JS app paints. Borderline, verify visually: `lasuite-docs` ~5.9KB,
`lasuite-drive` ~5.9KB, `keycloak` ~8.7KB (might be a genuine sparse login page).
3. **Healthy (sanity reference):** ghost 444KB, mattermost-lts 242KB, hedgedoc 132KB,
discourse 67KB, custom-html 35KB, mailu 34KB, matrix-synapse 33KB, uptime-kuma 31KB,
custom-html-tiny 13KB.
Recipe enumeration is YOURS to make authoritative: every `tests/<recipe>/recipe_meta.py`
directory EXCEPT the harness fixtures (`_generic`, `regression`, `concurrency`,
`custom-html-bkp-bad`, `custom-html-rst-bad`). Expected real set ≈ bluesky-pds, cryptpad,
custom-html, custom-html-tiny, discourse, ghost, hedgedoc, immich, keycloak, lasuite-docs,
lasuite-drive, lasuite-meet, mailu, matrix-synapse, mattermost-lts, mumble, n8n, plausible,
uptime-kuma. Some recipes (e.g. mumble — a voice server) may have no meaningful web UI:
a **justified N/A** (documented in the matrix, Adversary-agreed) is an acceptable outcome
for those; the dashboard placeholder is then correct behavior, not a defect.
## 3. Work plan
**P1 — Audit matrix.** For every enrolled recipe, record in `BACKLOG-shot.md`: latest
run(s) with artifacts; screenshot null/present; PNG size; **visual content** (Read the
PNG: app UI / login page / blank / error page); `has_screenshot` as the dashboard sees it.
Pull PNGs locally (scp) and actually look at each one. Classify: OK / BLANK / NULL /
ERROR-PAGE / N-A-candidate.
**P2 — Diagnose.** Per defective recipe, find the root cause, not the symptom. Expected
buckets: (a) SPA paint race — needs a smarter default wait; (b) plausible's null — step
not reached or capture raising, find which from run logs; (c) app-specific (redirect to
an external/IdP page, splash screen, etc.) — needs a per-recipe `SCREENSHOT` hook.
**P3 — Fix.** Preference order:
1. **Harness default improvement** (fixes whole classes): e.g. after `domcontentloaded`,
wait briefly for network-idle or first rendered content, with a blank-detection retry
(a captured PNG under a few KB → one retry with a stronger wait) — all WITHIN the
existing `NAV_DEADLINE_S=45` budget. Do not balloon run times: the screenshot step's
total worst-case must stay ≤ ~60s.
2. **Per-recipe `SCREENSHOT(page, ctx)` hooks** in `tests/<recipe>/recipe_meta.py` only
where the default genuinely can't work (uniform ctx signature per rcust P3).
3. results.json/card/dashboard plumbing fixes if the defect is downstream of capture.
Unit tests accompany harness changes (`tests/unit/test_screenshot.py`).
**P4 — Prove.** Fresh real-CI run per fixed recipe; verify the new artifact: PNG exists,
visually shows the app (login page fine; blank/error not), card + dashboard render it.
At least 2 of the proof runs via the drone `!testme` path. Healthy-class recipes (class 3
above) need no new run — cite the existing artifact + your visual check of it.
## 4. Gates (Builder claims via `claim(shot): ...` commit; Adversary verdicts in REVIEW-shot.md)
**M1 — Audit + diagnosis complete.** The full matrix exists (every enrolled recipe, no
omissions), every non-OK entry has a root-cause diagnosis with evidence (log lines, PNG
inspection), N/A candidates argued. Adversary independently spot-checks ≥5 recipes'
artifacts (including at least plausible + one 4801B case) and verifies the matrix matches
reality.
**M2 — All screenshots working.** Every enrolled recipe is OK or Adversary-agreed N/A:
fixes merged to main, fresh proof runs done (≥2 via !testme), and the Adversary has
**looked at every final PNG** (Read tool) and confirms it is a real, representative,
credential-free view of the app, and that the dashboard/card shows it. Verdicts, levels,
and run durations unaffected (compare runtimes pre/post; screenshot step ≤ ~60s worst
case). `## DONE` in STATUS-shot.md only after a fresh Adversary M2 PASS.
## 5. Guardrails (binding)
- **Cosmetics never block (R7 is law):** the screenshot path stays best-effort end to
end — it must NEVER affect a verdict, fail a run, or hang past its deadline. Any change
that could flip a verdict on screenshot failure is an automatic Adversary FAIL.
- **Secret-safety (cardinal):** never capture a page displaying generated credentials
(install wizards with admin passwords, secrets pages). Default = landing page. Any new
`SCREENSHOT` hook must keep this guarantee and the Adversary must check it explicitly.
- **No gate weakening anywhere** — tests/assertions in the suite are untouchable except
the screenshot-specific unit tests you extend.
- Changes live in the cc-ci repo only (harness + tests/<recipe>/). Never push recipe
mirror repos' main / never merge their PRs.
- Real-CI etiquette: ≤2-3 concurrent live deploys; NEVER git-checkout
`~/.abra/recipes/<recipe>` on cc-ci while its build runs; tear down every dev deploy on
every exit path; never re-print secrets/tokens into logs or commits.
- The CI host has no python3 in default PATH for remote one-liners — use shell, or run
python via the harness venv (`cc-ci-run`) on the host, or copy files local.
- Commit author: `autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>`; push to
git.autonomic.zone right after every commit.
## 6. Definition of Done
All enrolled recipes' CI cards show verified-real screenshots (or documented N/A),
M1 + M2 Adversary-PASSed fresh, harness changes unit-tested, no verdict/runtime
regressions, all work merged to cc-ci main and pushed.