plan: phase 'shot' — recipe screenshot audit & repair (queued after rcust)
Audit every enrolled recipe's CI badge/card screenshot, diagnose defects (plausible null-every-run; ~4.8KB blank-frame SPAs: immich/lasuite-meet/ cryptpad/flaky n8n), fix via harness default-wait improvement first, per-recipe SCREENSHOT hooks second; M1 audit matrix + M2 visually-verified PNGs on fresh real-CI runs (>=2 !testme). Cosmetics-never-block and secret-safety guardrails binding. Also: temporary hourly-wake instruction to verify the new limit-wait system tonight; journal entry.
This commit is contained in:
@ -386,3 +386,16 @@ cc-ci-orchestrator-vm, repointed .orchestrator-session-id (old → .bak). Loops
|
||||
limit-stall window 07:51–08:03 via watchdog kill/reboot/nudge — resilience layer worked as designed.
|
||||
Open for operator: review/merge immich PR#2 + plausible PR#3 (still green, unmerged); stale
|
||||
session cc-ci-orchestrator-stale can be killed; recipe-mirrors org still private.
|
||||
|
||||
## 2026-06-11 ~01:15 — phase `shot` queued; limit-system night watch
|
||||
- Operator requested a follow-on phase: audit + repair the per-recipe CI screenshot
|
||||
(badge/card) across ALL enrolled recipes. Plan written:
|
||||
cc-ci-plan/plan-phase-shot-screenshots.md, queued AFTER rcust in .phases-spec
|
||||
(rcust;shot) — watchdog auto-advances on rcust `## DONE`.
|
||||
- Pre-audit evidence (last ~120 runs): plausible screenshot=null on every run;
|
||||
immich/lasuite-meet/cryptpad (+flaky n8n) produce byte-identical ~4.8KB PNGs =
|
||||
suspected blank SPA frames; ghost/mattermost/discourse/etc healthy.
|
||||
- Hourly wakes tonight: TEMPORARY line added to ai-progress-monitor-prompt.txt — verify
|
||||
the new limit-wait system (d6e1a70/2e1ab8d) on each wake; remove the line 06-11 daytime.
|
||||
- Orchestrator renamed cc-ci-orchestrator (was -vm); stale Jun01 squatter killed;
|
||||
watchdog bounced twice tonight (limit patch, then hourly-wake-during-limit fallback).
|
||||
|
||||
@ -1 +1 @@
|
||||
You are the cc-ci orchestrator, woken for your scheduled supervision pass. Read /srv/cc-ci/cc-ci-plan/orchestration.md (the agent map) and do your job per its "The orchestrator's job — keep everyone on track" section — it tells you how to find the CURRENT phase (`python3 cc-ci-plan/launch.py status`) and what to check, nudge, and do. Be decisive but minimal; if everything is healthy and active, just note the state.
|
||||
You are the cc-ci orchestrator, woken for your scheduled supervision pass. Read /srv/cc-ci/cc-ci-plan/orchestration.md (the agent map) and do your job per its "The orchestrator's job — keep everyone on track" section — it tells you how to find the CURRENT phase (`python3 cc-ci-plan/launch.py status`) and what to check, nudge, and do. Be decisive but minimal; if everything is healthy and active, just note the state. TEMPORARY (operator request 2026-06-11, remove after 2026-06-11 daytime): on each wake tonight also verify the NEW limit-wait system (launch.py limit_tick, commits d6e1a70/2e1ab8d) is working — grep the tail of /srv/cc-ci/.cc-ci-logs/watchdog.log for "limit hit/probe/lifted" lines and check /srv/cc-ci/.cc-ci-logs/.limited-* state files; confirm NO limit-stalled session was kill+rebooted during a limit window, probes fire on schedule, and sessions resume promptly after resets. Report anomalies loudly.
|
||||
|
||||
127
cc-ci-plan/plan-phase-shot-screenshots.md
Normal file
127
cc-ci-plan/plan-phase-shot-screenshots.md
Normal file
@ -0,0 +1,127 @@
|
||||
# Phase `shot` — recipe screenshot audit & repair
|
||||
|
||||
**Mission:** every enrolled recipe's CI badge/dashboard card shows a real, representative
|
||||
screenshot of the deployed app. Iterate through ALL enrolled recipes, audit the screenshot
|
||||
that their latest runs produced, diagnose every defect, fix it, and prove the fix with a
|
||||
fresh real-CI run whose PNG is **visually verified** (you can Read a PNG — look at it).
|
||||
|
||||
State files for this phase (phase-namespaced, per §6.1 of plan.md):
|
||||
`STATUS-shot.md`, `BACKLOG-shot.md`, `REVIEW-shot.md`, `JOURNAL-shot.md`. DECISIONS.md shared.
|
||||
|
||||
---
|
||||
|
||||
## 1. System under audit (file map)
|
||||
|
||||
- `runner/harness/screenshot.py` (94 lines) — `capture(domain, out_path, recipe_meta=)`:
|
||||
Playwright chromium, viewport 1280×800, `NAV_DEADLINE_S = 45`. Default path:
|
||||
`goto_with_retry(..., wait_until="domcontentloaded")` then `page.screenshot()`.
|
||||
Optional per-recipe `SCREENSHOT(page, ctx)` hook from recipe_meta (delivered since the
|
||||
rcust single-loader landed; it was a dead knob before — spec §8 R2).
|
||||
- `runner/run_recipe_ci.py:~1017-1035` — capture invoked while the app is up, outside the
|
||||
deploy try/except; double-wrapped so it can NEVER affect the verdict. `screenshot` field
|
||||
in results.json = relative PNG name iff capture succeeded, else null.
|
||||
- `runner/harness/card.py:183-208` — summary card embeds the PNG; "no screenshot"
|
||||
placeholder when null.
|
||||
- `dashboard/dashboard.py:144-156, 265-275` — grid thumbnail from
|
||||
`/runs/<n>/screenshot.png`, `has_screenshot` from results.json; `/badge/<recipe>.svg`.
|
||||
- Artifacts: `/var/lib/cc-ci-runs/<run>/` on the CI server (ssh alias `cc-ci`;
|
||||
NOTE: no python3 on that host — use shell/grep/stat for remote inspection, or scp the
|
||||
PNGs/JSON locally to look at them).
|
||||
- Unit tests: `tests/unit/test_screenshot.py`, `test_card.py`, `test_dashboard.py`.
|
||||
|
||||
## 2. Evidence already in hand (orchestrator pre-audit, 2026-06-11, last ~120 runs)
|
||||
|
||||
Three classes observed in `/var/lib/cc-ci-runs/*/results.json` + PNG sizes:
|
||||
|
||||
1. **Never captured — `screenshot: null` on EVERY run:** `plausible` (runs 122→357, all
|
||||
null). Root cause unknown — find it (capture failing? step never reached on its run
|
||||
shape? exception swallowed by design — check run logs for the
|
||||
`screenshot: capture failed` / `produced no file` prints).
|
||||
2. **Suspected blank frames — implausibly small PNGs, byte-identical sizes across
|
||||
different apps:** `immich` 4801B (every run), `lasuite-meet` 4801B, `n8n` sometimes
|
||||
4801B (other runs 29-30KB — flaky), `cryptpad` 4802B. A 1280×800 PNG at ~4.8KB is
|
||||
near-certainly a solid white/blank page: SPA shells where `domcontentloaded` fires
|
||||
before the JS app paints. Borderline, verify visually: `lasuite-docs` ~5.9KB,
|
||||
`lasuite-drive` ~5.9KB, `keycloak` ~8.7KB (might be a genuine sparse login page).
|
||||
3. **Healthy (sanity reference):** ghost 444KB, mattermost-lts 242KB, hedgedoc 132KB,
|
||||
discourse 67KB, custom-html 35KB, mailu 34KB, matrix-synapse 33KB, uptime-kuma 31KB,
|
||||
custom-html-tiny 13KB.
|
||||
|
||||
Recipe enumeration is YOURS to make authoritative: every `tests/<recipe>/recipe_meta.py`
|
||||
directory EXCEPT the harness fixtures (`_generic`, `regression`, `concurrency`,
|
||||
`custom-html-bkp-bad`, `custom-html-rst-bad`). Expected real set ≈ bluesky-pds, cryptpad,
|
||||
custom-html, custom-html-tiny, discourse, ghost, hedgedoc, immich, keycloak, lasuite-docs,
|
||||
lasuite-drive, lasuite-meet, mailu, matrix-synapse, mattermost-lts, mumble, n8n, plausible,
|
||||
uptime-kuma. Some recipes (e.g. mumble — a voice server) may have no meaningful web UI:
|
||||
a **justified N/A** (documented in the matrix, Adversary-agreed) is an acceptable outcome
|
||||
for those; the dashboard placeholder is then correct behavior, not a defect.
|
||||
|
||||
## 3. Work plan
|
||||
|
||||
**P1 — Audit matrix.** For every enrolled recipe, record in `BACKLOG-shot.md`: latest
|
||||
run(s) with artifacts; screenshot null/present; PNG size; **visual content** (Read the
|
||||
PNG: app UI / login page / blank / error page); `has_screenshot` as the dashboard sees it.
|
||||
Pull PNGs locally (scp) and actually look at each one. Classify: OK / BLANK / NULL /
|
||||
ERROR-PAGE / N-A-candidate.
|
||||
|
||||
**P2 — Diagnose.** Per defective recipe, find the root cause, not the symptom. Expected
|
||||
buckets: (a) SPA paint race — needs a smarter default wait; (b) plausible's null — step
|
||||
not reached or capture raising, find which from run logs; (c) app-specific (redirect to
|
||||
an external/IdP page, splash screen, etc.) — needs a per-recipe `SCREENSHOT` hook.
|
||||
|
||||
**P3 — Fix.** Preference order:
|
||||
1. **Harness default improvement** (fixes whole classes): e.g. after `domcontentloaded`,
|
||||
wait briefly for network-idle or first rendered content, with a blank-detection retry
|
||||
(a captured PNG under a few KB → one retry with a stronger wait) — all WITHIN the
|
||||
existing `NAV_DEADLINE_S=45` budget. Do not balloon run times: the screenshot step's
|
||||
total worst-case must stay ≤ ~60s.
|
||||
2. **Per-recipe `SCREENSHOT(page, ctx)` hooks** in `tests/<recipe>/recipe_meta.py` only
|
||||
where the default genuinely can't work (uniform ctx signature per rcust P3).
|
||||
3. results.json/card/dashboard plumbing fixes if the defect is downstream of capture.
|
||||
Unit tests accompany harness changes (`tests/unit/test_screenshot.py`).
|
||||
|
||||
**P4 — Prove.** Fresh real-CI run per fixed recipe; verify the new artifact: PNG exists,
|
||||
visually shows the app (login page fine; blank/error not), card + dashboard render it.
|
||||
At least 2 of the proof runs via the drone `!testme` path. Healthy-class recipes (class 3
|
||||
above) need no new run — cite the existing artifact + your visual check of it.
|
||||
|
||||
## 4. Gates (Builder claims via `claim(shot): ...` commit; Adversary verdicts in REVIEW-shot.md)
|
||||
|
||||
**M1 — Audit + diagnosis complete.** The full matrix exists (every enrolled recipe, no
|
||||
omissions), every non-OK entry has a root-cause diagnosis with evidence (log lines, PNG
|
||||
inspection), N/A candidates argued. Adversary independently spot-checks ≥5 recipes'
|
||||
artifacts (including at least plausible + one 4801B case) and verifies the matrix matches
|
||||
reality.
|
||||
|
||||
**M2 — All screenshots working.** Every enrolled recipe is OK or Adversary-agreed N/A:
|
||||
fixes merged to main, fresh proof runs done (≥2 via !testme), and the Adversary has
|
||||
**looked at every final PNG** (Read tool) and confirms it is a real, representative,
|
||||
credential-free view of the app, and that the dashboard/card shows it. Verdicts, levels,
|
||||
and run durations unaffected (compare runtimes pre/post; screenshot step ≤ ~60s worst
|
||||
case). `## DONE` in STATUS-shot.md only after a fresh Adversary M2 PASS.
|
||||
|
||||
## 5. Guardrails (binding)
|
||||
|
||||
- **Cosmetics never block (R7 is law):** the screenshot path stays best-effort end to
|
||||
end — it must NEVER affect a verdict, fail a run, or hang past its deadline. Any change
|
||||
that could flip a verdict on screenshot failure is an automatic Adversary FAIL.
|
||||
- **Secret-safety (cardinal):** never capture a page displaying generated credentials
|
||||
(install wizards with admin passwords, secrets pages). Default = landing page. Any new
|
||||
`SCREENSHOT` hook must keep this guarantee and the Adversary must check it explicitly.
|
||||
- **No gate weakening anywhere** — tests/assertions in the suite are untouchable except
|
||||
the screenshot-specific unit tests you extend.
|
||||
- Changes live in the cc-ci repo only (harness + tests/<recipe>/). Never push recipe
|
||||
mirror repos' main / never merge their PRs.
|
||||
- Real-CI etiquette: ≤2-3 concurrent live deploys; NEVER git-checkout
|
||||
`~/.abra/recipes/<recipe>` on cc-ci while its build runs; tear down every dev deploy on
|
||||
every exit path; never re-print secrets/tokens into logs or commits.
|
||||
- The CI host has no python3 in default PATH for remote one-liners — use shell, or run
|
||||
python via the harness venv (`cc-ci-run`) on the host, or copy files local.
|
||||
- Commit author: `autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>`; push to
|
||||
git.autonomic.zone right after every commit.
|
||||
|
||||
## 6. Definition of Done
|
||||
|
||||
All enrolled recipes' CI cards show verified-real screenshots (or documented N/A),
|
||||
M1 + M2 Adversary-PASSed fresh, harness changes unit-tested, no verdict/runtime
|
||||
regressions, all work merged to cc-ci main and pushed.
|
||||
Reference in New Issue
Block a user