review(3 U1): PASS — app screenshot cold-verified (R4)

Cold/independent on real cc-ci-run harness: - 3 screenshot unit tests pass (claim doc said 4 — over-count, noted). - My own live uptime-kuma run produced a valid 1280x800 PNG; eyeballed it: real working UI (admin-account setup page, empty fields), NO secret values. results.json screenshot="screenshot.png", clean_teardown=true. - Clean teardown: no orphan uptime-kuma service post-run. - Graceful degradation (R7): capture vs unresolvable host returns None, no file, no raise ("verdict unaffected"). - Wiring R7-safe: capture under if deploy_ok after wait_healthy, before tiers/teardown, outside deploy try/except, 45s nav cap; screenshot field set only when file produced. - Secret-safe by design: landing page only, viewport-only, no wizard autofill; post-login via opt-in hook (unused). R4 cold-verified. No VETO. Builder may proceed to U2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 07:10:05 +00:00
parent d3af7ea80a
commit 74a6993e4b
1 changed files with 68 additions and 3 deletions
--- a/machine-docs/REVIEW-3.md
+++ b/machine-docs/REVIEW-3.md
@ -11,8 +11,8 @@ JOURNAL-3.md / BACKLOG-3.md `## Build backlog`. I own this file + BACKLOG-3.md `
      status/level badge + summary image, both linking to run/dashboard; re-run updates same comment.
 - [ ] **R3 — Summary card image.** Per-run PNG: recipe+version, level, per-stage/per-test ✔/✘
      breakdown, embedded deployed-app screenshot; stable URL; in comment + dashboard.
- [ ] **R4 — App screenshot.** Runner captures real screenshot of deployed app (Playwright, post-login
-      where needed) for the card.
+- [x] **R4 — App screenshot.** Runner captures real screenshot of deployed app (Playwright, post-login
+      where needed) for the card. **COLD-VERIFIED @U1 07:15Z.**
 - [ ] **R5 — Dashboard polish.** Overview at ci.commoninternet.net resembles ci-apps.yunohost.org:
      recipe grid w/ level badge, latest pass/fail, last version, app screenshot, history link.
 - [ ] **R6 — Badges.** Per-recipe level/status SVG badge endpoint embeddable in READMEs + dashboard.
@ -23,7 +23,7 @@ JOURNAL-3.md / BACKLOG-3.md `## Build backlog`. I own this file + BACKLOG-3.md `

 ## Milestone gates (each ends with an Adversary gate) — U0..U5
 - [x] U0 — Results schema + level (results.json per-stage/per-test; level correct for L4-pass & L2-cap). **PASS @07:05Z.**
- [ ] U1 — App screenshot (real, post-login, secret-safe).
+- [x] U1 — App screenshot (real, post-login, secret-safe). **PASS @07:15Z.**
 - [ ] U2 — Summary card + badge (HTML→PNG; level/✔✘/screenshot; SVG badge; stable URLs; pass+fail).
 - [ ] U3 — YunoHost-style PR comment (marker+badge+card, linked; updates on re-run; no secrets).
 - [ ] U4 — Dashboard polish (grid mirrors underlying results across several runs).
@ -158,3 +158,68 @@ may proceed past U0.
  (rendered card/level/screenshot never greener than raw results.json + actual outcomes) at U2–U4.
 - Pre-existing repo-wide lint RED on origin/main (Builder-flagged) is not a Phase-3 DoD item and not
  introduced by U0 — noted, not a finding.
+
+### @2026-05-31T07:15Z — U1 GATE: **PASS** (App screenshot; R4)
+
+**Claim (STATUS-3, `claim(3 U1)` @d7e812e).** The harness captures a real Playwright screenshot of
+the deployed app while it is up (after deploy+readiness, before teardown), writes `screenshot.png` to
+the run artifact dir, is secret-safe by default (landing page, never a credentials page), and is
+best-effort so it never blocks/fails/hangs the run (R7); `results.json` `screenshot` is set to
+`"screenshot.png"` only when a file was produced.
+
+**Verification COLD + INDEPENDENT** (my clone tar'd to a fresh `/tmp/advverify` on cc-ci, run under
+the real `cc-ci-run`; JOURNAL-3 not read before this verdict).
+
+**1. Pure-helper unit tests.** `cc-ci-run -m pytest tests/unit/test_screenshot.py -q` → **3 passed**.
+(STATUS EXPECTED said "4 passed"; the file has exactly **3** test functions. Minor over-count in the
+claim doc — NOT a defect, recorded for honesty.)
+
+**2. Real positive capture — MY OWN live run.** `RECIPE=uptime-kuma STAGES=install,custom
+CCCI_RUN_ID=u1-adv cc-ci-run runner/run_recipe_ci.py` ran to completion (install pass, custom pass,
+exit clean). Artifacts: `/var/lib/cc-ci-runs/u1-adv/{screenshot.png,results.json,junit/}`.
+- I `scp`'d `screenshot.png` to the VM and **EYEBALLED it with the image viewer**: a valid PNG header,
+  **1280×800, 39 773 bytes**, showing uptime-kuma's live **"Create your admin account"** setup page —
+  empty Username / Password / Repeat-Password fields + a Create button. This is **real working app UI**
+  and displays **NO secret values** (a setup form asks the user to *choose* a password; it reveals
+  none). Secret-safe ✔.
+- `results.json`: `screenshot="screenshot.png"`, `level=1` (cap "L2 upgrade … N/A" — correct for an
+  install-only run), `flags={clean_teardown:true, no_secret_leak:true}`, `results={install:pass,
+  custom:pass}`. The screenshot field is set BECAUSE a file was produced. ✔
+
+**3. Clean teardown (live).** Post-run `docker service ls` shows only infra (backups / bridge /
+dashboard / drone / traefik×2) — **no orphan uptime-kuma stack**. ✔
+
+**4. Graceful degradation (R7) — the key cosmetics-never-block invariant.** I drove
+`screenshot.capture("adv-noexist-xyz.ci.commoninternet.net", "/tmp/advx.png")` against an
+unresolvable host: it printed `screenshot: capture failed (non-fatal, verdict unaffected):
+... ERR_NAME_NOT_RESOLVED`, **returned `None`, wrote no file, raised nothing**. A screenshot failure
+cannot fail/hang the run or flip the verdict. ✔
+
+**5. Wiring is R7-safe (code inspection, cold).** `run_recipe_ci.py:968-979` places the capture
+under `if deploy_ok:` AFTER `lifecycle.wait_healthy(...)` and BEFORE any tier mutates state and BEFORE
+the `finally` teardown — so the app is genuinely up and in its cleanest state when shot. It is
+**outside** the deploy `try/except`, so a screenshot issue can never flip `deploy_ok`. `capture()`
+itself wraps everything in `try/except Exception → return None` with a hard `NAV_DEADLINE_S=45`
+cap (can't hang). `screenshot_rel` is `basename(shot) if shot else None`, and the whole
+`build_results`/`write_results` block is itself R7-wrapped. Cosmetics provably cannot change `overall`.
+
+**6. Secret-safety by design.** Default capture is the app landing page (login/setup forms show
+*fields*, not secrets); `full_page=False` (viewport only, no scroll into a secrets panel); the harness
+**never auto-fills an install wizard**; a post-login view is only reachable via an opt-in recipe
+`SCREENSHOT` hook that owns the no-secret-page guarantee — **none used yet**, so no recipe currently
+risks a credential page.
+
+**Cardinal U1 invariant** (screenshot is a faithful live-app capture, never a credentials page, and
+its presence/absence never changes the verdict): **HELD**.
+
+**VERDICT: U1 PASS @2026-05-31T07:15Z.** **R4 (app screenshot) cold-verified.** No VETO. Builder may
+proceed to U2.
+
+**Carry-forward (NOT blocking U1):**
+- The plan's "post-login where the landing page requires it" path (the `SCREENSHOT` hook) is
+  *implemented* but *unexercised on any real recipe* — uptime-kuma's informative landing/setup page
+  doesn't need it. Fine for U1's accept criterion ("working UI, no secrets"); I'll re-scrutinise the
+  hook + secret-safety once a recipe whose landing page is blank/uninformative opts in, and over the
+  served card/dashboard images at U2–U5 (R7 leak authority is mine).
+- STATUS EXPECTED's "4 passed" vs actual 3 unit tests — doc-only over-count; flag to Builder via the
+  honest-reporting rule, no behavioural impact.