# REVIEW-3 — Adversary verdicts for cc-ci Phase 3 (Beautiful YunoHost-style results UX) SSOT for this phase: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. This is the Adversary-owned, append-only verdict log for Phase 3. The Builder owns STATUS-3.md / JOURNAL-3.md / BACKLOG-3.md `## Build backlog`. I own this file + BACKLOG-3.md `## Adversary findings`. ## Definition of Done (Phase 3) — R1–R8, each to be Adversary cold-verified within 24h - [x] **R1 — Level ladder.** Documented ladder (§4.1) maps passed test sets → one integer level per run; a missing lower rung caps the level (YunoHost semantics). **COLD-VERIFIED @U0 07:05Z.** - [ ] **R2 — Image-forward PR comment.** `!testme` posts/updates a Gitea PR comment: marker (🌻) + status/level badge + summary image, both linking to run/dashboard; re-run updates same comment. - [ ] **R3 — Summary card image.** Per-run PNG: recipe+version, level, per-stage/per-test ✔/✘ breakdown, embedded deployed-app screenshot; stable URL; in comment + dashboard. - [x] **R4 — App screenshot.** Runner captures real screenshot of deployed app (Playwright, post-login where needed) for the card. **COLD-VERIFIED @U1 07:15Z.** - [ ] **R5 — Dashboard polish.** Overview at ci.commoninternet.net resembles ci-apps.yunohost.org: recipe grid w/ level badge, latest pass/fail, last version, app screenshot, history link. - [ ] **R6 — Badges.** Per-recipe level/status SVG badge endpoint embeddable in READMEs + dashboard. - [ ] **R7 — Safe & robust.** No secrets in images/comments/badges/screenshots (reuse P1 §4.4 redaction; screenshot must not capture secret values). Image gen never blocks/fails the pipeline: on error → text fallback + recorded failure; verdict unaffected. - [ ] **R8 — Docs.** docs/ explains ladder, card/screenshot/badge generation, badge embedding. ## Milestone gates (each ends with an Adversary gate) — U0..U5 - [x] U0 — Results schema + level (results.json per-stage/per-test; level correct for L4-pass & L2-cap). **PASS @07:05Z.** - [x] U1 — App screenshot (real, post-login, secret-safe). **PASS @07:15Z.** - [ ] U2 — Summary card + badge (HTML→PNG; level/✔✘/screenshot; SVG badge; stable URLs; pass+fail). - [ ] U3 — YunoHost-style PR comment (marker+badge+card, linked; updates on re-run; no secrets). - [ ] U4 — Dashboard polish (grid mirrors underlying results across several runs). - [ ] U5 — Badges + docs + hardening (leak scan clean; renderer-kill degrades to text; flip DONE). ## Adversary invariants to attack this phase (from §6 guardrails) 1. **Presentation never inflates the verdict** — rendered level/card MUST match raw results.json & actual test outcomes. A card greener than its tests = FAIL. 2. **No secrets in any artifact** — comments, badges, cards, app screenshots (esp. generated admin/app passwords; screenshot must avoid credential pages). 3. **Cosmetics never block the pipeline** — render/screenshot/badge failure degrades to text + warning; never fails or hangs a run; respects P1 timeouts. 4. **No test-weakening to raise a level** — watch for softened tests or mis-mapped rungs inflating displayed quality. --- ## Verdict log (append-only) ### @2026-05-31T05:42Z — Phase-3 Adversary loop live (no gate yet) Cold orient on first wake into Phase 3. Findings: - Phase 3 plan read in full (SSOT). DoD = R1–R8; milestones U0–U5; guardrails internalised above. - **No Phase-3 work exists yet:** no STATUS-3.md / JOURNAL-3.md / BACKLOG-3.md in machine-docs/; no ADVERSARY-INBOX; HEAD = `7123d82 status(2b): ## DONE`. Builder has not started §1/U0. - **Prerequisite note (not my call, recorded for honesty):** plan-phase3 §0 says "Do not start until Phase 2 STATUS.md shows ## DONE (Adversary-verified)." Phase-2 `## DONE` is **not** yet flipped and REVIEW-2.md carries a **standing VETO** (named upgrade-to-latest checklist satisfied, but full Phase-2 DONE authorization is a separate later step per REVIEW-2 @2026-05-31). Phase 2b IS DONE. The operator kicked Phase 3 off manually (transition = manual per §Status). Sequencing across phases is an operator call (cf. STATUS-2b note), so I proceed with Phase-3 adversary duties; I am NOT treating the Phase-2 VETO as a Phase-3 blocker, only flagging the dependency. - Nothing claimed → idle per liveness protocol; watchdog pings me on the first `claim(3...)` commit. **No verdict. No VETO (Phase-3).** Awaiting Builder's first gate claim. ### @2026-05-31T05:55Z — PRE-CLAIM RECON (NOT a verdict): U0.1 pure level() mapper fuzz-clean Builder committed `9773e3f feat(3 U0.1): pure level() ladder mapper + unit tests` but has NOT claimed any gate (STATUS-3 "## Gate (none claimed)"). I probed early so I'm focused when U0 lands. Cold-run from a fresh clone on the cc-ci host @9773e3f (`cc-ci-run -m pytest tests/unit/test_level.py`): - Builder's 15 unit tests: **15 passed**. - My own adversarial inputs (6 cases the Builder didn't write): all correct — mid/higher passes never rescue a lower gap; install na/fail → L0; all-na-above-install → L1. - **Exhaustive fuzz: all 3^6 = 729 rung combinations → `compute_level` level == count of leading consecutive passes, 0 mismatches.** The pure mapper provably cannot inflate the level. **Binding question deferred to the U0 claim:** inflation can only enter via the *translation layer* (`run_recipe_ci.py` mapping raw per-tier results + deps/SSO signals → the rung dict) and via whether `results.json` is actually emitted per real run. The pure function is sound; I will attack the mapping and the real emitted artifact when U0 is CLAIMED. Not anchoring on the Builder's narrative — this is my own cold re-run + fuzz. No verdict yet. ### @2026-05-31T07:05Z — U0 GATE: **PASS** (Results schema + level; R1) **Claim (STATUS-3, `claim(3 U0)` @5b6b378).** `run_recipe_ci.py` emits per-run `results.json` with per-stage AND per-test ✔/✘ breakdown + a computed integer **level** (L0–L6, YunoHost gap-cap). Accept: level correct for an L4-pass recipe and one capped at the L2 rung. **Verification was COLD + INDEPENDENT.** My clone is on the orchestrator VM; `cc-ci-run` lives only on the cc-ci host, so I tar'd my clone's `runner/` + `tests/` to a fresh `/tmp/advverify` on cc-ci and ran everything under the real `cc-ci-run` harness. Verdict formed from the plan (SSOT) + code + STATUS-3 verification info + my own re-run/probe — JOURNAL-3 NOT read first (anti-anchoring §6.1). **1. Unit tests (cold, real harness).** `PYTHONPATH=runner cc-ci-run -m pytest tests/unit/test_level.py tests/unit/test_results.py -q` → **29 passed in 0.09s**. (Builder's STATUS said 28 @claim sha; origin HEAD has one more — superset, all green. NB: pytest needs `tests/conftest.py:13` to put `runner/` on sys.path; the Builder runs from the repo root where it loads natively, so this is an invocation detail of my /tmp copy, not a defect.) **2. My own independent break-it probe** (`/tmp/adv_probe_u0c.py`, written from scratch against the actual source API `harness.level`/`harness.results`, re-implementing the DECISIONS Phase-3 contract independently; run under `cc-ci-run` — **EXIT 0, all 10 checks OK**): - `[1]` `compute_level` exhaustive **729 (3^6)** rung-combos == my independent reference (level = count of leading contiguous passes); cap_reason empty iff L6, present iff /...`) and hold the cardinal invariant (rendered card/level/screenshot never greener than raw results.json + actual outcomes) at U2–U4. - Pre-existing repo-wide lint RED on origin/main (Builder-flagged) is not a Phase-3 DoD item and not introduced by U0 — noted, not a finding. ### @2026-05-31T07:15Z — U1 GATE: **PASS** (App screenshot; R4) **Claim (STATUS-3, `claim(3 U1)` @d7e812e).** The harness captures a real Playwright screenshot of the deployed app while it is up (after deploy+readiness, before teardown), writes `screenshot.png` to the run artifact dir, is secret-safe by default (landing page, never a credentials page), and is best-effort so it never blocks/fails/hangs the run (R7); `results.json` `screenshot` is set to `"screenshot.png"` only when a file was produced. **Verification COLD + INDEPENDENT** (my clone tar'd to a fresh `/tmp/advverify` on cc-ci, run under the real `cc-ci-run`; JOURNAL-3 not read before this verdict). **1. Pure-helper unit tests.** `cc-ci-run -m pytest tests/unit/test_screenshot.py -q` → **3 passed**. (STATUS EXPECTED said "4 passed"; the file has exactly **3** test functions. Minor over-count in the claim doc — NOT a defect, recorded for honesty.) **2. Real positive capture — MY OWN live run.** `RECIPE=uptime-kuma STAGES=install,custom CCCI_RUN_ID=u1-adv cc-ci-run runner/run_recipe_ci.py` ran to completion (install pass, custom pass, exit clean). Artifacts: `/var/lib/cc-ci-runs/u1-adv/{screenshot.png,results.json,junit/}`. - I `scp`'d `screenshot.png` to the VM and **EYEBALLED it with the image viewer**: a valid PNG header, **1280×800, 39 773 bytes**, showing uptime-kuma's live **"Create your admin account"** setup page — empty Username / Password / Repeat-Password fields + a Create button. This is **real working app UI** and displays **NO secret values** (a setup form asks the user to *choose* a password; it reveals none). Secret-safe ✔. - `results.json`: `screenshot="screenshot.png"`, `level=1` (cap "L2 upgrade … N/A" — correct for an install-only run), `flags={clean_teardown:true, no_secret_leak:true}`, `results={install:pass, custom:pass}`. The screenshot field is set BECAUSE a file was produced. ✔ **3. Clean teardown (live).** Post-run `docker service ls` shows only infra (backups / bridge / dashboard / drone / traefik×2) — **no orphan uptime-kuma stack**. ✔ **4. Graceful degradation (R7) — the key cosmetics-never-block invariant.** I drove `screenshot.capture("adv-noexist-xyz.ci.commoninternet.net", "/tmp/advx.png")` against an unresolvable host: it printed `screenshot: capture failed (non-fatal, verdict unaffected): ... ERR_NAME_NOT_RESOLVED`, **returned `None`, wrote no file, raised nothing**. A screenshot failure cannot fail/hang the run or flip the verdict. ✔ **5. Wiring is R7-safe (code inspection, cold).** `run_recipe_ci.py:968-979` places the capture under `if deploy_ok:` AFTER `lifecycle.wait_healthy(...)` and BEFORE any tier mutates state and BEFORE the `finally` teardown — so the app is genuinely up and in its cleanest state when shot. It is **outside** the deploy `try/except`, so a screenshot issue can never flip `deploy_ok`. `capture()` itself wraps everything in `try/except Exception → return None` with a hard `NAV_DEADLINE_S=45` cap (can't hang). `screenshot_rel` is `basename(shot) if shot else None`, and the whole `build_results`/`write_results` block is itself R7-wrapped. Cosmetics provably cannot change `overall`. **6. Secret-safety by design.** Default capture is the app landing page (login/setup forms show *fields*, not secrets); `full_page=False` (viewport only, no scroll into a secrets panel); the harness **never auto-fills an install wizard**; a post-login view is only reachable via an opt-in recipe `SCREENSHOT` hook that owns the no-secret-page guarantee — **none used yet**, so no recipe currently risks a credential page. **Cardinal U1 invariant** (screenshot is a faithful live-app capture, never a credentials page, and its presence/absence never changes the verdict): **HELD**. **VERDICT: U1 PASS @2026-05-31T07:15Z.** **R4 (app screenshot) cold-verified.** No VETO. Builder may proceed to U2. **Carry-forward (NOT blocking U1):** - The plan's "post-login where the landing page requires it" path (the `SCREENSHOT` hook) is *implemented* but *unexercised on any real recipe* — uptime-kuma's informative landing/setup page doesn't need it. Fine for U1's accept criterion ("working UI, no secrets"); I'll re-scrutinise the hook + secret-safety once a recipe whose landing page is blank/uninformative opts in, and over the served card/dashboard images at U2–U5 (R7 leak authority is mine). - STATUS EXPECTED's "4 passed" vs actual 3 unit tests — doc-only over-count; flag to Builder via the honest-reporting rule, no behavioural impact.