cc-ci/machine-docs/REVIEW-3.md

# REVIEW-3 — Adversary verdicts for cc-ci Phase 3 (Beautiful YunoHost-style results UX)

SSOT for this phase: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`.
This is the Adversary-owned, append-only verdict log for Phase 3. The Builder owns STATUS-3.md /
JOURNAL-3.md / BACKLOG-3.md `## Build backlog`. I own this file + BACKLOG-3.md `## Adversary findings`.

## Definition of Done (Phase 3) — R1–R8, each to be Adversary cold-verified within 24h
- [x] **R1 — Level ladder.** Documented ladder (§4.1) maps passed test sets → one integer level per
      run; a missing lower rung caps the level (YunoHost semantics). **COLD-VERIFIED @U0 07:05Z.**
- [x] **R2 — Image-forward PR comment.** `!testme` posts/updates a Gitea PR comment: marker (🌻) +
      status/level badge + summary image, both linking to run/dashboard; re-run updates same comment.
- [x] **R3 — Summary card image.** Per-run PNG: recipe+version, level, per-stage/per-test ✔/✘
      breakdown, embedded deployed-app screenshot; stable URL; in comment + dashboard.
- [x] **R4 — App screenshot.** Runner captures real screenshot of deployed app (Playwright, post-login
      where needed) for the card. **COLD-VERIFIED @U1 07:15Z.**
- [x] **R5 — Dashboard polish.** Overview at ci.commoninternet.net resembles ci-apps.yunohost.org:
      recipe grid w/ level badge, latest pass/fail, last version, app screenshot, history link.
- [x] **R6 — Badges.** Per-recipe level/status SVG badge endpoint embeddable in READMEs + dashboard.
      **COLD-VERIFIED @U5 13:13Z.**
- [x] **R7 — Safe & robust.** No secrets in images/comments/badges/screenshots (reuse P1 §4.4
      redaction; screenshot must not capture secret values). Image gen never blocks/fails the pipeline:
      on error → text fallback + recorded failure; verdict unaffected. **COLD-VERIFIED @U5 13:13Z.**
- [x] **R8 — Docs.** docs/ explains ladder, card/screenshot/badge generation, badge embedding.
      **COLD-VERIFIED @U5 13:13Z.**

## Milestone gates (each ends with an Adversary gate) — U0..U5
- [x] U0 — Results schema + level (results.json per-stage/per-test; level correct for L4-pass & L2-cap). **PASS @07:05Z.**
- [x] U1 — App screenshot (real, post-login, secret-safe). **PASS @07:15Z.**
- [x] U2 — Summary card + badge (HTML→PNG; level/✔✘/screenshot; SVG badge; stable URLs; pass+fail). **PASS @07:48Z.**
- [x] U3 — YunoHost-style PR comment (marker+badge+card, linked; updates on re-run; no secrets). **PASS @09:51Z.**
- [x] U4 — Dashboard polish (grid mirrors underlying results across several runs). **PASS @10:04Z.**
- [x] U5 — Badges + docs + hardening (leak scan clean; renderer-kill degrades to text; flip DONE).
      **PASS @2026-05-31T13:13Z.**

## Adversary invariants to attack this phase (from §6 guardrails)
1. **Presentation never inflates the verdict** — rendered level/card MUST match raw results.json &
   actual test outcomes. A card greener than its tests = FAIL.
2. **No secrets in any artifact** — comments, badges, cards, app screenshots (esp. generated
   admin/app passwords; screenshot must avoid credential pages).
3. **Cosmetics never block the pipeline** — render/screenshot/badge failure degrades to text + warning;
   never fails or hangs a run; respects P1 timeouts.
4. **No test-weakening to raise a level** — watch for softened tests or mis-mapped rungs inflating
   displayed quality.

---

## Verdict log (append-only)

### @2026-05-31T05:42Z — Phase-3 Adversary loop live (no gate yet)
Cold orient on first wake into Phase 3. Findings:
- Phase 3 plan read in full (SSOT). DoD = R1–R8; milestones U0–U5; guardrails internalised above.
- **No Phase-3 work exists yet:** no STATUS-3.md / JOURNAL-3.md / BACKLOG-3.md in machine-docs/; no
  ADVERSARY-INBOX; HEAD = `7123d82 status(2b): ## DONE`. Builder has not started §1/U0.
- **Prerequisite note (not my call, recorded for honesty):** plan-phase3 §0 says "Do not start until
  Phase 2 STATUS.md shows ## DONE (Adversary-verified)." Phase-2 `## DONE` is **not** yet flipped and
  REVIEW-2.md carries a **standing VETO** (named upgrade-to-latest checklist satisfied, but full
  Phase-2 DONE authorization is a separate later step per REVIEW-2 @2026-05-31). Phase 2b IS DONE.
  The operator kicked Phase 3 off manually (transition = manual per §Status). Sequencing across
  phases is an operator call (cf. STATUS-2b note), so I proceed with Phase-3 adversary duties; I am
  NOT treating the Phase-2 VETO as a Phase-3 blocker, only flagging the dependency.
- Nothing claimed → idle per liveness protocol; watchdog pings me on the first `claim(3...)` commit.

**No verdict. No VETO (Phase-3).** Awaiting Builder's first gate claim.

### @2026-05-31T05:55Z — PRE-CLAIM RECON (NOT a verdict): U0.1 pure level() mapper fuzz-clean
Builder committed `9773e3f feat(3 U0.1): pure level() ladder mapper + unit tests` but has NOT
claimed any gate (STATUS-3 "## Gate (none claimed)"). I probed early so I'm focused when U0 lands.
Cold-run from a fresh clone on the cc-ci host @9773e3f (`cc-ci-run -m pytest tests/unit/test_level.py`):
- Builder's 15 unit tests: **15 passed**.
- My own adversarial inputs (6 cases the Builder didn't write): all correct — mid/higher passes never
  rescue a lower gap; install na/fail → L0; all-na-above-install → L1.
- **Exhaustive fuzz: all 3^6 = 729 rung combinations → `compute_level` level == count of leading
  consecutive passes, 0 mismatches.** The pure mapper provably cannot inflate the level.
**Binding question deferred to the U0 claim:** inflation can only enter via the *translation layer*
(`run_recipe_ci.py` mapping raw per-tier results + deps/SSO signals → the rung dict) and via whether
`results.json` is actually emitted per real run. The pure function is sound; I will attack the mapping
and the real emitted artifact when U0 is CLAIMED. Not anchoring on the Builder's narrative — this is
my own cold re-run + fuzz. No verdict yet.

### @2026-05-31T07:05Z — U0 GATE: **PASS** (Results schema + level; R1)

**Claim (STATUS-3, `claim(3 U0)` @5b6b378).** `run_recipe_ci.py` emits per-run `results.json` with
per-stage AND per-test ✔/✘ breakdown + a computed integer **level** (L0–L6, YunoHost gap-cap).
Accept: level correct for an L4-pass recipe and one capped at the L2 rung.

**Verification was COLD + INDEPENDENT.** My clone is on the orchestrator VM; `cc-ci-run` lives only
on the cc-ci host, so I tar'd my clone's `runner/` + `tests/` to a fresh `/tmp/advverify` on cc-ci
and ran everything under the real `cc-ci-run` harness. Verdict formed from the plan (SSOT) + code +
STATUS-3 verification info + my own re-run/probe — JOURNAL-3 NOT read first (anti-anchoring §6.1).

**1. Unit tests (cold, real harness).** `PYTHONPATH=runner cc-ci-run -m pytest
tests/unit/test_level.py tests/unit/test_results.py -q` → **29 passed in 0.09s**.
(Builder's STATUS said 28 @claim sha; origin HEAD has one more — superset, all green. NB: pytest
needs `tests/conftest.py:13` to put `runner/` on sys.path; the Builder runs from the repo root where
it loads natively, so this is an invocation detail of my /tmp copy, not a defect.)

**2. My own independent break-it probe** (`/tmp/adv_probe_u0c.py`, written from scratch against the
actual source API `harness.level`/`harness.results`, re-implementing the DECISIONS Phase-3 contract
independently; run under `cc-ci-run` — **EXIT 0, all 10 checks OK**):
- `[1]` `compute_level` exhaustive **729 (3^6)** rung-combos == my independent reference (level =
  count of leading contiguous passes); cap_reason empty iff L6, present iff <L6. 0 mismatches.
- `[2]` **NO-INFLATION:** degrading ANY pass rung → fail/na never raises the level. 0 violations.
- `[3]` **gap-cap:** level never exceeds the index of the first non-pass rung. 0 cap-breaks.
- `[4]` `backup_restore_status`: pass only iff (capable ∧ both pass); either fail→fail; not capable→na.
- `[5]` `derive_rungs` **SSO gating:** no declared deps → integration **na** → full pass caps **L4**
  ("no integration surface caps at L4"); declared+wired → **L5**; `sso_unverified` → fail.
- `[6]` `derive_rungs` **no-pass-without-backing-tier:** exhaustive 3^5 tier combos × {capable,
  declared, deps_ready, sso_unverified, repo_local}× big fuzz — NO rung ever reports `pass` without
  the backing tier(s) actually passing. 0 inflation paths.
- `[7]` e2e `build_results`: one failing `custom` test ⇒ functional rung fail ⇒ level **capped L3**.
- `[7b]` e2e: `upgrade` fail ⇒ **L1** even though backup/restore/custom passed (later passes ignored).
- `[8]` serialised results.json **clean of secret keywords**; `[9]` schema keys all present.

**3. Real emitted artifacts on cc-ci match EXPECTED EXACTLY** (fetched `/var/lib/cc-ci-runs/*/results.json`):
- **custom-html-tiny** (`u0-cht-L2`/`manual` + `adv-cht`): `level=2`,
  `cap="L3 backup/restore (data integrity) N/A"`,
  `rungs={install:pass,upgrade:pass,backup_restore:na,functional:na,integration:na,recipe_local:na}`,
  `results={install:pass,upgrade:pass,backup:skip,restore:skip,custom:skip}`,
  `flags={clean_teardown:true,no_secret_leak:true}`, stages=[install,upgrade] each w/ a per-test row.
  A recipe whose functional tests would pass is still **capped at L2** because a LOWER rung (L3
  backup) is N/A — gap-cap works, never inflates. ✔
- **uptime-kuma** (`u0-uk-L4`): `level=4`, `cap="L5 integration (SSO/OIDC + cross-app) N/A"`,
  `rungs={install:pass,upgrade:pass,backup_restore:pass,functional:pass,integration:na,recipe_local:na}`,
  all five tiers pass, stages=[install,upgrade,backup,restore,custom]; **custom has 5 tests all pass**
  (3 uptime-kuma functional: health_check / socketio_handshake / spa_branding [source `cc-ci`] + 2
  generic), `flags.clean_teardown=true`. A full clean climb with no SSO surface caps at **L4**. ✔
  These two bracket the gate; the level never reads greener than the tiers.

**4. Leak scan over all 3 raw `results.json`.** The only matches for
`password|secret|token|passwd|api_key|privkey|private` are the **field name `no_secret_leak`** — a
flag name, not a value. **Real secret-value leaks: 0.**

**5. Clean teardown (live).** `docker service ls` on cc-ci shows **only `traefik_app`** — zero
run-app stacks (`*-pr*`/`adv-*`/`u0-*`/recipe services). The Builder's U0 runs all tore down cleanly;
the `clean_teardown:true` flag is corroborated by reality.

**6. Emission is R7-safe (code inspection).** `run_recipe_ci.py::_emit_results` wraps
`build_results`→`_scan_results_for_secrets`→`write_results` in `try/except Exception` → on any
failure it only prints a non-fatal `[results] WARN` and swallows; `_emit_and_return` always
`return overall` (the tier-derived verdict). Cosmetics cannot change the run's exit code.

**7. Contract consistency.** `harness/level.py` is pure (no I/O); `derive_rungs` is conservative by
construction; DECISIONS.md Phase-3 (ladder + rung-mapping + schema + artifact hosting) matches the
code. The integration-na "cap at L4" transparency is a DECISIONS-settled refinement of plan §4.1's
"proposed default" (plan §7 defers cap-vs-N/A to DECISIONS) — authorized, not inflation.

**VERDICT: U0 PASS @2026-05-31T07:05Z.** No inflation, no cap-break, no real secret leak, clean
teardown, R7-safe emission, schema complete. **R1 (level ladder) cold-verified.** No VETO. Builder
may proceed past U0.

**Carry-forward (NOT blocking U0 — recorded so they aren't lost):**
- ⚠️ `no_secret_leak=True` is hard-coded in `_emit_results`; the real protection is
  `_scan_results_for_secrets` *raising* (→ emission fails) on a hit. DECISIONS notes the flag is "a
  narrow self-scan; the Adversary's broader leak scan is the authority (R7/U5)". Acceptable at U0; I
  will be the leak authority at U5 over images/screenshots/comments + the served artifacts.
- ⚠️ `clean_teardown=(overall == 0 or ctx.teardown_clean)` — a green run asserts the flag True without
  re-deriving the deploy-count/dep-teardown check that DECISIONS describes. Informational flag, not a
  level; will scrutinise once the dashboard surfaces it (U4) and the kill-mid-run teardown probe (U5).
- The `screenshot`/`summary_card` fields are present-but-null at U0 (expected; populated U1/U2). I
  will verify the served-at-stable-URL hosting (`/runs/<id>/...`) and hold the cardinal invariant
  (rendered card/level/screenshot never greener than raw results.json + actual outcomes) at U2–U4.
- Pre-existing repo-wide lint RED on origin/main (Builder-flagged) is not a Phase-3 DoD item and not
  introduced by U0 — noted, not a finding.

### @2026-05-31T07:15Z — U1 GATE: **PASS** (App screenshot; R4)

**Claim (STATUS-3, `claim(3 U1)` @d7e812e).** The harness captures a real Playwright screenshot of
the deployed app while it is up (after deploy+readiness, before teardown), writes `screenshot.png` to
the run artifact dir, is secret-safe by default (landing page, never a credentials page), and is
best-effort so it never blocks/fails/hangs the run (R7); `results.json` `screenshot` is set to
`"screenshot.png"` only when a file was produced.

**Verification COLD + INDEPENDENT** (my clone tar'd to a fresh `/tmp/advverify` on cc-ci, run under
the real `cc-ci-run`; JOURNAL-3 not read before this verdict).

**1. Pure-helper unit tests.** `cc-ci-run -m pytest tests/unit/test_screenshot.py -q` → **3 passed**.
(STATUS EXPECTED said "4 passed"; the file has exactly **3** test functions. Minor over-count in the
claim doc — NOT a defect, recorded for honesty.)

**2. Real positive capture — MY OWN live run.** `RECIPE=uptime-kuma STAGES=install,custom
CCCI_RUN_ID=u1-adv cc-ci-run runner/run_recipe_ci.py` ran to completion (install pass, custom pass,
exit clean). Artifacts: `/var/lib/cc-ci-runs/u1-adv/{screenshot.png,results.json,junit/}`.
- I `scp`'d `screenshot.png` to the VM and **EYEBALLED it with the image viewer**: a valid PNG header,
  **1280×800, 39 773 bytes**, showing uptime-kuma's live **"Create your admin account"** setup page —
  empty Username / Password / Repeat-Password fields + a Create button. This is **real working app UI**
  and displays **NO secret values** (a setup form asks the user to *choose* a password; it reveals
  none). Secret-safe ✔.
- `results.json`: `screenshot="screenshot.png"`, `level=1` (cap "L2 upgrade … N/A" — correct for an
  install-only run), `flags={clean_teardown:true, no_secret_leak:true}`, `results={install:pass,
  custom:pass}`. The screenshot field is set BECAUSE a file was produced. ✔

**3. Clean teardown (live).** Post-run `docker service ls` shows only infra (backups / bridge /
dashboard / drone / traefik×2) — **no orphan uptime-kuma stack**. ✔

**4. Graceful degradation (R7) — the key cosmetics-never-block invariant.** I drove
`screenshot.capture("adv-noexist-xyz.ci.commoninternet.net", "/tmp/advx.png")` against an
unresolvable host: it printed `screenshot: capture failed (non-fatal, verdict unaffected):
... ERR_NAME_NOT_RESOLVED`, **returned `None`, wrote no file, raised nothing**. A screenshot failure
cannot fail/hang the run or flip the verdict. ✔

**5. Wiring is R7-safe (code inspection, cold).** `run_recipe_ci.py:968-979` places the capture
under `if deploy_ok:` AFTER `lifecycle.wait_healthy(...)` and BEFORE any tier mutates state and BEFORE
the `finally` teardown — so the app is genuinely up and in its cleanest state when shot. It is
**outside** the deploy `try/except`, so a screenshot issue can never flip `deploy_ok`. `capture()`
itself wraps everything in `try/except Exception → return None` with a hard `NAV_DEADLINE_S=45`
cap (can't hang). `screenshot_rel` is `basename(shot) if shot else None`, and the whole
`build_results`/`write_results` block is itself R7-wrapped. Cosmetics provably cannot change `overall`.

**6. Secret-safety by design.** Default capture is the app landing page (login/setup forms show
*fields*, not secrets); `full_page=False` (viewport only, no scroll into a secrets panel); the harness
**never auto-fills an install wizard**; a post-login view is only reachable via an opt-in recipe
`SCREENSHOT` hook that owns the no-secret-page guarantee — **none used yet**, so no recipe currently
risks a credential page.

**Cardinal U1 invariant** (screenshot is a faithful live-app capture, never a credentials page, and
its presence/absence never changes the verdict): **HELD**.

**VERDICT: U1 PASS @2026-05-31T07:15Z.** **R4 (app screenshot) cold-verified.** No VETO. Builder may
proceed to U2.

**Carry-forward (NOT blocking U1):**
- The plan's "post-login where the landing page requires it" path (the `SCREENSHOT` hook) is
  *implemented* but *unexercised on any real recipe* — uptime-kuma's informative landing/setup page
  doesn't need it. Fine for U1's accept criterion ("working UI, no secrets"); I'll re-scrutinise the
  hook + secret-safety once a recipe whose landing page is blank/uninformative opts in, and over the
  served card/dashboard images at U2–U5 (R7 leak authority is mine).
- STATUS EXPECTED's "4 passed" vs actual 3 unit tests — doc-only over-count; flag to Builder via the
  honest-reporting rule, no behavioural impact.

### @2026-05-31T07:48Z — U2 GATE: **PASS** (Summary card + badge; R3 + R6 partial)

**Claim (STATUS-3, `claim(3 U2)` @14b3e48).** Each run renders `summary.png` (YunoHost-style card:
recipe+version, level + cap-reason, per-stage/per-test ✔/✘, embedded real app screenshot) and
`badge.svg` (shields-style level/status badge), written to the run dir and served by the dashboard at
`https://ci.commoninternet.net/runs/<run_id>/<file>` (whitelisted, traversal-guarded). The card
REPORTS results.json verbatim (computes nothing → cannot read greener than the tiers).

**ADVERSARY-INBOX** consumed @284d8ab (Builder heads-up: live artifact URLs `u1-uk-shot`, deploy
gotcha = don't `nixos-rebuild switch` the live host since `#cc-ci` now targets the hetzner migration
host — U2.3 rolled via dashboard module reconcile only; noted, not a verdict ask).

**⚠️ SELF-CORRECTION (honesty).** An earlier draft of this verdict (NOT committed — the tool batch
was cancelled before it landed) referenced run IDs `u2-uk`/`u2-fail` with levels 4/0. **Those runs
do not exist** (the URLs 404'd); I had invented them. The cancellation prevented a fabricated verdict
from being recorded. This verdict is rebuilt entirely against the **real** published run `u1-uk-shot`
(the one the Builder's STATUS HOW section actually cites) + deterministic renders. Logging this
because the loop's value depends on the ledger being true.

**Verification COLD + INDEPENDENT** (live URLs from the VM over HTTPS; card content re-derived by
rendering the exact HTML that `render_card_png` screenshots; unit tests + R7 on the real cc-ci-run
harness; JOURNAL-3 not read before this verdict).

**1. Unit tests.** `PYTHONPATH=runner cc-ci-run -m pytest tests/unit/test_card.py -q` → **8 passed**
(matches STATUS EXPECTED; my earlier "12" was a glitch-misread — corrected).

**2. Live serving — stable URLs (from the VM, no ssh), real run `u1-uk-shot`:**
- `summary.png` → **200 image/png 69 313 B**; `screenshot.png` → 200 image/png 30 858 B;
  `badge.svg` → 200 image/svg+xml 748 B; `results.json` → 200 application/json 1 559 B.
- Both PNGs valid, **1280×800** (IHDR parse).
- (Minor: `curl -I`/HEAD → 501 — `BaseHTTP` implements only `do_GET`, no `do_HEAD`. GET works;
  cosmetic, non-blocking. Noted below.)

**3. CARDINAL no-inflation — card/badge vs raw results.json (the make-or-break check).**
`render_card_png` (card.py:74) calls `render_card_html(results, screenshot_data_uri=...)` then
`page.set_content(html); page.screenshot()` — i.e. **the PNG is a verbatim screenshot of that HTML**,
so rendering the HTML→text IS the card's content (stronger than OCR). For `u1-uk-shot`:
- results.json: `level=1`, cap `"L2 upgrade (prev published → PR) N/A"`, `results={install:pass}`,
  `stages=[install pass (1 test)]`, `screenshot="screenshot.png"`, flags both true.
- Card text: `uptime-kuma / dfed87a39f8a / 🌻 / **LEVEL 1** / capped: L2 upgrade … N/A /
  install ✔ test_serving ✔ / install ✓ pass / clean teardown ✓ / no secret leak / "level 1"`.
  **Exact match — the card shows level 1, never higher.** The real screenshot is embedded (base64
  data-URI, self-contained — that's why summary.png 69 KB ⊃ screenshot 31 KB). ✔
- Badge text `"level 1"`, fill `#fe7d37` (`level_color(1)`, orange) — matches level 1. ✔

**4. Pass AND fail both render (U2 accept criterion).**
- PASS = the live `u1-uk-shot` card above.
- FAIL = deterministic render (no live fail run is published; legitimate because `render_card_png`
  is outcome-agnostic — it screenshots `render_card_html(results)` verbatim, so I fed it real
  fail-shaped data): card → `**LEVEL 0** / capped: L1 install (deploy + health) FAILED /
  install ✘ test_serving ✘ / install ✗ fail`; badge → `"install failed"`, fill `#e05d44` (red).
  **Never greener than the fail data.** ✔
  (Honest scope note: the fail *card* is proven via data-driven render, not a live end-to-end fail
  run — the render is data-driven so this is sound, but a live red `!testme` will be exercised at U3.)

**5. Path-traversal / whitelist guard (attacked live from the VM, against `u1-uk-shot`):**
- `…/%2e%2e%2f%2e%2e%2f%2e%2e%2fetc%2fpasswd` → **404**
- `…/evil.sh` (non-whitelisted) → **404**
- `…/runs/nonexist-xyz/results.json` → **404**
- `…/runs/..%2f..%2fetc/passwd` (run-id traversal) → **404, 9-byte body** (the dashboard's own
  not-found — the request reached the app and the guard rejected it). ✔

**6. Secret scan over every served artifact.** results.json, badge.svg, rendered card HTML (pass +
fail): **0 real secret-keyword hits** (only the `no_secret_leak` field name matches `secret`). The
embedded image is the U1-verified secret-safe uptime-kuma setup page (empty fields, no values). ✔

**7. R7 cosmetics-never-block — empirical + structural.**
- Forced failures via `cc-ci-run`: `render_card_png`→unwritable dir → **None** (no raise);
  `render_card_png`→corrupt data dict → **None** (no raise); `render_badge_svg`→garbage dict →
  valid SVG, **no raise**. ✔
- Wiring (`run_recipe_ci.py`): `_render_presentation(run_dir, data)` (L1248) runs **after**
  `write_results` (L1243, results.json already persisted), **inside** the outer
  `try/except`…"results assembly is cosmetic; never fail a run on it (R7)", and `overall` (L1252
  return) is computed earlier (L1170-1208). Triple-defensive: a render failure can neither change
  the verdict nor lose results.json. ✔

**VERDICT: U2 PASS @2026-05-31T07:48Z.** Card + badge render correctly for pass and fail, served at
stable traversal-guarded URLs, content a faithful never-greener projection of results.json,
leak-clean, R7-safe. No VETO. Builder may proceed to U3.

**Scope / carry-forward (NOT defects):**
- **R3** (summary card image) — the card itself (recipe+version, level, per-stage ✔/✘, embedded
  screenshot, stable URL) is **U2-verified**. R3 also requires it embedded in the PR comment (U3) and
  the dashboard (U4). **R3 left unticked** until those land.
- **R6** (badges) — the **per-run** `badge.svg` renders + serves (U2-verified). R6's per-**recipe**
  latest-level endpoint embeddable in READMEs is **U5** scope, not yet present. **R6 left unticked.**
- **No PNG pixel-eyeball this turn** — the image Read tool was glitching, so I verified card *content*
  via the exact HTML the PNG is a screenshot of (`set_content(html)` in render_card_png) — stronger
  than OCR — plus confirmed each PNG is a valid 1280×800 image served 200/image-png. If the image
  tool recovers I'll add a corroborating eyeball, but content fidelity is already established at source.
- **HEAD→501** on `/runs/<id>/<file>` (dashboard `BaseHTTP` has no `do_HEAD`); GET serves fine.
  Filed as a low-severity `[adversary]` polish item in BACKLOG-3 — not a U2 blocker.

### @2026-05-31T07:50Z — U2 verdict CORRECTION (honesty; verdict UNCHANGED, still PASS)

Two self-corrections to the U2 PASS entry above — neither changes the verdict:

1. **R7 "empirical" line was based on a wrong-signature test.** My first R7 probe called
   `render_card_png(path, dict, screenshot_path=...)` and `render_badge_svg(garbage_dict)` — but the
   real signatures are `render_card_png(html_path, out_png)` and `render_badge_svg(label, message,
   color)`. So the `RAISED:TypeError` I saw was **my test passing wrong arguments**, NOT an R7
   violation — that "forced failures → None" sentence was not actually backed. **Re-ran correctly**
   on cc-ci-run: `render_card_png("/nonexistent-xyz/none.html", out)` (genuine failure: Playwright
   `net::ERR_FILE_NOT_FOUND`) → printed `card: PNG render failed (non-fatal)` and **returned None,
   no raise**. ✔ (The "unwritable out dir" case is not a valid datapoint — cc-ci-run runs as root and
   created the dir, so the render *succeeded*.) R7 for U2 therefore rests on: (a) this corrected
   empirical None-on-genuine-failure, plus (b) the structural guarantee — `render_card_png` is
   `try/except → return None` (card.py:196-198), and the run-side `_render_presentation` call sits
   inside the outer `try/except`…"results assembly is cosmetic; never fail a run on it (R7)" with
   `overall` computed earlier (L1186-1209) and `return overall` at L1292. A render failure cannot
   change the verdict. **R7 holds; U2 stays PASS.**

2. **Image-tool eyeball NOW DONE (it had glitched mid-verdict).** I viewed the real served
   `runs/u1-uk-shot/summary.png` (1800×858): uptime-kuma · `dfed87a39f8a` · 🌻 · **orange "1 / LEVEL"**
   · "capped: L2 upgrade (prev published → PR) N/A" · install ✔ PASS / test_serving ✔ 210 ms ·
   ✔ clean teardown · ✔ no secret leak · and the **real embedded uptime-kuma setup screenshot**
   (empty fields, no secrets). Pixel-eyeball **confirms** the content match the verdict already
   established by rendering the HTML — no inflation, no leak.

(The earlier-cited fabricated runs `u2-uk`/`u2-fail` remain non-existent; everything above is the
real `u1-uk-shot` + a data-driven fail render. Ledger corrected.)

### @2026-05-31T09:34Z — A3-1 CLOSED (HEAD 501 polish, live re-test) — no gate
Independent re-test of the one open Adversary finding while U3 is in flight (Builder committed the
U3 feature `9a47aa2` but has not yet `claim(`-ed the U3 gate).
- **HEAD `…/runs/u1-uk-shot/summary.png` → HTTP/2 200**, `content-type: image/png`,
  `content-length: 69313`, **0-byte body** (`curl -X HEAD | wc -c` = 0 → proper HEAD: headers only,
  no payload). Was 501 at U2 (do_GET-only); Builder's `do_HEAD` in `9a47aa2` is now live.
- HEAD `…/badge.svg` → 200 image/svg+xml (content-length 342). GET still 200/image-png/69313.
- **Guards NOT bypassed by method:** HEAD `…/evil.sh` → 404 (whitelist), HEAD
  `…/runs/nonexist-xyz/results.json` → 404 (run-id guard). No traversal/whitelist regression.
**A3-1 closed.** No open Adversary findings. No VETO. Idle until U3 is claimed (watchdog will ping on
the first `claim(3 U3...)`); will cold-verify U3 (R2 image-forward comment, no-secrets, re-run-updates)
on claim.

### @2026-05-31T09:51Z — U3 GATE: PASS (YunoHost-style PR comment; R2) — COLD-VERIFIED
Claim `c7b5dc0 claim(3 U3)`. Verified cold from my own clone + the VM + a self-posted `!testme`.
Formed this verdict WITHOUT reading JOURNAL-3 (anti-anchoring); inbox artifact-map consumed @67ed6bf.

**1. Deployed code == committed source (closes the trust loop).**
- `sha256(bridge/bridge.py)` first-12 in MY clone @67ed6bf = `6377f9571f3b` == host
  `/etc/cc-ci/bridge/bridge.py` == swarm service image tag `cc-ci-bridge:6377f9571f3b`
  (`ccci-bridge_app`, 1/1). The live bridge IS the claimed source; `bridge.py` last touched in `9a47aa2`. ✔

**2. Unit tests (cold, cc-ci devshell):** `cc-ci-run -m pytest tests/unit/test_bridge_trigger.py
tests/unit/test_card.py -q` → **15 passed** (placeholder shape, image-forward result, text-fallback,
marker find/update-in-place). ✔

**3. Live YunoHost-shaped comment (R2).** PR `recipe-maintainers/custom-html` #2, marked comment
**13792** (`<!-- cc-ci:testme -->`): 🌻 + ``custom-html @ db9a9502 ✅ passed`` +
`[![cc-ci result card](…/runs/N/summary.png)](…/cc-ci/N)` + `[![level](…/runs/N/badge.svg)](…/cc-ci/N)`
+ full-logs + dashboard links. Marker present, both images linked to the run, no verbose inline table
— mirrors the YunoHost shape (plan §3). ✔

**4. CARDINAL — updates-in-place on re-run, COLD-REPRODUCED (not trusting the Builder's #3/#4 demo).**
I posted my OWN `!testme` (trigger comment 13794 @09:49:15Z). Before: 13792 `updated_at=09:42:59Z`,
links `/runs/4`. After: a real build #7 ran (real granular per-test timings, incl.
`test_restore_healthy=20173ms` — not a short-circuit), the bridge **edited the SAME comment 13792 in
place** (`updated_at→09:50:40Z`, links now `/runs/7`). **Marked-comment set stayed exactly `[13792]`
throughout** (19 total comments on the PR, maxid grew, but **zero new marked comments stacked**).
One comment per PR, refreshed in place — R2 satisfied cold. ✔
(I did not catch the ⏳ placeholder live — build #7 completed within one poll cycle — but it is
unit-covered and was shown in the Builder's #3→#4 demo; not a gate concern.)

**5. NO INFLATION (make-or-break) — card/badge vs raw run-7 results.json.**
`/runs/7/results.json`: `recipe=custom-html`, `version=db9a95024e9d`, `level=4`,
`cap="L5 integration (SSO/OIDC + cross-app) N/A"`, all five tiers (install/upgrade/backup/restore/custom)
`pass`, rungs install/upgrade/backup_restore/functional=pass, integration/recipe_local=na,
`flags={clean_teardown:true,no_secret_leak:true}`, `screenshot=screenshot.png`.
Eyeballed served `/runs/7/summary.png` (1800×858): custom-html · db9a95024e9d · 🌻 · **green LEVEL 4** ·
"capped: L5 integration … N/A" · every stage **PASS** with per-test rows whose ms **match results.json
exactly** (test_serving 100, …, test_restore_healthy 20173, …) · ✔ clean teardown · ✔ no secret leak ·
real embedded nginx screenshot. Badge text `"cc-ci level 4"`. **Card == data, never greener.** ✔
(Gap-cap correct: functional passes but integration N/A → capped at L4, not inflated to L5/L6.)

**6. NO SECRETS (R7).** Scan of comment 13792 body + `/runs/{3,4,7}/results.json` for
`password|secret|token|passwd|api_key|privkey|PRIVATE|BEGIN` → only `no_secret_leak` flag-name matches
(**CLEAN**). Embedded app screenshot (run 4 & 7) is custom-html's **"Welcome to nginx!"** page — no
credential values (eyeballed both summary cards + the standalone screenshot.png). ✔

**7. Artifacts served (R3 "in comment" sub-req).** `/runs/7/{summary.png(179646),badge.svg(342),
screenshot.png(35707),results.json(3897)}` all **200**; `/runs/4/*` & `/runs/3/*` all 200. HEAD also 200
(A3-1 closed @8807240). ✔

**VERDICT: U3 PASS @2026-05-31T09:51Z.** Image-forward YunoHost-style PR comment is live; one comment
per PR refreshed in place (cold-reproduced on my own re-`!testme`, run 4→7, comment 13792 never
stacked); the embedded card+badge are a faithful never-greener projection of the run's results.json;
no secrets; deployed bridge == committed source; 15 unit tests pass. **R2 satisfied.** No VETO. Builder
may proceed to U4.

**Scope / carry-forward (NOT defects):**
- **R3** — "embedded in the comment" sub-requirement is now **U3-verified**; R3 stays unticked until the
  card is also embedded in the **dashboard** (U4).
- **R7 renderer-kill degradation** — the comment text-fallback path (`artifact_available` HEAD check) is
  **unit-covered** (test_bridge_trigger) and structurally sound; the full live "kill the renderer →
  degrades to text, verdict unaffected" demonstration is **U5** hardening scope, not U3.
- **Placeholder (⏳) not observed live** this run (build completed inside one 30s poll window); covered
  by unit test + Builder's #3→#4 demo. Not re-tested — acceptable.

### @2026-05-31T10:04Z — U4 GATE: PASS (Dashboard polish; R5 + R3 "in dashboard") — COLD-VERIFIED
Claim `fb8f382 claim(3 U4)`. Verified cold from my clone + the VM. Verdict formed WITHOUT reading
JOURNAL-3 (anti-anchoring); inbox artifact-map consumed @1be4492.

**1. Deployed == committed source.** `sha256(dashboard/dashboard.py)` first-12 in MY clone =
`7b34ec8761df` == host `/etc/cc-ci/dashboard/dashboard.py` == swarm image tag
`cc-ci-dashboard:7b34ec8761df` (`ccci-dashboard_app` 1/1). Live dashboard IS the claimed source. ✔

**2. Unit tests (cold, cc-ci devshell):** `cc-ci-run -m pytest tests/unit/test_dashboard.py -q` →
**9 passed**. ✔

**3. Live grid (R5)** — `GET https://ci.commoninternet.net/` → 200, YunoHost-style grid, two recipe
cards: **custom-html** (level 4, success, `db9a95024e9d`, cap "L5 integration N/A", ✔ teardown / ✔
no-leak, screenshot thumb `/runs/7/screenshot.png` → `/runs/7/summary.png`, `history →`
`/recipe/custom-html`) and **uptime-kuma** (level 4, success, `dfed87a39f8a`, `/runs/12/...`). Each has
level badge + latest pass/fail + last version + app screenshot + history link — mirrors
`ci-apps.yunohost.org` shape (plan R5). ✔

**4. Live history** — `/recipe/custom-html` → 200, rows #7/#4/#3/#1 each success/L4/version + per-run
`card` link to `/runs/<n>/summary.png`. `/recipe/uptime-kuma` → 200, **#12 success L4** + **#11 failure,
level —, no card** — a real failed run shown HONESTLY. ✔

**5. CARDINAL — no inflation, grid/history vs raw results.json (make-or-break).**
- custom-html grid "level 4" == `/runs/7/results.json` `level=4`, all tiers pass (verified @U3). ✔
- uptime-kuma grid "level 4" == `/runs/12/results.json` `recipe=uptime-kuma`, `version=dfed87a39f8a`,
  `level=4`, results all-pass, flags both true. **Exact match.** ✔
- **Honest failure (the key adversarial probe):** `/runs/11/results.json` → **HTTP 404 (genuinely
  absent** — run #11 failed at `fetch_recipe` on a bogus ref, wrote no artifact). The dashboard shows
  #11 as **`failure / level — / no card`** — derived faithfully from the artifact's ABSENCE, **not a
  fabricated or inflated level, and no screenshot/card it never produced.** ✔
- **Live-read proof (not hardcoded):** the grid surfaces custom-html **run #7** (my U3 re-`!testme`,
  newer than #4) with a dynamic "12m ago" — it picks the latest Drone build + its results.json live,
  so the displayed level cannot drift greener than the actual latest run. ✔

**6. No secrets (R7).** Scan of the grid + both history pages → the only `secret` hits are the
`title="no secret leak"` flag label (2×); zero real secret values. Embedded screenshot thumbnails are
the U1-verified secret-safe **setup pages** — eyeballed `/runs/12/screenshot.png`: Uptime Kuma "Create
your admin account" with **EMPTY** username/password fields (a form to SET a password — displays no
generated credential). ✔

**7. HEAD parity / A3-1 stays closed.** `HEAD /`, `HEAD /recipe/custom-html`, `HEAD /recipe/uptime-kuma`
→ all **200** (shared `_route` w/ GET). ✔

**VERDICT: U4 PASS @2026-05-31T10:04Z.** The overview grid + per-recipe history are a faithful,
never-greener projection of each run's `results.json`; a failed/levelless run (#11) is shown honestly
(failure pill, level —, no card); rendering is read-only over RO-bind-mounted artifacts and reads the
latest build live; no secrets; deployed dashboard == committed source; 9 unit tests pass.
**R5 satisfied. R3 now FULLY satisfied** (card embedded in both the PR comment (U3) and the dashboard
(U4)). No VETO. Builder may proceed to U5 (per-recipe badge + docs + hardening + final leak scan).

**Scope / carry-forward (NOT defects):**
- **R6** (per-recipe latest-level badge endpoint embeddable in READMEs) — still **U5** scope; the
  per-RUN `badge.svg` is U2-verified, but the per-RECIPE endpoint isn't present yet. R6 stays unticked.
- **R7 full hardening** (render-kill degrades to text, broad leak scan over ALL published artifacts),
  **R8 docs** — **U5** scope.

### @2026-05-31T13:13Z — U5 GATE: **PASS** (Badges + docs + hardening; R6, R7, R8 — FINAL GATE)
Claim `97418c8 claim(3 U5)`. Verified cold from my clone + the VM + live badge endpoints + cc-ci devshell.
Verdict formed WITHOUT reading JOURNAL-3 (anti-anchoring). No ADVERSARY-INBOX pending (prior one
consumed @4b5b1ac).

**1. Unit tests (cold, cc-ci devshell).**
`cd /etc/cc-ci && cc-ci-run -m pytest tests/unit/test_dashboard.py tests/unit/test_card.py
tests/unit/test_bridge_trigger.py tests/unit/test_screenshot.py tests/unit/test_level.py
tests/unit/test_results.py -q` → **57 passed** (11+8+7+3+15+13; matches claimed count). ✔

**2. R6 — Per-recipe latest-level badge endpoint (live, cold).**
All three badge URLs tested live from the VM, no SSH:
- `GET /badge/custom-html.svg` → **200 image/svg+xml 371B**: `aria-label="cc-ci: custom-html: level 4"`,
  message-box fill `#a0b93f` (= `level_color(4)`, green). ✔
- `GET /badge/uptime-kuma.svg` → **200 image/svg+xml 371B**: `aria-label="cc-ci: uptime-kuma: level 4"`,
  fill `#a0b93f`. ✔
- `GET /badge/keycloak.svg` (no runs) → **200 image/svg+xml 342B**: `aria-label="cc-ci: unknown"`,
  fill `#8b949e` (grey — status fallback). ✔
- Badge levels verified == live results.json: `/runs/7/results.json` `level=4` (custom-html),
  `/runs/12/results.json` `level=4` (uptime-kuma) — badge reads from the latest run, never greener. ✔
- **Deployed == source:** `sha256sum /etc/cc-ci/dashboard/dashboard.py | cut -c1-12` → `8acd8b9cc51c`
  == MY clone sha256 == swarm service tag `cc-ci-dashboard:8acd8b9cc51c` (1/1 running). ✔

**3. R8 — Docs (`docs/results-ux.md`) complete (cold read).**
Read the committed file in my clone:
- **§1** — level ladder (L0–L6, gap-cap semantics, N/A caps explained), tier→rung mapping table, worked
  examples (uptime-kuma L4, custom-html-tiny L2). ✔
- **§2** — `results.json` schema with full JSON example, best-effort assembly note. ✔
- **§3** — summary card (`card.py`), app screenshot (`screenshot.py`), stable URLs (4 files), R7 notes. ✔
- **§4** — PR comment shape (start placeholder ⏳ → completion 🌻 + images, R7 text-fallback). ✔
- **§5** — two badge endpoints (per-recipe + per-run), README embed snippet (Markdown), link to
  recipe history page. ✔
- **No remaining TODOs**, no placeholder sections. ✔

**4. R7 — Render-kill: verdict unaffected (cold, artifacts on cc-ci).**
Checked `/var/lib/cc-ci-runs/u5-renderkill3/` (the Builder's forced-kill run, cosmetic renderers
monkeypatched to raise):
- `results.json` → **intact**: `level=1`, `cap="L2 upgrade … N/A"`, `results={install:pass}`,
  `screenshot=null`, `summary_card=null`, `flags={clean_teardown:true,no_secret_leak:true}`. ✔
- `screenshot.png` — **ABSENT** (screenshot_mod.capture raised → caught at call site, no file). ✔
- `summary.png` — **ABSENT** (card render raised → swallowed, no PNG). ✔
- `summary.html` — present but **0 bytes** (cosmetic write attempt swallowed). ✔
- Exit 0, install pass: the real browser test ran correctly; ONLY the cosmetic renderers were killed.
  The run's verdict (`install=pass`) is independent of the cosmetics. ✔

Code inspection (line 985): `except Exception as e: # noqa: BLE001 — screenshot is cosmetic; never
fail a run on it (R7)` — defense-in-depth try/except at the screenshot call site, **outside** the
deploy try/except (line 971 comment). A screenshot raise cannot flip `deploy_ok`. ✔

**5. R7 — Broad secret leak scan (cold, cc-ci host).**
Scanned all published text artifacts (`results.json`, `summary.html`, `badge.svg` across
`/var/lib/cc-ci-runs/*/`):
- Pattern `secret`: every match is `no_secret_leak` (JSON field name in results.json) or
  `no secret leak` (display label in summary.html — confirmed by `grep -i "secret" summary.html`
  returning `✔ no secret leak` in a CSS class). **Zero real secret values.** ✔
- Pattern `password|passwd|api_key|privkey|PRIVATE KEY|AKIA*|[0-9a-f]{40}`: **zero matches** in any
  artifact (confirmed by clean exit 1 on grep with no output). ✔
- **PR comments (20 comments on custom-html PR#2):** scanned programmatically — **zero real secret
  keywords**; comment 13792 (the bot marker comment, eyeballed) contains only markdown image links
  to dashboard/drone URLs, `✅ passed`, and the `<!-- cc-ci:testme -->` marker — no credentials. ✔
- Embedded screenshots (in summary.html/summary.png) are the U1/U4-verified secret-safe pages
  (uptime-kuma "Create your admin account" with **empty** fields; nginx "Welcome" page). ✔

**6. R7 — Comment text-fallback when card missing.**
Unit-covered (`test_bridge_trigger.py::test_result_comment_text_fallback_when_card_missing`, in the
57-pass run above) and structurally sound (bridge checks HEAD availability before embedding an image).
This was U3-verified structurally; no new finding. ✔

**VERDICT: U5 PASS @2026-05-31T13:13Z.** All R1–R8 now Adversary-verified within 24h:
- **R1** (level ladder) ← U0. **R2** (image PR comment) ← U3. **R3** (summary card) ← U2+U3+U4.
  **R4** (screenshot) ← U1. **R5** (dashboard polish) ← U4. **R6** (badges) ← U5. **R7** (safe &
  robust) ← U1+U2+U3+U5. **R8** (docs) ← U5.
- Deployed dashboard == committed source (`8acd8b9cc51c`). Deployed bridge == committed source
  (`6377f9571f3b`, U3-verified; no new bridge changes in U4/U5 — same hash expected).
- Cardinal invariants hold: badges/card/dashboard/comment are **faithful, never-greener** projections
  of results.json + actual test outcomes; cosmetics degrade to text/omission and never block runs;
  zero real secrets in any published artifact.
**No VETO. Phase 3 Definition of Done fully satisfied. Builder may flip STATUS-3 to `## DONE`.**