Files
cc-ci/machine-docs/REVIEW-3.md

5.6 KiB
Raw Blame History

REVIEW-3 — Adversary verdicts for cc-ci Phase 3 (Beautiful YunoHost-style results UX)

SSOT for this phase: /srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md. This is the Adversary-owned, append-only verdict log for Phase 3. The Builder owns STATUS-3.md / JOURNAL-3.md / BACKLOG-3.md ## Build backlog. I own this file + BACKLOG-3.md ## Adversary findings.

Definition of Done (Phase 3) — R1R8, each to be Adversary cold-verified within 24h

  • R1 — Level ladder. Documented ladder (§4.1) maps passed test sets → one integer level per run; a missing lower rung caps the level (YunoHost semantics).
  • R2 — Image-forward PR comment. !testme posts/updates a Gitea PR comment: marker (🌻) + status/level badge + summary image, both linking to run/dashboard; re-run updates same comment.
  • R3 — Summary card image. Per-run PNG: recipe+version, level, per-stage/per-test ✔/✘ breakdown, embedded deployed-app screenshot; stable URL; in comment + dashboard.
  • R4 — App screenshot. Runner captures real screenshot of deployed app (Playwright, post-login where needed) for the card.
  • R5 — Dashboard polish. Overview at ci.commoninternet.net resembles ci-apps.yunohost.org: recipe grid w/ level badge, latest pass/fail, last version, app screenshot, history link.
  • R6 — Badges. Per-recipe level/status SVG badge endpoint embeddable in READMEs + dashboard.
  • R7 — Safe & robust. No secrets in images/comments/badges/screenshots (reuse P1 §4.4 redaction; screenshot must not capture secret values). Image gen never blocks/fails the pipeline: on error → text fallback + recorded failure; verdict unaffected.
  • R8 — Docs. docs/ explains ladder, card/screenshot/badge generation, badge embedding.

Milestone gates (each ends with an Adversary gate) — U0..U5

  • U0 — Results schema + level (results.json per-stage/per-test; level correct for L4-pass & L2-cap).
  • U1 — App screenshot (real, post-login, secret-safe).
  • U2 — Summary card + badge (HTML→PNG; level/✔✘/screenshot; SVG badge; stable URLs; pass+fail).
  • U3 — YunoHost-style PR comment (marker+badge+card, linked; updates on re-run; no secrets).
  • U4 — Dashboard polish (grid mirrors underlying results across several runs).
  • U5 — Badges + docs + hardening (leak scan clean; renderer-kill degrades to text; flip DONE).

Adversary invariants to attack this phase (from §6 guardrails)

  1. Presentation never inflates the verdict — rendered level/card MUST match raw results.json & actual test outcomes. A card greener than its tests = FAIL.
  2. No secrets in any artifact — comments, badges, cards, app screenshots (esp. generated admin/app passwords; screenshot must avoid credential pages).
  3. Cosmetics never block the pipeline — render/screenshot/badge failure degrades to text + warning; never fails or hangs a run; respects P1 timeouts.
  4. No test-weakening to raise a level — watch for softened tests or mis-mapped rungs inflating displayed quality.

Verdict log (append-only)

@2026-05-31T05:42Z — Phase-3 Adversary loop live (no gate yet)

Cold orient on first wake into Phase 3. Findings:

  • Phase 3 plan read in full (SSOT). DoD = R1R8; milestones U0U5; guardrails internalised above.
  • No Phase-3 work exists yet: no STATUS-3.md / JOURNAL-3.md / BACKLOG-3.md in machine-docs/; no ADVERSARY-INBOX; HEAD = 7123d82 status(2b): ## DONE. Builder has not started §1/U0.
  • Prerequisite note (not my call, recorded for honesty): plan-phase3 §0 says "Do not start until Phase 2 STATUS.md shows ## DONE (Adversary-verified)." Phase-2 ## DONE is not yet flipped and REVIEW-2.md carries a standing VETO (named upgrade-to-latest checklist satisfied, but full Phase-2 DONE authorization is a separate later step per REVIEW-2 @2026-05-31). Phase 2b IS DONE. The operator kicked Phase 3 off manually (transition = manual per §Status). Sequencing across phases is an operator call (cf. STATUS-2b note), so I proceed with Phase-3 adversary duties; I am NOT treating the Phase-2 VETO as a Phase-3 blocker, only flagging the dependency.
  • Nothing claimed → idle per liveness protocol; watchdog pings me on the first claim(3...) commit.

No verdict. No VETO (Phase-3). Awaiting Builder's first gate claim.

@2026-05-31T05:55Z — PRE-CLAIM RECON (NOT a verdict): U0.1 pure level() mapper fuzz-clean

Builder committed 9773e3f feat(3 U0.1): pure level() ladder mapper + unit tests but has NOT claimed any gate (STATUS-3 "## Gate (none claimed)"). I probed early so I'm focused when U0 lands. Cold-run from a fresh clone on the cc-ci host @9773e3f (cc-ci-run -m pytest tests/unit/test_level.py):

  • Builder's 15 unit tests: 15 passed.
  • My own adversarial inputs (6 cases the Builder didn't write): all correct — mid/higher passes never rescue a lower gap; install na/fail → L0; all-na-above-install → L1.
  • Exhaustive fuzz: all 3^6 = 729 rung combinations → compute_level level == count of leading consecutive passes, 0 mismatches. The pure mapper provably cannot inflate the level. Binding question deferred to the U0 claim: inflation can only enter via the translation layer (run_recipe_ci.py mapping raw per-tier results + deps/SSO signals → the rung dict) and via whether results.json is actually emitted per real run. The pure function is sound; I will attack the mapping and the real emitted artifact when U0 is CLAIMED. Not anchoring on the Builder's narrative — this is my own cold re-run + fuzz. No verdict yet.