Files
cc-ci/docs/results-ux.md

9.1 KiB
Raw Blame History

cc-ci Results UX — level ladder, summary card, screenshot & badges (Phase 3, R8)

This doc explains how a cc-ci run is presented: the level a run earns, the summary card + app screenshot rendered for it, the PR comment it posts, and the badges you can embed. It is the R8 reference for Phase 3 (plan-phase3-results-ux.md).

Presentation never changes the verdict. The level and card report the test outcomes; they can only ever understate, never overstate, what the tests actually verified (the cardinal guardrail). The authoritative pass/fail is the run's exit status + the per-tier results; the level is a summary.


1. The level ladder (R1)

Every run earns a single integer level 06. The ladder is cumulative with YunoHost gap-caps-the-level semantics: you earn level L only if every rung 1..L was a clean PASS. The first rung that is not a clean PASS — a real FAIL or genuinely N/A for this recipe — stops the climb, and level_cap_reason records which rung and why.

Level Rung Earned when
L0 install failed / the app never became healthy.
L1 install deploys and passes health/readiness.
L2 upgrade previous published version → PR/latest, stays healthy, data intact.
L3 backup/restore seeded data survives backup → wipe → restore.
L4 functional the recipe-specific functional tests pass.
L5 integration SSO/OIDC + cross-app integration tests pass.
L6 recipe-local the recipe repo's own tests/ (D4) pass and are merged.

N/A caps, fairly. A rung that does not apply to a recipe (only one published version → no upgrade; not backup-capable; no SSO/integration surface; no recipe-local tests) is N/A, which caps the climb at the rung below it with a recorded reason — it is not counted as a failure. This is the only fair reading of "a missing lower rung caps the level": e.g. a recipe with no integration surface caps at L4 by definition, shown as level_cap_reason = "L5 integration … N/A". A stateless app whose functional tests pass but which cannot be backed up is honestly capped at L2 ("L3 backup/restore … N/A") rather than shown as L4 — understating is safe; overstating is forbidden.

Worked examples (real runs):

  • uptime-kuma — install+upgrade+backup+restore+functional all pass, no SSO surface → L4 (cap = "L5 integration (SSO/OIDC + cross-app) N/A").
  • custom-html-tiny — stateless, not backup-capable: install+upgrade pass, backup/restore N/A → L2 (cap = "L3 backup/restore (data integrity) N/A").

How tiers map to rungs (the translation layer)

run_recipe_ci.py holds the run's per-tier results (install/upgrade/backup/restore/custom) + deps/SSO signals; runner/harness/results.py::derive_rungs maps them to the rung-status dict that runner/harness/level.py::compute_level scores. The mapping (also in DECISIONS.md, Phase 3):

  • install ← install tier (pass/fail).
  • upgrade ← upgrade tier; skipna (only one published version).
  • backup_restore ← backup AND restore tiers both pass → pass; either fail → fail; not backup-capable → na.
  • functional ← the custom tier minus its SSO tests; a custom failure conservatively fails this rung (we don't split functional-vs-SSO failure → never inflate); no custom tests → na.
  • integration ← applies only if the recipe declares deps; pass iff deps wired and SSO verified and custom didn't fail; recipes with no declared deps → na (the "caps at L4" rule).
  • recipe_local ← the recipe repo's own tests/ (discovery source repo-local) ran and passed; none present → na.

The pure scorer is exhaustively unit-tested + fuzz-verified (all 729 rung combinations: level == count of leading consecutive passes, zero inflation).

Invariant flags (shown, not climbed)

Two Phase-1 gating invariants are surfaced as flags on the card, not as ladder rungs: clean_teardown (the run left no orphaned app/volume/secret and stayed within the deploy budget) and no_secret_leak (no known secret value appears in the published artifact — the Adversary's broader leak scan is the authority).


2. results.json (per run)

Each run writes ${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/results.json (run_id = the Drone build number, or the run's unique app domain for a hand-run). Schema:

{
  "schema": 1, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
  "finished": 0.0,
  "level": 4, "level_cap_reason": "L5 integration (SSO/OIDC + cross-app) N/A",
  "rungs": {"install":"pass","upgrade":"pass","backup_restore":"pass","functional":"pass",
            "integration":"na","recipe_local":"na"},
  "stages": [{"name":"install","status":"pass",
              "tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
  "results": {"install":"pass","upgrade":"pass","backup":"pass","restore":"pass","custom":"pass"},
  "flags": {"clean_teardown": true, "no_secret_leak": true},
  "screenshot": "screenshot.png", "summary_card": "summary.png"
}

Assembly is best-effort: a failure to build/write results.json is logged but never changes the run's exit code (cosmetics never block the pipeline, R7).


3. Summary card + app screenshot (R3/R4)

App screenshot (runner/harness/screenshot.py). After the app deploys and passes health/readiness and before any tier mutates state or teardown runs, the harness captures a real Playwright screenshot of the live app and writes screenshot.png to the run dir. It is secret-safe by default: it shoots the landing page (login/setup forms show input fields, not secret values), viewport-only (full_page=False, no scroll into a secrets panel), and the harness never auto-fills an install wizard. A recipe whose landing page is uninformative may opt into a post-login view via an optional SCREENSHOT hook in tests/<recipe>/recipe_meta.pythat hook owns the no-credential-page guarantee. Capture is best-effort: any error returns None, writes no file, and never blocks the run (R7); results.json.screenshot is set only when a file was actually produced.

Summary card (runner/harness/card.py). After results.json is written, the harness builds an HTML results card — recipe + version, the level badge, a per-stage/per-test ✔/✘ table with timings, the embedded app screenshot (base64 data-URI so the PNG is self-contained), and the invariant flags — and screenshots that HTML to summary.png via the harness Playwright browser. The card reports results.json verbatim — it computes nothing, so it can never show a run greener than its tests (cardinal guardrail). Rendering is best-effort (returns None on failure → no card, run unaffected).

Stable URLs. The dashboard serves the run artifact dir read-only at:

https://ci.commoninternet.net/runs/<run_id>/summary.png      # the card
https://ci.commoninternet.net/runs/<run_id>/screenshot.png   # the app screenshot
https://ci.commoninternet.net/runs/<run_id>/badge.svg        # the per-run level badge
https://ci.commoninternet.net/runs/<run_id>/results.json     # the raw data

<run_id> is the Drone build number. The route is whitelist + traversal-guarded (filenames from a fixed set; run_id charset-restricted; realpath must stay inside the runs dir) and read-only.

4. PR comment (R2)

On a !testme run the comment-bridge (bridge/bridge.py) maintains one comment per PR, updated in place (it carries a hidden <!-- cc-ci:testme --> marker so re-!testme finds and refreshes the same comment rather than stacking new ones):

  1. On start — a 🌻 + placeholder: testing <recipe> @ <sha> + a live-logs link, "level pending".
  2. On completion — the same comment is edited to the YunoHost-shaped result: 🌻 + a level badge image + the summary card image, both linking to the run, plus full-logs/dashboard links.

If the rendered card isn't served (render failed, build didn't finish), the comment falls back to a compact text verdict with the run link (the bridge checks artifact availability with a cheap HEAD request) — R7: a cosmetics failure degrades to text, never a broken image, never affecting the verdict.

5. Badges (R6) + how to embed one

Two SVG badge endpoints, both shields-style and coloured by level (level_color):

  • Per-recipe latest-level (for a recipe README): https://ci.commoninternet.net/badge/<recipe>.svgcc-ci: <recipe> | level N for that recipe's most recent run (falls back to a status badge if the recipe has no level yet). Re-rendered live from the latest results.json.
  • Per-run (pinned to one run, e.g. in the PR comment): https://ci.commoninternet.net/runs/<run_id>/badge.svg.

Embed the per-recipe badge in a recipe README (Markdown), linking to the cc-ci dashboard:

[![cc-ci level](https://ci.commoninternet.net/badge/<recipe>.svg)](https://ci.commoninternet.net/recipe/<recipe>)

The link target …/recipe/<recipe> is that recipe's run-history page (level/version/status per run, with a link to each run's summary card).