Files
cc-ci/docs/results-ux.md

161 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci Results UX — level ladder, summary card, screenshot & badges (Phase 3, R8)
This doc explains how a cc-ci run is presented: the **level** a run earns, the **summary card** +
**app screenshot** rendered for it, the **PR comment** it posts, and the **badges** you can embed.
It is the R8 reference for Phase 3 (`plan-phase3-results-ux.md`).
> Presentation never changes the verdict. The level and card *report* the test outcomes; they can
> only ever understate, never overstate, what the tests actually verified (the cardinal guardrail).
> The authoritative pass/fail is the run's exit status + the per-tier results; the level is a summary.
---
## 1. The level ladder (R1)
Every run earns a single integer **level 06**. The ladder is cumulative with **YunoHost
gap-caps-the-level** semantics: you earn level `L` only if **every rung 1..L was a clean PASS**. The
first rung that is not a clean PASS — a real **FAIL** *or* genuinely **N/A** for this recipe — stops
the climb, and `level_cap_reason` records which rung and why.
| Level | Rung | Earned when |
|------:|------|-------------|
| **L0** | — | install failed / the app never became healthy. |
| **L1** | install | deploys and passes health/readiness. |
| **L2** | upgrade | previous published version → PR/latest, stays healthy, data intact. |
| **L3** | backup/restore | seeded data survives backup → wipe → restore. |
| **L4** | functional | the recipe-specific functional tests pass. |
| **L5** | integration | SSO/OIDC + cross-app integration tests pass. |
| **L6** | recipe-local | the recipe repo's own `tests/` (D4) pass and are merged. |
**N/A caps, fairly.** A rung that does not apply to a recipe (only one published version → no
upgrade; not backup-capable; no SSO/integration surface; no recipe-local tests) is **N/A**, which
caps the climb at the rung below it with a recorded reason — it is *not* counted as a failure. This is
the only fair reading of "a missing lower rung caps the level": e.g. a recipe with **no integration
surface caps at L4 by definition**, shown as `level_cap_reason = "L5 integration … N/A"`. A stateless
app whose functional tests pass but which cannot be backed up is honestly capped at **L2** (`"L3
backup/restore … N/A"`) rather than shown as L4 — understating is safe; overstating is forbidden.
Worked examples (real runs):
- `uptime-kuma` — install+upgrade+backup+restore+functional all pass, no SSO surface → **L4**
(`cap = "L5 integration (SSO/OIDC + cross-app) N/A"`).
- `custom-html-tiny` — stateless, not backup-capable: install+upgrade pass, backup/restore N/A →
**L2** (`cap = "L3 backup/restore (data integrity) N/A"`).
### How tiers map to rungs (the translation layer)
`run_recipe_ci.py` holds the run's per-tier results (`install/upgrade/backup/restore/custom`) +
deps/SSO signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict that
`runner/harness/level.py::compute_level` scores. The mapping (also in `DECISIONS.md`, Phase 3):
- **install** ← install tier (pass/fail).
- **upgrade** ← upgrade tier; `skip`**na** (only one published version).
- **backup_restore** ← backup AND restore tiers both pass → pass; either fail → fail; not
backup-capable → **na**.
- **functional** ← the custom tier minus its SSO tests; a custom failure conservatively fails this
rung (we don't split functional-vs-SSO failure → never inflate); no custom tests → **na**.
- **integration** ← applies only if the recipe declares deps; pass iff deps wired and SSO verified and
custom didn't fail; recipes with no declared deps → **na** (the "caps at L4" rule).
- **recipe_local** ← the recipe repo's own `tests/` (discovery source `repo-local`) ran and passed;
none present → **na**.
The pure scorer is exhaustively unit-tested + fuzz-verified (all 729 rung combinations: level ==
count of leading consecutive passes, zero inflation).
### Invariant flags (shown, not climbed)
Two Phase-1 gating invariants are surfaced as flags on the card, not as ladder rungs:
`clean_teardown` (the run left no orphaned app/volume/secret and stayed within the deploy budget) and
`no_secret_leak` (no known secret value appears in the published artifact — the Adversary's broader
leak scan is the authority).
---
## 2. `results.json` (per run)
Each run writes `${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/results.json` (`run_id` = the Drone
build number, or the run's unique app domain for a hand-run). Schema:
```json
{
"schema": 1, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
"finished": 0.0,
"level": 4, "level_cap_reason": "L5 integration (SSO/OIDC + cross-app) N/A",
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"pass","functional":"pass",
"integration":"na","recipe_local":"na"},
"stages": [{"name":"install","status":"pass",
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
"results": {"install":"pass","upgrade":"pass","backup":"pass","restore":"pass","custom":"pass"},
"flags": {"clean_teardown": true, "no_secret_leak": true},
"screenshot": "screenshot.png", "summary_card": "summary.png"
}
```
Assembly is **best-effort**: a failure to build/write `results.json` is logged but never changes the
run's exit code (cosmetics never block the pipeline, R7).
---
## 3. Summary card + app screenshot (R3/R4)
**App screenshot** (`runner/harness/screenshot.py`). After the app deploys and passes health/readiness
and **before any tier mutates state or teardown runs**, the harness captures a real Playwright
screenshot of the live app and writes `screenshot.png` to the run dir. It is **secret-safe by
default**: it shoots the **landing page** (login/setup forms show input *fields*, not secret values),
viewport-only (`full_page=False`, no scroll into a secrets panel), and the harness never auto-fills an
install wizard. A recipe whose landing page is uninformative may opt into a post-login view via an
optional `SCREENSHOT` hook in `tests/<recipe>/recipe_meta.py` — **that hook owns the no-credential-page
guarantee**. Capture is **best-effort**: any error returns `None`, writes no file, and never blocks the
run (R7); `results.json.screenshot` is set only when a file was actually produced.
**Summary card** (`runner/harness/card.py`). After `results.json` is written, the harness builds an
HTML results card — recipe + version, the level badge, a per-stage/per-test ✔/✘ table with timings,
the embedded app screenshot (base64 data-URI so the PNG is self-contained), and the invariant flags —
and screenshots that HTML to `summary.png` via the harness Playwright browser. The card **reports
`results.json` verbatim — it computes nothing**, so it can never show a run greener than its tests
(cardinal guardrail). Rendering is best-effort (returns `None` on failure → no card, run unaffected).
**Stable URLs.** The dashboard serves the run artifact dir read-only at:
```
https://ci.commoninternet.net/runs/<run_id>/summary.png # the card
https://ci.commoninternet.net/runs/<run_id>/screenshot.png # the app screenshot
https://ci.commoninternet.net/runs/<run_id>/badge.svg # the per-run level badge
https://ci.commoninternet.net/runs/<run_id>/results.json # the raw data
```
`<run_id>` is the Drone build number. The route is whitelist + traversal-guarded (filenames from a
fixed set; `run_id` charset-restricted; realpath must stay inside the runs dir) and read-only.
## 4. PR comment (R2)
On a `!testme` run the comment-bridge (`bridge/bridge.py`) maintains **one comment per PR, updated in
place** (it carries a hidden `<!-- cc-ci:testme -->` marker so re-`!testme` finds and refreshes the
same comment rather than stacking new ones):
1. **On start** — a 🌻 + ⏳ placeholder: `testing <recipe> @ <sha>` + a live-logs link, "level pending".
2. **On completion** — the same comment is edited to the YunoHost-shaped result: 🌻 + a **level badge**
image + the **summary card** image, **both linking to the run**, plus full-logs/dashboard links.
If the rendered card isn't served (render failed, build didn't finish), the comment **falls back to a
compact text verdict** with the run link (the bridge checks artifact availability with a cheap HEAD
request) — R7: a cosmetics failure degrades to text, never a broken image, never affecting the verdict.
## 5. Badges (R6) + how to embed one
Two SVG badge endpoints, both shields-style and coloured by level (`level_color`):
- **Per-recipe latest-level** (for a recipe README): `https://ci.commoninternet.net/badge/<recipe>.svg`
`cc-ci: <recipe> | level N` for that recipe's most recent run (falls back to a status badge if the
recipe has no level yet). Re-rendered live from the latest `results.json`.
- **Per-run** (pinned to one run, e.g. in the PR comment):
`https://ci.commoninternet.net/runs/<run_id>/badge.svg`.
Embed the per-recipe badge in a recipe README (Markdown), linking to the cc-ci dashboard:
```markdown
[![cc-ci level](https://ci.commoninternet.net/badge/<recipe>.svg)](https://ci.commoninternet.net/recipe/<recipe>)
```
The link target `…/recipe/<recipe>` is that recipe's run-history page (level/version/status per run,
with a link to each run's summary card).