Files
cc-ci/docs/results-ux.md
2026-06-11 07:45:18 +00:00

178 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci Results UX — level ladder, summary card, screenshot & badges (Phase 3, R8)
This doc explains how a cc-ci run is presented: the **level** a run earns, the **summary card** +
**app screenshot** rendered for it, the **PR comment** it posts, and the **badges** you can embed.
It is the R8 reference for Phase 3 (`plan-phase3-results-ux.md`).
> Presentation never changes the verdict. The level and card *report* the test outcomes; they can
> only ever understate, never overstate, what the tests actually verified (the cardinal guardrail).
> The authoritative pass/fail is the run's exit status + the per-tier results; the level is a summary.
---
## 1. The level ladder (phase lvl5 semantics, operator-decided 2026-06-11)
Every run earns a single integer **level 05** over the FIVE essential rungs:
| Level | Rung | Earned when |
|------:|------|-------------|
| **L0** | — | install failed / the app never became healthy. |
| **L1** | install | deploys and passes health/readiness. |
| **L2** | upgrade | previous published version → PR/latest, stays healthy, data intact. |
| **L3** | backup/restore | seeded data survives backup → wipe → restore. |
| **L4** | functional | the recipe-specific functional tests pass. |
| **L5** | lint | `abra recipe lint` passes against the exact ref under test. |
Each rung has one of FOUR statuses, and the level is:
level = the highest rung that PASSED, where every rung below it is "pass" or an intentional skip
- **pass / fail** — the rung was exercised. A FAIL blocks: no rung above it counts, however green.
- **skip (intentional)** — the rung *genuinely does not apply*, from a declared or structural fact:
not backup-capable (declared), only one published version (no upgrade target), or a declared
`EXPECTED_NA`. Intentional skips are **climbed past** — a stateless recipe with passing
functional tests and a clean lint reaches **L5**, not the old "capped at 2".
- **unver (unverified)** — the rung *should* have run but didn't: infra error, missing tool,
harness exception, prior-stage abort, timeout. **The level cannot rise above an unverified
rung** — it blocks exactly like a fail (we never claim what we didn't check). Anything
unclassifiable defaults to unver (conservative).
There is **no capping concept** (no `cap_reason`, no `capped`): the per-rung table
(✔ / ✘ / intentional-skip / unverified) on the card and in `results.json.rungs` is the sole
carrier of "why isn't this level higher". Worked examples:
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → **level 1** (fail blocks).
- install ✔, upgrade ✔, backup skip (not capable), functional ✔, lint ✔ → **level 5**.
- install ✔, upgrade ✔, backup unver (harness error), functional ✔, lint ✔ → **level 2**.
- all four ✔, lint unver (abra missing) → **level 4** (an unverified top rung isn't earned).
Integration (SSO/OIDC + cross-app) and recipe-local tests are **optional capabilities**, not
rungs — they never affect the level (SSO remains enforced for the run VERDICT).
### How tiers map to rungs (the translation layer)
`run_recipe_ci.py` holds the run's per-tier results (`install/upgrade/backup/restore/custom`) +
structural signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict
that `runner/harness/level.py::compute_level` scores. The full intentional-vs-unintentional
classification table for every N/A source is in `machine-docs/DECISIONS.md` (phase lvl5). Summary:
- **install** ← install tier (pass/fail; a non-run is unver — install always applies).
- **upgrade** ← upgrade tier; tier skipped with no upgrade target (single published version,
structural) → skip; declared `EXPECTED_NA` → skip; otherwise unver.
- **backup_restore** ← backup AND restore tiers both pass → pass; either fail → fail; not
backup-capable (structural/declared) → skip; unverified-while-capable → unver.
- **functional** ← the custom tier; a custom failure conservatively fails this rung; no custom
tests is a coverage GAP → unver, unless declared `EXPECTED_NA["functional"]` → skip.
- **lint** ← the lint executor (`runner/harness/lint.py`): `abra recipe lint` on a pristine
scratch clone of the run's recipe tree at the exact tested sha, 60s hard budget, full output in
the run artifact `lint.txt`. pass/fail only — when lint can't run the rung is **unver** (never
a silent pass, never an intentional skip). Lint never changes the run verdict.
### Invariant flags (shown, not climbed)
Two Phase-1 gating invariants are surfaced as flags on the card, not as ladder rungs:
`clean_teardown` (the run left no orphaned app/volume/secret and stayed within the deploy budget) and
`no_secret_leak` (no known secret value appears in the published artifact — the Adversary's broader
leak scan is the authority).
---
## 2. `results.json` (per run)
Each run writes `${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/results.json` (`run_id` = the Drone
build number, or the run's unique app domain for a hand-run). Schema:
```json
{
"schema": 2, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
"finished": 0.0,
"level": 5,
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"skip","functional":"pass",
"lint":"pass"},
"lint": {"status":"pass","detail":"","rules_failed":[]},
"skips": {"intentional": {"backup_restore": "not backup-capable (no backupbot labels / declared)"},
"unintentional": []},
"stages": [{"name":"install","status":"pass",
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
"results": {"install":"pass","upgrade":"pass","backup":"skip","restore":"skip","custom":"pass"},
"flags": {"clean_teardown": true, "no_secret_leak": true},
"screenshot": "screenshot.png", "summary_card": "summary.png"
}
```
`rungs` carries the four-status vocabulary above; `skips.intentional` maps each intentionally
skipped rung to its (declared or structural) reason and `skips.unintentional` lists the
unverified rungs. `lint` carries the L5 rung outcome + failing rule ids; the full
`abra recipe lint` output is served at `/runs/<run_id>/lint.txt`. Pre-lvl5 artifacts
(`"schema": 1`, 4-rung ladder, `level_cap_reason`/`level_cap_rung` present, `"na"` statuses)
are still rendered as-is by the dashboard/card — their stored level is never recomputed.
Assembly is **best-effort**: a failure to build/write `results.json` is logged but never changes the
run's exit code (cosmetics never block the pipeline, R7).
---
## 3. Summary card + app screenshot (R3/R4)
**App screenshot** (`runner/harness/screenshot.py`). After the app deploys and passes health/readiness
and **before any tier mutates state or teardown runs**, the harness captures a real Playwright
screenshot of the live app and writes `screenshot.png` to the run dir. It is **secret-safe by
default**: it shoots the **landing page** (login/setup forms show input *fields*, not secret values),
viewport-only (`full_page=False`, no scroll into a secrets panel), and the harness never auto-fills an
install wizard. A recipe whose landing page is uninformative may opt into a post-login view via an
optional `SCREENSHOT` hook in `tests/<recipe>/recipe_meta.py` — **that hook owns the no-credential-page
guarantee**. Capture is **best-effort**: any error returns `None`, writes no file, and never blocks the
run (R7); `results.json.screenshot` is set only when a file was actually produced.
**Summary card** (`runner/harness/card.py`). After `results.json` is written, the harness builds an
HTML results card — recipe + version, the level badge, a per-stage/per-test ✔/✘ table with timings,
the embedded app screenshot (base64 data-URI so the PNG is self-contained), and the invariant flags —
and screenshots that HTML to `summary.png` via the harness Playwright browser. The card **reports
`results.json` verbatim — it computes nothing**, so it can never show a run greener than its tests
(cardinal guardrail). Rendering is best-effort (returns `None` on failure → no card, run unaffected).
**Stable URLs.** The dashboard serves the run artifact dir read-only at:
```
https://ci.commoninternet.net/runs/<run_id>/summary.png # the card
https://ci.commoninternet.net/runs/<run_id>/screenshot.png # the app screenshot
https://ci.commoninternet.net/runs/<run_id>/badge.svg # the per-run level badge
https://ci.commoninternet.net/runs/<run_id>/results.json # the raw data
```
`<run_id>` is the Drone build number. The route is whitelist + traversal-guarded (filenames from a
fixed set; `run_id` charset-restricted; realpath must stay inside the runs dir) and read-only.
## 4. PR comment (R2)
On a `!testme` run the comment-bridge (`bridge/bridge.py`) maintains **one comment per PR, updated in
place** (it carries a hidden `<!-- cc-ci:testme -->` marker so re-`!testme` finds and refreshes the
same comment rather than stacking new ones):
1. **On start** — a 🌻 + ⏳ placeholder: `testing <recipe> @ <sha>` + a live-logs link, "level pending".
2. **On completion** — the same comment is edited to the YunoHost-shaped result: 🌻 + a **level badge**
image + the **summary card** image, **both linking to the run**, plus full-logs/dashboard links.
If the rendered card isn't served (render failed, build didn't finish), the comment **falls back to a
compact text verdict** with the run link (the bridge checks artifact availability with a cheap HEAD
request) — R7: a cosmetics failure degrades to text, never a broken image, never affecting the verdict.
## 5. Badges (R6) + how to embed one
Two SVG badge endpoints, both shields-style and coloured by level (`level_color`):
- **Per-recipe latest-level** (for a recipe README): `https://ci.commoninternet.net/badge/<recipe>.svg`
`cc-ci: <recipe> | level N` for that recipe's most recent run (falls back to a status badge if the
recipe has no level yet). Re-rendered live from the latest `results.json`.
- **Per-run** (pinned to one run, e.g. in the PR comment):
`https://ci.commoninternet.net/runs/<run_id>/badge.svg`.
Embed the per-recipe badge in a recipe README (Markdown), linking to the cc-ci dashboard:
```markdown
[![cc-ci level](https://ci.commoninternet.net/badge/<recipe>.svg)](https://ci.commoninternet.net/recipe/<recipe>)
```
The link target `…/recipe/<recipe>` is that recipe's run-history page (level/version/status per run,
with a link to each run's summary card).