178 lines
10 KiB
Markdown
178 lines
10 KiB
Markdown
# cc-ci Results UX — level ladder, summary card, screenshot & badges (Phase 3, R8)
|
||
|
||
This doc explains how a cc-ci run is presented: the **level** a run earns, the **summary card** +
|
||
**app screenshot** rendered for it, the **PR comment** it posts, and the **badges** you can embed.
|
||
It is the R8 reference for Phase 3 (`plan-phase3-results-ux.md`).
|
||
|
||
> Presentation never changes the verdict. The level and card *report* the test outcomes; they can
|
||
> only ever understate, never overstate, what the tests actually verified (the cardinal guardrail).
|
||
> The authoritative pass/fail is the run's exit status + the per-tier results; the level is a summary.
|
||
|
||
---
|
||
|
||
## 1. The level ladder (phase lvl5 semantics, operator-decided 2026-06-11)
|
||
|
||
Every run earns a single integer **level 0–5** over the FIVE essential rungs:
|
||
|
||
| Level | Rung | Earned when |
|
||
|------:|------|-------------|
|
||
| **L0** | — | install failed / the app never became healthy. |
|
||
| **L1** | install | deploys and passes health/readiness. |
|
||
| **L2** | upgrade | previous published version → PR/latest, stays healthy, data intact. |
|
||
| **L3** | backup/restore | seeded data survives backup → wipe → restore. |
|
||
| **L4** | functional | the recipe-specific functional tests pass. |
|
||
| **L5** | lint | `abra recipe lint` passes against the exact ref under test. |
|
||
|
||
Each rung has one of FOUR statuses, and the level is:
|
||
|
||
level = the highest rung that PASSED, where every rung below it is "pass" or an intentional skip
|
||
|
||
- **pass / fail** — the rung was exercised. A FAIL blocks: no rung above it counts, however green.
|
||
- **skip (intentional)** — the rung *genuinely does not apply*, from a declared or structural fact:
|
||
not backup-capable (declared), only one published version (no upgrade target), or a declared
|
||
`EXPECTED_NA`. Intentional skips are **climbed past** — a stateless recipe with passing
|
||
functional tests and a clean lint reaches **L5**, not the old "capped at 2".
|
||
- **unver (unverified)** — the rung *should* have run but didn't: infra error, missing tool,
|
||
harness exception, prior-stage abort, timeout. **The level cannot rise above an unverified
|
||
rung** — it blocks exactly like a fail (we never claim what we didn't check). Anything
|
||
unclassifiable defaults to unver (conservative).
|
||
|
||
There is **no capping concept** (no `cap_reason`, no `capped`): the per-rung table
|
||
(✔ / ✘ / intentional-skip / unverified) on the card and in `results.json.rungs` is the sole
|
||
carrier of "why isn't this level higher". Worked examples:
|
||
|
||
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → **level 1** (fail blocks).
|
||
- install ✔, upgrade ✔, backup skip (not capable), functional ✔, lint ✔ → **level 5**.
|
||
- install ✔, upgrade ✔, backup unver (harness error), functional ✔, lint ✔ → **level 2**.
|
||
- all four ✔, lint unver (abra missing) → **level 4** (an unverified top rung isn't earned).
|
||
|
||
Integration (SSO/OIDC + cross-app) and recipe-local tests are **optional capabilities**, not
|
||
rungs — they never affect the level (SSO remains enforced for the run VERDICT).
|
||
|
||
### How tiers map to rungs (the translation layer)
|
||
|
||
`run_recipe_ci.py` holds the run's per-tier results (`install/upgrade/backup/restore/custom`) +
|
||
structural signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict
|
||
that `runner/harness/level.py::compute_level` scores. The full intentional-vs-unintentional
|
||
classification table for every N/A source is in `machine-docs/DECISIONS.md` (phase lvl5). Summary:
|
||
|
||
- **install** ← install tier (pass/fail; a non-run is unver — install always applies).
|
||
- **upgrade** ← upgrade tier; tier skipped with no upgrade target (single published version,
|
||
structural) → skip; declared `EXPECTED_NA` → skip; otherwise unver.
|
||
- **backup_restore** ← backup AND restore tiers both pass → pass; either fail → fail; not
|
||
backup-capable (structural/declared) → skip; unverified-while-capable → unver.
|
||
- **functional** ← the custom tier; a custom failure conservatively fails this rung; no custom
|
||
tests is a coverage GAP → unver, unless declared `EXPECTED_NA["functional"]` → skip.
|
||
- **lint** ← the lint executor (`runner/harness/lint.py`): `abra recipe lint` on a pristine
|
||
scratch clone of the run's recipe tree at the exact tested sha, 60s hard budget, full output in
|
||
the run artifact `lint.txt`. pass/fail only — when lint can't run the rung is **unver** (never
|
||
a silent pass, never an intentional skip). Lint never changes the run verdict.
|
||
|
||
### Invariant flags (shown, not climbed)
|
||
|
||
Two Phase-1 gating invariants are surfaced as flags on the card, not as ladder rungs:
|
||
`clean_teardown` (the run left no orphaned app/volume/secret and stayed within the deploy budget) and
|
||
`no_secret_leak` (no known secret value appears in the published artifact — the Adversary's broader
|
||
leak scan is the authority).
|
||
|
||
---
|
||
|
||
## 2. `results.json` (per run)
|
||
|
||
Each run writes `${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/results.json` (`run_id` = the Drone
|
||
build number, or the run's unique app domain for a hand-run). Schema:
|
||
|
||
```json
|
||
{
|
||
"schema": 2, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
|
||
"finished": 0.0,
|
||
"level": 5,
|
||
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"skip","functional":"pass",
|
||
"lint":"pass"},
|
||
"lint": {"status":"pass","detail":"","rules_failed":[]},
|
||
"skips": {"intentional": {"backup_restore": "not backup-capable (no backupbot labels / declared)"},
|
||
"unintentional": []},
|
||
"stages": [{"name":"install","status":"pass",
|
||
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
|
||
"results": {"install":"pass","upgrade":"pass","backup":"skip","restore":"skip","custom":"pass"},
|
||
"flags": {"clean_teardown": true, "no_secret_leak": true},
|
||
"screenshot": "screenshot.png", "summary_card": "summary.png"
|
||
}
|
||
```
|
||
|
||
`rungs` carries the four-status vocabulary above; `skips.intentional` maps each intentionally
|
||
skipped rung to its (declared or structural) reason and `skips.unintentional` lists the
|
||
unverified rungs. `lint` carries the L5 rung outcome + failing rule ids; the full
|
||
`abra recipe lint` output is served at `/runs/<run_id>/lint.txt`. Pre-lvl5 artifacts
|
||
(`"schema": 1`, 4-rung ladder, `level_cap_reason`/`level_cap_rung` present, `"na"` statuses)
|
||
are still rendered as-is by the dashboard/card — their stored level is never recomputed.
|
||
|
||
Assembly is **best-effort**: a failure to build/write `results.json` is logged but never changes the
|
||
run's exit code (cosmetics never block the pipeline, R7).
|
||
|
||
---
|
||
|
||
## 3. Summary card + app screenshot (R3/R4)
|
||
|
||
**App screenshot** (`runner/harness/screenshot.py`). After the app deploys and passes health/readiness
|
||
and **before any tier mutates state or teardown runs**, the harness captures a real Playwright
|
||
screenshot of the live app and writes `screenshot.png` to the run dir. It is **secret-safe by
|
||
default**: it shoots the **landing page** (login/setup forms show input *fields*, not secret values),
|
||
viewport-only (`full_page=False`, no scroll into a secrets panel), and the harness never auto-fills an
|
||
install wizard. A recipe whose landing page is uninformative may opt into a post-login view via an
|
||
optional `SCREENSHOT` hook in `tests/<recipe>/recipe_meta.py` — **that hook owns the no-credential-page
|
||
guarantee**. Capture is **best-effort**: any error returns `None`, writes no file, and never blocks the
|
||
run (R7); `results.json.screenshot` is set only when a file was actually produced.
|
||
|
||
**Summary card** (`runner/harness/card.py`). After `results.json` is written, the harness builds an
|
||
HTML results card — recipe + version, the level badge, a per-stage/per-test ✔/✘ table with timings,
|
||
the embedded app screenshot (base64 data-URI so the PNG is self-contained), and the invariant flags —
|
||
and screenshots that HTML to `summary.png` via the harness Playwright browser. The card **reports
|
||
`results.json` verbatim — it computes nothing**, so it can never show a run greener than its tests
|
||
(cardinal guardrail). Rendering is best-effort (returns `None` on failure → no card, run unaffected).
|
||
|
||
**Stable URLs.** The dashboard serves the run artifact dir read-only at:
|
||
|
||
```
|
||
https://ci.commoninternet.net/runs/<run_id>/summary.png # the card
|
||
https://ci.commoninternet.net/runs/<run_id>/screenshot.png # the app screenshot
|
||
https://ci.commoninternet.net/runs/<run_id>/badge.svg # the per-run level badge
|
||
https://ci.commoninternet.net/runs/<run_id>/results.json # the raw data
|
||
```
|
||
|
||
`<run_id>` is the Drone build number. The route is whitelist + traversal-guarded (filenames from a
|
||
fixed set; `run_id` charset-restricted; realpath must stay inside the runs dir) and read-only.
|
||
|
||
## 4. PR comment (R2)
|
||
|
||
On a `!testme` run the comment-bridge (`bridge/bridge.py`) maintains **one comment per PR, updated in
|
||
place** (it carries a hidden `<!-- cc-ci:testme -->` marker so re-`!testme` finds and refreshes the
|
||
same comment rather than stacking new ones):
|
||
|
||
1. **On start** — a 🌻 + ⏳ placeholder: `testing <recipe> @ <sha>` + a live-logs link, "level pending".
|
||
2. **On completion** — the same comment is edited to the YunoHost-shaped result: 🌻 + a **level badge**
|
||
image + the **summary card** image, **both linking to the run**, plus full-logs/dashboard links.
|
||
|
||
If the rendered card isn't served (render failed, build didn't finish), the comment **falls back to a
|
||
compact text verdict** with the run link (the bridge checks artifact availability with a cheap HEAD
|
||
request) — R7: a cosmetics failure degrades to text, never a broken image, never affecting the verdict.
|
||
|
||
## 5. Badges (R6) + how to embed one
|
||
|
||
Two SVG badge endpoints, both shields-style and coloured by level (`level_color`):
|
||
|
||
- **Per-recipe latest-level** (for a recipe README): `https://ci.commoninternet.net/badge/<recipe>.svg`
|
||
→ `cc-ci: <recipe> | level N` for that recipe's most recent run (falls back to a status badge if the
|
||
recipe has no level yet). Re-rendered live from the latest `results.json`.
|
||
- **Per-run** (pinned to one run, e.g. in the PR comment):
|
||
`https://ci.commoninternet.net/runs/<run_id>/badge.svg`.
|
||
|
||
Embed the per-recipe badge in a recipe README (Markdown), linking to the cc-ci dashboard:
|
||
|
||
```markdown
|
||
[](https://ci.commoninternet.net/recipe/<recipe>)
|
||
```
|
||
|
||
The link target `…/recipe/<recipe>` is that recipe's run-history page (level/version/status per run,
|
||
with a link to each run's summary card).
|