10 KiB
cc-ci Results UX — level ladder, summary card, screenshot & badges (Phase 3, R8)
This doc explains how a cc-ci run is presented: the level a run earns, the summary card +
app screenshot rendered for it, the PR comment it posts, and the badges you can embed.
It is the R8 reference for Phase 3 (plan-phase3-results-ux.md).
Presentation never changes the verdict. The level and card report the test outcomes; they can only ever understate, never overstate, what the tests actually verified (the cardinal guardrail). The authoritative pass/fail is the run's exit status + the per-tier results; the level is a summary.
1. The level ladder (phase lvl5 semantics, operator-decided 2026-06-11)
Every run earns a single integer level 0–5 over the FIVE essential rungs:
| Level | Rung | Earned when |
|---|---|---|
| L0 | — | install failed / the app never became healthy. |
| L1 | install | deploys and passes health/readiness. |
| L2 | upgrade | previous published version → PR/latest, stays healthy, data intact. |
| L3 | backup/restore | seeded data survives backup → wipe → restore. |
| L4 | functional | the recipe-specific functional tests pass. |
| L5 | lint | abra recipe lint passes against the exact ref under test. |
Each rung has one of FOUR statuses, and the level is:
level = the highest rung that PASSED, where every rung below it is "pass" or an intentional skip
- pass / fail — the rung was exercised. A FAIL blocks: no rung above it counts, however green.
- skip (intentional) — the rung genuinely does not apply, from a declared or structural fact:
not backup-capable (declared), only one published version (no upgrade target), or a declared
EXPECTED_NA. Intentional skips are climbed past — a stateless recipe with passing functional tests and a clean lint reaches L5, not the old "capped at 2". - unver (unverified) — the rung should have run but didn't: infra error, missing tool, harness exception, prior-stage abort, timeout. The level cannot rise above an unverified rung — it blocks exactly like a fail (we never claim what we didn't check). Anything unclassifiable defaults to unver (conservative).
There is no capping concept (no cap_reason, no capped): the per-rung table
(✔ / ✘ / intentional-skip / unverified) on the card and in results.json.rungs is the sole
carrier of "why isn't this level higher". Worked examples:
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → level 1 (fail blocks).
- install ✔, upgrade ✔, backup skip (not capable), functional ✔, lint ✔ → level 5.
- install ✔, upgrade ✔, backup unver (harness error), functional ✔, lint ✔ → level 2.
- all four ✔, lint unver (abra missing) → level 4 (an unverified top rung isn't earned).
Integration (SSO/OIDC + cross-app) and recipe-local tests are optional capabilities, not rungs — they never affect the level (SSO remains enforced for the run VERDICT).
How tiers map to rungs (the translation layer)
run_recipe_ci.py holds the run's per-tier results (install/upgrade/backup/restore/custom) +
structural signals; runner/harness/results.py::derive_rungs maps them to the rung-status dict
that runner/harness/level.py::compute_level scores. The full intentional-vs-unintentional
classification table for every N/A source is in machine-docs/DECISIONS.md (phase lvl5). Summary:
- install ← install tier (pass/fail; a non-run is unver — install always applies).
- upgrade ← upgrade tier; tier skipped with no upgrade target (single published version,
structural) → skip; declared
EXPECTED_NA→ skip; otherwise unver. - backup_restore ← backup AND restore tiers both pass → pass; either fail → fail; not backup-capable (structural/declared) → skip; unverified-while-capable → unver.
- functional ← the custom tier; a custom failure conservatively fails this rung; no custom
tests is a coverage GAP → unver, unless declared
EXPECTED_NA["functional"]→ skip. - lint ← the lint executor (
runner/harness/lint.py):abra recipe linton a pristine scratch clone of the run's recipe tree at the exact tested sha, 60s hard budget, full output in the run artifactlint.txt. pass/fail only — when lint can't run the rung is unver (never a silent pass, never an intentional skip). Lint never changes the run verdict.
Invariant flags (shown, not climbed)
Two Phase-1 gating invariants are surfaced as flags on the card, not as ladder rungs:
clean_teardown (the run left no orphaned app/volume/secret and stayed within the deploy budget) and
no_secret_leak (no known secret value appears in the published artifact — the Adversary's broader
leak scan is the authority).
2. results.json (per run)
Each run writes ${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/results.json (run_id = the Drone
build number, or the run's unique app domain for a hand-run). Schema:
{
"schema": 2, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
"finished": 0.0,
"level": 5,
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"skip","functional":"pass",
"lint":"pass"},
"lint": {"status":"pass","detail":"","rules_failed":[]},
"skips": {"intentional": {"backup_restore": "not backup-capable (no backupbot labels / declared)"},
"unintentional": []},
"stages": [{"name":"install","status":"pass",
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
"results": {"install":"pass","upgrade":"pass","backup":"skip","restore":"skip","custom":"pass"},
"flags": {"clean_teardown": true, "no_secret_leak": true},
"screenshot": "screenshot.png", "summary_card": "summary.png"
}
rungs carries the four-status vocabulary above; skips.intentional maps each intentionally
skipped rung to its (declared or structural) reason and skips.unintentional lists the
unverified rungs. lint carries the L5 rung outcome + failing rule ids; the full
abra recipe lint output is served at /runs/<run_id>/lint.txt. Pre-lvl5 artifacts
("schema": 1, 4-rung ladder, level_cap_reason/level_cap_rung present, "na" statuses)
are still rendered as-is by the dashboard/card — their stored level is never recomputed.
Assembly is best-effort: a failure to build/write results.json is logged but never changes the
run's exit code (cosmetics never block the pipeline, R7).
3. Summary card + app screenshot (R3/R4)
App screenshot (runner/harness/screenshot.py). After the app deploys and passes health/readiness
and before any tier mutates state or teardown runs, the harness captures a real Playwright
screenshot of the live app and writes screenshot.png to the run dir. It is secret-safe by
default: it shoots the landing page (login/setup forms show input fields, not secret values),
viewport-only (full_page=False, no scroll into a secrets panel), and the harness never auto-fills an
install wizard. A recipe whose landing page is uninformative may opt into a post-login view via an
optional SCREENSHOT hook in tests/<recipe>/recipe_meta.py — that hook owns the no-credential-page
guarantee. Capture is best-effort: any error returns None, writes no file, and never blocks the
run (R7); results.json.screenshot is set only when a file was actually produced.
Summary card (runner/harness/card.py). After results.json is written, the harness builds an
HTML results card — recipe + version, the level badge, a per-stage/per-test ✔/✘ table with timings,
the embedded app screenshot (base64 data-URI so the PNG is self-contained), and the invariant flags —
and screenshots that HTML to summary.png via the harness Playwright browser. The card reports
results.json verbatim — it computes nothing, so it can never show a run greener than its tests
(cardinal guardrail). Rendering is best-effort (returns None on failure → no card, run unaffected).
Stable URLs. The dashboard serves the run artifact dir read-only at:
https://ci.commoninternet.net/runs/<run_id>/summary.png # the card
https://ci.commoninternet.net/runs/<run_id>/screenshot.png # the app screenshot
https://ci.commoninternet.net/runs/<run_id>/badge.svg # the per-run level badge
https://ci.commoninternet.net/runs/<run_id>/results.json # the raw data
<run_id> is the Drone build number. The route is whitelist + traversal-guarded (filenames from a
fixed set; run_id charset-restricted; realpath must stay inside the runs dir) and read-only.
4. PR comment (R2)
On a !testme run the comment-bridge (bridge/bridge.py) maintains one comment per PR, updated in
place (it carries a hidden <!-- cc-ci:testme --> marker so re-!testme finds and refreshes the
same comment rather than stacking new ones):
- On start — a 🌻 + ⏳ placeholder:
testing <recipe> @ <sha>+ a live-logs link, "level pending". - On completion — the same comment is edited to the YunoHost-shaped result: 🌻 + a level badge image + the summary card image, both linking to the run, plus full-logs/dashboard links.
If the rendered card isn't served (render failed, build didn't finish), the comment falls back to a compact text verdict with the run link (the bridge checks artifact availability with a cheap HEAD request) — R7: a cosmetics failure degrades to text, never a broken image, never affecting the verdict.
5. Badges (R6) + how to embed one
Two SVG badge endpoints, both shields-style and coloured by level (level_color):
- Per-recipe latest-level (for a recipe README):
https://ci.commoninternet.net/badge/<recipe>.svg→cc-ci: <recipe> | level Nfor that recipe's most recent run (falls back to a status badge if the recipe has no level yet). Re-rendered live from the latestresults.json. - Per-run (pinned to one run, e.g. in the PR comment):
https://ci.commoninternet.net/runs/<run_id>/badge.svg.
Embed the per-recipe badge in a recipe README (Markdown), linking to the cc-ci dashboard:
[](https://ci.commoninternet.net/recipe/<recipe>)
The link target …/recipe/<recipe> is that recipe's run-history page (level/version/status per run,
with a link to each run's summary card).