R8 doc seeded with the SETTLED, Adversary-fuzzed level ladder + tier->rung translation + results.json schema + invariant flags. Card/screenshot/PR-comment/badge sections stubbed (filled as U1-U5 wire + serve their artifacts). Does not advance past the U0 gate; pure documentation of settled design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.3 KiB
cc-ci Results UX — level ladder, summary card, screenshot & badges (Phase 3, R8)
This doc explains how a cc-ci run is presented: the level a run earns, the summary card +
app screenshot rendered for it, the PR comment it posts, and the badges you can embed.
It is the R8 reference for Phase 3 (plan-phase3-results-ux.md).
Presentation never changes the verdict. The level and card report the test outcomes; they can only ever understate, never overstate, what the tests actually verified (the cardinal guardrail). The authoritative pass/fail is the run's exit status + the per-tier results; the level is a summary.
1. The level ladder (R1)
Every run earns a single integer level 0–6. The ladder is cumulative with YunoHost
gap-caps-the-level semantics: you earn level L only if every rung 1..L was a clean PASS. The
first rung that is not a clean PASS — a real FAIL or genuinely N/A for this recipe — stops
the climb, and level_cap_reason records which rung and why.
| Level | Rung | Earned when |
|---|---|---|
| L0 | — | install failed / the app never became healthy. |
| L1 | install | deploys and passes health/readiness. |
| L2 | upgrade | previous published version → PR/latest, stays healthy, data intact. |
| L3 | backup/restore | seeded data survives backup → wipe → restore. |
| L4 | functional | the recipe-specific functional tests pass. |
| L5 | integration | SSO/OIDC + cross-app integration tests pass. |
| L6 | recipe-local | the recipe repo's own tests/ (D4) pass and are merged. |
N/A caps, fairly. A rung that does not apply to a recipe (only one published version → no
upgrade; not backup-capable; no SSO/integration surface; no recipe-local tests) is N/A, which
caps the climb at the rung below it with a recorded reason — it is not counted as a failure. This is
the only fair reading of "a missing lower rung caps the level": e.g. a recipe with no integration
surface caps at L4 by definition, shown as level_cap_reason = "L5 integration … N/A". A stateless
app whose functional tests pass but which cannot be backed up is honestly capped at L2 ("L3 backup/restore … N/A") rather than shown as L4 — understating is safe; overstating is forbidden.
Worked examples (real runs):
uptime-kuma— install+upgrade+backup+restore+functional all pass, no SSO surface → L4 (cap = "L5 integration (SSO/OIDC + cross-app) N/A").custom-html-tiny— stateless, not backup-capable: install+upgrade pass, backup/restore N/A → L2 (cap = "L3 backup/restore (data integrity) N/A").
How tiers map to rungs (the translation layer)
run_recipe_ci.py holds the run's per-tier results (install/upgrade/backup/restore/custom) +
deps/SSO signals; runner/harness/results.py::derive_rungs maps them to the rung-status dict that
runner/harness/level.py::compute_level scores. The mapping (also in DECISIONS.md, Phase 3):
- install ← install tier (pass/fail).
- upgrade ← upgrade tier;
skip→ na (only one published version). - backup_restore ← backup AND restore tiers both pass → pass; either fail → fail; not backup-capable → na.
- functional ← the custom tier minus its SSO tests; a custom failure conservatively fails this rung (we don't split functional-vs-SSO failure → never inflate); no custom tests → na.
- integration ← applies only if the recipe declares deps; pass iff deps wired and SSO verified and custom didn't fail; recipes with no declared deps → na (the "caps at L4" rule).
- recipe_local ← the recipe repo's own
tests/(discovery sourcerepo-local) ran and passed; none present → na.
The pure scorer is exhaustively unit-tested + fuzz-verified (all 729 rung combinations: level == count of leading consecutive passes, zero inflation).
Invariant flags (shown, not climbed)
Two Phase-1 gating invariants are surfaced as flags on the card, not as ladder rungs:
clean_teardown (the run left no orphaned app/volume/secret and stayed within the deploy budget) and
no_secret_leak (no known secret value appears in the published artifact — the Adversary's broader
leak scan is the authority).
2. results.json (per run)
Each run writes ${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/results.json (run_id = the Drone
build number, or the run's unique app domain for a hand-run). Schema:
{
"schema": 1, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
"finished": 0.0,
"level": 4, "level_cap_reason": "L5 integration (SSO/OIDC + cross-app) N/A",
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"pass","functional":"pass",
"integration":"na","recipe_local":"na"},
"stages": [{"name":"install","status":"pass",
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
"results": {"install":"pass","upgrade":"pass","backup":"pass","restore":"pass","custom":"pass"},
"flags": {"clean_teardown": true, "no_secret_leak": true},
"screenshot": "screenshot.png", "summary_card": "summary.png"
}
Assembly is best-effort: a failure to build/write results.json is logged but never changes the
run's exit code (cosmetics never block the pipeline, R7).