docs(3 R8): results-ux.md — level ladder + rung-mapping reference (stable section)
R8 doc seeded with the SETTLED, Adversary-fuzzed level ladder + tier->rung translation + results.json schema + invariant flags. Card/screenshot/PR-comment/badge sections stubbed (filled as U1-U5 wire + serve their artifacts). Does not advance past the U0 gate; pure documentation of settled design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
116
docs/results-ux.md
Normal file
116
docs/results-ux.md
Normal file
@ -0,0 +1,116 @@
|
|||||||
|
# cc-ci Results UX — level ladder, summary card, screenshot & badges (Phase 3, R8)
|
||||||
|
|
||||||
|
This doc explains how a cc-ci run is presented: the **level** a run earns, the **summary card** +
|
||||||
|
**app screenshot** rendered for it, the **PR comment** it posts, and the **badges** you can embed.
|
||||||
|
It is the R8 reference for Phase 3 (`plan-phase3-results-ux.md`).
|
||||||
|
|
||||||
|
> Presentation never changes the verdict. The level and card *report* the test outcomes; they can
|
||||||
|
> only ever understate, never overstate, what the tests actually verified (the cardinal guardrail).
|
||||||
|
> The authoritative pass/fail is the run's exit status + the per-tier results; the level is a summary.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. The level ladder (R1)
|
||||||
|
|
||||||
|
Every run earns a single integer **level 0–6**. The ladder is cumulative with **YunoHost
|
||||||
|
gap-caps-the-level** semantics: you earn level `L` only if **every rung 1..L was a clean PASS**. The
|
||||||
|
first rung that is not a clean PASS — a real **FAIL** *or* genuinely **N/A** for this recipe — stops
|
||||||
|
the climb, and `level_cap_reason` records which rung and why.
|
||||||
|
|
||||||
|
| Level | Rung | Earned when |
|
||||||
|
|------:|------|-------------|
|
||||||
|
| **L0** | — | install failed / the app never became healthy. |
|
||||||
|
| **L1** | install | deploys and passes health/readiness. |
|
||||||
|
| **L2** | upgrade | previous published version → PR/latest, stays healthy, data intact. |
|
||||||
|
| **L3** | backup/restore | seeded data survives backup → wipe → restore. |
|
||||||
|
| **L4** | functional | the recipe-specific functional tests pass. |
|
||||||
|
| **L5** | integration | SSO/OIDC + cross-app integration tests pass. |
|
||||||
|
| **L6** | recipe-local | the recipe repo's own `tests/` (D4) pass and are merged. |
|
||||||
|
|
||||||
|
**N/A caps, fairly.** A rung that does not apply to a recipe (only one published version → no
|
||||||
|
upgrade; not backup-capable; no SSO/integration surface; no recipe-local tests) is **N/A**, which
|
||||||
|
caps the climb at the rung below it with a recorded reason — it is *not* counted as a failure. This is
|
||||||
|
the only fair reading of "a missing lower rung caps the level": e.g. a recipe with **no integration
|
||||||
|
surface caps at L4 by definition**, shown as `level_cap_reason = "L5 integration … N/A"`. A stateless
|
||||||
|
app whose functional tests pass but which cannot be backed up is honestly capped at **L2** (`"L3
|
||||||
|
backup/restore … N/A"`) rather than shown as L4 — understating is safe; overstating is forbidden.
|
||||||
|
|
||||||
|
Worked examples (real runs):
|
||||||
|
- `uptime-kuma` — install+upgrade+backup+restore+functional all pass, no SSO surface → **L4**
|
||||||
|
(`cap = "L5 integration (SSO/OIDC + cross-app) N/A"`).
|
||||||
|
- `custom-html-tiny` — stateless, not backup-capable: install+upgrade pass, backup/restore N/A →
|
||||||
|
**L2** (`cap = "L3 backup/restore (data integrity) N/A"`).
|
||||||
|
|
||||||
|
### How tiers map to rungs (the translation layer)
|
||||||
|
|
||||||
|
`run_recipe_ci.py` holds the run's per-tier results (`install/upgrade/backup/restore/custom`) +
|
||||||
|
deps/SSO signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict that
|
||||||
|
`runner/harness/level.py::compute_level` scores. The mapping (also in `DECISIONS.md`, Phase 3):
|
||||||
|
|
||||||
|
- **install** ← install tier (pass/fail).
|
||||||
|
- **upgrade** ← upgrade tier; `skip` → **na** (only one published version).
|
||||||
|
- **backup_restore** ← backup AND restore tiers both pass → pass; either fail → fail; not
|
||||||
|
backup-capable → **na**.
|
||||||
|
- **functional** ← the custom tier minus its SSO tests; a custom failure conservatively fails this
|
||||||
|
rung (we don't split functional-vs-SSO failure → never inflate); no custom tests → **na**.
|
||||||
|
- **integration** ← applies only if the recipe declares deps; pass iff deps wired and SSO verified and
|
||||||
|
custom didn't fail; recipes with no declared deps → **na** (the "caps at L4" rule).
|
||||||
|
- **recipe_local** ← the recipe repo's own `tests/` (discovery source `repo-local`) ran and passed;
|
||||||
|
none present → **na**.
|
||||||
|
|
||||||
|
The pure scorer is exhaustively unit-tested + fuzz-verified (all 729 rung combinations: level ==
|
||||||
|
count of leading consecutive passes, zero inflation).
|
||||||
|
|
||||||
|
### Invariant flags (shown, not climbed)
|
||||||
|
|
||||||
|
Two Phase-1 gating invariants are surfaced as flags on the card, not as ladder rungs:
|
||||||
|
`clean_teardown` (the run left no orphaned app/volume/secret and stayed within the deploy budget) and
|
||||||
|
`no_secret_leak` (no known secret value appears in the published artifact — the Adversary's broader
|
||||||
|
leak scan is the authority).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. `results.json` (per run)
|
||||||
|
|
||||||
|
Each run writes `${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/results.json` (`run_id` = the Drone
|
||||||
|
build number, or the run's unique app domain for a hand-run). Schema:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"schema": 1, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
|
||||||
|
"finished": 0.0,
|
||||||
|
"level": 4, "level_cap_reason": "L5 integration (SSO/OIDC + cross-app) N/A",
|
||||||
|
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"pass","functional":"pass",
|
||||||
|
"integration":"na","recipe_local":"na"},
|
||||||
|
"stages": [{"name":"install","status":"pass",
|
||||||
|
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
|
||||||
|
"results": {"install":"pass","upgrade":"pass","backup":"pass","restore":"pass","custom":"pass"},
|
||||||
|
"flags": {"clean_teardown": true, "no_secret_leak": true},
|
||||||
|
"screenshot": "screenshot.png", "summary_card": "summary.png"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Assembly is **best-effort**: a failure to build/write `results.json` is logged but never changes the
|
||||||
|
run's exit code (cosmetics never block the pipeline, R7).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Summary card + app screenshot (R3/R4)
|
||||||
|
|
||||||
|
<!-- TODO(U2/U1): finalize once wired — the card renderer (harness/card.py) builds an HTML results
|
||||||
|
card (recipe+version, level badge, per-stage/per-test ✔/✘ table, embedded app screenshot) and renders
|
||||||
|
it to PNG via the harness Playwright browser; the screenshot (harness/screenshot.py) is captured from
|
||||||
|
the live app before teardown, secret-safe (landing page by default; recipes needing a post-login view
|
||||||
|
opt into a SCREENSHOT hook that avoids credential pages). Document the stable serving URL
|
||||||
|
(/runs/<run_id>/summary.png) once the dashboard serves the artifact dir. -->
|
||||||
|
|
||||||
|
## 4. PR comment (R2)
|
||||||
|
|
||||||
|
<!-- TODO(U3): document the YunoHost-shaped comment — 🌻 marker + level/status badge + summary card
|
||||||
|
image, both linking to the run/dashboard; one comment per PR, updated in place; re-`!testme` refreshes
|
||||||
|
it; falls back to a text comment if image rendering fails. -->
|
||||||
|
|
||||||
|
## 5. Badges (R6) + how to embed one
|
||||||
|
|
||||||
|
<!-- TODO(U2/U5): document the per-recipe level/status SVG badge endpoint and the markdown snippet to
|
||||||
|
embed it in a recipe README. -->
|
||||||
Reference in New Issue
Block a user