Files
cc-ci-orchestrator/cc-ci-plan/plan-phase3-results-ux.md
autonomic-bot 2d3c17f4bd Add Phase-2b plan: test performance (measure, attribute, improve empirically)
Phase 2b (after Phase 2, before Phase 3): instrument per-phase timings, baseline a
representative recipe set (cold vs warm), attribute where time goes (Pareto), then try
improvements as controlled before/after experiments and keep measured winners — image
pull cache/pre-pull, readiness-wait tuning, dedup deploy cycles, warm/shared infra
(isolation-proven), runner caching, concurrency sizing, vCPU. Speed never weakens tests
or isolation (Adversary re-measures + re-verifies). Phase 3 now follows 2b. Linked in README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 04:26:27 +01:00

178 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci Phase 3 — Beautiful YunoHost-style results (Autonomous Build Plan)
**Status:** QUEUED — starts after Phase 2 (`plan-phase2-recipe-tests.md`) and Phase 2b
(`plan-phase2b-test-performance.md`) reach `## DONE`.
**Transition:** **manual** (operator kicks it off; check in / test between phases).
**Builds on:** Phase 1 (`plan.md` — dashboard `dashboard/`, the `!testme` bridge's PR comment,
the runner, Playwright in the harness) and Phase 2 (the rich per-recipe test taxonomy → meaningful
levels).
**Reference style:** the YunoHost app-CI result comment, e.g.
`https://github.com/YunoHost-Apps/lichenmarkdown_ynh/pull/20#issuecomment-3543928229` — see §3.
**Owner agents:** same Builder + Adversary loops + protocol as Phase 1 (`plan.md` §6/§7).
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`
---
## 0. Relationship to earlier phases
Phase 1 gave us a *functional* results surface (Drone's per-run logs, a basic overview dashboard, and
a PR comment with run URL + pass/fail — D7). Phase 2 gave us a *rich, layered* test taxonomy per
recipe (install / upgrade / backup-data-integrity / recipe-specific / SSO-integration / recipe-local).
Phase 3 makes the results **beautiful and YunoHost-style**: a computed **level** per run, an
**image-forward PR comment** (status badge + a rendered summary card with an app screenshot), and a
**polished overview dashboard** comparable to `ci-apps.yunohost.org`. It is presentation +
level-scoring on top of existing data — it must **not** change what the tests assert.
Do not start until Phase 2 `STATUS.md` shows `## DONE` (Adversary-verified). Same loop protocol.
---
## 1. Mission
Turn cc-ci results into something a maintainer is happy to see on a PR and on a status page:
- a **level** (0N) summarizing how far up the quality ladder the recipe got,
- an **image-forward PR comment** like YunoHost's (🌻 + status badge + summary image that includes a
screenshot of the actually-deployed app), linking to the full run,
- a **dashboard** that looks and feels like the YunoHost app list (per-recipe level badges, latest
status, screenshots, history).
---
## 2. Definition of Done (Phase 3 exit condition)
Terminates only when every item holds **and the Adversary has independently re-verified each within
24h** (logged in `REVIEW.md`):
- [ ] **R1 — Level ladder.** A documented level ladder (§4.1) maps which test sets passed → a single
integer **level**, computed per run. Missing a lower rung caps the level (YunoHost semantics).
- [ ] **R2 — Image-forward PR comment.** On a `!testme` run, the bridge posts/updates a Gitea PR
comment in the YunoHost shape: a marker (🌻), a **status/level badge**, and a **summary image**,
both linking to the full run/dashboard. Re-running updates the same comment.
- [ ] **R3 — Summary card image.** Each run renders a PNG summary card showing: recipe + version,
the **level**, a per-stage/per-test **✔/✘ breakdown**, and an embedded **screenshot of the
deployed app**. Served at a stable URL; embedded in the comment and the dashboard.
- [ ] **R4 — App screenshot.** The runner captures a real screenshot of the deployed app (Playwright,
reusing the Phase-1 harness) — post-login where the landing page requires it — for the card.
- [ ] **R5 — Dashboard polish.** The overview at `ci.commoninternet.net` looks/feels like
`ci-apps.yunohost.org`: a table/grid of recipes with **level badge**, latest pass/fail, last
tested version, app screenshot/thumbnail, and a link to history. Regenerated on completion.
- [ ] **R6 — Badges.** A per-recipe **level/status badge** endpoint (SVG) embeddable in recipe
READMEs and the dashboard.
- [ ] **R7 — Safe & robust.** No secrets in images, comments, badges, or screenshots (reuse Phase-1
§4.4 redaction; the screenshot step must not capture secret values — e.g. don't shoot pages
showing generated admin passwords). Image/screenshot generation **never blocks or fails the
pipeline**: on error it falls back to a text comment + records the failure, and the test verdict
is unaffected.
- [ ] **R8 — Docs.** `docs/` explains the level ladder, how the card/screenshot/badge are generated,
and how to embed a badge.
When R1R8 hold and are Adversary-verified, write `## DONE` to Phase-3 `STATUS.md`.
---
## 3. Reference: the YunoHost comment style
The linked YunoHost CI comment is deliberately **minimal and visual** (verified by fetching it):
- A header marker (🌻).
- A **shield-style test badge** linking to the CI job (`ci-apps.yunohost.org/ci/job/<id>`).
- A **summary image (PNG)** — a rendered card with the result/level — also linking to the job.
- **No verbose inline table**; the per-test breakdown + level live *inside the rendered image* and on
the dashboard. Users click through for full logs.
Mirror this shape for cc-ci (Gitea renders markdown images in comments): marker + badge + summary
PNG, both linking to the cc-ci run/dashboard. YunoHost also shows a **screenshot of the app** — we do
the same in the card.
---
## 4. Design
### 4.1 The level ladder (proposed default — finalize in `DECISIONS.md`)
A single integer; each rung requires all lower rungs (a gap caps the level, like YunoHost):
- **L0** — install failed / app never became healthy.
- **L1 — Installs:** deploys and passes health/readiness.
- **L2 — Upgrades:** previous published version → PR version, stays healthy, data intact.
- **L3 — Backup/restore:** seeded data survives backup → wipe → restore (real data-integrity, P4).
- **L4 — Functional:** the recipe-specific functional tests pass (Phase-2 parity + ≥2 specific).
- **L5 — Integration:** SSO/OIDC and cross-app integration tests pass (for recipes that have them;
recipes with no integration surface cap at L4 by definition — record this so the level is fair).
- **L6 — Recipe-local:** the recipe repo's own `tests/` (D4) pass and are merged.
(Also surface, as badges/flags rather than levels: clean-teardown ✔, no-secret-leak ✔ — these are
gating invariants from Phase 1, shown but not part of the climb.)
### 4.2 Data flow
```
run_recipe_ci.py emits a structured results.json per run
{ recipe, version, pr, stages:[{name,status,tests:[{name,status,ms}]}], level, screenshot.png }
├─► summary-card renderer: HTML template (recipe, level badge, ✔/✘ table, app screenshot)
│ → render to PNG (Playwright screenshot of the HTML, reusing the harness browser)
│ → publish at ci.commoninternet.net/runs/<id>/summary.png (+ badge.svg)
├─► bridge updates the Gitea PR comment: 🌻 + [badge] + [summary.png], linking to the run
└─► dashboard generator: overview grid (per-recipe level badge, screenshot, last status,
version, history) regenerated on build-completion → ci.commoninternet.net
```
- **Summary image:** render an HTML results card → PNG via Playwright (already in the harness — no
new heavy dep). Keep a deterministic template; embed the app screenshot.
- **App screenshot:** Playwright navigates the live `<recipe>-pr<n>-<sha>.ci.commoninternet.net`
(logging in via the test user where needed) and screenshots the main view — captured during the
run while the app is up, before teardown.
- **Badges:** generate SVG (shields-style) per run + a per-recipe latest-level badge endpoint.
- **Hosting:** the dashboard service (Phase-1 `dashboard/`) serves `/runs/<id>/...` and `/badge/...`;
Gitea comments embed them by URL.
### 4.3 PR comment (YunoHost-shaped)
On run start: a placeholder comment ("⏳ testing … level pending", link to live logs). On completion:
update the same comment to 🌻 + level/status **badge** + **summary card image**, linking to the run
and the dashboard. One comment per PR, updated in place; re-`!testme` refreshes it.
---
## 5. Milestones (each ends with an Adversary gate)
- **U0 — Results schema + level.** `run_recipe_ci.py` emits `results.json` (per-stage/per-test) and
computes the level (§4.1). *Accept:* level is correct for a recipe that passes through L4 and one
that fails at L2 (capped).
- **U1 — App screenshot.** Harness captures a real screenshot of the deployed app (post-login where
needed), secret-safe. *Accept:* screenshot of a sample recipe shows the working UI, no secrets.
- **U2 — Summary card + badge.** Render the HTML card → PNG (level, ✔/✘ table, screenshot) + SVG
badge, served at stable URLs. *Accept:* card + badge render correctly for pass and fail runs.
- **U3 — YunoHost-style PR comment.** Bridge posts/updates the image-forward comment (marker + badge
+ card, linked). *Accept:* live on a scratch PR — comment shows badge + card + screenshot, updates
on re-run, contains no secrets.
- **U4 — Dashboard polish.** Overview grid with per-recipe level badges, screenshots, status, version,
history — comparable look/feel to `ci-apps.yunohost.org`. *Accept:* matches reality across several
runs; Adversary confirms it mirrors the underlying results.
- **U5 — Badges + docs + hardening.** Embeddable per-recipe badges; docs for the ladder + embedding;
fallback-to-text on render failure; secret-scan over images/screenshots/comments. *Accept:*
Adversary's leak scan over published images/comments finds nothing; killing the renderer degrades
gracefully to text without affecting the verdict; flip Phase-3 `STATUS.md` to `## DONE`.
---
## 6. Guardrails (inherit Phase 1 §9 + Phase 2 §7.1)
- **Presentation never changes the verdict.** The level and card *report* test outcomes; they must
not let a run look greener than its tests. The Adversary checks the rendered level/card against the
raw `results.json` and the actual test outcomes — a card that overstates the result is a FAIL.
- **No secrets in any artifact** (R7) — comments, badges, summary cards, app screenshots. The
screenshot step must avoid pages that display generated credentials.
- **Never block the pipeline on cosmetics** — image/screenshot/badge generation failures degrade to a
text comment and a recorded warning; they never fail or hang a test run (respect Phase-1 timeouts).
- **Don't weaken tests to raise a level** (carry-over of the cardinal rule) — the Adversary watches
for tests softened or levels mis-mapped to inflate the displayed quality.
---
## 7. Open decisions (log in DECISIONS.md)
- Exact level ladder + how recipes without an integration/SSO surface are scored fairly (cap vs N/A).
- Summary-card rendering: HTML→Playwright-PNG (default, reuses the harness) vs a dedicated image lib.
- Where app screenshots are hosted/retained and for how long (retention/cleanup, like run logs).
- Badge implementation: self-rendered SVG vs a shields.io endpoint pattern.
- Whether to also post a compact markdown fallback table beneath the image for accessibility.