Add Phase-3 plan: beautiful YunoHost-style results (levels + image comment + dashboard)
Phase 3 (after Phase-2 DONE, manual transition): compute a per-run quality LEVEL, post an image-forward Gitea PR comment in the YunoHost shape (marker + status/level badge + a rendered summary card containing a real app screenshot, linking to the run), and polish the overview dashboard to a ci-apps.yunohost.org look/feel with per-recipe level badges + screenshots. Reuses the Phase-1 dashboard/bridge/Playwright; presentation never changes the verdict; no secrets in any artifact; cosmetics never block the pipeline. Linked from README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -15,7 +15,8 @@ autonomous Claude loops (a Builder and an adversarial Reviewer) running over day
|
|||||||
| File | Purpose |
|
| File | Purpose |
|
||||||
|---|---|
|
|---|---|
|
||||||
| `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. |
|
| `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. |
|
||||||
| `plan-phase2-recipe-tests.md` | **Phase 2** (starts after Phase-1 `## DONE`): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. |
|
| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase-1 `## DONE`): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. |
|
||||||
|
| `plan-phase3-results-ux.md` | **Phase 3** (after Phase-2 `## DONE`): beautiful YunoHost-style results — per-run **level**, image-forward PR comment (badge + summary card + app screenshot), polished dashboard. |
|
||||||
| `IDEAS.md` | Deferred/future ideas, parked out of current scope. |
|
| `IDEAS.md` | Deferred/future ideas, parked out of current scope. |
|
||||||
| `brief.md` | The original one-page brief (context only; `plan.md` supersedes it). |
|
| `brief.md` | The original one-page brief (context only; `plan.md` supersedes it). |
|
||||||
| `kickoff.md` | Launch & supervision guide. |
|
| `kickoff.md` | Launch & supervision guide. |
|
||||||
|
|||||||
176
cc-ci-plan/plan-phase3-results-ux.md
Normal file
176
cc-ci-plan/plan-phase3-results-ux.md
Normal file
@ -0,0 +1,176 @@
|
|||||||
|
# cc-ci Phase 3 — Beautiful YunoHost-style results (Autonomous Build Plan)
|
||||||
|
|
||||||
|
**Status:** QUEUED — starts only after Phase 2 (`plan-phase2-recipe-tests.md`) reaches `## DONE`.
|
||||||
|
**Transition:** **manual** (operator kicks it off; check in / test between phases).
|
||||||
|
**Builds on:** Phase 1 (`plan.md` — dashboard `dashboard/`, the `!testme` bridge's PR comment,
|
||||||
|
the runner, Playwright in the harness) and Phase 2 (the rich per-recipe test taxonomy → meaningful
|
||||||
|
levels).
|
||||||
|
**Reference style:** the YunoHost app-CI result comment, e.g.
|
||||||
|
`https://github.com/YunoHost-Apps/lichenmarkdown_ynh/pull/20#issuecomment-3543928229` — see §3.
|
||||||
|
**Owner agents:** same Builder + Adversary loops + protocol as Phase 1 (`plan.md` §6/§7).
|
||||||
|
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Relationship to earlier phases
|
||||||
|
|
||||||
|
Phase 1 gave us a *functional* results surface (Drone's per-run logs, a basic overview dashboard, and
|
||||||
|
a PR comment with run URL + pass/fail — D7). Phase 2 gave us a *rich, layered* test taxonomy per
|
||||||
|
recipe (install / upgrade / backup-data-integrity / recipe-specific / SSO-integration / recipe-local).
|
||||||
|
|
||||||
|
Phase 3 makes the results **beautiful and YunoHost-style**: a computed **level** per run, an
|
||||||
|
**image-forward PR comment** (status badge + a rendered summary card with an app screenshot), and a
|
||||||
|
**polished overview dashboard** comparable to `ci-apps.yunohost.org`. It is presentation +
|
||||||
|
level-scoring on top of existing data — it must **not** change what the tests assert.
|
||||||
|
|
||||||
|
Do not start until Phase 2 `STATUS.md` shows `## DONE` (Adversary-verified). Same loop protocol.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Mission
|
||||||
|
|
||||||
|
Turn cc-ci results into something a maintainer is happy to see on a PR and on a status page:
|
||||||
|
- a **level** (0–N) summarizing how far up the quality ladder the recipe got,
|
||||||
|
- an **image-forward PR comment** like YunoHost's (🌻 + status badge + summary image that includes a
|
||||||
|
screenshot of the actually-deployed app), linking to the full run,
|
||||||
|
- a **dashboard** that looks and feels like the YunoHost app list (per-recipe level badges, latest
|
||||||
|
status, screenshots, history).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Definition of Done (Phase 3 exit condition)
|
||||||
|
|
||||||
|
Terminates only when every item holds **and the Adversary has independently re-verified each within
|
||||||
|
24h** (logged in `REVIEW.md`):
|
||||||
|
|
||||||
|
- [ ] **R1 — Level ladder.** A documented level ladder (§4.1) maps which test sets passed → a single
|
||||||
|
integer **level**, computed per run. Missing a lower rung caps the level (YunoHost semantics).
|
||||||
|
- [ ] **R2 — Image-forward PR comment.** On a `!testme` run, the bridge posts/updates a Gitea PR
|
||||||
|
comment in the YunoHost shape: a marker (🌻), a **status/level badge**, and a **summary image**,
|
||||||
|
both linking to the full run/dashboard. Re-running updates the same comment.
|
||||||
|
- [ ] **R3 — Summary card image.** Each run renders a PNG summary card showing: recipe + version,
|
||||||
|
the **level**, a per-stage/per-test **✔/✘ breakdown**, and an embedded **screenshot of the
|
||||||
|
deployed app**. Served at a stable URL; embedded in the comment and the dashboard.
|
||||||
|
- [ ] **R4 — App screenshot.** The runner captures a real screenshot of the deployed app (Playwright,
|
||||||
|
reusing the Phase-1 harness) — post-login where the landing page requires it — for the card.
|
||||||
|
- [ ] **R5 — Dashboard polish.** The overview at `ci.commoninternet.net` looks/feels like
|
||||||
|
`ci-apps.yunohost.org`: a table/grid of recipes with **level badge**, latest pass/fail, last
|
||||||
|
tested version, app screenshot/thumbnail, and a link to history. Regenerated on completion.
|
||||||
|
- [ ] **R6 — Badges.** A per-recipe **level/status badge** endpoint (SVG) embeddable in recipe
|
||||||
|
READMEs and the dashboard.
|
||||||
|
- [ ] **R7 — Safe & robust.** No secrets in images, comments, badges, or screenshots (reuse Phase-1
|
||||||
|
§4.4 redaction; the screenshot step must not capture secret values — e.g. don't shoot pages
|
||||||
|
showing generated admin passwords). Image/screenshot generation **never blocks or fails the
|
||||||
|
pipeline**: on error it falls back to a text comment + records the failure, and the test verdict
|
||||||
|
is unaffected.
|
||||||
|
- [ ] **R8 — Docs.** `docs/` explains the level ladder, how the card/screenshot/badge are generated,
|
||||||
|
and how to embed a badge.
|
||||||
|
|
||||||
|
When R1–R8 hold and are Adversary-verified, write `## DONE` to Phase-3 `STATUS.md`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Reference: the YunoHost comment style
|
||||||
|
|
||||||
|
The linked YunoHost CI comment is deliberately **minimal and visual** (verified by fetching it):
|
||||||
|
- A header marker (🌻).
|
||||||
|
- A **shield-style test badge** linking to the CI job (`ci-apps.yunohost.org/ci/job/<id>`).
|
||||||
|
- A **summary image (PNG)** — a rendered card with the result/level — also linking to the job.
|
||||||
|
- **No verbose inline table**; the per-test breakdown + level live *inside the rendered image* and on
|
||||||
|
the dashboard. Users click through for full logs.
|
||||||
|
|
||||||
|
Mirror this shape for cc-ci (Gitea renders markdown images in comments): marker + badge + summary
|
||||||
|
PNG, both linking to the cc-ci run/dashboard. YunoHost also shows a **screenshot of the app** — we do
|
||||||
|
the same in the card.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Design
|
||||||
|
|
||||||
|
### 4.1 The level ladder (proposed default — finalize in `DECISIONS.md`)
|
||||||
|
A single integer; each rung requires all lower rungs (a gap caps the level, like YunoHost):
|
||||||
|
|
||||||
|
- **L0** — install failed / app never became healthy.
|
||||||
|
- **L1 — Installs:** deploys and passes health/readiness.
|
||||||
|
- **L2 — Upgrades:** previous published version → PR version, stays healthy, data intact.
|
||||||
|
- **L3 — Backup/restore:** seeded data survives backup → wipe → restore (real data-integrity, P4).
|
||||||
|
- **L4 — Functional:** the recipe-specific functional tests pass (Phase-2 parity + ≥2 specific).
|
||||||
|
- **L5 — Integration:** SSO/OIDC and cross-app integration tests pass (for recipes that have them;
|
||||||
|
recipes with no integration surface cap at L4 by definition — record this so the level is fair).
|
||||||
|
- **L6 — Recipe-local:** the recipe repo's own `tests/` (D4) pass and are merged.
|
||||||
|
|
||||||
|
(Also surface, as badges/flags rather than levels: clean-teardown ✔, no-secret-leak ✔ — these are
|
||||||
|
gating invariants from Phase 1, shown but not part of the climb.)
|
||||||
|
|
||||||
|
### 4.2 Data flow
|
||||||
|
```
|
||||||
|
run_recipe_ci.py emits a structured results.json per run
|
||||||
|
{ recipe, version, pr, stages:[{name,status,tests:[{name,status,ms}]}], level, screenshot.png }
|
||||||
|
│
|
||||||
|
├─► summary-card renderer: HTML template (recipe, level badge, ✔/✘ table, app screenshot)
|
||||||
|
│ → render to PNG (Playwright screenshot of the HTML, reusing the harness browser)
|
||||||
|
│ → publish at ci.commoninternet.net/runs/<id>/summary.png (+ badge.svg)
|
||||||
|
│
|
||||||
|
├─► bridge updates the Gitea PR comment: 🌻 + [badge] + [summary.png], linking to the run
|
||||||
|
│
|
||||||
|
└─► dashboard generator: overview grid (per-recipe level badge, screenshot, last status,
|
||||||
|
version, history) regenerated on build-completion → ci.commoninternet.net
|
||||||
|
```
|
||||||
|
- **Summary image:** render an HTML results card → PNG via Playwright (already in the harness — no
|
||||||
|
new heavy dep). Keep a deterministic template; embed the app screenshot.
|
||||||
|
- **App screenshot:** Playwright navigates the live `<recipe>-pr<n>-<sha>.ci.commoninternet.net`
|
||||||
|
(logging in via the test user where needed) and screenshots the main view — captured during the
|
||||||
|
run while the app is up, before teardown.
|
||||||
|
- **Badges:** generate SVG (shields-style) per run + a per-recipe latest-level badge endpoint.
|
||||||
|
- **Hosting:** the dashboard service (Phase-1 `dashboard/`) serves `/runs/<id>/...` and `/badge/...`;
|
||||||
|
Gitea comments embed them by URL.
|
||||||
|
|
||||||
|
### 4.3 PR comment (YunoHost-shaped)
|
||||||
|
On run start: a placeholder comment ("⏳ testing … level pending", link to live logs). On completion:
|
||||||
|
update the same comment to 🌻 + level/status **badge** + **summary card image**, linking to the run
|
||||||
|
and the dashboard. One comment per PR, updated in place; re-`!testme` refreshes it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Milestones (each ends with an Adversary gate)
|
||||||
|
|
||||||
|
- **U0 — Results schema + level.** `run_recipe_ci.py` emits `results.json` (per-stage/per-test) and
|
||||||
|
computes the level (§4.1). *Accept:* level is correct for a recipe that passes through L4 and one
|
||||||
|
that fails at L2 (capped).
|
||||||
|
- **U1 — App screenshot.** Harness captures a real screenshot of the deployed app (post-login where
|
||||||
|
needed), secret-safe. *Accept:* screenshot of a sample recipe shows the working UI, no secrets.
|
||||||
|
- **U2 — Summary card + badge.** Render the HTML card → PNG (level, ✔/✘ table, screenshot) + SVG
|
||||||
|
badge, served at stable URLs. *Accept:* card + badge render correctly for pass and fail runs.
|
||||||
|
- **U3 — YunoHost-style PR comment.** Bridge posts/updates the image-forward comment (marker + badge
|
||||||
|
+ card, linked). *Accept:* live on a scratch PR — comment shows badge + card + screenshot, updates
|
||||||
|
on re-run, contains no secrets.
|
||||||
|
- **U4 — Dashboard polish.** Overview grid with per-recipe level badges, screenshots, status, version,
|
||||||
|
history — comparable look/feel to `ci-apps.yunohost.org`. *Accept:* matches reality across several
|
||||||
|
runs; Adversary confirms it mirrors the underlying results.
|
||||||
|
- **U5 — Badges + docs + hardening.** Embeddable per-recipe badges; docs for the ladder + embedding;
|
||||||
|
fallback-to-text on render failure; secret-scan over images/screenshots/comments. *Accept:*
|
||||||
|
Adversary's leak scan over published images/comments finds nothing; killing the renderer degrades
|
||||||
|
gracefully to text without affecting the verdict; flip Phase-3 `STATUS.md` to `## DONE`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Guardrails (inherit Phase 1 §9 + Phase 2 §7.1)
|
||||||
|
|
||||||
|
- **Presentation never changes the verdict.** The level and card *report* test outcomes; they must
|
||||||
|
not let a run look greener than its tests. The Adversary checks the rendered level/card against the
|
||||||
|
raw `results.json` and the actual test outcomes — a card that overstates the result is a FAIL.
|
||||||
|
- **No secrets in any artifact** (R7) — comments, badges, summary cards, app screenshots. The
|
||||||
|
screenshot step must avoid pages that display generated credentials.
|
||||||
|
- **Never block the pipeline on cosmetics** — image/screenshot/badge generation failures degrade to a
|
||||||
|
text comment and a recorded warning; they never fail or hang a test run (respect Phase-1 timeouts).
|
||||||
|
- **Don't weaken tests to raise a level** (carry-over of the cardinal rule) — the Adversary watches
|
||||||
|
for tests softened or levels mis-mapped to inflate the displayed quality.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Open decisions (log in DECISIONS.md)
|
||||||
|
- Exact level ladder + how recipes without an integration/SSO surface are scored fairly (cap vs N/A).
|
||||||
|
- Summary-card rendering: HTML→Playwright-PNG (default, reuses the harness) vs a dedicated image lib.
|
||||||
|
- Where app screenshots are hosted/retained and for how long (retention/cleanup, like run logs).
|
||||||
|
- Badge implementation: self-rendered SVG vs a shields.io endpoint pattern.
|
||||||
|
- Whether to also post a compact markdown fallback table beneath the image for accessibility.
|
||||||
Reference in New Issue
Block a user