chore(3): bootstrap Phase-3 loop state (STATUS/BACKLOG/JOURNAL-3); seed U0-U5 backlog

Phase 3 = beautiful YunoHost-style results UX (level ladder + image-forward PR comment + summary card w/ app screenshot + polished dashboard + badges). Operator kicked off manually. Starting U0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:43:27 +00:00
parent 2022c3a2bb
commit 805fbba2ad
3 changed files with 122 additions and 0 deletions
--- a/machine-docs/BACKLOG-3.md
+++ b/machine-docs/BACKLOG-3.md
@ -0,0 +1,53 @@
+# Phase 3 — Beautiful YunoHost-style results — BACKLOG
+
+Single source of truth: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`.
+Milestones U0–U5 (plan §5); each ends with an Adversary gate. DoD items R1–R8 (plan §2).
+
+## Build backlog
+
+### U0 — Results schema + level  (R1)
+- [ ] U0.1 — Pure `level()` function: map per-tier results (+ deps/SSO/recipe-local signal) → integer
+      level L0–L6 with gap-caps-level semantics (§4.1). Unit-tested (pass-through-L4 and fail-at-L2-capped).
+- [ ] U0.2 — Per-tier pytest emits structured per-test results (JUnit XML per tier → parsed) so
+      results.json carries per-stage AND per-test ✔/✘ breakdown.
+- [ ] U0.3 — `run_recipe_ci.py` writes `results.json` per run (recipe, version, pr, ref, stages[],
+      per-test rows, level, level_cap_reason, invariant flags: clean-teardown, no-secret-leak) to a
+      run-scoped artifact dir. Never blocks/fails the test verdict (R7).
+- [ ] U0.4 — Decide & wire the artifact hosting path (run-scoped dir on host + dashboard serves
+      `/runs/<id>/...`). Record in DECISIONS.
+- GATE U0: level correct for a recipe through L4 and one capped at L2.
+
+### U1 — App screenshot  (R4)
+- [ ] U1.1 — Harness captures a real Playwright screenshot of the deployed app while it is up
+      (post-login where the landing page needs it), secret-safe (never shoot a credentials page).
+- [ ] U1.2 — Screenshot saved to the run artifact dir; degrades gracefully (no screenshot ≠ run fail).
+- GATE U1: screenshot of a sample recipe shows the working UI, no secrets.
+
+### U2 — Summary card + badge  (R3, R6)
+- [ ] U2.1 — HTML results-card template (recipe+version, level badge, per-stage/per-test ✔/✘ table,
+      embedded app screenshot) → render to PNG via Playwright (reuse harness browser).
+- [ ] U2.2 — Per-run + per-recipe SVG level/status badge endpoint.
+- [ ] U2.3 — Card + badge served at stable URLs (`/runs/<id>/summary.png`, `/badge/<recipe>.svg`).
+- GATE U2: card + badge render correctly for a pass run and a fail run.
+
+### U3 — YunoHost-style PR comment  (R2)
+- [ ] U3.1 — Bridge posts a placeholder comment on run start (⏳ + live-logs link).
+- [ ] U3.2 — On completion, update the SAME comment to 🌻 + level/status badge + summary card image,
+      both linking to the run/dashboard. Re-`!testme` refreshes it. Fallback to text on render failure.
+- GATE U3: live on a scratch PR — comment shows badge + card + screenshot, updates on re-run, no secrets.
+
+### U4 — Dashboard polish  (R5)
+- [ ] U4.1 — Overview grid like `ci-apps.yunohost.org`: per-recipe level badge, latest pass/fail,
+      last-tested version, app screenshot/thumbnail, link to history.
+- [ ] U4.2 — Regenerated on build completion; reads results.json artifacts.
+- GATE U4: matches reality across several runs; mirrors the underlying results.json.
+
+### U5 — Badges + docs + hardening  (R6, R7, R8)
+- [ ] U5.1 — Embeddable per-recipe latest-level badge documented for README embedding.
+- [ ] U5.2 — `docs/` explains the level ladder, card/screenshot/badge generation, how to embed a badge.
+- [ ] U5.3 — Hardening: render failure degrades to text (R7); secret-scan over published
+      images/screenshots/comments finds nothing; killing the renderer doesn't affect the verdict.
+- GATE U5: Adversary leak-scan clean; graceful degradation proven; flip STATUS-3 to `## DONE`.
+
+## Adversary findings
+(Adversary owns this section — Builder does not edit.)
--- a/machine-docs/JOURNAL-3.md
+++ b/machine-docs/JOURNAL-3.md
@ -0,0 +1,42 @@
+# Phase 3 — Beautiful YunoHost-style results — JOURNAL (Builder-private reasoning)
+
+SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. WHY lives here; WHAT/HOW/EXPECTED/WHERE → STATUS-3.
+
+## 2026-05-31T05:41Z — Phase-3 bootstrap + orientation
+
+Read plan-phase3-results-ux.md in full (SSOT) + plan.md §6.1/§7/§9. Oriented on the existing
+Phase-1/2 artifacts I'll extend:
+- `runner/run_recipe_ci.py`: orchestrates deploy-once → per-tier (install/upgrade/backup/restore/custom),
+  produces an in-memory `results` dict `{tier: 'pass'|'fail'|'skip'}` printed to Drone logs. **No
+  results.json, no level, no screenshot today.** Also tracks deploy-count (DG4.1), deps/SSO readiness
+  (`sso_dep_unverified` → F2-11), teardown errors.
+- `bridge/bridge.py`: posts a text PR comment with the Drone run URL; `watch_and_reflect` edits it to
+  ✅/❌ on completion. No image/badge/level.
+- `dashboard/dashboard.py`: stdlib HTTP service (swarm OCI image, Nix-built) that polls the **Drone API
+  only** and renders a latest-per-recipe table + a basic per-recipe SVG badge (Drone status, not level).
+  Runs as a container with **no host volume mounts** — relevant for artifact hosting (U0.4).
+
+Key Phase-3 mapping insight: the level ladder (§4.1) maps cleanly onto the existing per-tier results:
+- L1 install-tier pass; L2 upgrade pass; L3 backup AND restore pass; L4 custom (functional) pass;
+  L5 SSO/integration (requires_deps tests actually ran + passed — `deps_ready` and not
+  `sso_dep_unverified`); L6 recipe-local tests pass (D4 — discovered repo-local overlay/custom).
+- Gap-caps-level (YunoHost): level = highest rung L such that every rung ≤ L passed. A rung that is
+  genuinely N/A (e.g. backup not BACKUP_CAPABLE, or no SSO/integration surface) must NOT block the
+  climb but caps with a recorded reason ("L4 — no integration surface" etc.) for fairness (§4.1 L5).
+- Invariants surfaced as flags not levels: clean-teardown ✔ (no dep_teardown_error / DG4.1 ok),
+  no-secret-leak ✔.
+
+Adversary is live (REVIEW-3 @05:42Z), flagged the Phase-2-DONE prerequisite but is not treating it as
+a P3 blocker; operator kicked Phase 3 off manually. Proceeding.
+
+### Plan for U0 (foundation)
+1. Pure `level()` function in a new `runner/harness/level.py` — unit-testable (no I/O), so I can prove
+   "L4-pass" and "L2-cap" semantics cheaply and the Adversary can re-run the unit test cold. This is
+   the load-bearing logic; everything else (card, badge, dashboard) just *renders* what it returns.
+2. Capture per-test detail: run each tier's pytest with `--junitxml` to a run-scoped dir, parse the
+   XML (stdlib `xml.etree`) into per-test rows {name, status, ms}. Aggregate per stage.
+3. `run_recipe_ci.py` assembles `results.json` {recipe, version, pr, ref, run_id, stages[], level,
+   level_cap_reason, flags} and writes it to the artifact dir — wrapped so a failure here NEVER changes
+   the run's exit code (R7: cosmetics never block).
+4. Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve
+   `/runs/<id>/...`. Decide details + record in DECISIONS.
--- a/machine-docs/STATUS-3.md
+++ b/machine-docs/STATUS-3.md
@ -0,0 +1,27 @@
+# Phase 3 — Beautiful YunoHost-style results — STATUS
+
+SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. DoD = R1–R8. Milestones U0–U5.
+State files (this phase): `machine-docs/{STATUS,BACKLOG,REVIEW,JOURNAL}-3.md`. DECISIONS.md shared.
+
+**WHAT + HOW + EXPECTED + WHERE live here; WHY → JOURNAL-3.md.**
+
+## Phase context
+- Phase 2b is `## DONE` (Adversary-verified, no VETO). Phase 3 kicked off **manually by the operator**
+  (plan-phase3 transition = manual). Note for honesty: Phase-2 (recipe-tests) `## DONE` is not yet
+  flipped and REVIEW-2 carries a standing VETO on full Phase-2 DONE authorization; cross-phase
+  sequencing is an operator call — Phase 3 proceeds per the operator kickoff. Adversary concurs this
+  is not a Phase-3 blocker (REVIEW-3 @05:42Z).
+
+## Current state
+- Phase-3 loop live. Bootstrapping state files + settling open decisions, then executing **U0**.
+- No gate claimed yet.
+
+## In flight
+- **U0 — Results schema + level (R1).** Building: pure `level()` mapper (L0–L6, gap-caps),
+  per-test structured results, `results.json` per run, artifact hosting path.
+
+## Gate
+(none claimed)
+
+## Blocked
+(none)