Operator 2026-06-17. /recipe/<name> shows only the latest run for most recipes because history is built from a single page of the latest 100 Drone builds, while 362 runs exist on the host. Source per-recipe history from the local /var/lib/cc-ci-runs artifacts (already bind-mounted read-only) instead — full, durable history. Deploy + verify live on bluesky-pds.
82 lines
5.3 KiB
Markdown
82 lines
5.3 KiB
Markdown
# Phase `dash` — fix incomplete per-recipe run history on the CI dashboard
|
|
|
|
**Mission (operator-specified 2026-06-17):** the dashboard's per-recipe history page
|
|
(`https://ci.commoninternet.net/recipe/<recipe>`, e.g. `/recipe/bluesky-pds`) shows only the latest run
|
|
for most recipes. Make it show the **full run history** per recipe.
|
|
|
|
State files: `STATUS-dash.md`, `BACKLOG-dash.md`, `REVIEW-dash.md`, `JOURNAL-dash.md`. DECISIONS.md shared.
|
|
|
|
## 1. Root cause (verified 2026-06-17)
|
|
|
|
`dashboard/dashboard.py` builds the per-recipe history **solely from the Drone API, capped at a single
|
|
page of the latest 100 builds**:
|
|
```python
|
|
builds = _drone(f"/api/repos/{CI_REPO}/builds?per_page=100") # single page, no pagination
|
|
def history_for(recipe):
|
|
builds = _custom_recipe_builds()
|
|
return [_build_row(b) for b in builds if (b.get("params") or {}).get("RECIPE") == recipe]
|
|
```
|
|
But there are **362 actual runs** on the host (`/var/lib/cc-ci-runs` has 362 run dirs). So ~262 runs are
|
|
older than the 100-build window and never fetched. The recent `regall` sweep `!testme`'d each of the 21
|
|
recipes once, filling the latest-100 window and pushing each recipe's older runs out of view → most
|
|
recipes show exactly one run. (The overview/latest-per-recipe page is unaffected — it only needs the
|
|
recent window.)
|
|
|
|
## 2. Design — source history from the local run artifacts
|
|
|
|
The dashboard already **bind-mounts `/var/lib/cc-ci-runs` read-only** (see `nix/modules/dashboard.nix`)
|
|
and reads each run's `results.json` (`_results_for`). Build the per-recipe history from THAT, not from
|
|
the 100-build Drone slice — it's complete (362 runs), durable (independent of Drone pagination/retention),
|
|
and already available.
|
|
|
|
- **`history_for(recipe)` → enumerate `/var/lib/cc-ci-runs/*/results.json`**, keep those whose recipe
|
|
matches, sort newest-first (by run id / timestamp), and render the existing history table (status,
|
|
level, version, ref, when, link to `/runs/<id>/…`). Apply a sane **display cap** (e.g. the last ~30 per
|
|
recipe) so a long-lived recipe's page stays bounded.
|
|
- First **confirm the `results.json` schema** carries what's needed (recipe, version, level/status, ref,
|
|
timestamp) — adapt if a field is named differently or read the run id from the dir name; skip a run dir
|
|
with no/À malformed `results.json` gracefully (don't 500).
|
|
- **Keep Drone only where it adds value** — e.g. the live "currently running" status for the most recent
|
|
run (a run mid-flight has no final `results.json` yet). The *historical* list comes from local
|
|
artifacts. Keep the overview + `/badge/<recipe>.svg` working exactly as today.
|
|
- **Retention check:** 362 runs implies adequate retention, but confirm nothing (cleanlogs / docker-prune)
|
|
trims `/var/lib/cc-ci-runs` so aggressively that history vanishes; if it does, note it in DECISIONS and
|
|
keep a Drone-pagination fallback. Do not add unbounded growth — if retention needs a cap, record it.
|
|
|
|
## 3. Gates
|
|
|
|
**M1 — fix implemented + locally verified.** `history_for` (and any helper) sources per-recipe history
|
|
from `/var/lib/cc-ci-runs`, newest-first, display-capped; `results.json` schema confirmed; malformed/empty
|
|
run dirs handled without erroring. **Python stdlib only** (the dashboard's standing constraint); the
|
|
existing path-traversal guard + the `/recipe/` name validation preserved. Unit/local render test shows a
|
|
recipe with many runs now lists them all (up to the cap). Adversary cold-verifies: the rendered history
|
|
matches the actual run dirs for that recipe (count + order), no security regression (path traversal, arg
|
|
injection), overview + badge routes unchanged.
|
|
|
|
**M2 — deployed + verified live.** Rebuild/redeploy the dashboard service (the `deploy-dashboard`
|
|
reconcile; the content-hash image tag rolls on `dashboard.py` change). Then confirm on the live site:
|
|
`/recipe/bluesky-pds` and ≥2 other recipes show their **full** run history (multiple runs, matching the
|
|
local run count), with correct status/level/links; the overview and badges still render. Fresh Adversary
|
|
PASS on both milestones → `## DONE`.
|
|
|
|
## 4. Guardrails
|
|
|
|
- **Read-only dashboard** — it never writes run artifacts; the mount stays read-only.
|
|
- **Python stdlib only** (no new deps — it's a stdlib HTTP server packaged into an OCI image).
|
|
- **Preserve security + the other routes:** keep the path-traversal guard and the `/recipe/` validation;
|
|
do not regress the overview, `/badge/<recipe>.svg`, `/runs/<id>/<file>`, or the bridge's `/hook` routing
|
|
(traefik priority).
|
|
- **Host/deploy change** (redeploying the dashboard service via its reconcile / a nixos-rebuild): loops may
|
|
deploy if clean and **verify host health after** (dashboard service N/N, `ci.commoninternet.net` 200);
|
|
else file for the orchestrator. Commit author `autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>`;
|
|
push every commit.
|
|
- Bounded scope — this is a history-page fix, not a dashboard redesign.
|
|
|
|
## 5. Definition of Done
|
|
|
|
The per-recipe history page sources its run list from the local `/var/lib/cc-ci-runs` artifacts and shows
|
|
the **full** (display-capped) history per recipe — deployed and verified live on `bluesky-pds` + ≥2 other
|
|
recipes (multiple runs each, matching the host's run count), with the overview/badges/other routes
|
|
unaffected and the dashboard still stdlib-only + read-only. Retention confirmed adequate (or recorded).
|
|
M1 + M2 fresh Adversary PASSes in REVIEW-dash.md.
|