Operator 2026-06-17. /recipe/<name> shows only the latest run for most recipes because history is built from a single page of the latest 100 Drone builds, while 362 runs exist on the host. Source per-recipe history from the local /var/lib/cc-ci-runs artifacts (already bind-mounted read-only) instead — full, durable history. Deploy + verify live on bluesky-pds.
5.3 KiB
Phase dash — fix incomplete per-recipe run history on the CI dashboard
Mission (operator-specified 2026-06-17): the dashboard's per-recipe history page
(https://ci.commoninternet.net/recipe/<recipe>, e.g. /recipe/bluesky-pds) shows only the latest run
for most recipes. Make it show the full run history per recipe.
State files: STATUS-dash.md, BACKLOG-dash.md, REVIEW-dash.md, JOURNAL-dash.md. DECISIONS.md shared.
1. Root cause (verified 2026-06-17)
dashboard/dashboard.py builds the per-recipe history solely from the Drone API, capped at a single
page of the latest 100 builds:
builds = _drone(f"/api/repos/{CI_REPO}/builds?per_page=100") # single page, no pagination
def history_for(recipe):
builds = _custom_recipe_builds()
return [_build_row(b) for b in builds if (b.get("params") or {}).get("RECIPE") == recipe]
But there are 362 actual runs on the host (/var/lib/cc-ci-runs has 362 run dirs). So ~262 runs are
older than the 100-build window and never fetched. The recent regall sweep !testme'd each of the 21
recipes once, filling the latest-100 window and pushing each recipe's older runs out of view → most
recipes show exactly one run. (The overview/latest-per-recipe page is unaffected — it only needs the
recent window.)
2. Design — source history from the local run artifacts
The dashboard already bind-mounts /var/lib/cc-ci-runs read-only (see nix/modules/dashboard.nix)
and reads each run's results.json (_results_for). Build the per-recipe history from THAT, not from
the 100-build Drone slice — it's complete (362 runs), durable (independent of Drone pagination/retention),
and already available.
history_for(recipe)→ enumerate/var/lib/cc-ci-runs/*/results.json, keep those whose recipe matches, sort newest-first (by run id / timestamp), and render the existing history table (status, level, version, ref, when, link to/runs/<id>/…). Apply a sane display cap (e.g. the last ~30 per recipe) so a long-lived recipe's page stays bounded.- First confirm the
results.jsonschema carries what's needed (recipe, version, level/status, ref, timestamp) — adapt if a field is named differently or read the run id from the dir name; skip a run dir with no/À malformedresults.jsongracefully (don't 500). - Keep Drone only where it adds value — e.g. the live "currently running" status for the most recent
run (a run mid-flight has no final
results.jsonyet). The historical list comes from local artifacts. Keep the overview +/badge/<recipe>.svgworking exactly as today. - Retention check: 362 runs implies adequate retention, but confirm nothing (cleanlogs / docker-prune)
trims
/var/lib/cc-ci-runsso aggressively that history vanishes; if it does, note it in DECISIONS and keep a Drone-pagination fallback. Do not add unbounded growth — if retention needs a cap, record it.
3. Gates
M1 — fix implemented + locally verified. history_for (and any helper) sources per-recipe history
from /var/lib/cc-ci-runs, newest-first, display-capped; results.json schema confirmed; malformed/empty
run dirs handled without erroring. Python stdlib only (the dashboard's standing constraint); the
existing path-traversal guard + the /recipe/ name validation preserved. Unit/local render test shows a
recipe with many runs now lists them all (up to the cap). Adversary cold-verifies: the rendered history
matches the actual run dirs for that recipe (count + order), no security regression (path traversal, arg
injection), overview + badge routes unchanged.
M2 — deployed + verified live. Rebuild/redeploy the dashboard service (the deploy-dashboard
reconcile; the content-hash image tag rolls on dashboard.py change). Then confirm on the live site:
/recipe/bluesky-pds and ≥2 other recipes show their full run history (multiple runs, matching the
local run count), with correct status/level/links; the overview and badges still render. Fresh Adversary
PASS on both milestones → ## DONE.
4. Guardrails
- Read-only dashboard — it never writes run artifacts; the mount stays read-only.
- Python stdlib only (no new deps — it's a stdlib HTTP server packaged into an OCI image).
- Preserve security + the other routes: keep the path-traversal guard and the
/recipe/validation; do not regress the overview,/badge/<recipe>.svg,/runs/<id>/<file>, or the bridge's/hookrouting (traefik priority). - Host/deploy change (redeploying the dashboard service via its reconcile / a nixos-rebuild): loops may
deploy if clean and verify host health after (dashboard service N/N,
ci.commoninternet.net200); else file for the orchestrator. Commit authorautonomic-bot <autonomic-bot@noreply.git.autonomic.zone>; push every commit. - Bounded scope — this is a history-page fix, not a dashboard redesign.
5. Definition of Done
The per-recipe history page sources its run list from the local /var/lib/cc-ci-runs artifacts and shows
the full (display-capped) history per recipe — deployed and verified live on bluesky-pds + ≥2 other
recipes (multiple runs each, matching the host's run count), with the overview/badges/other routes
unaffected and the dashboard still stdlib-only + read-only. Retention confirmed adequate (or recorded).
M1 + M2 fresh Adversary PASSes in REVIEW-dash.md.