Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-dash-recipe-history.md
autonomic-bot 9e7d76ca1f plan: queue dash — fix incomplete per-recipe run history on the CI dashboard (opus, after canon)
Operator 2026-06-17. /recipe/<name> shows only the latest run for most recipes
because history is built from a single page of the latest 100 Drone builds,
while 362 runs exist on the host. Source per-recipe history from the local
/var/lib/cc-ci-runs artifacts (already bind-mounted read-only) instead — full,
durable history. Deploy + verify live on bluesky-pds.
2026-06-17 04:29:52 +00:00

5.3 KiB

Phase dash — fix incomplete per-recipe run history on the CI dashboard

Mission (operator-specified 2026-06-17): the dashboard's per-recipe history page (https://ci.commoninternet.net/recipe/<recipe>, e.g. /recipe/bluesky-pds) shows only the latest run for most recipes. Make it show the full run history per recipe.

State files: STATUS-dash.md, BACKLOG-dash.md, REVIEW-dash.md, JOURNAL-dash.md. DECISIONS.md shared.

1. Root cause (verified 2026-06-17)

dashboard/dashboard.py builds the per-recipe history solely from the Drone API, capped at a single page of the latest 100 builds:

builds = _drone(f"/api/repos/{CI_REPO}/builds?per_page=100")   # single page, no pagination
def history_for(recipe):
    builds = _custom_recipe_builds()
    return [_build_row(b) for b in builds if (b.get("params") or {}).get("RECIPE") == recipe]

But there are 362 actual runs on the host (/var/lib/cc-ci-runs has 362 run dirs). So ~262 runs are older than the 100-build window and never fetched. The recent regall sweep !testme'd each of the 21 recipes once, filling the latest-100 window and pushing each recipe's older runs out of view → most recipes show exactly one run. (The overview/latest-per-recipe page is unaffected — it only needs the recent window.)

2. Design — source history from the local run artifacts

The dashboard already bind-mounts /var/lib/cc-ci-runs read-only (see nix/modules/dashboard.nix) and reads each run's results.json (_results_for). Build the per-recipe history from THAT, not from the 100-build Drone slice — it's complete (362 runs), durable (independent of Drone pagination/retention), and already available.

  • history_for(recipe) → enumerate /var/lib/cc-ci-runs/*/results.json, keep those whose recipe matches, sort newest-first (by run id / timestamp), and render the existing history table (status, level, version, ref, when, link to /runs/<id>/…). Apply a sane display cap (e.g. the last ~30 per recipe) so a long-lived recipe's page stays bounded.
  • First confirm the results.json schema carries what's needed (recipe, version, level/status, ref, timestamp) — adapt if a field is named differently or read the run id from the dir name; skip a run dir with no/À malformed results.json gracefully (don't 500).
  • Keep Drone only where it adds value — e.g. the live "currently running" status for the most recent run (a run mid-flight has no final results.json yet). The historical list comes from local artifacts. Keep the overview + /badge/<recipe>.svg working exactly as today.
  • Retention check: 362 runs implies adequate retention, but confirm nothing (cleanlogs / docker-prune) trims /var/lib/cc-ci-runs so aggressively that history vanishes; if it does, note it in DECISIONS and keep a Drone-pagination fallback. Do not add unbounded growth — if retention needs a cap, record it.

3. Gates

M1 — fix implemented + locally verified. history_for (and any helper) sources per-recipe history from /var/lib/cc-ci-runs, newest-first, display-capped; results.json schema confirmed; malformed/empty run dirs handled without erroring. Python stdlib only (the dashboard's standing constraint); the existing path-traversal guard + the /recipe/ name validation preserved. Unit/local render test shows a recipe with many runs now lists them all (up to the cap). Adversary cold-verifies: the rendered history matches the actual run dirs for that recipe (count + order), no security regression (path traversal, arg injection), overview + badge routes unchanged.

M2 — deployed + verified live. Rebuild/redeploy the dashboard service (the deploy-dashboard reconcile; the content-hash image tag rolls on dashboard.py change). Then confirm on the live site: /recipe/bluesky-pds and ≥2 other recipes show their full run history (multiple runs, matching the local run count), with correct status/level/links; the overview and badges still render. Fresh Adversary PASS on both milestones → ## DONE.

4. Guardrails

  • Read-only dashboard — it never writes run artifacts; the mount stays read-only.
  • Python stdlib only (no new deps — it's a stdlib HTTP server packaged into an OCI image).
  • Preserve security + the other routes: keep the path-traversal guard and the /recipe/ validation; do not regress the overview, /badge/<recipe>.svg, /runs/<id>/<file>, or the bridge's /hook routing (traefik priority).
  • Host/deploy change (redeploying the dashboard service via its reconcile / a nixos-rebuild): loops may deploy if clean and verify host health after (dashboard service N/N, ci.commoninternet.net 200); else file for the orchestrator. Commit author autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>; push every commit.
  • Bounded scope — this is a history-page fix, not a dashboard redesign.

5. Definition of Done

The per-recipe history page sources its run list from the local /var/lib/cc-ci-runs artifacts and shows the full (display-capped) history per recipe — deployed and verified live on bluesky-pds + ≥2 other recipes (multiple runs each, matching the host's run count), with the overview/badges/other routes unaffected and the dashboard still stdlib-only + read-only. Retention confirmed adequate (or recorded). M1 + M2 fresh Adversary PASSes in REVIEW-dash.md.