Files
cc-ci/machine-docs/JOURNAL-dash.md
autonomic-bot 4c0b289881
Some checks failed
continuous-integration/drone/push Build is failing
claim(M2): dashboard redeployed (image 15addbc7bf45 -> 11ac2a1e6c07), live full per-recipe history verified
bluesky-pds 8 rows in exact host ts order (753 556 435 427 423 ab-* m2rr-* m2r-*),
plausible 30 (capped from 33), ghost 24; overview+badges 200; service 1/1.
Deploy via path: flake (git-flake drops secrets/ submodule). Retention: no trim
job on /var/lib/cc-ci-runs (439 dirs / 17 days) — adequate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 16:37:21 +00:00

4.3 KiB

JOURNAL — phase dash (reasoning; Adversary does not read before verdict)

2026-06-17 — M1 design + implementation

Root cause (confirmed against plan §1 + host): history_for read _custom_recipe_builds(), which fetches a single Drone page …/builds?per_page=100. The recent regall sweep !testme'd all 21 recipes once, filling the latest-100 window, so each recipe's older runs fell outside it → most recipes rendered exactly 1 history row. Host has 432 run dirs (308 parseable results.json).

Why source from local artifacts, not paginate Drone: the plan's chosen design. Local artifacts are complete (308 finished runs vs 100-build Drone window), durable (independent of Drone retention/pagination), already bind-mounted read-only, and already read per-run by _results_for. Pure-local also removes a network dependency + failure mode from the history page. I deliberately did NOT merge in Drone "currently running" live status (plan lists it as an optional "e.g." value-add): it re-introduces the Drone dependency and the overview already shows live status; the DoD asks only that the historical list come from local artifacts. Recorded as a decision.

Status derivation: results.json (schema 2) has no top-level status field. Derived from the per-stage results map: any fail/error → failure; all pass/skip → success; else unknown. A skip alone is not a failure (e.g. custom-html-bkp-bad: backup=fail → failure; level-5 plausible: all pass → success). This matches what the run actually did without inventing a Drone call.

The sort trap (flagged by Adversary's pre-claim baseline too): run ids are MIXED numeric (753,556) and named (m2r-bluesky-pds,ab-bluesky-pds-oldmain). int(run_id) would crash on named ids; lexical sort would scatter them and misorder 9… vs 7…. The ONLY correct order is by finished timestamp. Sort key = (finished, _numeric_id) reverse — finished is primary, numeric id is a stable tiebreak (named ids get -1, so timestamp always decides their slot). Verified the output matches the Adversary's independently-derived bluesky-pds order byte-for-byte.

Cap: HISTORY_CAP=30 (env-overridable). Sorted newest-first BEFORE slicing, so the cap keeps the 30 newest and drops the oldest — verified plausible (33 runs) keeps the newest 30, drops oldest 3.

Caching: _local_history scans the whole runs dir once per CACHE_TTL (reuses the existing 30s TTL) and groups by recipe, so a busy page doesn't json-load 300+ files per request. _results_for (already traversal-guarded) is reused for each dir read, so the path-traversal guarantee is unchanged.

Retention: 308 parseable runs present spanning many days — retention is adequate; no trimming of /var/lib/cc-ci-runs observed that would vanish history. Will confirm no cleanlogs/prune job trims it during M2 and record in DECISIONS if a cap is ever needed (none needed now).

Local verification (M1): 13/13 unit tests pass (incl. new local-sourcing test). Full-fixture run against all 308 real results.json + injected malformed/empty/no-recipe dirs: bluesky-pds=8 in exact timestamp order, plausible capped 30 (newest kept), 308 total grouped, edge dirs skipped without raising, security guards (_RUN_ID_RE, _results_for, serve_run_file) all still reject traversal.

2026-06-17 — M2 deploy + live verify

Deploy gotcha (recorded): nixos-rebuild switch --flake /etc/cc-ci#cc-ci FAILED: error: path '…/secrets/secrets.yaml' does not exist. A git-flake build copies only the top repo's git-tracked files; secrets/ is a submodule gitlink, so its working-tree contents (the sops file) are excluded unless ?submodules=1. The documented canonical approach builds a path: flake of the synced tree (which includes the on-disk submodule files, no remote submodule fetch / creds). Did: tar /etc/cc-ci minus .git/root/ccci-buildnixos-rebuild switch --flake path:/root/ccci-build#cc-ci. Build OK (24s), deploy-dashboard reconcile rolled the service 15addbc7bf45 → 11ac2a1e6c07.

Live verify: service 1/1 on new tag; /recipe/bluesky-pds shows 8 rows in the EXACT host timestamp order (incl. named ids landing in their slots); plausible 30 (capped from 33), ghost 24; overview + badge still 200. Retention: no module trims /var/lib/cc-ci-runs; 439 dirs over 17 days.