Some checks failed
continuous-integration/drone/push Build is failing
bluesky-pds 8 rows in exact host ts order (753 556 435 427 423 ab-* m2rr-* m2r-*), plausible 30 (capped from 33), ghost 24; overview+badges 200; service 1/1. Deploy via path: flake (git-flake drops secrets/ submodule). Retention: no trim job on /var/lib/cc-ci-runs (439 dirs / 17 days) — adequate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
59 lines
4.3 KiB
Markdown
59 lines
4.3 KiB
Markdown
# JOURNAL — phase `dash` (reasoning; Adversary does not read before verdict)
|
|
|
|
## 2026-06-17 — M1 design + implementation
|
|
|
|
**Root cause (confirmed against plan §1 + host):** `history_for` read `_custom_recipe_builds()`,
|
|
which fetches a single Drone page `…/builds?per_page=100`. The recent `regall` sweep `!testme`'d all
|
|
21 recipes once, filling the latest-100 window, so each recipe's older runs fell outside it → most
|
|
recipes rendered exactly 1 history row. Host has 432 run dirs (308 parseable `results.json`).
|
|
|
|
**Why source from local artifacts, not paginate Drone:** the plan's chosen design. Local artifacts
|
|
are complete (308 finished runs vs 100-build Drone window), durable (independent of Drone
|
|
retention/pagination), already bind-mounted read-only, and already read per-run by `_results_for`.
|
|
Pure-local also removes a network dependency + failure mode from the history page. I deliberately did
|
|
NOT merge in Drone "currently running" live status (plan lists it as an optional "e.g." value-add):
|
|
it re-introduces the Drone dependency and the overview already shows live status; the DoD asks only
|
|
that the *historical* list come from local artifacts. Recorded as a decision.
|
|
|
|
**Status derivation:** `results.json` (schema 2) has no top-level status field. Derived from the
|
|
per-stage `results` map: any `fail`/`error` → failure; all `pass`/`skip` → success; else unknown.
|
|
A skip alone is not a failure (e.g. custom-html-bkp-bad: backup=fail → failure; level-5 plausible:
|
|
all pass → success). This matches what the run actually did without inventing a Drone call.
|
|
|
|
**The sort trap (flagged by Adversary's pre-claim baseline too):** run ids are MIXED numeric
|
|
(`753`,`556`) and named (`m2r-bluesky-pds`,`ab-bluesky-pds-oldmain`). `int(run_id)` would crash on
|
|
named ids; lexical sort would scatter them and misorder `9…` vs `7…`. The ONLY correct order is by
|
|
`finished` timestamp. Sort key = `(finished, _numeric_id)` reverse — finished is primary, numeric id
|
|
is a stable tiebreak (named ids get -1, so timestamp always decides their slot). Verified the output
|
|
matches the Adversary's independently-derived bluesky-pds order byte-for-byte.
|
|
|
|
**Cap:** `HISTORY_CAP=30` (env-overridable). Sorted newest-first BEFORE slicing, so the cap keeps the
|
|
30 newest and drops the oldest — verified plausible (33 runs) keeps the newest 30, drops oldest 3.
|
|
|
|
**Caching:** `_local_history` scans the whole runs dir once per `CACHE_TTL` (reuses the existing 30s
|
|
TTL) and groups by recipe, so a busy page doesn't json-load 300+ files per request. `_results_for`
|
|
(already traversal-guarded) is reused for each dir read, so the path-traversal guarantee is unchanged.
|
|
|
|
**Retention:** 308 parseable runs present spanning many days — retention is adequate; no trimming of
|
|
`/var/lib/cc-ci-runs` observed that would vanish history. Will confirm no cleanlogs/prune job trims it
|
|
during M2 and record in DECISIONS if a cap is ever needed (none needed now).
|
|
|
|
**Local verification (M1):** 13/13 unit tests pass (incl. new local-sourcing test). Full-fixture run
|
|
against all 308 real `results.json` + injected malformed/empty/no-recipe dirs: bluesky-pds=8 in exact
|
|
timestamp order, plausible capped 30 (newest kept), 308 total grouped, edge dirs skipped without
|
|
raising, security guards (`_RUN_ID_RE`, `_results_for`, `serve_run_file`) all still reject traversal.
|
|
|
|
## 2026-06-17 — M2 deploy + live verify
|
|
|
|
**Deploy gotcha (recorded):** `nixos-rebuild switch --flake /etc/cc-ci#cc-ci` FAILED:
|
|
`error: path '…/secrets/secrets.yaml' does not exist`. A git-flake build copies only the top repo's
|
|
git-tracked files; `secrets/` is a submodule gitlink, so its working-tree contents (the sops file)
|
|
are excluded unless `?submodules=1`. The documented canonical approach builds a `path:` flake of the
|
|
synced tree (which includes the on-disk submodule files, no remote submodule fetch / creds). Did:
|
|
tar `/etc/cc-ci` minus `.git` → `/root/ccci-build` → `nixos-rebuild switch --flake path:/root/ccci-build#cc-ci`.
|
|
Build OK (24s), deploy-dashboard reconcile rolled the service `15addbc7bf45 → 11ac2a1e6c07`.
|
|
|
|
**Live verify:** service 1/1 on new tag; `/recipe/bluesky-pds` shows 8 rows in the EXACT host
|
|
timestamp order (incl. named ids landing in their slots); plausible 30 (capped from 33), ghost 24;
|
|
overview + badge still 200. Retention: no module trims `/var/lib/cc-ci-runs`; 439 dirs over 17 days.
|