Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md
autonomic-bot bd16123865 plan: queue nixenv — single-source the harness runtime env (timer + Drone runner share deps; root-cause fix for DEFECT-3 drift, opus)
Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today
(harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a
divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the
Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer).
Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host
systemPackages; sweep invokes cc-ci-run so env is identical by construction.
Queued last (after settings).
2026-06-17 15:38:45 +00:00

6.4 KiB
Raw Blame History

Phase nixenv — single-source the harness runtime env (timer + Drone runner share, no duplication)

Mission (operator-specified 2026-06-17): the nix runtime environment for the nightly/weekly sweep systemd timer and the Drone runner must share one declaration, not duplicate their dependency lists. DEFECT-3 (caught in canon) was exactly this divergence: the timer's runtimeInputs had drifted from what recipe tests actually need (missing bash, then git-lfs), while the Drone runner got those from the host system PATH. Patching the timer's list treated the symptom; this phase removes the root cause by single-sourcing the env so a dependency can never be present for one path and missing for the other again.

State files: STATUS-nixenv.md, BACKLOG-nixenv.md, REVIEW-nixenv.md, JOURNAL-nixenv.md. DECISIONS.md shared.

1. The divergence today (verified 2026-06-17)

Three separate declarations of "what's needed to run a recipe test":

  • nix/modules/harness.nixpyEnv = python3.withPackages([pytest playwright]); cc-ci-run wrapper with runtimeInputs = [pyEnv abra docker git coreutils util-linux]. The Drone exec pipeline runs cc-ci-run runner/run_recipe_ci.py.
  • nix/modules/drone-runner.nix — runs pipelines with PATH = /run/current-system/sw/bin → recipe shell-outs (abra/docker/git/git-lfs/openssl/…) come from host environment.systemPackages (nix/hosts/*/configuration.nix, where git-lfs was added one-off).
  • nix/modules/nightly-sweep.nix — a duplicate pyEnv + a different runtimeInputs (bash abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps) + a DEFECT-3 patch that prepends the host PATH to reach git-lfs etc.

So the same env is declared 23× with no shared source → drift (DEFECT-3). pyEnv alone is copy-pasted verbatim in two modules.

2. Design — one shared env, referenced everywhere

Define the harness/recipe-test runtime env once (e.g. a nix/modules/harness-env.nix or a let-binding in packages.nix): the pyEnv and the full recipe-test tooling set — abra docker git git-lfs bash util-linux coreutils curl jq gnused gnugrep gnutar openssl procps … (the union that recipe tests + the harness actually shell out to). Then:

  1. cc-ci-run (harness.nix) references the shared set for its runtimeInputs (so the Drone pipeline's cc-ci-run … has the full env directly, not via a fragile host-PATH dependency).
  2. The sweep (nightly-sweep.nix) stops building its own pyEnv/runtimeInputs and instead invokes cc-ci-run (the SAME entrypoint the Drone runner uses) — so the python env + tooling are identical by construction, and the DEFECT-3 host-PATH-prepend patch can be removed/justified.
  3. Host systemPackages (the Drone runner's PATH tooling, both nix/hosts/cc-ci-hetzner/ and nix/hosts/cc-ci/) references the same shared set rather than a hand-maintained list with a one-off git-lfs.

Net: there is exactly one place that lists the harness's runtime dependencies; adding one (the next git-lfs) propagates atomically to the Drone runner, cc-ci-run, and the sweep. The DEFECT-3 class becomes structurally impossible.

Keep it a faithful refactor: the resulting env must be a superset-or-equal of every current list (do not drop anything any path relies on — enumerate all three current lists + host git-lfs and prove the shared set covers them). Don't over-abstract; one shared definition, three references.

3. Gates

M1 — refactored + builds. One shared env definition; harness.nix/cc-ci-run, nightly-sweep.nix, and both host configuration.nix systemPackages reference it; no duplicate pyEnv; the sweep invokes cc-ci-run (or otherwise provably shares the exact env). nixos-rebuild build succeeds. A test/grep proving no module declares its own harness dep list anymore (single source). Adversary cold-verifies: the shared set is a superset-or-equal of all prior lists (nothing dropped — bash, git-lfs, util-linux, curl, jq, openssl, playwright browsers, etc. all present for both paths); the sweep and the Drone runner resolve the same tooling; adding a hypothetical dep to the shared set would reach all consumers.

M2 — deployed + parity proven live. nixos-rebuild switch deployed (verify host health after: systemctl --failed, reconcile oneshots active, nightly-sweep.timer active, endpoints 200). Prove parity in production: the tooling available to a Drone-run recipe test and to a real timer sweep fire is identical (e.g. git-lfs, bash, script/util-linux resolve in both) — re-run the DEFECT-3 witness (gitea test_lfs_roundtrip) green under BOTH the Drone path and a real timer fire, from the shared env. A canon-style sweep still promotes/SKIPs correctly under the unified env (no regression to canon's result). Fresh Adversary PASS on both milestones → ## DONE.

4. Guardrails

  • High blast radius — do not break the Drone runner or the sweep. The harness runtime env underpins ALL recipe CI; a dropped dependency breaks every test. The shared set must be ≥ the union of today's lists; prove it before switching.
  • Faithful refactor, not a redesign — one shared definition, referenced from the existing modules; no behavior change beyond removing duplication (and the now-redundant DEFECT-3 patch).
  • Host change (nixos-rebuild) — loops may deploy if clean and verify host health after; else file for the orchestrator. Both host configs (cc-ci-hetzner, cc-ci) kept consistent.
  • Never weaken a test. Commit author autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>; push every commit.

5. Definition of Done

The harness/recipe-test runtime env is declared in one place and referenced by cc-ci-run (Drone runner entrypoint), the nightly/weekly sweep timer, and the host systemPackages — no duplicated pyEnv, no divergent runtimeInputs, the DEFECT-3 host-PATH patch removed/subsumed. Proven that the shared set is a superset-or-equal of all prior lists (nothing dropped), deployed via nixos-rebuild with host verified healthy, and parity demonstrated live (the same tooling — incl. git-lfs — resolves for both a Drone recipe test and a real timer sweep fire; the DEFECT-3 witness stays green under both). A future dependency addition reaches all consumers from the single source. M1 + M2 fresh Adversary PASSes in REVIEW-nixenv.md.