Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md
autonomic-bot bd16123865 plan: queue nixenv — single-source the harness runtime env (timer + Drone runner share deps; root-cause fix for DEFECT-3 drift, opus)
Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today
(harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a
divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the
Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer).
Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host
systemPackages; sweep invokes cc-ci-run so env is identical by construction.
Queued last (after settings).
2026-06-17 15:38:45 +00:00

92 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase `nixenv` — single-source the harness runtime env (timer + Drone runner share, no duplication)
**Mission (operator-specified 2026-06-17):** the nix runtime environment for the **nightly/weekly sweep
systemd timer** and the **Drone runner** must **share one declaration**, not duplicate their dependency
lists. DEFECT-3 (caught in `canon`) was exactly this divergence: the timer's `runtimeInputs` had drifted
from what recipe tests actually need (missing `bash`, then `git-lfs`), while the Drone runner got those
from the host system PATH. Patching the timer's list treated the symptom; this phase removes the **root
cause** by single-sourcing the env so a dependency can never be present for one path and missing for the
other again.
State files: `STATUS-nixenv.md`, `BACKLOG-nixenv.md`, `REVIEW-nixenv.md`, `JOURNAL-nixenv.md`. DECISIONS.md shared.
## 1. The divergence today (verified 2026-06-17)
Three separate declarations of "what's needed to run a recipe test":
- **`nix/modules/harness.nix`** — `pyEnv = python3.withPackages([pytest playwright])`; `cc-ci-run` wrapper
with `runtimeInputs = [pyEnv abra docker git coreutils util-linux]`. The Drone exec pipeline runs
`cc-ci-run runner/run_recipe_ci.py`.
- **`nix/modules/drone-runner.nix`** — runs pipelines with `PATH = /run/current-system/sw/bin` → recipe
shell-outs (`abra`/`docker`/`git`/**`git-lfs`**/openssl/…) come from **host `environment.systemPackages`**
(`nix/hosts/*/configuration.nix`, where `git-lfs` was added one-off).
- **`nix/modules/nightly-sweep.nix`** — a **duplicate** `pyEnv` + a **different** `runtimeInputs`
(`bash abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps`) + a DEFECT-3 patch
that prepends the host PATH to reach `git-lfs` etc.
So the same env is declared 23× with no shared source → drift (DEFECT-3). `pyEnv` alone is copy-pasted
verbatim in two modules.
## 2. Design — one shared env, referenced everywhere
Define the harness/recipe-test runtime env **once** (e.g. a `nix/modules/harness-env.nix` or a let-binding
in `packages.nix`): the `pyEnv` **and** the full recipe-test tooling set — `abra docker git git-lfs bash
util-linux coreutils curl jq gnused gnugrep gnutar openssl procps …` (the union that recipe tests + the
harness actually shell out to). Then:
1. **`cc-ci-run` (`harness.nix`)** references the shared set for its `runtimeInputs` (so the Drone
pipeline's `cc-ci-run …` has the full env directly, not via a fragile host-PATH dependency).
2. **The sweep (`nightly-sweep.nix`)** stops building its own `pyEnv`/`runtimeInputs` and instead
**invokes `cc-ci-run`** (the SAME entrypoint the Drone runner uses) — so the python env + tooling are
identical *by construction*, and the DEFECT-3 host-PATH-prepend patch can be removed/justified.
3. **Host `systemPackages`** (the Drone runner's PATH tooling, both `nix/hosts/cc-ci-hetzner/` and
`nix/hosts/cc-ci/`) references the **same shared set** rather than a hand-maintained list with a
one-off `git-lfs`.
Net: there is exactly **one place** that lists the harness's runtime dependencies; adding one (the next
`git-lfs`) propagates atomically to the Drone runner, `cc-ci-run`, and the sweep. The DEFECT-3 class
becomes structurally impossible.
Keep it a faithful refactor: the *resulting* env must be a **superset-or-equal** of every current list (do
not drop anything any path relies on — enumerate all three current lists + host git-lfs and prove the
shared set covers them). Don't over-abstract; one shared definition, three references.
## 3. Gates
**M1 — refactored + builds.** One shared env definition; `harness.nix`/`cc-ci-run`, `nightly-sweep.nix`,
and both host `configuration.nix` systemPackages reference it; no duplicate `pyEnv`; the sweep invokes
`cc-ci-run` (or otherwise provably shares the exact env). `nixos-rebuild build` succeeds. A test/grep
proving **no module declares its own harness dep list** anymore (single source). Adversary cold-verifies:
the shared set is a superset-or-equal of all prior lists (nothing dropped — `bash`, `git-lfs`, `util-linux`,
`curl`, `jq`, `openssl`, playwright browsers, etc. all present for both paths); the sweep and the Drone
runner resolve the **same** tooling; adding a hypothetical dep to the shared set would reach all consumers.
**M2 — deployed + parity proven live.** `nixos-rebuild switch` deployed (verify host health after:
`systemctl --failed`, reconcile oneshots active, `nightly-sweep.timer` active, endpoints 200). Prove
**parity in production**: the tooling available to a Drone-run recipe test and to a real timer sweep fire is
identical (e.g. `git-lfs`, `bash`, `script`/util-linux resolve in both) — re-run the DEFECT-3 witness
(gitea `test_lfs_roundtrip`) green under BOTH the Drone path and a real timer fire, from the shared env.
A canon-style sweep still promotes/SKIPs correctly under the unified env (no regression to `canon`'s
result). Fresh Adversary PASS on both milestones → `## DONE`.
## 4. Guardrails
- **High blast radius — do not break the Drone runner or the sweep.** The harness runtime env underpins
ALL recipe CI; a dropped dependency breaks every test. The shared set must be ≥ the union of today's
lists; prove it before switching.
- **Faithful refactor, not a redesign** — one shared definition, referenced from the existing modules; no
behavior change beyond removing duplication (and the now-redundant DEFECT-3 patch).
- **Host change (nixos-rebuild)** — loops may deploy if clean and **verify host health after**; else file
for the orchestrator. Both host configs (`cc-ci-hetzner`, `cc-ci`) kept consistent.
- Never weaken a test. Commit author `autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>`; push
every commit.
## 5. Definition of Done
The harness/recipe-test runtime env is declared in **one** place and referenced by `cc-ci-run` (Drone
runner entrypoint), the nightly/weekly sweep timer, and the host `systemPackages` — no duplicated `pyEnv`,
no divergent `runtimeInputs`, the DEFECT-3 host-PATH patch removed/subsumed. Proven that the shared set is
a superset-or-equal of all prior lists (nothing dropped), deployed via `nixos-rebuild` with host verified
healthy, and parity demonstrated live (the same tooling — incl. `git-lfs` — resolves for both a Drone
recipe test and a real timer sweep fire; the DEFECT-3 witness stays green under both). A future dependency
addition reaches all consumers from the single source. M1 + M2 fresh Adversary PASSes in REVIEW-nixenv.md.