Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today (harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer). Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host systemPackages; sweep invokes cc-ci-run so env is identical by construction. Queued last (after settings).
92 lines
6.4 KiB
Markdown
92 lines
6.4 KiB
Markdown
# Phase `nixenv` — single-source the harness runtime env (timer + Drone runner share, no duplication)
|
||
|
||
**Mission (operator-specified 2026-06-17):** the nix runtime environment for the **nightly/weekly sweep
|
||
systemd timer** and the **Drone runner** must **share one declaration**, not duplicate their dependency
|
||
lists. DEFECT-3 (caught in `canon`) was exactly this divergence: the timer's `runtimeInputs` had drifted
|
||
from what recipe tests actually need (missing `bash`, then `git-lfs`), while the Drone runner got those
|
||
from the host system PATH. Patching the timer's list treated the symptom; this phase removes the **root
|
||
cause** by single-sourcing the env so a dependency can never be present for one path and missing for the
|
||
other again.
|
||
|
||
State files: `STATUS-nixenv.md`, `BACKLOG-nixenv.md`, `REVIEW-nixenv.md`, `JOURNAL-nixenv.md`. DECISIONS.md shared.
|
||
|
||
## 1. The divergence today (verified 2026-06-17)
|
||
|
||
Three separate declarations of "what's needed to run a recipe test":
|
||
- **`nix/modules/harness.nix`** — `pyEnv = python3.withPackages([pytest playwright])`; `cc-ci-run` wrapper
|
||
with `runtimeInputs = [pyEnv abra docker git coreutils util-linux]`. The Drone exec pipeline runs
|
||
`cc-ci-run runner/run_recipe_ci.py`.
|
||
- **`nix/modules/drone-runner.nix`** — runs pipelines with `PATH = /run/current-system/sw/bin` → recipe
|
||
shell-outs (`abra`/`docker`/`git`/**`git-lfs`**/openssl/…) come from **host `environment.systemPackages`**
|
||
(`nix/hosts/*/configuration.nix`, where `git-lfs` was added one-off).
|
||
- **`nix/modules/nightly-sweep.nix`** — a **duplicate** `pyEnv` + a **different** `runtimeInputs`
|
||
(`bash abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps`) + a DEFECT-3 patch
|
||
that prepends the host PATH to reach `git-lfs` etc.
|
||
|
||
So the same env is declared 2–3× with no shared source → drift (DEFECT-3). `pyEnv` alone is copy-pasted
|
||
verbatim in two modules.
|
||
|
||
## 2. Design — one shared env, referenced everywhere
|
||
|
||
Define the harness/recipe-test runtime env **once** (e.g. a `nix/modules/harness-env.nix` or a let-binding
|
||
in `packages.nix`): the `pyEnv` **and** the full recipe-test tooling set — `abra docker git git-lfs bash
|
||
util-linux coreutils curl jq gnused gnugrep gnutar openssl procps …` (the union that recipe tests + the
|
||
harness actually shell out to). Then:
|
||
|
||
1. **`cc-ci-run` (`harness.nix`)** references the shared set for its `runtimeInputs` (so the Drone
|
||
pipeline's `cc-ci-run …` has the full env directly, not via a fragile host-PATH dependency).
|
||
2. **The sweep (`nightly-sweep.nix`)** stops building its own `pyEnv`/`runtimeInputs` and instead
|
||
**invokes `cc-ci-run`** (the SAME entrypoint the Drone runner uses) — so the python env + tooling are
|
||
identical *by construction*, and the DEFECT-3 host-PATH-prepend patch can be removed/justified.
|
||
3. **Host `systemPackages`** (the Drone runner's PATH tooling, both `nix/hosts/cc-ci-hetzner/` and
|
||
`nix/hosts/cc-ci/`) references the **same shared set** rather than a hand-maintained list with a
|
||
one-off `git-lfs`.
|
||
|
||
Net: there is exactly **one place** that lists the harness's runtime dependencies; adding one (the next
|
||
`git-lfs`) propagates atomically to the Drone runner, `cc-ci-run`, and the sweep. The DEFECT-3 class
|
||
becomes structurally impossible.
|
||
|
||
Keep it a faithful refactor: the *resulting* env must be a **superset-or-equal** of every current list (do
|
||
not drop anything any path relies on — enumerate all three current lists + host git-lfs and prove the
|
||
shared set covers them). Don't over-abstract; one shared definition, three references.
|
||
|
||
## 3. Gates
|
||
|
||
**M1 — refactored + builds.** One shared env definition; `harness.nix`/`cc-ci-run`, `nightly-sweep.nix`,
|
||
and both host `configuration.nix` systemPackages reference it; no duplicate `pyEnv`; the sweep invokes
|
||
`cc-ci-run` (or otherwise provably shares the exact env). `nixos-rebuild build` succeeds. A test/grep
|
||
proving **no module declares its own harness dep list** anymore (single source). Adversary cold-verifies:
|
||
the shared set is a superset-or-equal of all prior lists (nothing dropped — `bash`, `git-lfs`, `util-linux`,
|
||
`curl`, `jq`, `openssl`, playwright browsers, etc. all present for both paths); the sweep and the Drone
|
||
runner resolve the **same** tooling; adding a hypothetical dep to the shared set would reach all consumers.
|
||
|
||
**M2 — deployed + parity proven live.** `nixos-rebuild switch` deployed (verify host health after:
|
||
`systemctl --failed`, reconcile oneshots active, `nightly-sweep.timer` active, endpoints 200). Prove
|
||
**parity in production**: the tooling available to a Drone-run recipe test and to a real timer sweep fire is
|
||
identical (e.g. `git-lfs`, `bash`, `script`/util-linux resolve in both) — re-run the DEFECT-3 witness
|
||
(gitea `test_lfs_roundtrip`) green under BOTH the Drone path and a real timer fire, from the shared env.
|
||
A canon-style sweep still promotes/SKIPs correctly under the unified env (no regression to `canon`'s
|
||
result). Fresh Adversary PASS on both milestones → `## DONE`.
|
||
|
||
## 4. Guardrails
|
||
|
||
- **High blast radius — do not break the Drone runner or the sweep.** The harness runtime env underpins
|
||
ALL recipe CI; a dropped dependency breaks every test. The shared set must be ≥ the union of today's
|
||
lists; prove it before switching.
|
||
- **Faithful refactor, not a redesign** — one shared definition, referenced from the existing modules; no
|
||
behavior change beyond removing duplication (and the now-redundant DEFECT-3 patch).
|
||
- **Host change (nixos-rebuild)** — loops may deploy if clean and **verify host health after**; else file
|
||
for the orchestrator. Both host configs (`cc-ci-hetzner`, `cc-ci`) kept consistent.
|
||
- Never weaken a test. Commit author `autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>`; push
|
||
every commit.
|
||
|
||
## 5. Definition of Done
|
||
|
||
The harness/recipe-test runtime env is declared in **one** place and referenced by `cc-ci-run` (Drone
|
||
runner entrypoint), the nightly/weekly sweep timer, and the host `systemPackages` — no duplicated `pyEnv`,
|
||
no divergent `runtimeInputs`, the DEFECT-3 host-PATH patch removed/subsumed. Proven that the shared set is
|
||
a superset-or-equal of all prior lists (nothing dropped), deployed via `nixos-rebuild` with host verified
|
||
healthy, and parity demonstrated live (the same tooling — incl. `git-lfs` — resolves for both a Drone
|
||
recipe test and a real timer sweep fire; the DEFECT-3 witness stays green under both). A future dependency
|
||
addition reaches all consumers from the single source. M1 + M2 fresh Adversary PASSes in REVIEW-nixenv.md.
|