plan: queue nixenv — single-source the harness runtime env (timer + Drone runner share deps; root-cause fix for DEFECT-3 drift, opus)

Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today
(harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a
divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the
Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer).
Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host
systemPackages; sweep invokes cc-ci-run so env is identical by construction.
Queued last (after settings).
This commit is contained in:
2026-06-17 15:38:45 +00:00
parent f7825f8494
commit bd16123865
2 changed files with 93 additions and 0 deletions

View File

@ -166,4 +166,6 @@ phases = [
{ id = "dash", plan = "plan-phase-dash-recipe-history.md", status = "STATUS-dash.md", models = { builder = "claude-opus-4-8", adversary = "claude-opus-4-8" } },
# CI-server settings.toml + SKIP_CANONICALS_FOR_UPGRADE + release-tag-first no-canonical fallback (opus) — see plan-phase-settings-*.md (operator 2026-06-17)
{ id = "settings", plan = "plan-phase-settings-ci-server-config.md", status = "STATUS-settings.md", models = { builder = "claude-opus-4-8", adversary = "claude-opus-4-8" } },
# single-source the harness runtime env so the sweep timer + Drone runner SHARE deps (no duplication) — root-cause fix for DEFECT-3 drift (opus) — see plan-phase-nixenv-*.md (operator 2026-06-17)
{ id = "nixenv", plan = "plan-phase-nixenv-shared-runtime-env.md", status = "STATUS-nixenv.md", models = { builder = "claude-opus-4-8", adversary = "claude-opus-4-8" } },
]

View File

@ -0,0 +1,91 @@
# Phase `nixenv` — single-source the harness runtime env (timer + Drone runner share, no duplication)
**Mission (operator-specified 2026-06-17):** the nix runtime environment for the **nightly/weekly sweep
systemd timer** and the **Drone runner** must **share one declaration**, not duplicate their dependency
lists. DEFECT-3 (caught in `canon`) was exactly this divergence: the timer's `runtimeInputs` had drifted
from what recipe tests actually need (missing `bash`, then `git-lfs`), while the Drone runner got those
from the host system PATH. Patching the timer's list treated the symptom; this phase removes the **root
cause** by single-sourcing the env so a dependency can never be present for one path and missing for the
other again.
State files: `STATUS-nixenv.md`, `BACKLOG-nixenv.md`, `REVIEW-nixenv.md`, `JOURNAL-nixenv.md`. DECISIONS.md shared.
## 1. The divergence today (verified 2026-06-17)
Three separate declarations of "what's needed to run a recipe test":
- **`nix/modules/harness.nix`** — `pyEnv = python3.withPackages([pytest playwright])`; `cc-ci-run` wrapper
with `runtimeInputs = [pyEnv abra docker git coreutils util-linux]`. The Drone exec pipeline runs
`cc-ci-run runner/run_recipe_ci.py`.
- **`nix/modules/drone-runner.nix`** — runs pipelines with `PATH = /run/current-system/sw/bin` → recipe
shell-outs (`abra`/`docker`/`git`/**`git-lfs`**/openssl/…) come from **host `environment.systemPackages`**
(`nix/hosts/*/configuration.nix`, where `git-lfs` was added one-off).
- **`nix/modules/nightly-sweep.nix`** — a **duplicate** `pyEnv` + a **different** `runtimeInputs`
(`bash abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps`) + a DEFECT-3 patch
that prepends the host PATH to reach `git-lfs` etc.
So the same env is declared 23× with no shared source → drift (DEFECT-3). `pyEnv` alone is copy-pasted
verbatim in two modules.
## 2. Design — one shared env, referenced everywhere
Define the harness/recipe-test runtime env **once** (e.g. a `nix/modules/harness-env.nix` or a let-binding
in `packages.nix`): the `pyEnv` **and** the full recipe-test tooling set — `abra docker git git-lfs bash
util-linux coreutils curl jq gnused gnugrep gnutar openssl procps …` (the union that recipe tests + the
harness actually shell out to). Then:
1. **`cc-ci-run` (`harness.nix`)** references the shared set for its `runtimeInputs` (so the Drone
pipeline's `cc-ci-run …` has the full env directly, not via a fragile host-PATH dependency).
2. **The sweep (`nightly-sweep.nix`)** stops building its own `pyEnv`/`runtimeInputs` and instead
**invokes `cc-ci-run`** (the SAME entrypoint the Drone runner uses) — so the python env + tooling are
identical *by construction*, and the DEFECT-3 host-PATH-prepend patch can be removed/justified.
3. **Host `systemPackages`** (the Drone runner's PATH tooling, both `nix/hosts/cc-ci-hetzner/` and
`nix/hosts/cc-ci/`) references the **same shared set** rather than a hand-maintained list with a
one-off `git-lfs`.
Net: there is exactly **one place** that lists the harness's runtime dependencies; adding one (the next
`git-lfs`) propagates atomically to the Drone runner, `cc-ci-run`, and the sweep. The DEFECT-3 class
becomes structurally impossible.
Keep it a faithful refactor: the *resulting* env must be a **superset-or-equal** of every current list (do
not drop anything any path relies on — enumerate all three current lists + host git-lfs and prove the
shared set covers them). Don't over-abstract; one shared definition, three references.
## 3. Gates
**M1 — refactored + builds.** One shared env definition; `harness.nix`/`cc-ci-run`, `nightly-sweep.nix`,
and both host `configuration.nix` systemPackages reference it; no duplicate `pyEnv`; the sweep invokes
`cc-ci-run` (or otherwise provably shares the exact env). `nixos-rebuild build` succeeds. A test/grep
proving **no module declares its own harness dep list** anymore (single source). Adversary cold-verifies:
the shared set is a superset-or-equal of all prior lists (nothing dropped — `bash`, `git-lfs`, `util-linux`,
`curl`, `jq`, `openssl`, playwright browsers, etc. all present for both paths); the sweep and the Drone
runner resolve the **same** tooling; adding a hypothetical dep to the shared set would reach all consumers.
**M2 — deployed + parity proven live.** `nixos-rebuild switch` deployed (verify host health after:
`systemctl --failed`, reconcile oneshots active, `nightly-sweep.timer` active, endpoints 200). Prove
**parity in production**: the tooling available to a Drone-run recipe test and to a real timer sweep fire is
identical (e.g. `git-lfs`, `bash`, `script`/util-linux resolve in both) — re-run the DEFECT-3 witness
(gitea `test_lfs_roundtrip`) green under BOTH the Drone path and a real timer fire, from the shared env.
A canon-style sweep still promotes/SKIPs correctly under the unified env (no regression to `canon`'s
result). Fresh Adversary PASS on both milestones → `## DONE`.
## 4. Guardrails
- **High blast radius — do not break the Drone runner or the sweep.** The harness runtime env underpins
ALL recipe CI; a dropped dependency breaks every test. The shared set must be ≥ the union of today's
lists; prove it before switching.
- **Faithful refactor, not a redesign** — one shared definition, referenced from the existing modules; no
behavior change beyond removing duplication (and the now-redundant DEFECT-3 patch).
- **Host change (nixos-rebuild)** — loops may deploy if clean and **verify host health after**; else file
for the orchestrator. Both host configs (`cc-ci-hetzner`, `cc-ci`) kept consistent.
- Never weaken a test. Commit author `autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>`; push
every commit.
## 5. Definition of Done
The harness/recipe-test runtime env is declared in **one** place and referenced by `cc-ci-run` (Drone
runner entrypoint), the nightly/weekly sweep timer, and the host `systemPackages` — no duplicated `pyEnv`,
no divergent `runtimeInputs`, the DEFECT-3 host-PATH patch removed/subsumed. Proven that the shared set is
a superset-or-equal of all prior lists (nothing dropped), deployed via `nixos-rebuild` with host verified
healthy, and parity demonstrated live (the same tooling — incl. `git-lfs` — resolves for both a Drone
recipe test and a real timer sweep fire; the DEFECT-3 witness stays green under both). A future dependency
addition reaches all consumers from the single source. M1 + M2 fresh Adversary PASSes in REVIEW-nixenv.md.