From bd16123865d1a40e8358df31eaf8113a36942b92 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Wed, 17 Jun 2026 15:38:45 +0000 Subject: [PATCH] =?UTF-8?q?plan:=20queue=20nixenv=20=E2=80=94=20single-sou?= =?UTF-8?q?rce=20the=20harness=20runtime=20env=20(timer=20+=20Drone=20runn?= =?UTF-8?q?er=20share=20deps;=20root-cause=20fix=20for=20DEFECT-3=20drift,?= =?UTF-8?q?=20opus)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today (harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer). Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host systemPackages; sweep invokes cc-ci-run so env is identical by construction. Queued last (after settings). --- cc-ci-plan/agents.toml | 2 + .../plan-phase-nixenv-shared-runtime-env.md | 91 +++++++++++++++++++ 2 files changed, 93 insertions(+) create mode 100644 cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md diff --git a/cc-ci-plan/agents.toml b/cc-ci-plan/agents.toml index 9c49251..531bd0d 100644 --- a/cc-ci-plan/agents.toml +++ b/cc-ci-plan/agents.toml @@ -166,4 +166,6 @@ phases = [ { id = "dash", plan = "plan-phase-dash-recipe-history.md", status = "STATUS-dash.md", models = { builder = "claude-opus-4-8", adversary = "claude-opus-4-8" } }, # CI-server settings.toml + SKIP_CANONICALS_FOR_UPGRADE + release-tag-first no-canonical fallback (opus) — see plan-phase-settings-*.md (operator 2026-06-17) { id = "settings", plan = "plan-phase-settings-ci-server-config.md", status = "STATUS-settings.md", models = { builder = "claude-opus-4-8", adversary = "claude-opus-4-8" } }, + # single-source the harness runtime env so the sweep timer + Drone runner SHARE deps (no duplication) — root-cause fix for DEFECT-3 drift (opus) — see plan-phase-nixenv-*.md (operator 2026-06-17) + { id = "nixenv", plan = "plan-phase-nixenv-shared-runtime-env.md", status = "STATUS-nixenv.md", models = { builder = "claude-opus-4-8", adversary = "claude-opus-4-8" } }, ] diff --git a/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md b/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md new file mode 100644 index 0000000..4025040 --- /dev/null +++ b/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md @@ -0,0 +1,91 @@ +# Phase `nixenv` — single-source the harness runtime env (timer + Drone runner share, no duplication) + +**Mission (operator-specified 2026-06-17):** the nix runtime environment for the **nightly/weekly sweep +systemd timer** and the **Drone runner** must **share one declaration**, not duplicate their dependency +lists. DEFECT-3 (caught in `canon`) was exactly this divergence: the timer's `runtimeInputs` had drifted +from what recipe tests actually need (missing `bash`, then `git-lfs`), while the Drone runner got those +from the host system PATH. Patching the timer's list treated the symptom; this phase removes the **root +cause** by single-sourcing the env so a dependency can never be present for one path and missing for the +other again. + +State files: `STATUS-nixenv.md`, `BACKLOG-nixenv.md`, `REVIEW-nixenv.md`, `JOURNAL-nixenv.md`. DECISIONS.md shared. + +## 1. The divergence today (verified 2026-06-17) + +Three separate declarations of "what's needed to run a recipe test": +- **`nix/modules/harness.nix`** — `pyEnv = python3.withPackages([pytest playwright])`; `cc-ci-run` wrapper + with `runtimeInputs = [pyEnv abra docker git coreutils util-linux]`. The Drone exec pipeline runs + `cc-ci-run runner/run_recipe_ci.py`. +- **`nix/modules/drone-runner.nix`** — runs pipelines with `PATH = /run/current-system/sw/bin` → recipe + shell-outs (`abra`/`docker`/`git`/**`git-lfs`**/openssl/…) come from **host `environment.systemPackages`** + (`nix/hosts/*/configuration.nix`, where `git-lfs` was added one-off). +- **`nix/modules/nightly-sweep.nix`** — a **duplicate** `pyEnv` + a **different** `runtimeInputs` + (`bash abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps`) + a DEFECT-3 patch + that prepends the host PATH to reach `git-lfs` etc. + +So the same env is declared 2–3× with no shared source → drift (DEFECT-3). `pyEnv` alone is copy-pasted +verbatim in two modules. + +## 2. Design — one shared env, referenced everywhere + +Define the harness/recipe-test runtime env **once** (e.g. a `nix/modules/harness-env.nix` or a let-binding +in `packages.nix`): the `pyEnv` **and** the full recipe-test tooling set — `abra docker git git-lfs bash +util-linux coreutils curl jq gnused gnugrep gnutar openssl procps …` (the union that recipe tests + the +harness actually shell out to). Then: + +1. **`cc-ci-run` (`harness.nix`)** references the shared set for its `runtimeInputs` (so the Drone + pipeline's `cc-ci-run …` has the full env directly, not via a fragile host-PATH dependency). +2. **The sweep (`nightly-sweep.nix`)** stops building its own `pyEnv`/`runtimeInputs` and instead + **invokes `cc-ci-run`** (the SAME entrypoint the Drone runner uses) — so the python env + tooling are + identical *by construction*, and the DEFECT-3 host-PATH-prepend patch can be removed/justified. +3. **Host `systemPackages`** (the Drone runner's PATH tooling, both `nix/hosts/cc-ci-hetzner/` and + `nix/hosts/cc-ci/`) references the **same shared set** rather than a hand-maintained list with a + one-off `git-lfs`. + +Net: there is exactly **one place** that lists the harness's runtime dependencies; adding one (the next +`git-lfs`) propagates atomically to the Drone runner, `cc-ci-run`, and the sweep. The DEFECT-3 class +becomes structurally impossible. + +Keep it a faithful refactor: the *resulting* env must be a **superset-or-equal** of every current list (do +not drop anything any path relies on — enumerate all three current lists + host git-lfs and prove the +shared set covers them). Don't over-abstract; one shared definition, three references. + +## 3. Gates + +**M1 — refactored + builds.** One shared env definition; `harness.nix`/`cc-ci-run`, `nightly-sweep.nix`, +and both host `configuration.nix` systemPackages reference it; no duplicate `pyEnv`; the sweep invokes +`cc-ci-run` (or otherwise provably shares the exact env). `nixos-rebuild build` succeeds. A test/grep +proving **no module declares its own harness dep list** anymore (single source). Adversary cold-verifies: +the shared set is a superset-or-equal of all prior lists (nothing dropped — `bash`, `git-lfs`, `util-linux`, +`curl`, `jq`, `openssl`, playwright browsers, etc. all present for both paths); the sweep and the Drone +runner resolve the **same** tooling; adding a hypothetical dep to the shared set would reach all consumers. + +**M2 — deployed + parity proven live.** `nixos-rebuild switch` deployed (verify host health after: +`systemctl --failed`, reconcile oneshots active, `nightly-sweep.timer` active, endpoints 200). Prove +**parity in production**: the tooling available to a Drone-run recipe test and to a real timer sweep fire is +identical (e.g. `git-lfs`, `bash`, `script`/util-linux resolve in both) — re-run the DEFECT-3 witness +(gitea `test_lfs_roundtrip`) green under BOTH the Drone path and a real timer fire, from the shared env. +A canon-style sweep still promotes/SKIPs correctly under the unified env (no regression to `canon`'s +result). Fresh Adversary PASS on both milestones → `## DONE`. + +## 4. Guardrails + +- **High blast radius — do not break the Drone runner or the sweep.** The harness runtime env underpins + ALL recipe CI; a dropped dependency breaks every test. The shared set must be ≥ the union of today's + lists; prove it before switching. +- **Faithful refactor, not a redesign** — one shared definition, referenced from the existing modules; no + behavior change beyond removing duplication (and the now-redundant DEFECT-3 patch). +- **Host change (nixos-rebuild)** — loops may deploy if clean and **verify host health after**; else file + for the orchestrator. Both host configs (`cc-ci-hetzner`, `cc-ci`) kept consistent. +- Never weaken a test. Commit author `autonomic-bot `; push + every commit. + +## 5. Definition of Done + +The harness/recipe-test runtime env is declared in **one** place and referenced by `cc-ci-run` (Drone +runner entrypoint), the nightly/weekly sweep timer, and the host `systemPackages` — no duplicated `pyEnv`, +no divergent `runtimeInputs`, the DEFECT-3 host-PATH patch removed/subsumed. Proven that the shared set is +a superset-or-equal of all prior lists (nothing dropped), deployed via `nixos-rebuild` with host verified +healthy, and parity demonstrated live (the same tooling — incl. `git-lfs` — resolves for both a Drone +recipe test and a real timer sweep fire; the DEFECT-3 witness stays green under both). A future dependency +addition reaches all consumers from the single source. M1 + M2 fresh Adversary PASSes in REVIEW-nixenv.md.