89 lines
6.2 KiB
Markdown
89 lines
6.2 KiB
Markdown
# JOURNAL — phase `nixenv` (Builder)
|
|
|
|
## 2026-06-17 — M1: single-source the harness runtime env
|
|
|
|
### Why this design
|
|
The phase plan §2 wants ONE definition of "what's needed to run a recipe test", referenced from
|
|
three places, so DEFECT-3 (a dep present for one path, missing for another) becomes structurally
|
|
impossible. I put the single source in `nix/modules/packages.nix` because it is the existing
|
|
"shared pkgs" overlay module already imported by both host configs — so `pkgs.ccciRuntimeTools`
|
|
and `pkgs.cc-ci-run` are reachable from every module/host without a fragile cross-module `let`.
|
|
|
|
Three overlay defs:
|
|
- `ccciPyEnv` (let-bound, internal) — `python3.withPackages [pytest playwright]`, the ONLY pyEnv now.
|
|
- `ccciRuntimeTools` (overlay attr) — the union tool set.
|
|
- `cc-ci-run` (overlay attr) — `writeShellApplication` with `runtimeInputs = [ccciPyEnv] ++ ccciRuntimeTools`.
|
|
|
|
Consumers:
|
|
- `harness.nix` → `environment.systemPackages = [ pkgs.cc-ci-run ]` (installs the entrypoint).
|
|
- `nightly-sweep.nix` → wrapper execs `cc-ci-run` (same binary the Drone pipeline runs), so pyEnv +
|
|
tooling + PLAYWRIGHT env are identical to the Drone path by construction. Dropped: the duplicate
|
|
pyEnv, the parallel `runtimeInputs` tool list, and the DEFECT-3 `export PATH=/run/current-system/sw/bin…`
|
|
prepend — git-lfs/bash/util-linux/openssl now come from cc-ci-run's runtimeInputs.
|
|
- both host `configuration.nix` → `systemPackages = pkgs.ccciRuntimeTools ++ [ pkgs.openssh ]`.
|
|
|
|
### Why the union is a superset (nothing dropped)
|
|
- old cc-ci-run: `abra docker git coreutils util-linux` ⊂ set.
|
|
- old sweep: `bash abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps` ⊂ set;
|
|
its host-PATH-derived git-lfs/openssl are now EXPLICIT in the set.
|
|
- old host PATH: `curl git jq` (+ git-lfs on hetzner only) ⊂ set; `openssh` kept as host-only add.
|
|
- pyEnv (python3+pytest+playwright) + playwright browsers (via PLAYWRIGHT_BROWSERS_PATH) preserved.
|
|
Additions vs any single prior list: `git-lfs`, `openssl` (plan §2). The `cc-ci` host GAINS git-lfs,
|
|
killing the one-off hetzner-only divergence — both host configs now byte-identical.
|
|
|
|
### Why writeShellApplication makes this work
|
|
`writeShellApplication` emits `export PATH="<runtimeInputs>:$PATH"` (confirmed on the live wrapper).
|
|
So cc-ci-run's full tool set is the PATH *prefix* regardless of caller. Under Drone the inherited
|
|
suffix is `/run/current-system/sw/bin:/run/wrappers/bin`; under the sweep it's the systemd-minimal
|
|
PATH — but the harness tools all resolve from the shared prefix either way, which is the parity the
|
|
plan wants. The host `systemPackages` reference is the belt-and-suspenders path for direct
|
|
`.drone.yml` shell-outs (`abra --version`, `docker info`) that don't go through cc-ci-run.
|
|
|
|
### buildEnv collision watch (resolved)
|
|
Worry: adding coreutils/util-linux/procps/bash/gnu* to host `systemPackages` could collide with the
|
|
NixOS base `requiredPackages`. It did not — base requiredPackages are `lowPrio`, so the normal-prio
|
|
additions override cleanly. Both `#cc-ci` and `#cc-ci-hetzner` built with no collision error.
|
|
|
|
### Note on other modules' tool lists
|
|
`backupbot/docker-prune/drone/proxy/warm-keycloak.nix` still list gnused/gnugrep/etc. in their OWN
|
|
`runtimeInputs` — those are independent reconcile-service scripts, never part of the harness/recipe
|
|
-test env, never part of the DEFECT-3 divergence. Single-sourcing is scoped to the harness env
|
|
(pyEnv + recipe-test tooling consumed by cc-ci-run / sweep / host PATH), which is now packages.nix only.
|
|
|
|
### Verification (local, dirty tree needs `?submodules=1` — `secrets/` is a submodule)
|
|
- `nixos-rebuild build --flake '.?submodules=1#cc-ci-hetzner'` → built `nixos-system-…dhmpm232…`.
|
|
- `nixos-rebuild build --flake '.?submodules=1#cc-ci'` → built OK.
|
|
- cc-ci-run store `zxlx9jnylh7la5m48bsqb1wfm5l9r0bd`; PATH carries all 15 tools incl git-lfs-3.6.1 + openssl-3.3.3.
|
|
- sweep wrapper `gh02w1kc…` execs the SAME `zxlx9j…/bin/cc-ci-run`.
|
|
- cc-ci host sw/bin now lists git-lfs + openssl (was missing git-lfs pre-refactor).
|
|
- `grep -rn withPackages nix/` → 1 hit (packages.nix:17).
|
|
|
|
## 2026-06-17T18:17Z — M2 claim (both live parity witnesses green)
|
|
|
|
### Drone-path witness (build #871)
|
|
Why REF=357926f2 PR=1 SRC=recipe-maintainers/gitea: this is the lfs-plain-gitea capstone ref (the
|
|
gtea-phase Build #685 ref). PR #1 is now merged so compose.lfs.yml is also on main, but pinning the
|
|
PR head guarantees `_lfs_enabled()` is true (compose.lfs.yml in checkout + RECIPE=gitea) so the LFS
|
|
test RUNS rather than skips. fetch_recipe takes the SRC+REF mirror-clone path; EXTRA_ENV adds
|
|
compose.lfs.yml to install+custom tiers so the deployed gitea has LFS on for the round-trip. Triggered
|
|
via the Drone API with the bridge's drone token (kept on-host). Build went green in ~3 min;
|
|
test_lfs_roundtrip PASSED. This is the SAME cc-ci-run store path the timer sweep execs, so the two
|
|
witnesses prove parity by both construction (M1) and observation (M2).
|
|
|
|
### Why the timer fire is the harder witness
|
|
The systemd unit PATH is systemd-minimal (coreutils/findutils/gnugrep/gnused/systemd) — NO git-lfs,
|
|
NO /run/current-system/sw/bin. So a green LFS test there can ONLY come from cc-ci-run's runtimeInputs
|
|
prepending git-lfs-3.6.1 to PATH. Confirmed by reading /proc/<run_recipe_ci pid>/environ live: PATH
|
|
starts with the cc-ci-run tool prefix incl git-lfs. This is exactly the DEFECT-3 condition the phase
|
|
set out to make structurally impossible.
|
|
|
|
### GREEN-BUT-PROMOTE-FAILED is not mine
|
|
Spent effort confirming the gitea promote-fail (`abra app deploy warm-gitea -o -n` → "already
|
|
deployed") is pre-existing: it appears identically in the two pre-deploy sweep fires (14:28Z, 15:56Z,
|
|
OLD env) and the promote path (runner/nightly_sweep.py) is unchanged by nixenv (last touched canon
|
|
f94de22). It's an abra deploy-idempotency limitation on the persistent warm canonical (warm-gitea up
|
|
since 08:39Z), non-fatal, known-good unchanged. discourse/mattermost-lts reds are likewise recipe-level
|
|
and pre-existing (mattermost: postgres restore marker assertion; docker resolved fine → not a dropped
|
|
tool). nixenv changes only WHICH tools are on PATH; it dropped nothing (M1 superset proof), so it
|
|
cannot have caused an app-level red.
|