Files
cc-ci/machine-docs/STATUS-nixenv.md
autonomic-bot de4d69072c
Some checks failed
continuous-integration/drone/push Build is failing
status(nixenv): mark phase DONE in STATUS (M1+M2 both PASS, no VETO)
2026-06-17 23:18:36 +00:00

156 lines
11 KiB
Markdown

# STATUS — phase `nixenv` (Builder)
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md`
## Phase
Single-source the harness/recipe-test runtime env so the Drone runner, the nightly/weekly sweep
timer, and host `systemPackages` share ONE declaration (no duplicate `pyEnv`, no divergent
`runtimeInputs`, DEFECT-3 host-PATH patch removed/subsumed).
## DONE
Phase `nixenv` complete. The harness/recipe-test runtime env is single-sourced in
`nix/modules/packages.nix` (`ccciPyEnv` + `ccciRuntimeTools` + `cc-ci-run`) and referenced by the
Drone runner entrypoint (`cc-ci-run`), the nightly/weekly sweep (execs `cc-ci-run`), and both hosts'
`systemPackages` — no duplicate `pyEnv`, no divergent `runtimeInputs`, the DEFECT-3 host-PATH patch
removed. Deployed (`nixos-rebuild switch`, d11f8f5), host healthy. Live parity proven: gitea
`test_lfs_roundtrip` GREEN under BOTH a real timer fire (@17:57:54Z) and the Drone path (build #871) —
git-lfs/openssl resolve from the single shared declaration on every path. The DEFECT-3 divergence
class is structurally impossible.
- **M1 — PASS** @ 2026-06-17T17:40Z (REVIEW-nixenv.md, claim `8b8fc1f`).
- **M2 — PASS** @ 2026-06-17T18:20Z (REVIEW-nixenv.md, claim `f7b6f26`).
- No VETO; no standing defects.
## M1 — PASS @ 2026-06-17T17:40Z (REVIEW-nixenv.md, claim 8b8fc1f). No VETO.
## Gate: M2 — PASS @ 2026-06-17T18:20Z (REVIEW-nixenv.md, claim f7b6f26). No VETO.
**WHAT (M2 DoD).** (1) Deployed via `nixos-rebuild switch`, host verified healthy. (2) Live parity:
gitea `test_lfs_roundtrip` GREEN under BOTH a real timer fire AND the Drone path, from the shared
env (git-lfs resolves on both — DEFECT-3 condition met live). (3) A canon-style sweep still
promotes/SKIPs correctly under the unified env — no regression to canon's result.
**WHERE (inputs).** Deployed system from `/etc/cc-ci` @ d11f8f5 (= M1-reviewed tree). nixenv diff
`dd6712c..d11f8f5` = nix/ modules + machine-docs ONLY; **zero `runner/`/`tests/` changes** (verify:
`git diff --name-only dd6712c..d11f8f5 | grep -E 'runner/|tests/'` → empty). `runner/nightly_sweep.py`
(the promote path) last touched by canon commit `f94de22` — byte-identical to canon.
### M2 result summary (both witnesses PASS, host healthy, no regression)
- **(2a) Drone-path witness — PASS.** Drone build **#871** (event=custom, RECIPE=gitea REF=357926f2
PR=1 SRC=recipe-maintainers/gitea), status=success, 18:11→18:14Z. The Drone exec pipeline runs
`cc-ci-run runner/run_recipe_ci.py` (`.drone.yml:83`). compose.lfs.yml present at that ref →
`_lfs_enabled()` true → LFS test RAN (not skipped): `tests/gitea/custom/test_lfs_roundtrip.py::
test_lfs_roundtrip PASSED`; all install/upgrade/backup/restore/custom tiers PASSED.
- HOW (Adversary re-run): `ssh cc-ci 'TOK=$(cat /run/secrets/bridge_drone_token); curl -s -H
"Authorization: Bearer $TOK" https://drone.ci.commoninternet.net/api/repos/recipe-maintainers/cc-ci/builds/871/logs/1/2 | jq -r ".[].out"' | grep test_lfs_roundtrip`.
EXPECTED: `test_lfs_roundtrip PASSED`. (Or trigger your OWN build with the same params and re-run.)
- **(2b) Real timer fire witness — PASS** (details retained in the block below): `test_lfs_roundtrip
PASSED` @17:57:54Z under `systemctl start nightly-sweep.service`, git-lfs resolved from cc-ci-run's
runtimeInputs while the systemd unit PATH has NO git-lfs / no /run/current-system/sw/bin.
- **(3) No regression.** Sweep (PID 2743890, 17:35→18:0xZ) completed all 20 enrolled recipes; SKIPs
all correct (cryptpad/ghost/drone/hedgedoc/immich/lasuite-*/mailu/matrix-synapse/n8n/plausible/
uptime-kuma no-new-version SKIP), promotes correct (custom-html→1.13.0+1.31.1, mumble→1.0.0+v1.6.870-0).
Three results need explicit non-regression context, ALL pre-existing (identical in the pre-deploy
fires PID 2149231@14:xx / 2248547@15:xx, OLD env):
- gitea `rc=0 GREEN-BUT-PROMOTE-FAILED` — tests green; WC5 promote fails `FATA warm-gitea… is
already deployed` (abra deploy-idempotency on the persistent warm canonical, up since 08:39Z;
non-fatal). promote path = canon `nightly_sweep.py` f94de22, unchanged by nixenv.
- discourse `rc=1` and mattermost-lts `rc=1` — recipe-level red (mattermost: `test_restore_returns_state`
→ `docker exec … postgres … relation "ci_marker" does not exist`; docker resolved fine → NOT a
missing-tool/dropped-dep failure). Both failed identically pre-deploy → not caused by the env change.
- **Host health (re-verified post-sweep @18:16Z).** `systemctl --failed` empty; `nightly-sweep.timer`
+ deploy-proxy/deploy-drone/deploy-bridge/drone-runner-exec/swarm-init/warm-keycloak all active;
drone `/healthz` 200, ci.commoninternet.net 200; live `cc-ci-run` = `zxlx9jnylh7la5m48bsqb1wfm5l9r0bd`
(M1-reviewed path).
### M2 deploy + timer-fire details (retained for the record)
**Deploy DONE** @ 2026-06-17T17:34Z. `nixos-rebuild switch --flake 'git+file:///etc/cc-ci?submodules=1#cc-ci-hetzner'`
(live host = hetzner; `/etc/cc-ci` @ d11f8f5). Deployed system `/nix/store/dhmpm232r6m0sq3s7y5r5jpyv5kxgzwi-nixos-system-…`
is BYTE-IDENTICAL to the M1-reviewed local build. Health: `systemctl --failed` empty; deploy-proxy /
warm-keycloak / swarm-init / drone-runner-exec all active; `nightly-sweep.timer` active;
drone healthz + ci.commoninternet.net → 200. Live `cc-ci-run` = `zxlx9jnylh7la5m48bsqb1wfm5l9r0bd`
(the M1-reviewed path); git-lfs/openssl/script/bash resolve on host PATH (openssl was MISSING pre-deploy).
**Live parity witness — BOTH paths GREEN** (Drone #871 + timer fire; summarised above). Diff scope: ONLY nix/ changed
(dd6712c..d11f8f5: 5 nix files, zero runner/tests) → sweep SKIP/promote logic byte-identical to
canon's PASSed sweep.
- **Real timer fire — PASS** @ 2026-06-17T17:57:54Z. `systemctl start nightly-sweep.service` @
17:35:38Z (PID 2743890; child run_recipe_ci PID 2808444). The unit's systemd PATH contains ONLY
coreutils/findutils/gnugrep/gnused/systemd — NOT git-lfs, NOT /run/current-system/sw/bin — so
git-lfs resolved from cc-ci-run's runtimeInputs (the DEFECT-3 condition). Verified live: the running
run_recipe_ci process PATH (`/proc/<pid>/environ`) carries `…-git-lfs-3.6.1/bin` from cc-ci-run.
gitea RUN (canonical 3.5.3+1.24.2 < tag 3.6.0+1.24.2) exercised LFS (upgrade-env COMPOSE_FILE
includes compose.lfs.yml) → `tests/gitea/custom/test_lfs_roundtrip.py::test_lfs_roundtrip PASSED`
(18.66s); all other gitea tiers PASSED.
- HOW (Adversary re-run): `ssh cc-ci 'journalctl -u nightly-sweep.service -o short-iso --since
"2026-06-17 17:55:57" --until "2026-06-17 17:58:07"' | grep -iE "lfs_roundtrip|PASSED|rc="`.
EXPECTED: `test_lfs_roundtrip PASSED` then `sweep: gitea rc=0`.
- NOTE (not a regression): the sweep line reads `rc=0 GREEN-BUT-PROMOTE-FAILED` — all TESTS green;
the WC5 promote (`abra app deploy warm-gitea… -o -n`) fails with `FATA warm-gitea… is already
deployed`. This is an abra deploy-idempotency quirk on the warm canonical (already running, volume
retained), NON-FATAL (known-good unchanged), and it occurred IDENTICALLY in the pre-deploy runs
(PID 2149231 @ 14:28Z, PID 2248547 @ 15:56Z) — orthogonal to the runtime-env refactor (abra is on
PATH unchanged in both). SKIPs in this fire are all correct (cryptpad/ghost/drone/hedgedoc/immich
no-new-version SKIP; custom-html RUN→promoted 1.13.0+1.31.1).
- Drone-path gitea witness: DONE — build #871 PASS (see "(2a)" above).
### (prior M1 claim block retained below for the record)
## M1 details — PASS
**WHAT (M1 DoD).** The harness/recipe-test runtime env is declared ONCE and referenced by all
consumers; `nixos-rebuild build` succeeds for both hosts; the shared set is superset-or-equal of
every prior list (nothing dropped); the sweep and the Drone runner resolve the same tooling; a
future dep added to the shared set reaches all consumers.
**WHERE (inputs).** All changes at the tip of `main` (commit pushed with this claim).
- Single source: `nix/modules/packages.nix` — overlay defines `ccciPyEnv` (let), `ccciRuntimeTools`
(overlay attr), `cc-ci-run` (overlay attr, `runtimeInputs = [ccciPyEnv] ++ ccciRuntimeTools`).
- Consumers: `nix/modules/harness.nix` (`systemPackages = [ pkgs.cc-ci-run ]`),
`nix/modules/nightly-sweep.nix` (wrapper execs `cc-ci-run`),
`nix/hosts/cc-ci/configuration.nix` + `nix/hosts/cc-ci-hetzner/configuration.nix`
(`systemPackages = pkgs.ccciRuntimeTools ++ [ pkgs.openssh ]`).
- `nix/modules/drone-runner.nix` unchanged (still `PATH=/run/current-system/sw/bin:/run/wrappers/bin`;
it consumes the host PATH, which now references the shared set).
**HOW + EXPECTED (cold-verifiable; `secrets/` is a git submodule → use `?submodules=1` for a dirty
tree, or build from a `git clone --recursive`).**
1. Builds succeed (both hosts):
- `nixos-rebuild build --flake '.?submodules=1#cc-ci-hetzner'` → builds
`nixos-system-nixos-24.11.…` (locally: `/nix/store/dhmpm232r6m0sq3s7y5r5jpyv5kxgzwi-nixos-system-nixos-24.11.20250630.50ab793`;
store hash may differ on a fresh clone if paths differ, but it MUST build with no collision error).
- `nixos-rebuild build --flake '.?submodules=1#cc-ci'` → builds OK (no collision error).
2. Single source (grep proofs):
- `grep -rn withPackages nix/` → EXACTLY 1 hit: `nix/modules/packages.nix` (`ccciPyEnv`).
- `grep -rn "pytest playwright" nix/` → EXACTLY 1 hit: same line. (No duplicate pyEnv.)
- `grep -rn ccciRuntimeTools nix/` → defined once (packages.nix), referenced by both host configs.
- `nightly-sweep.nix` contains NO `withPackages`, NO `python3`, NO `/run/current-system/sw/bin`
PATH prepend, and its `runtimeInputs = [ pkgs.cc-ci-run ]` only; it `exec cc-ci-run …`.
3. Superset-or-equal — `cc-ci-run` carries every tool (inspect the built wrapper's PATH):
- `CCRUN=$(nix eval --raw '.?submodules=1#nixosConfigurations.cc-ci-hetzner.pkgs.cc-ci-run'); grep '^export PATH' "$CCRUN/bin/cc-ci-run"`
- EXPECTED store dirs on PATH (15): python3-3.12.8-env, abra-0.13.0-beta, docker-27.5.1,
git-2.47.2, **git-lfs-3.6.1**, bash-5.2p37, coreutils-9.5, util-linux-2.39.4, curl-8.12.1,
jq-1.7.1, gnused-4.9, gnugrep-3.11, gnutar-1.35, **openssl-3.3.3**, procps-4.0.4.
- git-lfs + openssl are the additions vs prior lists; nothing from any prior list is dropped.
4. Sweep ≡ Drone entrypoint (parity by construction):
- The built `cc-ci-nightly-sweep` wrapper `exec cc-ci-run …` resolves the BYTE-IDENTICAL
cc-ci-run store path that the `.drone.yml` `cc-ci-run runner/run_recipe_ci.py` step runs
(locally `/nix/store/zxlx9jnylh7la5m48bsqb1wfm5l9r0bd-cc-ci-run`). Same store path ⇒ same
pyEnv, same tooling, same PLAYWRIGHT_BROWSERS_PATH.
5. Host divergence removed:
- Both host `configuration.nix` `systemPackages` lines are textually identical
(`pkgs.ccciRuntimeTools ++ [ pkgs.openssh ]`). The `cc-ci` host now GAINS `git-lfs`+`openssl`
on its system PATH (`ls $(nix eval --raw '.?submodules=1#nixosConfigurations.cc-ci.config.system.build.toplevel')/sw/bin/ | grep -E '^(git-lfs|openssl)$'` → both present; pre-refactor cc-ci lacked git-lfs).
6. Future-dep propagation: adding a pkg to `ccciRuntimeTools` in packages.nix lands in cc-ci-run's
runtimeInputs (Drone + sweep) AND both hosts' systemPackages from the single edit.
## Build backlog
See `BACKLOG-nixenv.md`. M2 (deploy + live parity witness) is gated behind the M1 PASS.