10 KiB
STATUS — phase nixenv (Builder)
Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md
Phase
Single-source the harness/recipe-test runtime env so the Drone runner, the nightly/weekly sweep
timer, and host systemPackages share ONE declaration (no duplicate pyEnv, no divergent
runtimeInputs, DEFECT-3 host-PATH patch removed/subsumed).
M1 — PASS @ 2026-06-17T17:40Z (REVIEW-nixenv.md, claim 8b8fc1f). No VETO.
Gate: M2 — CLAIMED @ 2026-06-17T18:17Z, awaiting Adversary (claim commit below)
WHAT (M2 DoD). (1) Deployed via nixos-rebuild switch, host verified healthy. (2) Live parity:
gitea test_lfs_roundtrip GREEN under BOTH a real timer fire AND the Drone path, from the shared
env (git-lfs resolves on both — DEFECT-3 condition met live). (3) A canon-style sweep still
promotes/SKIPs correctly under the unified env — no regression to canon's result.
WHERE (inputs). Deployed system from /etc/cc-ci @ d11f8f5 (= M1-reviewed tree). nixenv diff
dd6712c..d11f8f5 = nix/ modules + machine-docs ONLY; zero runner//tests/ changes (verify:
git diff --name-only dd6712c..d11f8f5 | grep -E 'runner/|tests/' → empty). runner/nightly_sweep.py
(the promote path) last touched by canon commit f94de22 — byte-identical to canon.
M2 result summary (both witnesses PASS, host healthy, no regression)
- (2a) Drone-path witness — PASS. Drone build #871 (event=custom, RECIPE=gitea REF=357926f2
PR=1 SRC=recipe-maintainers/gitea), status=success, 18:11→18:14Z. The Drone exec pipeline runs
cc-ci-run runner/run_recipe_ci.py(.drone.yml:83). compose.lfs.yml present at that ref →_lfs_enabled()true → LFS test RAN (not skipped):tests/gitea/custom/test_lfs_roundtrip.py:: test_lfs_roundtrip PASSED; all install/upgrade/backup/restore/custom tiers PASSED.- HOW (Adversary re-run):
ssh cc-ci 'TOK=$(cat /run/secrets/bridge_drone_token); curl -s -H "Authorization: Bearer $TOK" https://drone.ci.commoninternet.net/api/repos/recipe-maintainers/cc-ci/builds/871/logs/1/2 | jq -r ".[].out"' | grep test_lfs_roundtrip. EXPECTED:test_lfs_roundtrip PASSED. (Or trigger your OWN build with the same params and re-run.)
- HOW (Adversary re-run):
- (2b) Real timer fire witness — PASS (details retained in the block below):
test_lfs_roundtrip PASSED@17:57:54Z undersystemctl start nightly-sweep.service, git-lfs resolved from cc-ci-run's runtimeInputs while the systemd unit PATH has NO git-lfs / no /run/current-system/sw/bin. - (3) No regression. Sweep (PID 2743890, 17:35→18:0xZ) completed all 20 enrolled recipes; SKIPs
all correct (cryptpad/ghost/drone/hedgedoc/immich/lasuite-*/mailu/matrix-synapse/n8n/plausible/
uptime-kuma no-new-version SKIP), promotes correct (custom-html→1.13.0+1.31.1, mumble→1.0.0+v1.6.870-0).
Three results need explicit non-regression context, ALL pre-existing (identical in the pre-deploy
fires PID 2149231@14:xx / 2248547@15:xx, OLD env):
- gitea
rc=0 GREEN-BUT-PROMOTE-FAILED— tests green; WC5 promote failsFATA warm-gitea… is already deployed(abra deploy-idempotency on the persistent warm canonical, up since 08:39Z; non-fatal). promote path = canonnightly_sweep.pyf94de22, unchanged by nixenv. - discourse
rc=1and mattermost-ltsrc=1— recipe-level red (mattermost:test_restore_returns_state→docker exec … postgres … relation "ci_marker" does not exist; docker resolved fine → NOT a missing-tool/dropped-dep failure). Both failed identically pre-deploy → not caused by the env change.
- gitea
- Host health (re-verified post-sweep @18:16Z).
systemctl --failedempty;nightly-sweep.timer- deploy-proxy/deploy-drone/deploy-bridge/drone-runner-exec/swarm-init/warm-keycloak all active;
drone
/healthz200, ci.commoninternet.net 200; livecc-ci-run=zxlx9jnylh7la5m48bsqb1wfm5l9r0bd(M1-reviewed path).
- deploy-proxy/deploy-drone/deploy-bridge/drone-runner-exec/swarm-init/warm-keycloak all active;
drone
M2 deploy + timer-fire details (retained for the record)
Deploy DONE @ 2026-06-17T17:34Z. nixos-rebuild switch --flake 'git+file:///etc/cc-ci?submodules=1#cc-ci-hetzner'
(live host = hetzner; /etc/cc-ci @ d11f8f5). Deployed system /nix/store/dhmpm232r6m0sq3s7y5r5jpyv5kxgzwi-nixos-system-…
is BYTE-IDENTICAL to the M1-reviewed local build. Health: systemctl --failed empty; deploy-proxy /
warm-keycloak / swarm-init / drone-runner-exec all active; nightly-sweep.timer active;
drone healthz + ci.commoninternet.net → 200. Live cc-ci-run = zxlx9jnylh7la5m48bsqb1wfm5l9r0bd
(the M1-reviewed path); git-lfs/openssl/script/bash resolve on host PATH (openssl was MISSING pre-deploy).
Live parity witness — BOTH paths GREEN (Drone #871 + timer fire; summarised above). Diff scope: ONLY nix/ changed (dd6712c..d11f8f5: 5 nix files, zero runner/tests) → sweep SKIP/promote logic byte-identical to canon's PASSed sweep.
- Real timer fire — PASS @ 2026-06-17T17:57:54Z.
systemctl start nightly-sweep.service@ 17:35:38Z (PID 2743890; child run_recipe_ci PID 2808444). The unit's systemd PATH contains ONLY coreutils/findutils/gnugrep/gnused/systemd — NOT git-lfs, NOT /run/current-system/sw/bin — so git-lfs resolved from cc-ci-run's runtimeInputs (the DEFECT-3 condition). Verified live: the running run_recipe_ci process PATH (/proc/<pid>/environ) carries…-git-lfs-3.6.1/binfrom cc-ci-run. gitea RUN (canonical 3.5.3+1.24.2 < tag 3.6.0+1.24.2) exercised LFS (upgrade-env COMPOSE_FILE includes compose.lfs.yml) →tests/gitea/custom/test_lfs_roundtrip.py::test_lfs_roundtrip PASSED(18.66s); all other gitea tiers PASSED.- HOW (Adversary re-run):
ssh cc-ci 'journalctl -u nightly-sweep.service -o short-iso --since "2026-06-17 17:55:57" --until "2026-06-17 17:58:07"' | grep -iE "lfs_roundtrip|PASSED|rc=". EXPECTED:test_lfs_roundtrip PASSEDthensweep: gitea rc=0. - NOTE (not a regression): the sweep line reads
rc=0 GREEN-BUT-PROMOTE-FAILED— all TESTS green; the WC5 promote (abra app deploy warm-gitea… -o -n) fails withFATA warm-gitea… is already deployed. This is an abra deploy-idempotency quirk on the warm canonical (already running, volume retained), NON-FATAL (known-good unchanged), and it occurred IDENTICALLY in the pre-deploy runs (PID 2149231 @ 14:28Z, PID 2248547 @ 15:56Z) — orthogonal to the runtime-env refactor (abra is on PATH unchanged in both). SKIPs in this fire are all correct (cryptpad/ghost/drone/hedgedoc/immich no-new-version SKIP; custom-html RUN→promoted 1.13.0+1.31.1).
- HOW (Adversary re-run):
- Drone-path gitea witness: DONE — build #871 PASS (see "(2a)" above).
(prior M1 claim block retained below for the record)
M1 details — PASS
WHAT (M1 DoD). The harness/recipe-test runtime env is declared ONCE and referenced by all
consumers; nixos-rebuild build succeeds for both hosts; the shared set is superset-or-equal of
every prior list (nothing dropped); the sweep and the Drone runner resolve the same tooling; a
future dep added to the shared set reaches all consumers.
WHERE (inputs). All changes at the tip of main (commit pushed with this claim).
- Single source:
nix/modules/packages.nix— overlay definesccciPyEnv(let),ccciRuntimeTools(overlay attr),cc-ci-run(overlay attr,runtimeInputs = [ccciPyEnv] ++ ccciRuntimeTools). - Consumers:
nix/modules/harness.nix(systemPackages = [ pkgs.cc-ci-run ]),nix/modules/nightly-sweep.nix(wrapper execscc-ci-run),nix/hosts/cc-ci/configuration.nix+nix/hosts/cc-ci-hetzner/configuration.nix(systemPackages = pkgs.ccciRuntimeTools ++ [ pkgs.openssh ]). nix/modules/drone-runner.nixunchanged (stillPATH=/run/current-system/sw/bin:/run/wrappers/bin; it consumes the host PATH, which now references the shared set).
HOW + EXPECTED (cold-verifiable; secrets/ is a git submodule → use ?submodules=1 for a dirty
tree, or build from a git clone --recursive).
-
Builds succeed (both hosts):
nixos-rebuild build --flake '.?submodules=1#cc-ci-hetzner'→ buildsnixos-system-nixos-24.11.…(locally:/nix/store/dhmpm232r6m0sq3s7y5r5jpyv5kxgzwi-nixos-system-nixos-24.11.20250630.50ab793; store hash may differ on a fresh clone if paths differ, but it MUST build with no collision error).nixos-rebuild build --flake '.?submodules=1#cc-ci'→ builds OK (no collision error).
-
Single source (grep proofs):
grep -rn withPackages nix/→ EXACTLY 1 hit:nix/modules/packages.nix(ccciPyEnv).grep -rn "pytest playwright" nix/→ EXACTLY 1 hit: same line. (No duplicate pyEnv.)grep -rn ccciRuntimeTools nix/→ defined once (packages.nix), referenced by both host configs.nightly-sweep.nixcontains NOwithPackages, NOpython3, NO/run/current-system/sw/binPATH prepend, and itsruntimeInputs = [ pkgs.cc-ci-run ]only; itexec cc-ci-run ….
-
Superset-or-equal —
cc-ci-runcarries every tool (inspect the built wrapper's PATH):CCRUN=$(nix eval --raw '.?submodules=1#nixosConfigurations.cc-ci-hetzner.pkgs.cc-ci-run'); grep '^export PATH' "$CCRUN/bin/cc-ci-run"- EXPECTED store dirs on PATH (15): python3-3.12.8-env, abra-0.13.0-beta, docker-27.5.1, git-2.47.2, git-lfs-3.6.1, bash-5.2p37, coreutils-9.5, util-linux-2.39.4, curl-8.12.1, jq-1.7.1, gnused-4.9, gnugrep-3.11, gnutar-1.35, openssl-3.3.3, procps-4.0.4.
- git-lfs + openssl are the additions vs prior lists; nothing from any prior list is dropped.
-
Sweep ≡ Drone entrypoint (parity by construction):
- The built
cc-ci-nightly-sweepwrapperexec cc-ci-run …resolves the BYTE-IDENTICAL cc-ci-run store path that the.drone.ymlcc-ci-run runner/run_recipe_ci.pystep runs (locally/nix/store/zxlx9jnylh7la5m48bsqb1wfm5l9r0bd-cc-ci-run). Same store path ⇒ same pyEnv, same tooling, same PLAYWRIGHT_BROWSERS_PATH.
- The built
-
Host divergence removed:
- Both host
configuration.nixsystemPackageslines are textually identical (pkgs.ccciRuntimeTools ++ [ pkgs.openssh ]). Thecc-cihost now GAINSgit-lfs+opensslon its system PATH (ls $(nix eval --raw '.?submodules=1#nixosConfigurations.cc-ci.config.system.build.toplevel')/sw/bin/ | grep -E '^(git-lfs|openssl)$'→ both present; pre-refactor cc-ci lacked git-lfs).
- Both host
-
Future-dep propagation: adding a pkg to
ccciRuntimeToolsin packages.nix lands in cc-ci-run's runtimeInputs (Drone + sweep) AND both hosts' systemPackages from the single edit.
Build backlog
See BACKLOG-nixenv.md. M2 (deploy + live parity witness) is gated behind the M1 PASS.