Files
cc-ci/machine-docs/REVIEW-nixenv.md

10 KiB
Raw Blame History

REVIEW — phase nixenv (Adversary)

Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md SSOT for verification. Verdicts below; cold-runs only.

Status: M1 PASS @ 17:40Z (8b8fc1f) + M2 PASS @ 18:20Z (f7b6f26). Both milestones fresh Adversary PASS, no VETO → Builder cleared to write ## DONE.


M2 — PASS @ 2026-06-17T18:20Z — claim f7b6f26 (deployed /etc/cc-ci@d11f8f5 = M1-reviewed tree)

Deploy + live parity proven — cold-verified. Verdict from the plan (SSOT), the code, the claim's verification info, and my OWN live re-runs (Drone API, journald, host probes). JOURNAL-nixenv.md NOT read before this verdict (anti-anchoring preserved).

(1) Deploy clean + host healthy (re-verified live post-sweep @18:1618:18Z).

  • Deployed system dhmpm232r6m0sq3s7y5r5jpyv5kxgzwi-nixos-system-… BYTE-IDENTICAL to my M1 build.
  • systemctl --failed EMPTY; nightly-sweep.timer active+enabled; drone-runner-exec / deploy-proxy / warm-keycloak / swarm-init all active; nightly-sweep.service finished Result=success ExecMainStatus=0. drone /healthz→200, ci.commoninternet.net→200.
  • Live cc-ci-run = zxlx9jnylh7la5m48bsqb1wfm5l9r0bd (M1-reviewed path). git-lfs/openssl/script/bash resolve on host PATH AND inside cc-ci-run (git-lfs→33ikv…-git-lfs-3.6.1, openssl→48p8b…-openssl-3.3.3 from runtimeInputs, NOT host PATH). openssl was MISSING on this host pre-deploy.
  • NO orphan ephemeral test stacks left by the sweep (no gite-/matt-/disc- per-run stacks); only the expected warm canonicals (bluesky-pds, gitea, keycloak) remain — clean teardown.

(2) Live LFS parity — GREEN on BOTH paths (the DEFECT-3 witness).

  • Real timer fire: systemctl start nightly-sweep.service @17:35:38Z; gitea RUN-eligible (canonical 3.5.3 < tag 3.6.0) → tests/gitea/custom/test_lfs_roundtrip.py::test_lfs_roundtrip PASSED @17:57:54Z (+ install/upgrade/backup/restore all PASS). The systemd unit PATH carries NO git-lfs and NO /run/current-system/sw/bin, so git-lfs MUST have resolved from cc-ci-run's runtimeInputs — exactly the old DEFECT-3 condition, now satisfied by the shared env.
  • Drone path: independently inspected build #871 via Drone API (status=success): stage recipe-ci → step ci runs cc-ci-run runner/run_recipe_ci.py (.drone.yml:83). Log shows LFS RAN not skipped: test_lfs_roundtrip PASSED; RUN SUMMARY install/upgrade/backup/restore/custom all pass, level=5 of 5.
  • Both paths exec the SAME zxlx9jn cc-ci-run ⇒ git-lfs resolves identically. DEFECT-3 class structurally eliminated, demonstrated live.

(3) No regression — sweep SKIPs/promotes correct; the 3 non-green results ALL pre-existing.

  • Regression canary: scanned the ENTIRE post-deploy sweep journal for missing-tool signatures (command not found / not found / executable file not found / No such file) → ZERO. Nothing got dropped from the env (consistent with the M1 superset proof). No recipe went GREEN→RED.
  • SKIPs all correct (cryptpad/ghost/drone/hedgedoc/immich/lasuite-*/mailu/matrix-synapse/n8n/ plausible/uptime-kuma — no-new-version); promotes correct (custom-html, mumble).
  • gitea GREEN-BUT-PROMOTE-FAILED: tests green; WC5 promote abra app deploy warm-gitea… -o -n fails FATA … is already deployed — abra idempotency on the persistent warm canonical (warm-gitea confirmed still up). canonical.json unchanged (3.5.3, ts 08:39Z). Promote path = nightly_sweep.py @canon f94de22, UNCHANGED by nixenv (diff dd6712c..d11f8f5 is nix/+machine-docs only, zero runner/tests) → behaviour identical to canon by construction.
  • discourse rc=1 / mattermost-lts rc=1: recipe-level reds, env-independent — discourse test_head_runs_official_image_not_bitnamilegacy + test_sidekiq_service_dropped_by_head (HEAD-image/service assertions); mattermost test_restore_returns_statedocker exec … postgres … relation "ci_marker" does not exist (docker RESOLVED and ran — a restore-data failure, not a missing tool). Corroborated pre-existing: the SAME reds occur in BOTH OLD-env pre-deploy fires today (PID 2149231@14:xx, PID 2248547@15:xx) — mattermost byte-identical postgres error; discourse red in all fires (never green). Not caused by the env change.

No defects, no VETO. M2 DoD fully met live. The harness runtime env is single-sourced and proven identical across the Drone runner, the timer sweep, and host systemPackages, with git-lfs/openssl now guaranteed from one declaration — the DEFECT-3 divergence class is structurally impossible.

M1 + M2 fresh Adversary PASS → DONE is cleared. (Consulted JOURNAL-nixenv.md? No — verdict stands on plan + code + my own live re-runs.)


M1 — PASS @ 2026-06-17T17:40Z — claim 8b8fc1f

Single-source harness runtime env — cold-verified, all 6 DoD items. Verdict formed from the phase plan (SSOT), the code, and my OWN cold builds/evals — JOURNAL-nixenv.md NOT consulted (anti-anchoring preserved).

  1. Builds succeed, both hosts (no collision). nix build .?submodules=1#…cc-ci-hetzner…toplevel → EXIT 0; …#…cc-ci…toplevel → EXIT 0. (A transient SQLite eval-cache "busy" from running both in parallel was error (ignored), not a build failure.)
  2. Single source (greps). withPackages → 1 hit (packages.nix:17 ccciPyEnv); pytest playwright → 1 hit (same line); ccciRuntimeTools defined once (packages.nix:45), referenced by cc-ci-run (:68) + both host configs. nightly-sweep.nix has NO withPackages, NO python3, NO /run/current-system/sw/bin PATH prepend — runtimeInputs = [ pkgs.cc-ci-run ] and exec cc-ci-run …. The DEFECT-3 host-PATH patch is GONE.
  3. Superset-or-equal — inspected the BUILT wrapper PATH. cc-ci-run store zxlx9jnylh7la5m48bsqb1wfm5l9r0bd export PATH carries all 15 store dirs: python3-3.12.8-env, abra-0.13.0-beta, docker-27.5.1, git-2.47.2, git-lfs-3.6.1, bash-5.2p37, coreutils-9.5, util-linux-2.39.4, curl-8.12.1, jq-1.7.1, gnused-4.9, gnugrep-3.11, gnutar-1.35, openssl-3.3.3, procps-4.0.4 — and ends :$PATH (PREPEND, inherited PATH retained → nothing from any path lost). Covers the full union of all 3 prior lists; git-lfs+openssl are the only additions. Nothing dropped.
  4. Sweep ≡ Drone entrypoint (parity by construction). Built cc-ci-nightly-sweep references the BYTE-IDENTICAL zxlx9jnylh7la5m48bsqb1wfm5l9r0bd-cc-ci-run; both hosts' pkgs.cc-ci-run resolve that SAME store path; .drone.yml:83 runs cc-ci-run runner/run_recipe_ci.py (host systemPackages wrapper = same path). Same store path ⇒ identical pyEnv + tooling + PLAYWRIGHT_BROWSERS_PATH on Drone path AND timer sweep.
  5. Host divergence removed. Both configuration.nix systemPackages lines are textually identical (pkgs.ccciRuntimeTools ++ [ pkgs.openssh ]). The pre-refactor cc-ci-vs-hetzner git-lfs one-off divergence (my prep flag #1) is ELIMINATED: built cc-ci toplevel sw/bin now contains git-lfs, openssl, script (util-linux) — tools it previously lacked. openssh correctly kept host-only (ssh client, not a recipe tool); it remains on both hosts so the Drone path's inherited PATH is unchanged for it.
  6. Future-dep propagation (by construction). ccciRuntimeTools is the lone definition; it feeds cc-ci-run.runtimeInputs (→ Drone path via .drone.yml, → sweep via exec cc-ci-run) AND both hosts' systemPackages (→ Drone runner host PATH). One edit to that list reaches every consumer. Proven structurally via the reference graph; no working-tree mutation needed.

No defects, no VETO. Faithful refactor — one shared definition, three references, DEFECT-3 class structurally eliminated. M2 (deploy via nixos-rebuild switch + live parity witness: gitea LFS roundtrip green under BOTH Drone path and a real timer fire) remains to be claimed/verified.


(prior) Cold-prep notes


Cold-prep — enumeration of the CURRENT (pre-refactor) declarations @ HEAD dd6712c

The M1 superset-or-equal proof must show the new shared set ⊇ the union of all of these. Captured from the code (SSOT), independent of any Builder narrative:

(A) nix/modules/harness.nixcc-ci-run (Drone entrypoint) runtimeInputs: pyEnv abra docker git coreutils util-linux

  • pyEnv = python3.withPackages [ pytest playwright ]
  • env: PLAYWRIGHT_BROWSERS_PATH=${playwright-driver.browsers}, PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1

(B) nix/modules/nightly-sweep.nix — sweep runtimeInputs: bash abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps

  • DUPLICATE pyEnv = python3.withPackages [ pytest playwright ]
  • same PLAYWRIGHT env
  • DEFECT-3 patch: export PATH="/run/current-system/sw/bin:/run/wrappers/bin:$PATH" (host-PATH prepend)

(C) Drone runner path — nix/modules/drone-runner.nix: PATH = mkForce "/run/current-system/sw/bin:/run/wrappers/bin" → recipe shell-outs resolve from host environment.systemPackages, NOT a runtimeInputs list.

(D) Host systemPackages (feeds C):

  • nix/hosts/cc-ci/configuration.nix: curl git jq opensshNO git-lfs
  • nix/hosts/cc-ci-hetzner/configuration.nix: curl git git-lfs jq openssh

UNION the shared set must cover (≥):

python3+pytest+playwright (pyEnv) · playwright browsers · abra docker git git-lfs coreutils util-linux bash curl jq gnused gnugrep gnutar procps openssh Plan §2 also names openssl as a recipe shell-out → expect it present too.

Pre-noted suspicions to break on M1/M2 (cold, not yet verdicts):

  1. Host divergence: cc-ci config lacks git-lfs but hetzner has it. Which config is the LIVE ssh cc-ci server running, and does git-lfs actually resolve there today? If the shared set is applied to both host configs, cc-ci should GAIN git-lfs. Verify both configs end identical.
  2. Nothing dropped: any token in the union missing from the shared set = blast-radius break.
  3. Sweep parity by construction: plan wants sweep to invoke cc-ci-run (same entrypoint) — if it instead keeps a parallel list, "single source" is not actually achieved; grep must prove no module declares its own harness dep list.
  4. DEFECT-3 patch removal: the host-PATH prepend should be gone/subsumed; if removed, git-lfs etc. must now come from the shared runtimeInputs, else the sweep regresses.
  5. Live witness: gitea test_lfs_roundtrip must stay GREEN under BOTH Drone path and a real timer fire from the unified env.