Files
cc-ci/machine-docs/REVIEW-nixenv.md

157 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW — phase `nixenv` (Adversary)
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase-nixenv-shared-runtime-env.md`
SSOT for verification. Verdicts below; cold-runs only.
Status: **M1 PASS** @ 17:40Z (`8b8fc1f`) + **M2 PASS** @ 18:20Z (`f7b6f26`). Both milestones fresh
Adversary PASS, no VETO → Builder cleared to write `## DONE`.
---
## M2 — PASS @ 2026-06-17T18:20Z — claim `f7b6f26` (deployed `/etc/cc-ci`@d11f8f5 = M1-reviewed tree)
**Deploy + live parity proven — cold-verified.** Verdict from the plan (SSOT), the code, the claim's
verification info, and my OWN live re-runs (Drone API, journald, host probes). JOURNAL-nixenv.md NOT
read before this verdict (anti-anchoring preserved).
**(1) Deploy clean + host healthy (re-verified live post-sweep @18:1618:18Z).**
- Deployed system `dhmpm232r6m0sq3s7y5r5jpyv5kxgzwi-nixos-system-…` BYTE-IDENTICAL to my M1 build.
- `systemctl --failed` EMPTY; `nightly-sweep.timer` active+enabled; drone-runner-exec / deploy-proxy /
warm-keycloak / swarm-init all active; `nightly-sweep.service` finished Result=success
ExecMainStatus=0. drone `/healthz`→200, `ci.commoninternet.net`→200.
- Live `cc-ci-run` = `zxlx9jnylh7la5m48bsqb1wfm5l9r0bd` (M1-reviewed path). git-lfs/openssl/script/bash
resolve on host PATH AND inside cc-ci-run (git-lfs→`33ikv…-git-lfs-3.6.1`, openssl→`48p8b…-openssl-3.3.3`
from runtimeInputs, NOT host PATH). openssl was MISSING on this host pre-deploy.
- NO orphan ephemeral test stacks left by the sweep (no `gite-/matt-/disc-` per-run stacks); only the
expected warm canonicals (bluesky-pds, gitea, keycloak) remain — clean teardown.
**(2) Live LFS parity — GREEN on BOTH paths (the DEFECT-3 witness).**
- **Real timer fire:** `systemctl start nightly-sweep.service` @17:35:38Z; gitea RUN-eligible
(canonical 3.5.3 < tag 3.6.0) `tests/gitea/custom/test_lfs_roundtrip.py::test_lfs_roundtrip
PASSED` @17:57:54Z (+ install/upgrade/backup/restore all PASS). The systemd unit PATH carries NO
git-lfs and NO /run/current-system/sw/bin, so git-lfs MUST have resolved from cc-ci-run's
runtimeInputs exactly the old DEFECT-3 condition, now satisfied by the shared env.
- **Drone path:** independently inspected build **#871** via Drone API (status=success): stage
recipe-ci step `ci` runs `cc-ci-run runner/run_recipe_ci.py` (`.drone.yml:83`). Log shows LFS
RAN not skipped: `test_lfs_roundtrip PASSED`; RUN SUMMARY install/upgrade/backup/restore/custom all
pass, level=5 of 5.
- Both paths exec the SAME `zxlx9jn` cc-ci-run git-lfs resolves identically. DEFECT-3 class
structurally eliminated, demonstrated live.
**(3) No regression sweep SKIPs/promotes correct; the 3 non-green results ALL pre-existing.**
- **Regression canary:** scanned the ENTIRE post-deploy sweep journal for missing-tool signatures
(`command not found` / `not found` / `executable file not found` / `No such file`) **ZERO**.
Nothing got dropped from the env (consistent with the M1 superset proof). No recipe went GREENRED.
- SKIPs all correct (cryptpad/ghost/drone/hedgedoc/immich/lasuite-*/mailu/matrix-synapse/n8n/
plausible/uptime-kuma no-new-version); promotes correct (custom-html, mumble).
- **gitea GREEN-BUT-PROMOTE-FAILED**: tests green; WC5 promote `abra app deploy warm-gitea… -o -n`
fails `FATA … is already deployed` abra idempotency on the persistent warm canonical (warm-gitea
confirmed still up). canonical.json unchanged (3.5.3, ts 08:39Z). Promote path = `nightly_sweep.py`
@canon f94de22, UNCHANGED by nixenv (diff dd6712c..d11f8f5 is nix/+machine-docs only, zero
runner/tests) behaviour identical to canon by construction.
- **discourse rc=1 / mattermost-lts rc=1**: recipe-level reds, env-independent
discourse `test_head_runs_official_image_not_bitnamilegacy` + `test_sidekiq_service_dropped_by_head`
(HEAD-image/service assertions); mattermost `test_restore_returns_state` `docker exec … postgres …
relation "ci_marker" does not exist` (docker RESOLVED and ran a restore-data failure, not a
missing tool). **Corroborated pre-existing:** the SAME reds occur in BOTH OLD-env pre-deploy fires
today (PID 2149231@14:xx, PID 2248547@15:xx) mattermost byte-identical postgres error; discourse
red in all fires (never green). Not caused by the env change.
**No defects, no VETO.** M2 DoD fully met live. The harness runtime env is single-sourced and proven
identical across the Drone runner, the timer sweep, and host systemPackages, with git-lfs/openssl now
guaranteed from one declaration the DEFECT-3 divergence class is structurally impossible.
**M1 + M2 fresh Adversary PASS → DONE is cleared.** (Consulted JOURNAL-nixenv.md? No verdict stands
on plan + code + my own live re-runs.)
---
## M1 — PASS @ 2026-06-17T17:40Z — claim `8b8fc1f`
**Single-source harness runtime env — cold-verified, all 6 DoD items.** Verdict formed from the
phase plan (SSOT), the code, and my OWN cold builds/evals JOURNAL-nixenv.md NOT consulted
(anti-anchoring preserved).
1. **Builds succeed, both hosts (no collision).** `nix build .?submodules=1#…cc-ci-hetzner…toplevel`
EXIT 0; `…#…cc-ci…toplevel` EXIT 0. (A transient SQLite eval-cache "busy" from running both
in parallel was `error (ignored)`, not a build failure.)
2. **Single source (greps).** `withPackages` 1 hit (`packages.nix:17` `ccciPyEnv`); `pytest
playwright` → 1 hit (same line); `ccciRuntimeTools` defined once (`packages.nix:45`), referenced
by `cc-ci-run` (`:68`) + both host configs. `nightly-sweep.nix` has NO `withPackages`, NO
`python3`, NO `/run/current-system/sw/bin` PATH prepend — `runtimeInputs = [ pkgs.cc-ci-run ]`
and `exec cc-ci-run `. The DEFECT-3 host-PATH patch is GONE.
3. **Superset-or-equal — inspected the BUILT wrapper PATH.** `cc-ci-run` store
`zxlx9jnylh7la5m48bsqb1wfm5l9r0bd` `export PATH` carries all 15 store dirs:
python3-3.12.8-env, abra-0.13.0-beta, docker-27.5.1, git-2.47.2, **git-lfs-3.6.1**, bash-5.2p37,
coreutils-9.5, util-linux-2.39.4, curl-8.12.1, jq-1.7.1, gnused-4.9, gnugrep-3.11, gnutar-1.35,
**openssl-3.3.3**, procps-4.0.4 — and ends `:$PATH` (PREPEND, inherited PATH retained → nothing
from any path lost). Covers the full union of all 3 prior lists; `git-lfs`+`openssl` are the only
additions. Nothing dropped.
4. **Sweep ≡ Drone entrypoint (parity by construction).** Built `cc-ci-nightly-sweep` references the
BYTE-IDENTICAL `zxlx9jnylh7la5m48bsqb1wfm5l9r0bd-cc-ci-run`; both hosts'
`pkgs.cc-ci-run` resolve that SAME store path; `.drone.yml:83` runs `cc-ci-run
runner/run_recipe_ci.py` (host systemPackages wrapper = same path). Same store path ⇒ identical
pyEnv + tooling + PLAYWRIGHT_BROWSERS_PATH on Drone path AND timer sweep.
5. **Host divergence removed.** Both `configuration.nix` systemPackages lines are textually identical
(`pkgs.ccciRuntimeTools ++ [ pkgs.openssh ]`). The pre-refactor `cc-ci`-vs-`hetzner` `git-lfs`
one-off divergence (my prep flag #1) is ELIMINATED: built `cc-ci` toplevel `sw/bin` now contains
`git-lfs`, `openssl`, `script` (util-linux) — tools it previously lacked. `openssh` correctly kept
host-only (ssh client, not a recipe tool); it remains on both hosts so the Drone path's inherited
PATH is unchanged for it.
6. **Future-dep propagation (by construction).** `ccciRuntimeTools` is the lone definition; it feeds
`cc-ci-run.runtimeInputs` (→ Drone path via `.drone.yml`, → sweep via `exec cc-ci-run`) AND both
hosts' `systemPackages` (→ Drone runner host PATH). One edit to that list reaches every consumer.
Proven structurally via the reference graph; no working-tree mutation needed.
**No defects, no VETO.** Faithful refactor — one shared definition, three references, DEFECT-3 class
structurally eliminated. M2 (deploy via `nixos-rebuild switch` + live parity witness: gitea LFS
roundtrip green under BOTH Drone path and a real timer fire) remains to be claimed/verified.
---
## (prior) Cold-prep notes
---
## Cold-prep — enumeration of the CURRENT (pre-refactor) declarations @ HEAD dd6712c
The M1 superset-or-equal proof must show the new shared set ⊇ the union of all of these. Captured
from the code (SSOT), independent of any Builder narrative:
**(A) `nix/modules/harness.nix` — `cc-ci-run` (Drone entrypoint) `runtimeInputs`:**
`pyEnv abra docker git coreutils util-linux`
- `pyEnv = python3.withPackages [ pytest playwright ]`
- env: `PLAYWRIGHT_BROWSERS_PATH=${playwright-driver.browsers}`, `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1`
**(B) `nix/modules/nightly-sweep.nix` — sweep `runtimeInputs`:**
`bash abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps`
- DUPLICATE `pyEnv = python3.withPackages [ pytest playwright ]`
- same PLAYWRIGHT env
- DEFECT-3 patch: `export PATH="/run/current-system/sw/bin:/run/wrappers/bin:$PATH"` (host-PATH prepend)
**(C) Drone runner path — `nix/modules/drone-runner.nix`:**
`PATH = mkForce "/run/current-system/sw/bin:/run/wrappers/bin"` → recipe shell-outs resolve from
**host `environment.systemPackages`**, NOT a runtimeInputs list.
**(D) Host `systemPackages` (feeds C):**
- `nix/hosts/cc-ci/configuration.nix`: `curl git jq openssh` ← **NO git-lfs**
- `nix/hosts/cc-ci-hetzner/configuration.nix`: `curl git git-lfs jq openssh`
### UNION the shared set must cover (≥):
`python3+pytest+playwright` (pyEnv) · playwright browsers · `abra docker git git-lfs coreutils
util-linux bash curl jq gnused gnugrep gnutar procps openssh`
Plan §2 also names `openssl` as a recipe shell-out → expect it present too.
### Pre-noted suspicions to break on M1/M2 (cold, not yet verdicts):
1. **Host divergence**: `cc-ci` config lacks `git-lfs` but `hetzner` has it. Which config is the
LIVE `ssh cc-ci` server running, and does `git-lfs` actually resolve there today? If the shared
set is applied to both host configs, cc-ci should GAIN git-lfs. Verify both configs end identical.
2. **Nothing dropped**: any token in the union missing from the shared set = blast-radius break.
3. **Sweep parity by construction**: plan wants sweep to invoke `cc-ci-run` (same entrypoint) — if
it instead keeps a parallel list, "single source" is not actually achieved; grep must prove no
module declares its own harness dep list.
4. **DEFECT-3 patch removal**: the host-PATH prepend should be gone/subsumed; if removed, git-lfs
etc. must now come from the shared runtimeInputs, else the sweep regresses.
5. **Live witness**: gitea `test_lfs_roundtrip` must stay GREEN under BOTH Drone path and a real
timer fire from the unified env.