Files
cc-ci/machine-docs/BACKLOG-2pc.md

50 lines
3.7 KiB
Markdown

# BACKLOG — Phase 2pc (sane image-prune policy)
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase2pc-image-cache.md`.
Scope (post operator correction 2026-05-29): **PC1 prune policy + confirm local-store
retention/auth ONLY.** The registry:2 pull-through cache is **dropped** (deferred to IDEAS /
Phase 2b — revisit only if multi-node OR a measured cold-deploy bottleneck on recreate-surviving
storage).
## Build backlog
- [ ] **PC1 — Conservative prune policy.** Remove `virtualisation.docker.autoPrune` (`--all` evicts
in-use base images → forced cold re-pull → rate-limit). Replace with a surgical, gated prune:
dangling + `until=24h` only, NEVER `--all`/`--volumes`; gated on (a) genuine disk pressure
(`/` ≥ 80%), (b) no run-app stack live, (c) no swarm service converging (mid-pull). Teardown
already removes only services/volumes/secrets/.env — NOT images (verified) — keep it that way.
- [ ] **PC2 — Confirm local cache retained + authenticated.** Daemon stays PAT-authenticated
(`docker info` Username=nptest2, sops `dockerhub_auth``/root/.docker/config.json`); local
image store `/var/lib/docker` persists across runs/teardowns/reboots. No code change expected —
confirm + document.
- [ ] **PC3 — Verify + document.** Deploy → teardown → redeploy reuses local layers (no
re-download); disk bounded without `-af`. Update `docs/runbook.md` + `docs/` prune note;
record the policy + the dropped-registry-cache deviation in `DECISIONS.md`.
## Adversary findings
- [x] **F2pc-1 [adversary] CLOSED @2026-05-29 (re-verified, re-claim 9e73ebd).** Builder renamed
committed units `docker-prune``ci-docker-prune` (b9bbd25; NixOS reserves `docker-prune`).
Re-verified: `git show HEAD:nix/modules/{docker-prune,swarm}.nix` byte-identical to host
`/root/cc-ci`; committed units = `ci-docker-prune.*` = live (enabled+active); old
`docker-prune.timer` not-found. git now reproduces the verified system → CLOSED by Adversary.
- [x] ~~**F2pc-1 [adversary] BLOCKING — committed code ≠ deployed/"verified" host (gate 2pc, claim de6103d).**~~
The verified prune behavior is correct, but git does not reproduce the verified system.
- **Observed.** origin/main HEAD `de6103d` `nix/modules/docker-prune.nix:56,67` defines
`systemd.services.docker-prune` / `systemd.timers.docker-prune`. The live host runs
`ci-docker-prune.service`/`.timer` (enabled+active), built from **uncommitted** source in
`/root/cc-ci` (not a git repo; its module names units `ci-docker-prune`). STATUS-2pc's
verify commands also use `ci-docker-prune.timer`.
- **Repro.** `cd /srv/cc-ci/cc-ci-adv && grep -nE 'systemd\.(services|timers)\.' nix/modules/docker-prune.nix`
`docker-prune`. `ssh cc-ci 'systemctl is-active ci-docker-prune.timer; systemctl is-enabled docker-prune.timer'`
`active` / `not-found`. So a from-git rebuild creates `docker-prune.*` (≠ verified
`ci-docker-prune.*`); a verifier following STATUS against a git-built host gets false FAIL.
- **Impact.** D8/fresh-rebuild contract: the "deployed+verified" artifact was never
committed. Functionally equivalent (same `cc-ci-docker-prune` script body), so this is a
reproducibility/integrity defect, not behavioral.
- **To clear (Builder).** Make git == host: commit the deployed `ci-docker-prune` naming
(push `/root/cc-ci`'s module), OR rename module units to `docker-prune` + `nixos-rebuild
switch` + fix STATUS verify cmds. Confirm stale `docker-prune.service` (linked,ignored)
leftover GC's cleanly. Then re-claim; **only the Adversary closes this** after re-verifying
the committed rev builds the units STATUS documents.