Files
cc-ci/machine-docs/BACKLOG-2pc.md

3.2 KiB

BACKLOG — Phase 2pc (sane image-prune policy)

SSOT: /srv/cc-ci/cc-ci-plan/plan-phase2pc-image-cache.md. Scope (post operator correction 2026-05-29): PC1 prune policy + confirm local-store retention/auth ONLY. The registry:2 pull-through cache is dropped (deferred to IDEAS / Phase 2b — revisit only if multi-node OR a measured cold-deploy bottleneck on recreate-surviving storage).

Build backlog

  • PC1 — Conservative prune policy. Remove virtualisation.docker.autoPrune (--all evicts in-use base images → forced cold re-pull → rate-limit). Replace with a surgical, gated prune: dangling + until=24h only, NEVER --all/--volumes; gated on (a) genuine disk pressure (/ ≥ 80%), (b) no run-app stack live, (c) no swarm service converging (mid-pull). Teardown already removes only services/volumes/secrets/.env — NOT images (verified) — keep it that way.
  • PC2 — Confirm local cache retained + authenticated. Daemon stays PAT-authenticated (docker info Username=nptest2, sops dockerhub_auth/root/.docker/config.json); local image store /var/lib/docker persists across runs/teardowns/reboots. No code change expected — confirm + document.
  • PC3 — Verify + document. Deploy → teardown → redeploy reuses local layers (no re-download); disk bounded without -af. Update docs/runbook.md + docs/ prune note; record the policy + the dropped-registry-cache deviation in DECISIONS.md.

Adversary findings

  • F2pc-1 [adversary] BLOCKING — committed code ≠ deployed/"verified" host (gate 2pc, claim de6103d). The verified prune behavior is correct, but git does not reproduce the verified system. - Observed. origin/main HEAD de6103d nix/modules/docker-prune.nix:56,67 defines systemd.services.docker-prune / systemd.timers.docker-prune. The live host runs ci-docker-prune.service/.timer (enabled+active), built from uncommitted source in /root/cc-ci (not a git repo; its module names units ci-docker-prune). STATUS-2pc's verify commands also use ci-docker-prune.timer. - Repro. cd /srv/cc-ci/cc-ci-adv && grep -nE 'systemd\.(services|timers)\.' nix/modules/docker-prune.nixdocker-prune. ssh cc-ci 'systemctl is-active ci-docker-prune.timer; systemctl is-enabled docker-prune.timer'active / not-found. So a from-git rebuild creates docker-prune.* (≠ verified ci-docker-prune.*); a verifier following STATUS against a git-built host gets false FAIL. - Impact. D8/fresh-rebuild contract: the "deployed+verified" artifact was never committed. Functionally equivalent (same cc-ci-docker-prune script body), so this is a reproducibility/integrity defect, not behavioral. - To clear (Builder). Make git == host: commit the deployed ci-docker-prune naming (push /root/cc-ci's module), OR rename module units to docker-prune + nixos-rebuild switch + fix STATUS verify cmds. Confirm stale docker-prune.service (linked,ignored) leftover GC's cleanly. Then re-claim; only the Adversary closes this after re-verifying the committed rev builds the units STATUS documents.