50 lines
3.7 KiB
Markdown
50 lines
3.7 KiB
Markdown
# BACKLOG — Phase 2pc (sane image-prune policy)
|
|
|
|
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase2pc-image-cache.md`.
|
|
Scope (post operator correction 2026-05-29): **PC1 prune policy + confirm local-store
|
|
retention/auth ONLY.** The registry:2 pull-through cache is **dropped** (deferred to IDEAS /
|
|
Phase 2b — revisit only if multi-node OR a measured cold-deploy bottleneck on recreate-surviving
|
|
storage).
|
|
|
|
## Build backlog
|
|
|
|
- [ ] **PC1 — Conservative prune policy.** Remove `virtualisation.docker.autoPrune` (`--all` evicts
|
|
in-use base images → forced cold re-pull → rate-limit). Replace with a surgical, gated prune:
|
|
dangling + `until=24h` only, NEVER `--all`/`--volumes`; gated on (a) genuine disk pressure
|
|
(`/` ≥ 80%), (b) no run-app stack live, (c) no swarm service converging (mid-pull). Teardown
|
|
already removes only services/volumes/secrets/.env — NOT images (verified) — keep it that way.
|
|
- [ ] **PC2 — Confirm local cache retained + authenticated.** Daemon stays PAT-authenticated
|
|
(`docker info` Username=nptest2, sops `dockerhub_auth` → `/root/.docker/config.json`); local
|
|
image store `/var/lib/docker` persists across runs/teardowns/reboots. No code change expected —
|
|
confirm + document.
|
|
- [ ] **PC3 — Verify + document.** Deploy → teardown → redeploy reuses local layers (no
|
|
re-download); disk bounded without `-af`. Update `docs/runbook.md` + `docs/` prune note;
|
|
record the policy + the dropped-registry-cache deviation in `DECISIONS.md`.
|
|
|
|
## Adversary findings
|
|
|
|
- [x] **F2pc-1 [adversary] CLOSED @2026-05-29 (re-verified, re-claim 9e73ebd).** Builder renamed
|
|
committed units `docker-prune`→`ci-docker-prune` (b9bbd25; NixOS reserves `docker-prune`).
|
|
Re-verified: `git show HEAD:nix/modules/{docker-prune,swarm}.nix` byte-identical to host
|
|
`/root/cc-ci`; committed units = `ci-docker-prune.*` = live (enabled+active); old
|
|
`docker-prune.timer` not-found. git now reproduces the verified system → CLOSED by Adversary.
|
|
- [x] ~~**F2pc-1 [adversary] BLOCKING — committed code ≠ deployed/"verified" host (gate 2pc, claim de6103d).**~~
|
|
The verified prune behavior is correct, but git does not reproduce the verified system.
|
|
- **Observed.** origin/main HEAD `de6103d` `nix/modules/docker-prune.nix:56,67` defines
|
|
`systemd.services.docker-prune` / `systemd.timers.docker-prune`. The live host runs
|
|
`ci-docker-prune.service`/`.timer` (enabled+active), built from **uncommitted** source in
|
|
`/root/cc-ci` (not a git repo; its module names units `ci-docker-prune`). STATUS-2pc's
|
|
verify commands also use `ci-docker-prune.timer`.
|
|
- **Repro.** `cd /srv/cc-ci/cc-ci-adv && grep -nE 'systemd\.(services|timers)\.' nix/modules/docker-prune.nix`
|
|
→ `docker-prune`. `ssh cc-ci 'systemctl is-active ci-docker-prune.timer; systemctl is-enabled docker-prune.timer'`
|
|
→ `active` / `not-found`. So a from-git rebuild creates `docker-prune.*` (≠ verified
|
|
`ci-docker-prune.*`); a verifier following STATUS against a git-built host gets false FAIL.
|
|
- **Impact.** D8/fresh-rebuild contract: the "deployed+verified" artifact was never
|
|
committed. Functionally equivalent (same `cc-ci-docker-prune` script body), so this is a
|
|
reproducibility/integrity defect, not behavioral.
|
|
- **To clear (Builder).** Make git == host: commit the deployed `ci-docker-prune` naming
|
|
(push `/root/cc-ci`'s module), OR rename module units to `docker-prune` + `nixos-rebuild
|
|
switch` + fix STATUS verify cmds. Confirm stale `docker-prune.service` (linked,ignored)
|
|
leftover GC's cleanly. Then re-claim; **only the Adversary closes this** after re-verifying
|
|
the committed rev builds the units STATUS documents.
|