# STATUS — Phase 2pc (sane image-prune policy) **SSOT:** `/srv/cc-ci/cc-ci-plan/plan-phase2pc-image-cache.md` **Scope (operator correction 2026-05-29):** PC1 conservative prune + PC2/PC3 confirm+verify local-store retention/auth. **Registry pull-through cache DROPPED** (deferred → `cc-ci-plan/IDEAS.md` + DECISIONS Phase-2pc; no registry code was written). ## DONE Phase 2pc complete. **Adversary PASS @2026-05-29** for PC1+PC2+PC3 (REVIEW-2pc.md, `review(2pc)` commit `486d162`, gate re-claim `9e73ebd`); **F2pc-1 CLOSED**; no standing VETO. git==host (`ci-docker-prune`, reproducible from a fresh clone). Watchdog auto-returns to Phase 2. ## Gate: 2pc — PASSED (was RE-CLAIMED; F2pc-1 resolved) All of PC1/PC2/PC3 implemented, deployed to cc-ci, and Builder-verified on the real host. WHAT / HOW / EXPECTED / WHERE below. **F2pc-1 (committed code ≠ deployed host) — RESOLVED.** The Adversary cold-verified the *behavior* GREEN but FAILed the gate because it verified the **stale claim commit `de6103d`**, whose `docker-prune.nix` still named the units `docker-prune` while the host runs `ci-docker-prune`. That rename was already committed in **`b9bbd25`** (landed before the verdict) — which is exactly the Adversary's endorsed fix ("commit the deployed ci-docker-prune naming"). **Current pushed HEAD now has git == host == `ci-docker-prune`:** ```sh # committed git defines the SAME units STATUS documents + the host runs: grep -nE 'systemd\.(services|timers)\.' nix/modules/docker-prune.nix # EXPECT: ci-docker-prune (services+timers), introduced by b9bbd25 git log --oneline -1 -- nix/modules/docker-prune.nix # EXPECT: b9bbd25 rename commit ssh cc-ci 'systemctl is-active ci-docker-prune.timer' # EXPECT: active (matches a from-git rebuild) ``` The NixOS-builtin `docker-prune.service` is `inactive`/`linked` (and `docker-prune.timer` is `not-found`): that unit is defined by the NixOS docker module whenever Docker is enabled, has **no timer and no `wantedBy`** with autoPrune off, so it **never runs** — it is not a leftover of this change and a fresh from-git rebuild produces the identical inert unit. The unit name is determined literally by the attribute in `docker-prune.nix`, so a from-git build yields `ci-docker-prune.*`. (Claim discipline now followed: working tree committed + pushed + `git status` clean before this claim.) --- ### PC1 — Conservative prune policy **WHAT.** Removed the daily `docker system prune --all` and replaced it with a surgical, triple-gated prune that keeps Docker's local image store (the cache) warm. - **WHERE.** `nix/modules/docker-prune.nix` (NEW, unit `ci-docker-prune` service+timer); `nix/modules/swarm.nix` (`virtualisation.docker.autoPrune` block removed, left OFF=default); `nix/hosts/cc-ci/configuration.nix` (imports `docker-prune.nix`). Deployed via `nixos-rebuild switch --flake path:/root/cc-ci#cc-ci`. - The prune **no-ops unless ALL** hold: (1) `/` usage ≥ 80%, (2) no run-app stack live (`<=4char>-<6hex>_ci_commoninternet_net_*`), (3) no swarm service converging (unmet replicas). When it runs: `docker {container,image,builder} prune -f --filter until=24h` — **dangling+old only, never `--all`, never `--volumes`.** - Teardown unchanged: `runner/harness/lifecycle.py::teardown_app` removes services/volumes/secrets/ .env and **no images** (`grep -n 'rmi\|image rm\|image prune' runner/ tests/conftest.py` = empty). **HOW to verify (cold, Adversary's own checks):** ```sh ssh cc-ci 'systemctl is-enabled docker-prune.timer' # EXPECT: not-found (autoPrune gone) ssh cc-ci 'systemctl is-enabled ci-docker-prune.timer; systemctl is-active ci-docker-prune.timer' ssh cc-ci 'systemctl list-timers ci-docker-prune.timer --no-pager' # EXPECT: enabled/active, NEXT daily 00:00 ssh cc-ci 'systemctl start ci-docker-prune.service; \ journalctl -u ci-docker-prune.service -n 3 --no-pager' # EXPECT (disk<80%): "keeping local image cache, nothing to do" ssh cc-ci 'docker images -q | wc -l' # EXPECT: unchanged before==after the manual run # source-read the gates + flags (no --all, no --volumes): grep -nE "until=24h|--all|--volumes|prune" nix/modules/docker-prune.nix grep -n "autoPrune" nix/modules/swarm.nix # EXPECT: only a comment, no enable=true ``` **Active-path evidence (Builder ran the exact prune command; gate reaches it only ≥80% disk):** `docker image prune -f --filter until=24h` reclaimed **2.341 GB** (images 23→17, dangling 10→4 — the 4 kept are <24h, proving the age gate), disk 31%→27%, and **every tagged/in-use image survived** (keycloak/mariadb/nginx/redis). Disk bounded without `-af`. **EXPECTED:** old timer not-found; `ci-docker-prune.timer` enabled+active (daily); manual run below 80% prints the no-op line and removes nothing; module flags are `--filter until=24h` only (never `--all`/`--volumes`); swarm.nix has no live autoPrune. ### PC2 — Local cache retained + authenticated (confirm) **WHAT.** Daemon stays PAT-authenticated; `/var/lib/docker` local image store persists across runs/teardowns/reboots; no code change (sops `dockerhub_auth` → `/root/.docker/config.json` in `nix/modules/secrets.nix`, unchanged). **HOW / EXPECTED:** ```sh ssh cc-ci 'docker info 2>/dev/null | grep Username' # EXPECT: Username: nptest2 ssh cc-ci 'ls -l /root/.docker/config.json' # EXPECT: -> /run/secrets/rendered/docker-config.json (0600) ssh cc-ci 'docker images | wc -l' # EXPECT: many recipe images retained (was 21 leaf images) ``` ### PC3 — Deploy → teardown → redeploy reuses local layers (no re-download) **WHAT.** A previously-pulled image is retained through teardown and a redeploy reuses local layers; only an authenticated manifest check remains. Builder-proven with a real swarm deploy/teardown/ redeploy on `redis:7-alpine` (docker.io through the authenticated daemon — same pull path abra/swarm use). **HOW (Adversary, reproducible):** ```sh ssh cc-ci 'bash -s' <<'PROOF' IMG=redis:7-alpine; docker rmi -f "$IMG" >/dev/null 2>&1 || true t0=$(date +%s%N); docker pull "$IMG" 2>&1 | grep -E "Pull complete|Downloaded|Already exists|up to date"; t1=$(date +%s%N) echo COLD_MS=$(((t1-t0)/1000000)) docker service create --name pc3 --replicas 1 "$IMG" sleep 120 >/dev/null 2>&1; docker service ls --filter name=pc3 --format '{{.Replicas}}' docker service rm pc3 >/dev/null 2>&1 echo retained: $(docker images redis:7-alpine --format '{{.ID}}') t2=$(date +%s%N); docker pull "$IMG" 2>&1 | grep -E "Pull complete|Downloaded|Already exists|up to date"; t3=$(date +%s%N) echo WARM_MS=$(((t3-t2)/1000000)); docker rmi -f "$IMG" >/dev/null 2>&1 PROOF ``` **EXPECTED:** COLD pull shows layer "Pull complete" lines (download) — Builder saw 6 layers, COLD_MS≈5303; after `service rm` the image ID is still listed (retained); WARM pull shows `Image is up to date` (no layer download), WARM_MS≈674 (≈8× faster, manifest-only). Confirms the local store is the cache, survives teardown, and a redeploy needs no Docker-Hub layer download. Optional fuller proof: a real recipe cycle `RECIPE=custom-html-tiny PR=0 STAGES=install cc-ci-run runner/run_recipe_ci.py` run twice — the 2nd deploy shows no image-layer download. --- ## DoD checklist (Builder view — Adversary owns the verdict in REVIEW-2pc.md) - [x] **PC1** — autoPrune `--all` removed; surgical gated `ci-docker-prune` deployed; teardown keeps images. - [x] **PC2** — daemon PAT-authenticated (nptest2); local store retained across rebuild. - [x] **PC3** — deploy→teardown→redeploy reuses local layers (no re-download), measured; disk bounded (31%) without `-af`. Documented (runbook/warm/DECISIONS/IDEAS). ## Not blocked. No standing blockers.