claim(2pc): PC1 conservative prune deployed+verified; PC2/PC3 local-store cache confirmed

ci-docker-prune (gated surgical prune) live on cc-ci: old autoPrune --all gone, new timer
enabled (daily), no-ops below 80% disk keeping the local image cache, never --all/--volumes.
Daemon stays PAT-authenticated (nptest2); /var/lib/docker retained across rebuild. PC3 proof:
redis:7-alpine deploy->teardown(service rm, image retained)->redeploy = "Image is up to date",
no layer re-download (cold 5303ms -> warm 674ms). Docs: runbook "Image cache & prune policy",
warm.md, DECISIONS Phase-2pc, IDEAS (registry pull-through cache deferred + revisit trigger).
Gate 2pc CLAIMED, awaiting Adversary cold-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 09:42:36 +01:00
parent 16d177e73a
commit de6103d41d
5 changed files with 185 additions and 22 deletions

View File

@ -724,3 +724,29 @@ Standing policy for all Phase-2 (and later) recipe OIDC/SSO testing:
Consequences: DEFERRED #9 (authentik enrollment) re-entry trigger narrowed to "a recipe requires
authentik"; F2-7 (authentik backend) is not a DONE blocker. plan-sso-dep-testing.md §6 updated by the
orchestrator to match.
## Phase 2pc — image-prune policy; local store IS the cache; registry pull-through DROPPED (2026-05-29) — SETTLED
Decision (PC1): removed `virtualisation.docker.autoPrune` (it ran `docker system prune --force --all
--filter until=24h` daily). The `--all` evicts every image not used by a *running* container —
between runs no test apps run, so it wiped the cached recipe base images → cold re-pull → Docker-Hub
rate-limit churn (JOURNAL-2 507/542/690-693). Replaced with `nix/modules/docker-prune.nix`: the
`ci-docker-prune` daily timer + oneshot, a **surgical triple-gated** prune that no-ops unless ALL of
(1) `/` ≥ 80%, (2) no run-app stack live, (3) no swarm service converging; and when it runs prunes
only **dangling images + stopped containers + dangling build cache, `until=24h`** — never `--all`
(keeps tagged base/in-use images), never `--volumes` (warm canonical data). Teardown
(`lifecycle.teardown_app`) already removes only services/volumes/secrets/.env, never images — kept.
Why: on this **single host Docker's own local image store IS the cache** — a pulled image stays and
redeploys reuse local layers with no re-download (proven: redis:7-alpine cold pull 5303ms w/ 6 layer
downloads → after `service rm` teardown the image is retained → warm redeploy "Image is up to date"
674ms, no bytes); the PAT-authenticated daemon (200/6h) makes the residual warm-deploy manifest check
free of rate-limit pressure. So *keeping* the store recovers ~all the benefit a cache would give.
Decision (registry pull-through cache): **DROPPED here, deferred to IDEAS / Phase 2b** (operator
scope correction 2026-05-29, mid-phase). A `registry:2` pull-through cache's distinctive wins —
multi-node fan-out, surviving prune/VM-rebuild on *separate* storage, cache-miss authentication —
**don't apply** to a single authenticated non-pruning host (one node; co-located cache lost on a
recreate anyway; daemon already authenticated). It would add a registry service + daemon-mirror
config + cache GC for marginal gain. **Revisit ONLY if** (a) cc-ci goes multi-node, OR (b) Phase-2b
measurement shows cold-deploy pull time is a real bottleneck AND the cache can live on
recreate-surviving storage (Incus volume / host b1 path, not the VM's ephemeral disk). No registry
code was written (caught during orientation) — nothing to revert.