claim(2pc): PC1 conservative prune deployed+verified; PC2/PC3 local-store cache confirmed
ci-docker-prune (gated surgical prune) live on cc-ci: old autoPrune --all gone, new timer enabled (daily), no-ops below 80% disk keeping the local image cache, never --all/--volumes. Daemon stays PAT-authenticated (nptest2); /var/lib/docker retained across rebuild. PC3 proof: redis:7-alpine deploy->teardown(service rm, image retained)->redeploy = "Image is up to date", no layer re-download (cold 5303ms -> warm 674ms). Docs: runbook "Image cache & prune policy", warm.md, DECISIONS Phase-2pc, IDEAS (registry pull-through cache deferred + revisit trigger). Gate 2pc CLAIMED, awaiting Adversary cold-verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -724,3 +724,29 @@ Standing policy for all Phase-2 (and later) recipe OIDC/SSO testing:
|
||||
Consequences: DEFERRED #9 (authentik enrollment) re-entry trigger narrowed to "a recipe requires
|
||||
authentik"; F2-7 (authentik backend) is not a DONE blocker. plan-sso-dep-testing.md §6 updated by the
|
||||
orchestrator to match.
|
||||
|
||||
## Phase 2pc — image-prune policy; local store IS the cache; registry pull-through DROPPED (2026-05-29) — SETTLED
|
||||
Decision (PC1): removed `virtualisation.docker.autoPrune` (it ran `docker system prune --force --all
|
||||
--filter until=24h` daily). The `--all` evicts every image not used by a *running* container —
|
||||
between runs no test apps run, so it wiped the cached recipe base images → cold re-pull → Docker-Hub
|
||||
rate-limit churn (JOURNAL-2 507/542/690-693). Replaced with `nix/modules/docker-prune.nix`: the
|
||||
`ci-docker-prune` daily timer + oneshot, a **surgical triple-gated** prune that no-ops unless ALL of
|
||||
(1) `/` ≥ 80%, (2) no run-app stack live, (3) no swarm service converging; and when it runs prunes
|
||||
only **dangling images + stopped containers + dangling build cache, `until=24h`** — never `--all`
|
||||
(keeps tagged base/in-use images), never `--volumes` (warm canonical data). Teardown
|
||||
(`lifecycle.teardown_app`) already removes only services/volumes/secrets/.env, never images — kept.
|
||||
Why: on this **single host Docker's own local image store IS the cache** — a pulled image stays and
|
||||
redeploys reuse local layers with no re-download (proven: redis:7-alpine cold pull 5303ms w/ 6 layer
|
||||
downloads → after `service rm` teardown the image is retained → warm redeploy "Image is up to date"
|
||||
674ms, no bytes); the PAT-authenticated daemon (200/6h) makes the residual warm-deploy manifest check
|
||||
free of rate-limit pressure. So *keeping* the store recovers ~all the benefit a cache would give.
|
||||
|
||||
Decision (registry pull-through cache): **DROPPED here, deferred to IDEAS / Phase 2b** (operator
|
||||
scope correction 2026-05-29, mid-phase). A `registry:2` pull-through cache's distinctive wins —
|
||||
multi-node fan-out, surviving prune/VM-rebuild on *separate* storage, cache-miss authentication —
|
||||
**don't apply** to a single authenticated non-pruning host (one node; co-located cache lost on a
|
||||
recreate anyway; daemon already authenticated). It would add a registry service + daemon-mirror
|
||||
config + cache GC for marginal gain. **Revisit ONLY if** (a) cc-ci goes multi-node, OR (b) Phase-2b
|
||||
measurement shows cold-deploy pull time is a real bottleneck AND the cache can live on
|
||||
recreate-surviving storage (Incus volume / host b1 path, not the VM's ephemeral disk). No registry
|
||||
code was written (caught during orientation) — nothing to revert.
|
||||
|
||||
Reference in New Issue
Block a user