Files
cc-ci/machine-docs/STATUS-2pc.md
autonomic-bot de6103d41d claim(2pc): PC1 conservative prune deployed+verified; PC2/PC3 local-store cache confirmed
ci-docker-prune (gated surgical prune) live on cc-ci: old autoPrune --all gone, new timer
enabled (daily), no-ops below 80% disk keeping the local image cache, never --all/--volumes.
Daemon stays PAT-authenticated (nptest2); /var/lib/docker retained across rebuild. PC3 proof:
redis:7-alpine deploy->teardown(service rm, image retained)->redeploy = "Image is up to date",
no layer re-download (cold 5303ms -> warm 674ms). Docs: runbook "Image cache & prune policy",
warm.md, DECISIONS Phase-2pc, IDEAS (registry pull-through cache deferred + revisit trigger).
Gate 2pc CLAIMED, awaiting Adversary cold-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:42:36 +01:00

94 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# STATUS — Phase 2pc (sane image-prune policy)
**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-phase2pc-image-cache.md`
**Scope (operator correction 2026-05-29):** PC1 conservative prune + PC2/PC3 confirm+verify
local-store retention/auth. **Registry pull-through cache DROPPED** (deferred → `cc-ci-plan/IDEAS.md`
+ DECISIONS Phase-2pc; no registry code was written).
## Gate: 2pc — CLAIMED, awaiting Adversary
All of PC1/PC2/PC3 implemented, deployed to cc-ci, and Builder-verified on the real host. Commit
sha for this claim: see `claim(2pc)` HEAD. WHAT / HOW / EXPECTED / WHERE below.
---
### PC1 — Conservative prune policy
**WHAT.** Removed the daily `docker system prune --all` and replaced it with a surgical, triple-gated
prune that keeps Docker's local image store (the cache) warm.
- **WHERE.** `nix/modules/docker-prune.nix` (NEW, unit `ci-docker-prune` service+timer);
`nix/modules/swarm.nix` (`virtualisation.docker.autoPrune` block removed, left OFF=default);
`nix/hosts/cc-ci/configuration.nix` (imports `docker-prune.nix`). Deployed via
`nixos-rebuild switch --flake path:/root/cc-ci#cc-ci`.
- The prune **no-ops unless ALL** hold: (1) `/` usage ≥ 80%, (2) no run-app stack live
(`<=4char>-<6hex>_ci_commoninternet_net_*`), (3) no swarm service converging (unmet replicas).
When it runs: `docker {container,image,builder} prune -f --filter until=24h` — **dangling+old only,
never `--all`, never `--volumes`.**
- Teardown unchanged: `runner/harness/lifecycle.py::teardown_app` removes services/volumes/secrets/
.env and **no images** (`grep -n 'rmi\|image rm\|image prune' runner/ tests/conftest.py` = empty).
**HOW to verify (cold, Adversary's own checks):**
```sh
ssh cc-ci 'systemctl is-enabled docker-prune.timer' # EXPECT: not-found (autoPrune gone)
ssh cc-ci 'systemctl is-enabled ci-docker-prune.timer; systemctl is-active ci-docker-prune.timer'
ssh cc-ci 'systemctl list-timers ci-docker-prune.timer --no-pager' # EXPECT: enabled/active, NEXT daily 00:00
ssh cc-ci 'systemctl start ci-docker-prune.service; \
journalctl -u ci-docker-prune.service -n 3 --no-pager' # EXPECT (disk<80%): "keeping local image cache, nothing to do"
ssh cc-ci 'docker images -q | wc -l' # EXPECT: unchanged before==after the manual run
# source-read the gates + flags (no --all, no --volumes):
grep -nE "until=24h|--all|--volumes|prune" nix/modules/docker-prune.nix
grep -n "autoPrune" nix/modules/swarm.nix # EXPECT: only a comment, no enable=true
```
**EXPECTED:** old timer not-found; `ci-docker-prune.timer` enabled+active (daily); manual run below
80% prints the no-op line and removes nothing; module flags are `--filter until=24h` only (never
`--all`/`--volumes`); swarm.nix has no live autoPrune.
### PC2 — Local cache retained + authenticated (confirm)
**WHAT.** Daemon stays PAT-authenticated; `/var/lib/docker` local image store persists across
runs/teardowns/reboots; no code change (sops `dockerhub_auth``/root/.docker/config.json` in
`nix/modules/secrets.nix`, unchanged).
**HOW / EXPECTED:**
```sh
ssh cc-ci 'docker info 2>/dev/null | grep Username' # EXPECT: Username: nptest2
ssh cc-ci 'ls -l /root/.docker/config.json' # EXPECT: -> /run/secrets/rendered/docker-config.json (0600)
ssh cc-ci 'docker images | wc -l' # EXPECT: many recipe images retained (was 21 leaf images)
```
### PC3 — Deploy → teardown → redeploy reuses local layers (no re-download)
**WHAT.** A previously-pulled image is retained through teardown and a redeploy reuses local layers;
only an authenticated manifest check remains. Builder-proven with a real swarm deploy/teardown/
redeploy on `redis:7-alpine` (docker.io through the authenticated daemon — same pull path abra/swarm
use).
**HOW (Adversary, reproducible):**
```sh
ssh cc-ci 'bash -s' <<'PROOF'
IMG=redis:7-alpine; docker rmi -f "$IMG" >/dev/null 2>&1 || true
t0=$(date +%s%N); docker pull "$IMG" 2>&1 | grep -E "Pull complete|Downloaded|Already exists|up to date"; t1=$(date +%s%N)
echo COLD_MS=$(((t1-t0)/1000000))
docker service create --name pc3 --replicas 1 "$IMG" sleep 120 >/dev/null 2>&1; docker service ls --filter name=pc3 --format '{{.Replicas}}'
docker service rm pc3 >/dev/null 2>&1
echo retained: $(docker images redis:7-alpine --format '{{.ID}}')
t2=$(date +%s%N); docker pull "$IMG" 2>&1 | grep -E "Pull complete|Downloaded|Already exists|up to date"; t3=$(date +%s%N)
echo WARM_MS=$(((t3-t2)/1000000)); docker rmi -f "$IMG" >/dev/null 2>&1
PROOF
```
**EXPECTED:** COLD pull shows layer "Pull complete" lines (download) — Builder saw 6 layers,
COLD_MS≈5303; after `service rm` the image ID is still listed (retained); WARM pull shows
`Image is up to date` (no layer download), WARM_MS≈674 (≈8× faster, manifest-only). Confirms the
local store is the cache, survives teardown, and a redeploy needs no Docker-Hub layer download.
Optional fuller proof: a real recipe cycle
`RECIPE=custom-html-tiny PR=0 STAGES=install cc-ci-run runner/run_recipe_ci.py` run twice — the 2nd
deploy shows no image-layer download.
---
## DoD checklist (Builder view — Adversary owns the verdict in REVIEW-2pc.md)
- [x] **PC1** — autoPrune `--all` removed; surgical gated `ci-docker-prune` deployed; teardown keeps images.
- [x] **PC2** — daemon PAT-authenticated (nptest2); local store retained across rebuild.
- [x] **PC3** — deploy→teardown→redeploy reuses local layers (no re-download), measured; disk bounded
(31%) without `-af`. Documented (runbook/warm/DECISIONS/IDEAS).
## Not blocked. No standing blockers.