Files

autonomic-bot 61ab3ecb3a plan: per-test image pre-pull sub-plan (warm images before deploy + upgrade; cheap on warm cache)

Resolve a recipe's images (docker compose config --images) and docker pull them (skip-if-present for
pinned tags) at the start of the recipe sequence + before the upgrade-new-version deploy, then the
normal abra deploy. Separates pull from converge (clear pull failures vs murky convergence timeouts),
speeds convergence (fits abra-native window). No layer re-download on warm cache; nightly all-recipes
run warms everything. Complements (not replaces) the recipe healthcheck for slow-init convergence.
Near-term Phase-2 harness unit; real abra deploy unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-05-29 14:55:21 +01:00

3.9 KiB

Raw Blame History

Sub-plan — pre-pull a recipe's images at the start of its test sequence

Status: QUEUED — a small harness addition to runner/run_recipe_ci.py (+ harness). Picks up as a near-term Phase-2 harness unit (not a phase-pause). Auto-applies in the nightly all-recipes run. Owner: Builder + Adversary loops. This file: /srv/cc-ci/cc-ci-plan/plan-prepull-images.md

What

At the start of a recipe's test sequence — before the first abra app deploy (and before the upgrade tier's new-version deploy) — resolve the recipe's images and docker pull them so they are in the local store before the deploy runs.

# at the top of the per-recipe run, after `abra app new` + checkout + .env set:
imgs = docker compose --env-file <app .env> -f <COMPOSE_FILE> config --images   # resolves interpolation
for img in imgs:
    docker image inspect "$img" >/dev/null 2>&1 || docker pull "$img"   # skip-if-present (pinned tags)
# then the normal:  abra app deploy …    (unchanged — real abra)
# repeat the pre-pull for the UPGRADE target image set before `abra app upgrade`.

Why

Separate "pull" from "converge." A rate-limit / bad-tag / slow-pull then fails fast and clearly as a pull error, instead of surfacing later as a murky not converged deploy timeout (the F2-12-class confusion).
Faster, more reliable convergence. Images already local → swarm starts services immediately → the deploy fits abra's native convergence window better (supports "prefer abra convergence" — can reduce the -c/READY_PROBE workaround for pull-bound cases).
Nightly: the all-recipes nightly run pre-pulls implicitly, warming the cache for everything.

Cheap on a warm cache (the key property)

docker pull image:tag does not re-download cached layers — it does a cheap manifest check and reports Already exists; only missing/changed layers download. With coop-cloud's pinned (immutable) tags, the skip-if-present check (docker image inspect) makes it zero network when already cached. So per-test pre-pull is near-free after the first/nightly pull; it only does real work on a cold image or an upgrade to a genuinely new version.

Honest caveats

Removes pull time/variance from convergence, NOT app-init time. Slow-starting apps (collabora's heavy init, F2-12) still need the recipe healthcheck/start_period (plan-lasuite-drive-recipe-pr.md). Pre-pull is complementary, not a replacement.
Resolve via docker compose config --images (handles $VERSION-style interpolation) using the same COMPOSE_FILE set abra uses (read from the app .env) — a naive grep image: misses interpolated tags and multi-compose recipes.
Not an abra bypass. docker pull only warms the local store; the deploy is still real abra app deploy/upgrade. Consistent with the "real abra commands" guardrail.
Don't weaken anything — a failed pre-pull is a real (clearer) test failure, reported as such.

Definition of done (Adversary cold-verifies)

Pre-pull step runs at the start of the recipe sequence (before the deploy) and before the upgrade tier's new-version deploy; images resolved correctly (incl. interpolation, multi-compose).
Cheap on warm cache: a 2nd run's pre-pull does no layer re-download (skip-if-present, or Already exists); proven.
Clear failure mode: a pull failure (e.g. a deliberately-bad tag) is reported as a pull error before deploy, not as a deploy/convergence timeout.
No test weakened; deploy path unchanged (real abra). Bounded — this one step only.

Scope note

Per-test pre-pull only (at the recipe sequence start + upgrade). NOT a boot-time "pull all enrolled recipes" sweep — the local cache already accumulates across runs and the nightly all-recipes run warms everything, so per-test is the simple, targeted form.

3.9 KiB Raw Blame History