# Phase `prevb` — dynamic upgrade base + per-recipe `previous/` config **Mission (operator-specified 2026-06-16):** fix how cc-ci handles version-specific needs in the **upgrade tier**. Today a single per-recipe overlay (`compose.ccci.yml`) conflates two different things — *environmental* tweaks (cc-ci node is slow/memory-tight) and *version-specific repairs* (an old base's image reference rotted) — and applies BOTH to EVERY deploy, including the PR head. That silently overrides the head and masks the real change. Proven live on **discourse PR #4** (`recipe-maintainers/discourse#4`, `discourse-official-image → main`): the overlay re-pins `app.image` to `bitnamilegacy/discourse:3.3.1` and re-adds the dropped `sidekiq` service, so `!testme` deploys the OLD image instead of the PR's new official `discourse/discourse:3.5.3` — the migration is never tested. Replace that model with two changes, then prove them on discourse PR #4: 1. **Dynamic upgrade base** (no hardcoded `UPGRADE_BASE_VERSION`): the base the head upgrades from is resolved at run time as **last-green (warm canonical)** → fallback **target-branch (`main`) tip** → else **skip the upgrade tier** (recorded reason). 2. **Optional per-recipe `previous/` folder** holding the **minimal** config needed to deploy the *previous* (last-green) version, **applied only to the base deploy and ignored for the head**. State files: `STATUS-prevb.md`, `BACKLOG-prevb.md`, `REVIEW-prevb.md`, `JOURNAL-prevb.md`. DECISIONS.md shared. ## 1. Root cause (read first) `tests//compose.ccci.yml` + `EXTRA_ENV.COMPOSE_FILE = "compose.yml:compose.ccci.yml"` is applied to every deploy in the recipe-under-test flow. For discourse it carries BOTH: - **environmental** (`start_period: 20m` grace, `order: stop-first`) — depends on the cc-ci node, must apply to all deploys incl. head; and - **version-specific repair** (`app`/`sidekiq` → `bitnamilegacy/discourse:3.3.1`) — depends on the old 0.7.0 base whose published `bitnami/discourse:3.3.1` 404s; must apply ONLY to that base. Fusing them + applying to all deploys is the bug: the version-specific half leaks onto the head (scalar `image:` last-file-wins override; additive service merge re-adds dropped `sidekiq`). ## 2. Design **Decompose the overlay into two layers** — the harness applies them to different deploys: - **Environmental overlay (all deploys, incl. head).** Node-reality tweaks the recipe itself doesn't encode (e.g. rollout `order`). Keep it MINIMAL and shrink over time (a well-formed recipe head ships its own grace — PR #4 already has `start_period: 20m`). It must contain **no version-specific image pins or service add/drop**. - **`tests//previous/` (base deploy ONLY, ignored for head).** The minimal bundle needed to bring up the **previous (last-green) version** when it can't deploy as-published — e.g. an image relocation (`bitnami/* → bitnamilegacy/*`), or an era-specific service/step/env. Mirror the recipe-under-test layout but scoped to "deploy the previous version" (typically just a `compose.previous.yml`; add an `install_steps.sh`/`ops.py`/env override only if that version genuinely needs it). **Keep it as small and simple as possible — add one only where necessary.** **Dynamic base resolution (replace static `UPGRADE_BASE_VERSION`):** 1. **Primary: last-green (warm canonical).** Upgrade from the last version cc-ci recorded green for this recipe (prefer the warm-canonical snapshot where one exists — it's already data-warm, giving a realistic data-survival signal and avoiding a from-scratch old-version deploy). 2. **Fallback: target-branch (`main`) tip** when there is no last-green (e.g. a recipe with no recorded green predecessor yet). This is the real predecessor the PR merges on top of. 3. **Else skip** the upgrade tier with a recorded reason (new recipe / no predecessor). Structural skip, declared (`EXPECTED_NA`), not a silent pass. **`previous/` is for the current previous version, and is removable when stale.** To stop a stale folder silently overriding a non-matching base, `previous/` **declares the version it targets** (simplest: a one-line marker, or the `coop-cloud.*.version` label in its `compose.previous.yml`). The harness applies it **only when the resolved base version matches**; on mismatch it **skips it and flags it stale ("previous/ targets X, base is Y — remove it")**. After a recipe upgrade PR merges (new last-green), the now-stale `previous/` should be removed — keep it to roughly one version's worth, never an accumulating pile. ## 3. Discourse as the first real case - **main today is `bitnamilegacy/discourse:3.3.1`** (deployable — bitnamilegacy exists). So with a dynamic base, the base = last-green (≈ main) **deploys cleanly with NO `previous/` needed**: the rotted-base treadmill evaporates because we no longer resurrect the frozen 0.7.0 tag. (Confirm main's image; if the last-green base genuinely still needs a repair to deploy, add a **minimal** `previous/` — but expect not.) - Move discourse's environmental tweaks (rollout `order`, any grace the head lacks) into the environmental overlay; **delete the `bitnamilegacy` image pins and the `sidekiq` block from the all-deploys overlay**; **remove `UPGRADE_BASE_VERSION`**. - **PR #4 head now deploys UNMODIFIED** → the chaos redeploy runs the real `discourse/discourse:3.5.3` with no `sidekiq`, so the upgrade tier finally exercises the actual official-image migration (last-green bitnamilegacy → official head) the PR claims to support. ## 4. Gates **M1 — implemented + green locally.** Harness: dynamic base resolution (last-green → `main` → skip); `previous/` discovery + base-only application + version-guard/stale-flag; environmental overlay separated from version-specific config; `UPGRADE_BASE_VERSION` removed. Discourse migrated. Unit tests for the new harness surface (base resolution, `previous/` match/skip, overlay layering). Discourse upgrade tier green locally with **proof the head runs the real head image** — assert the deployed `app` image is `discourse/discourse:3.5.3` (NOT bitnamilegacy) and that no `sidekiq` service exists post-deploy. Adversary cold-verifies from a clean checkout: the overlay no longer touches the head; a deliberately-broken head still fails the upgrade tier (teeth — base resolution didn't paper over it); base falls back to `main` correctly when last-green is absent; `previous/` is ignored for the head; **no test weakened**. **M2 — proven in real CI + a representative spot-check.** discourse PR #4 `!testme` **GREEN**, with evidence the head genuinely ran `discourse/discourse:3.5.3` (not the old bitnami image) and the migration was exercised. Spot-check ≥3 other recipes with upgrade tiers (e.g. one warm-canonical recipe, one with a published predecessor, one that previously relied on a `.ccci` overlay — keycloak/cryptpad/ghost) to confirm dynamic base works generally and nothing obvious broke. (FULL all-recipe regression is the **next phase `regall`** — do not attempt it here; just don't ship something obviously broken.) Levels / records reconciled. Fresh Adversary PASS on both milestones → `## DONE`. ## 5. Guardrails (binding) - **Make the test FAITHFUL, never weaker.** The goal is that the head runs the head's real image; never resolve the base or apply `previous/` in a way that hides a genuinely broken head. A broken upgrade must still go RED. - **`previous/` minimal + non-accumulating.** Only what's strictly needed to deploy the previous base; version-guarded; removable when stale. No `previous/` at all if the last-green base deploys clean. - **Don't regress other recipes.** Dynamic base must work for recipes with/without warm canonicals and with/without published predecessors. (The `regall` phase is the exhaustive proof; here, don't break the spot-check set.) - **Recipe mirrors are PR-only.** We VERIFY discourse PR #4 (run the harness / post `!testme`); we do NOT merge it (operator's call). A recipe defect found → PR comment, not a test weakened. - Commit author `autonomic-bot `; push every commit; abra over a pseudo-TTY. Host changes are coordinated (loops self-rebuilding the host is acceptable if clean — verify host health after; but this phase likely needs none). ## 6. Definition of Done Dynamic upgrade-base resolution (last-green → `main` → skip) and the optional minimal `previous/` folder shipped and unit-tested; the environmental vs version-specific layers cleanly separated; discourse migrated off the static base + leaky overlay; **discourse PR #4 verified GREEN in real CI with the head genuinely running the official `discourse/discourse:3.5.3` image** (the migration actually tested), and a representative recipe spot-check still green; nothing merged on the mirror; M1 + M2 fresh Adversary PASSes in REVIEW-prevb.md. (Exhaustive all-recipe regression handed to phase `regall`.)