# Policy + cleanup — cc-ci compose overlays (when they're justified) & upgrade-tier from-version **Status:** POLICY (codifies `plan.md §9`) + a small set of follow-ups. **Owner:** Builder + Adversary. **This file:** `/srv/cc-ci/cc-ci-plan/plan-ccci-compose-overlay-policy.md` **Supersedes** the earlier `plan-prefer-env-over-compose-overlay.md` (its premise — parameterize `start_period` via an env var — is **wrong: abra does not support an env value for `start_period`**). --- ## 0. Policy (operator, 2026-05-30) A cc-ci-authored compose overlay (the single `compose.ccci.yml`, layered via `COMPOSE_FILE`) risks **drift** from the recipe users run — so **avoid where possible and justify each use**. But it is a **legitimate, uniform fallback pattern**, not forbidden: - **Prefer an upstream recipe PR** in most cases — a real robustness fix, or exposing a knob the recipe should expose. That's where a fix usually belongs. - **A ccci overlay is the right tool when the value can't be supplied any other way** — notably a healthcheck **`start_period`**, which **abra cannot take from an env var**. The ghost/discourse `start_period` bumps therefore **stay as overlays** (an env PR is impossible for that field). - **Uniform pattern (acceptable fallback):** a single, fixed-name **`compose.ccci.yml`** per recipe (NOT per-purpose suffixes — one file holds all cc-ci-side deploy tweaks for that recipe), provided into the checkout by `install_steps.sh`, wired by `recipe_meta` `COMPOSE_FILE` (`compose.yml:compose.ccci.yml`), kept as an untracked file so it survives the upgrade `git checkout -f` (`CHAOS_BASE_DEPLOY=True`; `assert_upgraded` strips the `+U` marker — see DECISIONS 2026-05-30). - **Each overlay must:** be **minimal + single-purpose**, **document WHY** in its header (the exact abra/upstream limitation that forces it), and be **Adversary-confirmed** to not weaken a test or mask a recipe defect. Where the fix also belongs upstream (e.g. a `start_period` too tight for any slow host), **file the upstream PR too** — the overlay is the cc-ci-side fallback, not a reason to skip it. ## 1. Upgrade tier: always test the upgrade to LATEST Don't drop the upgrade test because the *from* (older) version is awkward. - **Always perform the upgrade to the latest version and run the full assertions on the latest.** - If the older from-version can't be fully deployed/tested (image tag removed from the registry, or it predates an overlay/feature), you do **NOT** need that older version's **custom tests** to run. Deploy it minimally (a justified overlay is fine) or upgrade from the nearest deployable prior; skip only the from-version's custom assertions, and **record** that. - Skipping a from-version's custom tests = honest, recorded. Skipping upgrade-to-latest = not OK. ## 2. Disposition of the current overlays - [ ] **RENAME the overlay files to the uniform `compose.ccci.yml`.** `tests/ghost/compose.ccci-health.yml` and `tests/discourse/compose.ccci-health.yml` → `compose.ccci.yml`; update each recipe's `install_steps.sh` (the `cp` target) and `recipe_meta` `COMPOSE_FILE` (`compose.yml:compose.ccci-health.yml` → `compose.yml:compose.ccci.yml`). One fixed filename per recipe going forward. - [ ] **ghost `compose.ccci.yml` (start_period 900s) — KEEP, justified.** abra can't env-param `start_period`; the fresh-DB migration needs the larger grace or swarm kills it → deadlock. Confirm the header documents this; consider an upstream PR raising ghost's `start_period` (it's a real slow-host fragility) — but the overlay stays regardless. - [ ] **discourse `compose.ccci.yml` — KEEP, justified (both parts).** (a) `start_period 1200s` (same reason as ghost). (b) The `bitnami/discourse:3.3.1 → bitnamilegacy/discourse:3.3.1` re-pin makes the from-version (0.7.0, whose `bitnami/discourse` tag Docker Hub now 404s) **deployable so the upgrade-to-latest test can run** — namespace-only, identical discourse version, applied to base+head. This is the §1 case: keep the upgrade-to-latest test; the 0.7.0 custom tests need not run. Document it; if a deployable prior without the re-pin exists, prefer upgrading from that. - [ ] **mumble `compose.host-ports.yml` (cc-ci copy for the old base) — DROP it.** Deploying mumble 0.2.0 does NOT need host-ports (that overlay only *publishes* 64738 for on-host tests). Per §1: deploy 0.2.0 without it, **skip 0.2.0's voice/on-host custom tests**, then upgrade to the latest version (which ships `compose.host-ports.yml` natively) and run the voice tests on the latest. Remove the cc-ci copy + its `install_steps`/`COMPOSE_FILE` wiring for the old base; the current version's native overlay is untouched. ## 3. Definition of Done (Adversary cold-verifies) - [ ] Every surviving cc-ci overlay is minimal, header-documents its justification (the abra/upstream limitation), and is Adversary-confirmed to not weaken a test or mask a defect. - [ ] The mumble old-base cc-ci host-ports copy is removed; mumble still **upgrades to latest** and runs its voice tests **on the latest** (0.2.0's voice tests skipped + recorded). - [ ] ghost + discourse still pass full suites; discourse still tests the upgrade to latest. - [ ] Any upstream PR opened (e.g. ghost/discourse `start_period`) follows the recipe-PR rule (cc-ci-green via `!testme` before operator merge); the overlay remains as the cc-ci fallback. - [ ] No upgrade-to-latest test was dropped to avoid an awkward from-version. ## 4. Guardrails - **Correctness first** — never weaken/skip/soften a check to make a deploy or upgrade pass; an overlay tunes deploy/infra only (its header must say how), the real assertions stand. - **Justify + document every overlay**; prefer the upstream PR where the fix belongs. - **Real abra path** throughout.