Operator: use a single uniform filename `compose.ccci.yml` per recipe (one file
holding all cc-ci-side deploy tweaks) rather than per-purpose suffixes like
compose.ccci-health.yml. Updated §9 + plan-ccci-compose-overlay-policy.md; added
a DoD item to rename tests/{ghost,discourse}/compose.ccci-health.yml ->
compose.ccci.yml and update their install_steps.sh cp target + recipe_meta
COMPOSE_FILE.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
81 lines
5.8 KiB
Markdown
81 lines
5.8 KiB
Markdown
# Policy + cleanup — cc-ci compose overlays (when they're justified) & upgrade-tier from-version
|
|
|
|
**Status:** POLICY (codifies `plan.md §9`) + a small set of follow-ups. **Owner:** Builder + Adversary.
|
|
**This file:** `/srv/cc-ci/cc-ci-plan/plan-ccci-compose-overlay-policy.md`
|
|
**Supersedes** the earlier `plan-prefer-env-over-compose-overlay.md` (its premise — parameterize
|
|
`start_period` via an env var — is **wrong: abra does not support an env value for `start_period`**).
|
|
|
|
---
|
|
|
|
## 0. Policy (operator, 2026-05-30)
|
|
|
|
A cc-ci-authored compose overlay (the single `compose.ccci.yml`, layered via `COMPOSE_FILE`) risks
|
|
**drift** from the recipe users run — so **avoid where possible and justify each use**. But it is a
|
|
**legitimate, uniform fallback pattern**, not forbidden:
|
|
|
|
- **Prefer an upstream recipe PR** in most cases — a real robustness fix, or exposing a knob the recipe
|
|
should expose. That's where a fix usually belongs.
|
|
- **A ccci overlay is the right tool when the value can't be supplied any other way** — notably a
|
|
healthcheck **`start_period`**, which **abra cannot take from an env var**. The ghost/discourse
|
|
`start_period` bumps therefore **stay as overlays** (an env PR is impossible for that field).
|
|
- **Uniform pattern (acceptable fallback):** a single, fixed-name **`compose.ccci.yml`** per recipe
|
|
(NOT per-purpose suffixes — one file holds all cc-ci-side deploy tweaks for that recipe), provided
|
|
into the checkout by `install_steps.sh`, wired by `recipe_meta` `COMPOSE_FILE`
|
|
(`compose.yml:compose.ccci.yml`), kept as an untracked file so it survives the upgrade
|
|
`git checkout -f` (`CHAOS_BASE_DEPLOY=True`; `assert_upgraded` strips the `+U` marker — see
|
|
DECISIONS 2026-05-30).
|
|
- **Each overlay must:** be **minimal + single-purpose**, **document WHY** in its header (the exact
|
|
abra/upstream limitation that forces it), and be **Adversary-confirmed** to not weaken a test or mask
|
|
a recipe defect. Where the fix also belongs upstream (e.g. a `start_period` too tight for any slow
|
|
host), **file the upstream PR too** — the overlay is the cc-ci-side fallback, not a reason to skip it.
|
|
|
|
## 1. Upgrade tier: always test the upgrade to LATEST
|
|
|
|
Don't drop the upgrade test because the *from* (older) version is awkward.
|
|
- **Always perform the upgrade to the latest version and run the full assertions on the latest.**
|
|
- If the older from-version can't be fully deployed/tested (image tag removed from the registry, or it
|
|
predates an overlay/feature), you do **NOT** need that older version's **custom tests** to run.
|
|
Deploy it minimally (a justified overlay is fine) or upgrade from the nearest deployable prior; skip
|
|
only the from-version's custom assertions, and **record** that.
|
|
- Skipping a from-version's custom tests = honest, recorded. Skipping upgrade-to-latest = not OK.
|
|
|
|
## 2. Disposition of the current overlays
|
|
|
|
- [ ] **RENAME the overlay files to the uniform `compose.ccci.yml`.** `tests/ghost/compose.ccci-health.yml`
|
|
and `tests/discourse/compose.ccci-health.yml` → `compose.ccci.yml`; update each recipe's
|
|
`install_steps.sh` (the `cp` target) and `recipe_meta` `COMPOSE_FILE`
|
|
(`compose.yml:compose.ccci-health.yml` → `compose.yml:compose.ccci.yml`). One fixed filename per
|
|
recipe going forward.
|
|
- [ ] **ghost `compose.ccci.yml` (start_period 900s) — KEEP, justified.** abra can't env-param
|
|
`start_period`; the fresh-DB migration needs the larger grace or swarm kills it → deadlock.
|
|
Confirm the header documents this; consider an upstream PR raising ghost's `start_period` (it's a
|
|
real slow-host fragility) — but the overlay stays regardless.
|
|
- [ ] **discourse `compose.ccci.yml` — KEEP, justified (both parts).** (a) `start_period 1200s`
|
|
(same reason as ghost). (b) The `bitnami/discourse:3.3.1 → bitnamilegacy/discourse:3.3.1` re-pin
|
|
makes the from-version (0.7.0, whose `bitnami/discourse` tag Docker Hub now 404s) **deployable so
|
|
the upgrade-to-latest test can run** — namespace-only, identical discourse version, applied to
|
|
base+head. This is the §1 case: keep the upgrade-to-latest test; the 0.7.0 custom tests need not
|
|
run. Document it; if a deployable prior without the re-pin exists, prefer upgrading from that.
|
|
- [ ] **mumble `compose.host-ports.yml` (cc-ci copy for the old base) — DROP it.** Deploying mumble
|
|
0.2.0 does NOT need host-ports (that overlay only *publishes* 64738 for on-host tests). Per §1:
|
|
deploy 0.2.0 without it, **skip 0.2.0's voice/on-host custom tests**, then upgrade to the latest
|
|
version (which ships `compose.host-ports.yml` natively) and run the voice tests on the latest.
|
|
Remove the cc-ci copy + its `install_steps`/`COMPOSE_FILE` wiring for the old base; the current
|
|
version's native overlay is untouched.
|
|
|
|
## 3. Definition of Done (Adversary cold-verifies)
|
|
- [ ] Every surviving cc-ci overlay is minimal, header-documents its justification (the abra/upstream
|
|
limitation), and is Adversary-confirmed to not weaken a test or mask a defect.
|
|
- [ ] The mumble old-base cc-ci host-ports copy is removed; mumble still **upgrades to latest** and runs
|
|
its voice tests **on the latest** (0.2.0's voice tests skipped + recorded).
|
|
- [ ] ghost + discourse still pass full suites; discourse still tests the upgrade to latest.
|
|
- [ ] Any upstream PR opened (e.g. ghost/discourse `start_period`) follows the recipe-PR rule
|
|
(cc-ci-green via `!testme` before operator merge); the overlay remains as the cc-ci fallback.
|
|
- [ ] No upgrade-to-latest test was dropped to avoid an awkward from-version.
|
|
|
|
## 4. Guardrails
|
|
- **Correctness first** — never weaken/skip/soften a check to make a deploy or upgrade pass; an
|
|
overlay tunes deploy/infra only (its header must say how), the real assertions stand.
|
|
- **Justify + document every overlay**; prefer the upstream PR where the fix belongs.
|
|
- **Real abra path** throughout.
|