Operator: use a single uniform filename `compose.ccci.yml` per recipe (one file
holding all cc-ci-side deploy tweaks) rather than per-purpose suffixes like
compose.ccci-health.yml. Updated §9 + plan-ccci-compose-overlay-policy.md; added
a DoD item to rename tests/{ghost,discourse}/compose.ccci-health.yml ->
compose.ccci.yml and update their install_steps.sh cp target + recipe_meta
COMPOSE_FILE.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.8 KiB
5.8 KiB
Policy + cleanup — cc-ci compose overlays (when they're justified) & upgrade-tier from-version
Status: POLICY (codifies plan.md §9) + a small set of follow-ups. Owner: Builder + Adversary.
This file: /srv/cc-ci/cc-ci-plan/plan-ccci-compose-overlay-policy.md
Supersedes the earlier plan-prefer-env-over-compose-overlay.md (its premise — parameterize
start_period via an env var — is wrong: abra does not support an env value for start_period).
0. Policy (operator, 2026-05-30)
A cc-ci-authored compose overlay (the single compose.ccci.yml, layered via COMPOSE_FILE) risks
drift from the recipe users run — so avoid where possible and justify each use. But it is a
legitimate, uniform fallback pattern, not forbidden:
- Prefer an upstream recipe PR in most cases — a real robustness fix, or exposing a knob the recipe should expose. That's where a fix usually belongs.
- A ccci overlay is the right tool when the value can't be supplied any other way — notably a
healthcheck
start_period, which abra cannot take from an env var. The ghost/discoursestart_periodbumps therefore stay as overlays (an env PR is impossible for that field). - Uniform pattern (acceptable fallback): a single, fixed-name
compose.ccci.ymlper recipe (NOT per-purpose suffixes — one file holds all cc-ci-side deploy tweaks for that recipe), provided into the checkout byinstall_steps.sh, wired byrecipe_metaCOMPOSE_FILE(compose.yml:compose.ccci.yml), kept as an untracked file so it survives the upgradegit checkout -f(CHAOS_BASE_DEPLOY=True;assert_upgradedstrips the+Umarker — see DECISIONS 2026-05-30). - Each overlay must: be minimal + single-purpose, document WHY in its header (the exact
abra/upstream limitation that forces it), and be Adversary-confirmed to not weaken a test or mask
a recipe defect. Where the fix also belongs upstream (e.g. a
start_periodtoo tight for any slow host), file the upstream PR too — the overlay is the cc-ci-side fallback, not a reason to skip it.
1. Upgrade tier: always test the upgrade to LATEST
Don't drop the upgrade test because the from (older) version is awkward.
- Always perform the upgrade to the latest version and run the full assertions on the latest.
- If the older from-version can't be fully deployed/tested (image tag removed from the registry, or it predates an overlay/feature), you do NOT need that older version's custom tests to run. Deploy it minimally (a justified overlay is fine) or upgrade from the nearest deployable prior; skip only the from-version's custom assertions, and record that.
- Skipping a from-version's custom tests = honest, recorded. Skipping upgrade-to-latest = not OK.
2. Disposition of the current overlays
- RENAME the overlay files to the uniform
compose.ccci.yml.tests/ghost/compose.ccci-health.ymlandtests/discourse/compose.ccci-health.yml→compose.ccci.yml; update each recipe'sinstall_steps.sh(thecptarget) andrecipe_metaCOMPOSE_FILE(compose.yml:compose.ccci-health.yml→compose.yml:compose.ccci.yml). One fixed filename per recipe going forward. - ghost
compose.ccci.yml(start_period 900s) — KEEP, justified. abra can't env-paramstart_period; the fresh-DB migration needs the larger grace or swarm kills it → deadlock. Confirm the header documents this; consider an upstream PR raising ghost'sstart_period(it's a real slow-host fragility) — but the overlay stays regardless. - discourse
compose.ccci.yml— KEEP, justified (both parts). (a)start_period 1200s(same reason as ghost). (b) Thebitnami/discourse:3.3.1 → bitnamilegacy/discourse:3.3.1re-pin makes the from-version (0.7.0, whosebitnami/discoursetag Docker Hub now 404s) deployable so the upgrade-to-latest test can run — namespace-only, identical discourse version, applied to base+head. This is the §1 case: keep the upgrade-to-latest test; the 0.7.0 custom tests need not run. Document it; if a deployable prior without the re-pin exists, prefer upgrading from that. - mumble
compose.host-ports.yml(cc-ci copy for the old base) — DROP it. Deploying mumble 0.2.0 does NOT need host-ports (that overlay only publishes 64738 for on-host tests). Per §1: deploy 0.2.0 without it, skip 0.2.0's voice/on-host custom tests, then upgrade to the latest version (which shipscompose.host-ports.ymlnatively) and run the voice tests on the latest. Remove the cc-ci copy + itsinstall_steps/COMPOSE_FILEwiring for the old base; the current version's native overlay is untouched.
3. Definition of Done (Adversary cold-verifies)
- Every surviving cc-ci overlay is minimal, header-documents its justification (the abra/upstream limitation), and is Adversary-confirmed to not weaken a test or mask a defect.
- The mumble old-base cc-ci host-ports copy is removed; mumble still upgrades to latest and runs its voice tests on the latest (0.2.0's voice tests skipped + recorded).
- ghost + discourse still pass full suites; discourse still tests the upgrade to latest.
- Any upstream PR opened (e.g. ghost/discourse
start_period) follows the recipe-PR rule (cc-ci-green via!testmebefore operator merge); the overlay remains as the cc-ci fallback. - No upgrade-to-latest test was dropped to avoid an awkward from-version.
4. Guardrails
- Correctness first — never weaken/skip/soften a check to make a deploy or upgrade pass; an overlay tunes deploy/infra only (its header must say how), the real assertions stand.
- Justify + document every overlay; prefer the upstream PR where the fix belongs.
- Real abra path throughout.