Operator (2026-05-30): a cc-ci-authored compose overlay risks silent drift from the recipe users actually run — avoid it wherever possible. - plan.md §9 guardrail: when a recipe needs a cc-ci-env-tuned value (e.g. a longer healthcheck start_period for the slow single node), the preferred fix is an UPSTREAM recipe PR exposing it as an env var (e.g. APP_START_PERIOD) with the current value as the default in env.sample — CI sets the env, no new compose. For making the upgrade tier work from an older base version, prefer DECLARING that version not-testable under this CI env over crafting a custom compose. Overlay = last resort, Adversary-confirmed non-drifting + paired with the env PR. - plan-prefer-env-over-compose-overlay.md: migrates the existing debt — ghost/discourse compose.ccci-health.yml start_period -> APP_START_PERIOD recipe PRs (default=current) then drop the overlays; discourse image re-pin + mumble old-base host-ports copy -> declare those old versions untestable instead of forking compose. No test weakened; untestable-version is an honest outcome. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.4 KiB
Sub-plan — prefer upstream env-parameterization over cc-ci compose overlays
Status: QUEUED — a policy + a set of recipe-PR / harness follow-ups. Picks up as near-term Phase-2
units (no phase-pause). Owner: Builder + Adversary loops. This file:
/srv/cc-ci/cc-ci-plan/plan-prefer-env-over-compose-overlay.md
Codifies: the plan.md §9 guardrail "Don't fork the recipe's compose — parameterize upstream,
tune via env."
0. Principle (operator, 2026-05-30)
A cc-ci-authored compose file/overlay must be avoided wherever possible — every extra
compose.*.yml we layer via COMPOSE_FILE is a private fork of the deployment that can drift from
the recipe users actually run, so we'd stop testing what ships. Two strictly-preferred alternatives:
- Need a value tuned for cc-ci's env (e.g. a longer healthcheck
start_period) → open an upstream recipe PR that exposes it as an env var (current value as the default inenv.sample), e.g.APP_START_PERIOD. cc-ci then sets that env in the app.env(viarecipe_metaEXTRA_ENV) — no new compose. Bonus: real operators on slow hosts get the same knob. - Need a custom compose only to make the UPGRADE tier work from an older base version (a since-removed image tag, or an overlay the old version predates) → prefer declaring that older version not-testable under this CI env (record it + skip/scope that crossover) over authoring a custom compose for it.
A cc-ci compose overlay is a last resort only when neither works; it must be Adversary-confirmed non-drifting and paired with the upstream-env PR that will obsolete it.
1. The current debt to migrate (3 overlays)
- ghost
compose.ccci-health.yml(appstart_period: 900s) — the fresh-DB MySQL migration (~6–9 min) exceeds the recipe'sstart_period: 1m→ swarm kills it mid-migration →migrations_lockdeadlock. - discourse
compose.ccci-health.yml(appstart_period: 1200s+ image re-pinbitnami/discourse:3.3.1→bitnamilegacy/discourse:3.3.1on app+sidekiq) — Rails cold boot 15–25 min exceeds the recipe'sstart_period: 5m; the image re-pin is to make the old base version deployable after Docker Hub dropped thebitnami/discoursetags. - mumble
compose.host-ports.yml— a copy of the upstream host-ports overlay, provided only so the older base version (0.2.0+) that predates it can resolveCOMPOSE_FILE. (The current version ships it natively — that part is fine and stays.)
2. Definition of Done (Adversary cold-verifies)
- E1 — ghost:
start_periodenv PR. Recipe PR to ghost exposing the app healthcheckstart_periodas an env var (e.g.APP_START_PERIOD), default = the current recipe value inenv.sample(no behavior change for existing users). Verified green on cc-ci (the recipe-PR dogfood). Then cc-ci setsAPP_START_PERIODviarecipe_metaEXTRA_ENVand removestests/ghost/compose.ccci-health.yml+ itsCOMPOSE_FILE/install_stepswiring — full ghost suite still green (install migration completes, no deadlock). - E2 — discourse:
start_periodenv PR. Same pattern for discourse's appstart_period; remove thestart_periodhalf ofcompose.ccci-health.ymlonce CI tunes it by env. - E3 — discourse old-base image re-pin → declare untestable instead. Do NOT keep the
bitnami→bitnamilegacyre-pin in a cc-ci compose. Either (a) the recipe PR re-pins the image upstream (if that's the genuine recipe fix), OR (b) declare the old base version not-testable under cc-ci (its image is gone from the registry) and scope the upgrade crossover accordingly — recorded inDECISIONS.md. Remove the re-pin from any cc-ci compose. - E4 — mumble old-base host-ports → declare untestable instead. Drop the cc-ci copy of
compose.host-ports.yml; for the old base that predates the upstream overlay, declare that version not-testable under cc-ci's on-host port requirement rather than shipping a copy. The current version (ships the overlay natively) tests normally. - E5 — No cc-ci compose overlays remain (
tests/**/compose.*.ymlthat cc-ci authored/copied are gone), OR any that genuinely cannot be replaced is Adversary-justified + paired with a filed upstream-env PR. The guardrail (plan.md §9) holds going forward. - E6 — No test weakened. Every affected recipe's full suite still passes with real assertions + real healthcheck gating; the only change is how the value is supplied (env, not a forked compose) or that an un-runnable old crossover is honestly skipped — Adversary cold-verified.
3. Notes / guardrails
- Env PRs follow the standard recipe-PR rule: "working" only when cc-ci verifies the full suite green
(
/recipe-upgrade's!testme-on-PR path), operator-merged. Mirrors the recipe-robustness PR pattern (lasuite-drive collabora, plausible Q4.7b, immich). - Declaring a version untestable is a first-class, honest outcome — record which version + why
(registry tag gone / predates a required overlay) in
DECISIONS.md; it is NOT a test weakening. - Until an env PR is merged upstream, cc-ci may need the recipe-PR branch (via
SRC+REF) to test green — that's fine (it's the recipe under test), unlike a private cc-ci compose fork.