Files
cc-ci-orchestrator/cc-ci-plan/plan-prefer-env-over-compose-overlay.md
autonomic-bot 7a1f7f75aa Policy: prefer upstream env-parameterization over cc-ci compose overlays
Operator (2026-05-30): a cc-ci-authored compose overlay risks silent drift from
the recipe users actually run — avoid it wherever possible.

- plan.md §9 guardrail: when a recipe needs a cc-ci-env-tuned value (e.g. a longer
  healthcheck start_period for the slow single node), the preferred fix is an
  UPSTREAM recipe PR exposing it as an env var (e.g. APP_START_PERIOD) with the
  current value as the default in env.sample — CI sets the env, no new compose.
  For making the upgrade tier work from an older base version, prefer DECLARING
  that version not-testable under this CI env over crafting a custom compose.
  Overlay = last resort, Adversary-confirmed non-drifting + paired with the env PR.
- plan-prefer-env-over-compose-overlay.md: migrates the existing debt —
  ghost/discourse compose.ccci-health.yml start_period -> APP_START_PERIOD recipe
  PRs (default=current) then drop the overlays; discourse image re-pin + mumble
  old-base host-ports copy -> declare those old versions untestable instead of
  forking compose. No test weakened; untestable-version is an honest outcome.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 15:17:42 +01:00

5.4 KiB
Raw Blame History

Sub-plan — prefer upstream env-parameterization over cc-ci compose overlays

Status: QUEUED — a policy + a set of recipe-PR / harness follow-ups. Picks up as near-term Phase-2 units (no phase-pause). Owner: Builder + Adversary loops. This file: /srv/cc-ci/cc-ci-plan/plan-prefer-env-over-compose-overlay.md Codifies: the plan.md §9 guardrail "Don't fork the recipe's compose — parameterize upstream, tune via env."


0. Principle (operator, 2026-05-30)

A cc-ci-authored compose file/overlay must be avoided wherever possible — every extra compose.*.yml we layer via COMPOSE_FILE is a private fork of the deployment that can drift from the recipe users actually run, so we'd stop testing what ships. Two strictly-preferred alternatives:

  1. Need a value tuned for cc-ci's env (e.g. a longer healthcheck start_period) → open an upstream recipe PR that exposes it as an env var (current value as the default in env.sample), e.g. APP_START_PERIOD. cc-ci then sets that env in the app .env (via recipe_meta EXTRA_ENV) — no new compose. Bonus: real operators on slow hosts get the same knob.
  2. Need a custom compose only to make the UPGRADE tier work from an older base version (a since-removed image tag, or an overlay the old version predates) → prefer declaring that older version not-testable under this CI env (record it + skip/scope that crossover) over authoring a custom compose for it.

A cc-ci compose overlay is a last resort only when neither works; it must be Adversary-confirmed non-drifting and paired with the upstream-env PR that will obsolete it.

1. The current debt to migrate (3 overlays)

  • ghost compose.ccci-health.yml (app start_period: 900s) — the fresh-DB MySQL migration (~69 min) exceeds the recipe's start_period: 1m → swarm kills it mid-migration → migrations_lock deadlock.
  • discourse compose.ccci-health.yml (app start_period: 1200s + image re-pin bitnami/discourse:3.3.1bitnamilegacy/discourse:3.3.1 on app+sidekiq) — Rails cold boot 1525 min exceeds the recipe's start_period: 5m; the image re-pin is to make the old base version deployable after Docker Hub dropped the bitnami/discourse tags.
  • mumble compose.host-ports.yml — a copy of the upstream host-ports overlay, provided only so the older base version (0.2.0+) that predates it can resolve COMPOSE_FILE. (The current version ships it natively — that part is fine and stays.)

2. Definition of Done (Adversary cold-verifies)

  • E1 — ghost: start_period env PR. Recipe PR to ghost exposing the app healthcheck start_period as an env var (e.g. APP_START_PERIOD), default = the current recipe value in env.sample (no behavior change for existing users). Verified green on cc-ci (the recipe-PR dogfood). Then cc-ci sets APP_START_PERIOD via recipe_meta EXTRA_ENV and removes tests/ghost/compose.ccci-health.yml + its COMPOSE_FILE/install_steps wiring — full ghost suite still green (install migration completes, no deadlock).
  • E2 — discourse: start_period env PR. Same pattern for discourse's app start_period; remove the start_period half of compose.ccci-health.yml once CI tunes it by env.
  • E3 — discourse old-base image re-pin → declare untestable instead. Do NOT keep the bitnami→bitnamilegacy re-pin in a cc-ci compose. Either (a) the recipe PR re-pins the image upstream (if that's the genuine recipe fix), OR (b) declare the old base version not-testable under cc-ci (its image is gone from the registry) and scope the upgrade crossover accordingly — recorded in DECISIONS.md. Remove the re-pin from any cc-ci compose.
  • E4 — mumble old-base host-ports → declare untestable instead. Drop the cc-ci copy of compose.host-ports.yml; for the old base that predates the upstream overlay, declare that version not-testable under cc-ci's on-host port requirement rather than shipping a copy. The current version (ships the overlay natively) tests normally.
  • E5 — No cc-ci compose overlays remain (tests/**/compose.*.yml that cc-ci authored/copied are gone), OR any that genuinely cannot be replaced is Adversary-justified + paired with a filed upstream-env PR. The guardrail (plan.md §9) holds going forward.
  • E6 — No test weakened. Every affected recipe's full suite still passes with real assertions + real healthcheck gating; the only change is how the value is supplied (env, not a forked compose) or that an un-runnable old crossover is honestly skipped — Adversary cold-verified.

3. Notes / guardrails

  • Env PRs follow the standard recipe-PR rule: "working" only when cc-ci verifies the full suite green (/recipe-upgrade's !testme-on-PR path), operator-merged. Mirrors the recipe-robustness PR pattern (lasuite-drive collabora, plausible Q4.7b, immich).
  • Declaring a version untestable is a first-class, honest outcome — record which version + why (registry tag gone / predates a required overlay) in DECISIONS.md; it is NOT a test weakening.
  • Until an env PR is merged upstream, cc-ci may need the recipe-PR branch (via SRC+REF) to test green — that's fine (it's the recipe under test), unlike a private cc-ci compose fork.