Files
cc-ci-orchestrator/cc-ci-plan/plan-prefer-env-over-compose-overlay.md
autonomic-bot 7a1f7f75aa Policy: prefer upstream env-parameterization over cc-ci compose overlays
Operator (2026-05-30): a cc-ci-authored compose overlay risks silent drift from
the recipe users actually run — avoid it wherever possible.

- plan.md §9 guardrail: when a recipe needs a cc-ci-env-tuned value (e.g. a longer
  healthcheck start_period for the slow single node), the preferred fix is an
  UPSTREAM recipe PR exposing it as an env var (e.g. APP_START_PERIOD) with the
  current value as the default in env.sample — CI sets the env, no new compose.
  For making the upgrade tier work from an older base version, prefer DECLARING
  that version not-testable under this CI env over crafting a custom compose.
  Overlay = last resort, Adversary-confirmed non-drifting + paired with the env PR.
- plan-prefer-env-over-compose-overlay.md: migrates the existing debt —
  ghost/discourse compose.ccci-health.yml start_period -> APP_START_PERIOD recipe
  PRs (default=current) then drop the overlays; discourse image re-pin + mumble
  old-base host-ports copy -> declare those old versions untestable instead of
  forking compose. No test weakened; untestable-version is an honest outcome.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 15:17:42 +01:00

77 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Sub-plan — prefer upstream env-parameterization over cc-ci compose overlays
**Status:** QUEUED — a policy + a set of recipe-PR / harness follow-ups. Picks up as near-term Phase-2
units (no phase-pause). **Owner:** Builder + Adversary loops. **This file:**
`/srv/cc-ci/cc-ci-plan/plan-prefer-env-over-compose-overlay.md`
**Codifies:** the `plan.md §9` guardrail "Don't fork the recipe's compose — parameterize upstream,
tune via env."
---
## 0. Principle (operator, 2026-05-30)
**A cc-ci-authored compose file/overlay must be avoided wherever possible** — every extra
`compose.*.yml` we layer via `COMPOSE_FILE` is a private fork of the deployment that can **drift from
the recipe users actually run**, so we'd stop testing what ships. Two strictly-preferred alternatives:
1. **Need a value tuned for cc-ci's env (e.g. a longer healthcheck `start_period`)** → open an
**upstream recipe PR** that exposes it as an **env var** (current value as the default in
`env.sample`), e.g. `APP_START_PERIOD`. cc-ci then sets that env in the app `.env` (via
`recipe_meta` `EXTRA_ENV`) — **no new compose**. Bonus: real operators on slow hosts get the same
knob.
2. **Need a custom compose only to make the UPGRADE tier work from an older base version** (a
since-removed image tag, or an overlay the old version predates) → **prefer declaring that older
version not-testable under this CI env** (record it + skip/scope that crossover) over authoring a
custom compose for it.
A cc-ci compose overlay is a **last resort** only when neither works; it must be Adversary-confirmed
non-drifting and paired with the upstream-env PR that will obsolete it.
## 1. The current debt to migrate (3 overlays)
- **ghost `compose.ccci-health.yml`** (app `start_period: 900s`) — the fresh-DB MySQL migration
(~69 min) exceeds the recipe's `start_period: 1m` → swarm kills it mid-migration → `migrations_lock`
deadlock.
- **discourse `compose.ccci-health.yml`** (app `start_period: 1200s` + image re-pin
`bitnami/discourse:3.3.1``bitnamilegacy/discourse:3.3.1` on app+sidekiq) — Rails cold boot 1525 min
exceeds the recipe's `start_period: 5m`; the image re-pin is to make the **old base version**
deployable after Docker Hub dropped the `bitnami/discourse` tags.
- **mumble `compose.host-ports.yml`** — a copy of the *upstream* host-ports overlay, provided only so
the **older base version (0.2.0+)** that predates it can resolve `COMPOSE_FILE`. (The current version
ships it natively — that part is fine and stays.)
## 2. Definition of Done (Adversary cold-verifies)
- [ ] **E1 — ghost: `start_period` env PR.** Recipe PR to ghost exposing the app healthcheck
`start_period` as an env var (e.g. `APP_START_PERIOD`), **default = the current recipe value** in
`env.sample` (no behavior change for existing users). Verified green on cc-ci (the recipe-PR
dogfood). Then cc-ci sets `APP_START_PERIOD` via `recipe_meta` `EXTRA_ENV` and **removes
`tests/ghost/compose.ccci-health.yml` + its `COMPOSE_FILE`/`install_steps` wiring** — full ghost
suite still green (install migration completes, no deadlock).
- [ ] **E2 — discourse: `start_period` env PR.** Same pattern for discourse's app `start_period`;
remove the `start_period` half of `compose.ccci-health.yml` once CI tunes it by env.
- [ ] **E3 — discourse old-base image re-pin → declare untestable instead.** Do NOT keep the
`bitnami→bitnamilegacy` re-pin in a cc-ci compose. Either (a) the recipe PR re-pins the image
upstream (if that's the genuine recipe fix), OR (b) **declare the old base version not-testable
under cc-ci** (its image is gone from the registry) and scope the upgrade crossover accordingly
— recorded in `DECISIONS.md`. Remove the re-pin from any cc-ci compose.
- [ ] **E4 — mumble old-base host-ports → declare untestable instead.** Drop the cc-ci copy of
`compose.host-ports.yml`; for the old base that predates the upstream overlay, **declare that
version not-testable under cc-ci's on-host port requirement** rather than shipping a copy. The
current version (ships the overlay natively) tests normally.
- [ ] **E5 — No cc-ci compose overlays remain** (`tests/**/compose.*.yml` that cc-ci authored/copied
are gone), OR any that genuinely cannot be replaced is Adversary-justified + paired with a filed
upstream-env PR. The guardrail (`plan.md §9`) holds going forward.
- [ ] **E6 — No test weakened.** Every affected recipe's full suite still passes with real assertions
+ real healthcheck gating; the only change is *how* the value is supplied (env, not a forked
compose) or that an un-runnable old crossover is honestly skipped — Adversary cold-verified.
## 3. Notes / guardrails
- Env PRs follow the standard recipe-PR rule: "working" only when cc-ci verifies the full suite green
(`/recipe-upgrade`'s `!testme`-on-PR path), operator-merged. Mirrors the recipe-robustness PR pattern
(lasuite-drive collabora, plausible Q4.7b, immich).
- **Declaring a version untestable is a first-class, honest outcome** — record which version + why
(registry tag gone / predates a required overlay) in `DECISIONS.md`; it is NOT a test weakening.
- Until an env PR is merged upstream, cc-ci may need the recipe-PR *branch* (via `SRC`+`REF`) to test
green — that's fine (it's the recipe under test), unlike a private cc-ci compose fork.