Policy: prefer upstream env-parameterization over cc-ci compose overlays
Operator (2026-05-30): a cc-ci-authored compose overlay risks silent drift from the recipe users actually run — avoid it wherever possible. - plan.md §9 guardrail: when a recipe needs a cc-ci-env-tuned value (e.g. a longer healthcheck start_period for the slow single node), the preferred fix is an UPSTREAM recipe PR exposing it as an env var (e.g. APP_START_PERIOD) with the current value as the default in env.sample — CI sets the env, no new compose. For making the upgrade tier work from an older base version, prefer DECLARING that version not-testable under this CI env over crafting a custom compose. Overlay = last resort, Adversary-confirmed non-drifting + paired with the env PR. - plan-prefer-env-over-compose-overlay.md: migrates the existing debt — ghost/discourse compose.ccci-health.yml start_period -> APP_START_PERIOD recipe PRs (default=current) then drop the overlays; discourse image re-pin + mumble old-base host-ports copy -> declare those old versions untestable instead of forking compose. No test weakened; untestable-version is an honest outcome. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
76
cc-ci-plan/plan-prefer-env-over-compose-overlay.md
Normal file
76
cc-ci-plan/plan-prefer-env-over-compose-overlay.md
Normal file
@ -0,0 +1,76 @@
|
||||
# Sub-plan — prefer upstream env-parameterization over cc-ci compose overlays
|
||||
|
||||
**Status:** QUEUED — a policy + a set of recipe-PR / harness follow-ups. Picks up as near-term Phase-2
|
||||
units (no phase-pause). **Owner:** Builder + Adversary loops. **This file:**
|
||||
`/srv/cc-ci/cc-ci-plan/plan-prefer-env-over-compose-overlay.md`
|
||||
**Codifies:** the `plan.md §9` guardrail "Don't fork the recipe's compose — parameterize upstream,
|
||||
tune via env."
|
||||
|
||||
---
|
||||
|
||||
## 0. Principle (operator, 2026-05-30)
|
||||
|
||||
**A cc-ci-authored compose file/overlay must be avoided wherever possible** — every extra
|
||||
`compose.*.yml` we layer via `COMPOSE_FILE` is a private fork of the deployment that can **drift from
|
||||
the recipe users actually run**, so we'd stop testing what ships. Two strictly-preferred alternatives:
|
||||
|
||||
1. **Need a value tuned for cc-ci's env (e.g. a longer healthcheck `start_period`)** → open an
|
||||
**upstream recipe PR** that exposes it as an **env var** (current value as the default in
|
||||
`env.sample`), e.g. `APP_START_PERIOD`. cc-ci then sets that env in the app `.env` (via
|
||||
`recipe_meta` `EXTRA_ENV`) — **no new compose**. Bonus: real operators on slow hosts get the same
|
||||
knob.
|
||||
2. **Need a custom compose only to make the UPGRADE tier work from an older base version** (a
|
||||
since-removed image tag, or an overlay the old version predates) → **prefer declaring that older
|
||||
version not-testable under this CI env** (record it + skip/scope that crossover) over authoring a
|
||||
custom compose for it.
|
||||
|
||||
A cc-ci compose overlay is a **last resort** only when neither works; it must be Adversary-confirmed
|
||||
non-drifting and paired with the upstream-env PR that will obsolete it.
|
||||
|
||||
## 1. The current debt to migrate (3 overlays)
|
||||
|
||||
- **ghost `compose.ccci-health.yml`** (app `start_period: 900s`) — the fresh-DB MySQL migration
|
||||
(~6–9 min) exceeds the recipe's `start_period: 1m` → swarm kills it mid-migration → `migrations_lock`
|
||||
deadlock.
|
||||
- **discourse `compose.ccci-health.yml`** (app `start_period: 1200s` + image re-pin
|
||||
`bitnami/discourse:3.3.1`→`bitnamilegacy/discourse:3.3.1` on app+sidekiq) — Rails cold boot 15–25 min
|
||||
exceeds the recipe's `start_period: 5m`; the image re-pin is to make the **old base version**
|
||||
deployable after Docker Hub dropped the `bitnami/discourse` tags.
|
||||
- **mumble `compose.host-ports.yml`** — a copy of the *upstream* host-ports overlay, provided only so
|
||||
the **older base version (0.2.0+)** that predates it can resolve `COMPOSE_FILE`. (The current version
|
||||
ships it natively — that part is fine and stays.)
|
||||
|
||||
## 2. Definition of Done (Adversary cold-verifies)
|
||||
|
||||
- [ ] **E1 — ghost: `start_period` env PR.** Recipe PR to ghost exposing the app healthcheck
|
||||
`start_period` as an env var (e.g. `APP_START_PERIOD`), **default = the current recipe value** in
|
||||
`env.sample` (no behavior change for existing users). Verified green on cc-ci (the recipe-PR
|
||||
dogfood). Then cc-ci sets `APP_START_PERIOD` via `recipe_meta` `EXTRA_ENV` and **removes
|
||||
`tests/ghost/compose.ccci-health.yml` + its `COMPOSE_FILE`/`install_steps` wiring** — full ghost
|
||||
suite still green (install migration completes, no deadlock).
|
||||
- [ ] **E2 — discourse: `start_period` env PR.** Same pattern for discourse's app `start_period`;
|
||||
remove the `start_period` half of `compose.ccci-health.yml` once CI tunes it by env.
|
||||
- [ ] **E3 — discourse old-base image re-pin → declare untestable instead.** Do NOT keep the
|
||||
`bitnami→bitnamilegacy` re-pin in a cc-ci compose. Either (a) the recipe PR re-pins the image
|
||||
upstream (if that's the genuine recipe fix), OR (b) **declare the old base version not-testable
|
||||
under cc-ci** (its image is gone from the registry) and scope the upgrade crossover accordingly
|
||||
— recorded in `DECISIONS.md`. Remove the re-pin from any cc-ci compose.
|
||||
- [ ] **E4 — mumble old-base host-ports → declare untestable instead.** Drop the cc-ci copy of
|
||||
`compose.host-ports.yml`; for the old base that predates the upstream overlay, **declare that
|
||||
version not-testable under cc-ci's on-host port requirement** rather than shipping a copy. The
|
||||
current version (ships the overlay natively) tests normally.
|
||||
- [ ] **E5 — No cc-ci compose overlays remain** (`tests/**/compose.*.yml` that cc-ci authored/copied
|
||||
are gone), OR any that genuinely cannot be replaced is Adversary-justified + paired with a filed
|
||||
upstream-env PR. The guardrail (`plan.md §9`) holds going forward.
|
||||
- [ ] **E6 — No test weakened.** Every affected recipe's full suite still passes with real assertions
|
||||
+ real healthcheck gating; the only change is *how* the value is supplied (env, not a forked
|
||||
compose) or that an un-runnable old crossover is honestly skipped — Adversary cold-verified.
|
||||
|
||||
## 3. Notes / guardrails
|
||||
- Env PRs follow the standard recipe-PR rule: "working" only when cc-ci verifies the full suite green
|
||||
(`/recipe-upgrade`'s `!testme`-on-PR path), operator-merged. Mirrors the recipe-robustness PR pattern
|
||||
(lasuite-drive collabora, plausible Q4.7b, immich).
|
||||
- **Declaring a version untestable is a first-class, honest outcome** — record which version + why
|
||||
(registry tag gone / predates a required overlay) in `DECISIONS.md`; it is NOT a test weakening.
|
||||
- Until an env PR is merged upstream, cc-ci may need the recipe-PR *branch* (via `SRC`+`REF`) to test
|
||||
green — that's fine (it's the recipe under test), unlike a private cc-ci compose fork.
|
||||
@ -830,5 +830,20 @@ Each default stands until the Adversary or reality forces a change; record the c
|
||||
a real app-level check — that **RAISES on actual non-readiness**, never a no-op that masks a failed
|
||||
deploy. **Prove it has teeth** (a negative test that fails on stuck convergence, e.g. F2-12's
|
||||
P7-negative). The Adversary treats a custom probe as a potential test-weakening until cold-verified.
|
||||
- **Don't fork the recipe's compose — parameterize upstream, tune via env.** A cc-ci-authored compose
|
||||
file/overlay (an extra `compose.*.yml` layered via `COMPOSE_FILE`) is **avoided wherever possible**:
|
||||
it risks **silent drift** from the recipe actually shipped, so you'd no longer be testing what users
|
||||
get. When a recipe needs a value tuned for cc-ci's environment (e.g. a longer healthcheck
|
||||
`start_period` for the slower single node), the **preferred fix is an upstream recipe PR** that
|
||||
exposes it as an **env var** (e.g. `APP_START_PERIOD`) with the **current value as the default in
|
||||
`env.sample`** — then CI just sets that env in the app `.env`, no new compose. The env knob also
|
||||
helps real operators on slow hosts. **Old-version testability:** if making the **upgrade tier** work
|
||||
from an older base version would need a custom compose (a since-removed image tag, or an overlay the
|
||||
old version predates), **prefer DECLARING that older version not-testable under this CI env** (note
|
||||
it + skip that crossover) over authoring a custom compose for it. A cc-ci compose overlay is a
|
||||
**last resort** only when neither path is possible — Adversary-confirmed non-drifting and paired with
|
||||
the upstream-env PR that will obsolete it. (The existing ghost/discourse `compose.ccci-health.yml`
|
||||
start_period overlays + discourse's image re-pin are exactly this debt — migrate per
|
||||
`plan-prefer-env-over-compose-overlay.md`.)
|
||||
- **Honest reporting.** If a stage is skipped or a check failed, say so in `STATUS.md`/`JOURNAL.md`
|
||||
with the output. The loop's value depends entirely on the ledgers being true.
|
||||
|
||||
Reference in New Issue
Block a user