overlay policy (content): §9 guardrail rewrite + plan-ccci-compose-overlay-policy.md
The prior commit only captured the file deletion (git add aborted on the already-removed pathspec). This adds the actual content: the reworked §9 guardrail (justified ccci overlays OK; abra can't env start_period; always test upgrade-to-latest, from-version custom tests skippable) and the new policy doc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
73
cc-ci-plan/plan-ccci-compose-overlay-policy.md
Normal file
73
cc-ci-plan/plan-ccci-compose-overlay-policy.md
Normal file
@ -0,0 +1,73 @@
|
||||
# Policy + cleanup — cc-ci compose overlays (when they're justified) & upgrade-tier from-version
|
||||
|
||||
**Status:** POLICY (codifies `plan.md §9`) + a small set of follow-ups. **Owner:** Builder + Adversary.
|
||||
**This file:** `/srv/cc-ci/cc-ci-plan/plan-ccci-compose-overlay-policy.md`
|
||||
**Supersedes** the earlier `plan-prefer-env-over-compose-overlay.md` (its premise — parameterize
|
||||
`start_period` via an env var — is **wrong: abra does not support an env value for `start_period`**).
|
||||
|
||||
---
|
||||
|
||||
## 0. Policy (operator, 2026-05-30)
|
||||
|
||||
A cc-ci-authored compose overlay (`compose.ccci-*.yml` layered via `COMPOSE_FILE`) risks **drift** from
|
||||
the recipe users run — so **avoid where possible and justify each use**. But it is a **legitimate,
|
||||
uniform fallback pattern**, not forbidden:
|
||||
|
||||
- **Prefer an upstream recipe PR** in most cases — a real robustness fix, or exposing a knob the recipe
|
||||
should expose. That's where a fix usually belongs.
|
||||
- **A ccci overlay is the right tool when the value can't be supplied any other way** — notably a
|
||||
healthcheck **`start_period`**, which **abra cannot take from an env var**. The ghost/discourse
|
||||
`start_period` bumps therefore **stay as overlays** (an env PR is impossible for that field).
|
||||
- **Uniform pattern (acceptable fallback):** an optional `compose.ccci-<purpose>.yml` per recipe,
|
||||
provided into the checkout by `install_steps.sh`, wired by `recipe_meta` `COMPOSE_FILE`, kept as an
|
||||
untracked file so it survives the upgrade `git checkout -f` (`CHAOS_BASE_DEPLOY=True`; `assert_upgraded`
|
||||
strips the `+U` marker — see DECISIONS 2026-05-30).
|
||||
- **Each overlay must:** be **minimal + single-purpose**, **document WHY** in its header (the exact
|
||||
abra/upstream limitation that forces it), and be **Adversary-confirmed** to not weaken a test or mask
|
||||
a recipe defect. Where the fix also belongs upstream (e.g. a `start_period` too tight for any slow
|
||||
host), **file the upstream PR too** — the overlay is the cc-ci-side fallback, not a reason to skip it.
|
||||
|
||||
## 1. Upgrade tier: always test the upgrade to LATEST
|
||||
|
||||
Don't drop the upgrade test because the *from* (older) version is awkward.
|
||||
- **Always perform the upgrade to the latest version and run the full assertions on the latest.**
|
||||
- If the older from-version can't be fully deployed/tested (image tag removed from the registry, or it
|
||||
predates an overlay/feature), you do **NOT** need that older version's **custom tests** to run.
|
||||
Deploy it minimally (a justified overlay is fine) or upgrade from the nearest deployable prior; skip
|
||||
only the from-version's custom assertions, and **record** that.
|
||||
- Skipping a from-version's custom tests = honest, recorded. Skipping upgrade-to-latest = not OK.
|
||||
|
||||
## 2. Disposition of the current overlays
|
||||
|
||||
- [ ] **ghost `compose.ccci-health.yml` (start_period 900s) — KEEP, justified.** abra can't env-param
|
||||
`start_period`; the fresh-DB migration needs the larger grace or swarm kills it → deadlock.
|
||||
Confirm the header documents this; consider an upstream PR raising ghost's `start_period` (it's a
|
||||
real slow-host fragility) — but the overlay stays regardless.
|
||||
- [ ] **discourse `compose.ccci-health.yml` — KEEP, justified (both parts).** (a) `start_period 1200s`
|
||||
(same reason as ghost). (b) The `bitnami/discourse:3.3.1 → bitnamilegacy/discourse:3.3.1` re-pin
|
||||
makes the from-version (0.7.0, whose `bitnami/discourse` tag Docker Hub now 404s) **deployable so
|
||||
the upgrade-to-latest test can run** — namespace-only, identical discourse version, applied to
|
||||
base+head. This is the §1 case: keep the upgrade-to-latest test; the 0.7.0 custom tests need not
|
||||
run. Document it; if a deployable prior without the re-pin exists, prefer upgrading from that.
|
||||
- [ ] **mumble `compose.host-ports.yml` (cc-ci copy for the old base) — DROP it.** Deploying mumble
|
||||
0.2.0 does NOT need host-ports (that overlay only *publishes* 64738 for on-host tests). Per §1:
|
||||
deploy 0.2.0 without it, **skip 0.2.0's voice/on-host custom tests**, then upgrade to the latest
|
||||
version (which ships `compose.host-ports.yml` natively) and run the voice tests on the latest.
|
||||
Remove the cc-ci copy + its `install_steps`/`COMPOSE_FILE` wiring for the old base; the current
|
||||
version's native overlay is untouched.
|
||||
|
||||
## 3. Definition of Done (Adversary cold-verifies)
|
||||
- [ ] Every surviving cc-ci overlay is minimal, header-documents its justification (the abra/upstream
|
||||
limitation), and is Adversary-confirmed to not weaken a test or mask a defect.
|
||||
- [ ] The mumble old-base cc-ci host-ports copy is removed; mumble still **upgrades to latest** and runs
|
||||
its voice tests **on the latest** (0.2.0's voice tests skipped + recorded).
|
||||
- [ ] ghost + discourse still pass full suites; discourse still tests the upgrade to latest.
|
||||
- [ ] Any upstream PR opened (e.g. ghost/discourse `start_period`) follows the recipe-PR rule
|
||||
(cc-ci-green via `!testme` before operator merge); the overlay remains as the cc-ci fallback.
|
||||
- [ ] No upgrade-to-latest test was dropped to avoid an awkward from-version.
|
||||
|
||||
## 4. Guardrails
|
||||
- **Correctness first** — never weaken/skip/soften a check to make a deploy or upgrade pass; an
|
||||
overlay tunes deploy/infra only (its header must say how), the real assertions stand.
|
||||
- **Justify + document every overlay**; prefer the upstream PR where the fix belongs.
|
||||
- **Real abra path** throughout.
|
||||
@ -830,20 +830,25 @@ Each default stands until the Adversary or reality forces a change; record the c
|
||||
a real app-level check — that **RAISES on actual non-readiness**, never a no-op that masks a failed
|
||||
deploy. **Prove it has teeth** (a negative test that fails on stuck convergence, e.g. F2-12's
|
||||
P7-negative). The Adversary treats a custom probe as a potential test-weakening until cold-verified.
|
||||
- **Don't fork the recipe's compose — parameterize upstream, tune via env.** A cc-ci-authored compose
|
||||
file/overlay (an extra `compose.*.yml` layered via `COMPOSE_FILE`) is **avoided wherever possible**:
|
||||
it risks **silent drift** from the recipe actually shipped, so you'd no longer be testing what users
|
||||
get. When a recipe needs a value tuned for cc-ci's environment (e.g. a longer healthcheck
|
||||
`start_period` for the slower single node), the **preferred fix is an upstream recipe PR** that
|
||||
exposes it as an **env var** (e.g. `APP_START_PERIOD`) with the **current value as the default in
|
||||
`env.sample`** — then CI just sets that env in the app `.env`, no new compose. The env knob also
|
||||
helps real operators on slow hosts. **Old-version testability:** if making the **upgrade tier** work
|
||||
from an older base version would need a custom compose (a since-removed image tag, or an overlay the
|
||||
old version predates), **prefer DECLARING that older version not-testable under this CI env** (note
|
||||
it + skip that crossover) over authoring a custom compose for it. A cc-ci compose overlay is a
|
||||
**last resort** only when neither path is possible — Adversary-confirmed non-drifting and paired with
|
||||
the upstream-env PR that will obsolete it. (The existing ghost/discourse `compose.ccci-health.yml`
|
||||
start_period overlays + discourse's image re-pin are exactly this debt — migrate per
|
||||
`plan-prefer-env-over-compose-overlay.md`.)
|
||||
- **Custom cc-ci compose overlays — avoid where possible, justify each, prefer upstream.** A
|
||||
cc-ci-authored compose overlay (an extra `compose.*.yml` layered via `COMPOSE_FILE`) risks **drift**
|
||||
from the recipe users actually run, so **avoid it where possible and justify each use**. In most
|
||||
cases the cleaner fix is an **upstream recipe PR** — either a genuine robustness fix, or exposing a
|
||||
knob the recipe should expose. **But a uniform, optional `compose.ccci-*.yml` overlay file per
|
||||
recipe is an acceptable fallback** — especially for a value abra/compose can't take from an env var.
|
||||
**Known limitation (builder, 2026-05-30): abra does NOT support an env value for a healthcheck
|
||||
`start_period`.** So the ghost/discourse `start_period` bumps legitimately **need** the overlay (an
|
||||
env-var PR is not possible for that field) — these overlays **stay**, justified. When you do use an
|
||||
overlay: keep it **minimal + single-purpose**, **document WHY in the file header** (the exact abra/
|
||||
upstream limitation that forces it), have the **Adversary confirm it doesn't weaken a test or mask a
|
||||
recipe defect**, and **file the upstream PR where the fix genuinely belongs** (e.g. if a recipe's
|
||||
`start_period` is too tight for any slow host, propose raising it upstream too).
|
||||
- **Upgrade tier: always test the upgrade to the LATEST version.** Don't drop the upgrade test just
|
||||
because the *from* (older) version is awkward. If an older from-version can't be fully deployed/tested
|
||||
(its image tag was pulled from the registry, or it predates an overlay/feature), you do **NOT** need
|
||||
that older version's **custom tests** to run — deploy it minimally (a justified overlay is fine) or
|
||||
pick the nearest deployable prior, then **upgrade to latest and run the full assertions on the
|
||||
latest**. Skipping a from-version's custom tests is an honest, recorded outcome; skipping the
|
||||
upgrade-to-latest is not. (See `plan-ccci-compose-overlay-policy.md` for the per-recipe disposition.)
|
||||
- **Honest reporting.** If a stage is skipped or a check failed, say so in `STATUS.md`/`JOURNAL.md`
|
||||
with the output. The loop's value depends entirely on the ledgers being true.
|
||||
|
||||
Reference in New Issue
Block a user