All checks were successful
continuous-integration/drone/push Build is passing
58 lines
4.3 KiB
YAML
58 lines
4.3 KiB
YAML
---
|
|
version: "3.8"
|
|
# cc-ci overlay (Phase 2 Q4.6) — minimal, single-purpose: make the UPGRADE-tier BASE deploy (the
|
|
# previous published discourse version) deployable so upgrade-to-latest can run.
|
|
#
|
|
# WHY THIS OVERLAY EXISTS (plan-ccci-compose-overlay-policy.md §1 "minimal justified fallback" +
|
|
# the §1 mandate that upgrade-to-latest must ALWAYS run): the harness base-deploys the from-version
|
|
# (UPGRADE_BASE_VERSION = 0.7.0+3.3.1), then `deploy --chaos` to the recipe-PR head. Two blockers on
|
|
# that published base, both resolved here, NEITHER weakening any test:
|
|
# 1. RE-PIN: every published discourse tag pins `bitnami/discourse:3.3.1` (and 0.6.3 → 3.1.2),
|
|
# but Docker Hub REMOVED the bitnami/discourse namespace (404). The recipe-PR (recipe-maintainers/
|
|
# discourse#1) re-pins app+sidekiq to `bitnamilegacy/discourse:3.3.1` (the legit upstream
|
|
# relocation of the identical image). This overlay applies the SAME namespace-only re-pin to the
|
|
# BASE 0.7.0 (identical version 3.3.1, identical image content) so the from-version pulls — exactly
|
|
# the policy-blessed "minimal bitnami→bitnamilegacy re-pin overlay on the 0.7.0 from-version".
|
|
# 2. GRACE: discourse's Rails cold first boot (DB migrate + asset precompile) is 15-25min on cc-ci,
|
|
# exceeding the published 5m start_period → swarm kills the still-booting app. start_period CANNOT
|
|
# be an env var (abra validates the literal 'duration' BEFORE substitution → FATA; Adversary-
|
|
# reproduced, REVIEW-2 4b862f6), so we widen it to a literal 20m on the BASE. The PR head already
|
|
# ships 20m, so this overlay is idempotent on the head (it persists untracked across the checkout).
|
|
# Both changes are namespace/grace-only: identical image content, a healthy check still marks healthy
|
|
# immediately → NO assertion is weakened and no defect is masked.
|
|
#
|
|
# NOTE (prepull): the published recipe ships `sidekiq.depends_on: [discourse]` but the main service is
|
|
# named `app` (`discourse` is undefined), so `abra app config --images` returns invalid-compose (rc=15)
|
|
# and the harness prepull is SKIPPED. This overlay does NOT try to override depends_on — compose
|
|
# normalizes short-form depends_on to a map and map-merge is additive, so an override can't REMOVE the
|
|
# bad `discourse` key. Instead the 2.4GB `bitnamilegacy/discourse:3.3.1` image is kept warm in the node
|
|
# image cache, so the inline pull during deploy is a no-op and convergence isn't pull-bound. (swarm
|
|
# ignores depends_on, so the dangling ref has zero runtime effect — a recipe lint nit, not a defect.)
|
|
#
|
|
# 3. UPGRADE ROLLOUT (dstamp 2026-06-11, direct-evidence attribution in JOURNAL-dstamp): the
|
|
# published app service sets `deploy.update_config: { failure_action: rollback, order:
|
|
# start-first }`. On the upgrade chaos redeploy (base 0.7.0 → PR head), start-first runs the OLD
|
|
# and NEW precompile/Rails-heavy discourse tasks CO-RESIDENT (~2x memory); under host memory
|
|
# pressure the NEW task intermittently OOMs/fails swarm's update monitor → `failure_action:
|
|
# rollback` reverts the app service to its PREVIOUS spec, INCLUDING the
|
|
# `coop-cloud.<stack>.chaos-version` label (head → base). Because start-first keeps the OLD task
|
|
# serving, wait_healthy still passes, and HC1 then reads the reverted BASE commit (eb96de9+U) and
|
|
# misreports it as 'the re-checkout failed' — the dstamp drift, reproduced solo (runs
|
|
# dstamp-repro1/4) with `.Spec.chaos-version=7ae7b0f7+U` (head applied) flipping to
|
|
# `.PreviousSpec=eb96de94+U` after the rollback. FIX: `order: stop-first` so the NEW task boots
|
|
# with the full host memory (no 2x co-residency) and genuinely becomes healthy → no spurious
|
|
# rollback. This is a CI deploy-rollout tweak only: the upgrade still really deploys + asserts the
|
|
# PR-head code under test, and `failure_action: rollback` is LEFT intact, so a genuinely broken
|
|
# head still rolls back and is caught (lifecycle.assert_upgrade_converged) — NO test is weakened.
|
|
# Trade-off: brief real downtime during the CI upgrade (covered by DEPLOY_TIMEOUT 3600).
|
|
services:
|
|
app:
|
|
image: bitnamilegacy/discourse:3.3.1
|
|
healthcheck:
|
|
start_period: 20m
|
|
deploy:
|
|
update_config:
|
|
order: stop-first
|
|
sidekiq:
|
|
image: bitnamilegacy/discourse:3.3.1
|