Files
cc-ci/tests/discourse/compose.ccci.yml
autonomic-bot bb2e3c6b2c
All checks were successful
continuous-integration/drone/push Build is passing
feat(prevb): dynamic upgrade base (last-green→main→skip) + per-recipe previous/ overlay; migrate discourse off static base + leaky overlay
- resolve_upgrade_base: BasePlan(kind=version|ref|skip); last-green (warm canonical) primary,
  main-tip fallback, declared skip else. UPGRADE_BASE_VERSION retained as optional override.
- deploy_app: base_ref path (chaos-deploy a main-tip/last-green commit) + apply_previous wiring.
- lifecycle: previous/ surface (has_previous, previous_target_version, previous_status decision,
  provide/remove overlay, compose_file add/remove, recipe_branch_commit, stack_service_names).
- generic.perform_upgrade: strip previous/ overlay + COMPOSE_FILE entry before head redeploy.
- discourse: compose.ccci.yml now environmental-only (order: stop-first); removed bitnamilegacy
  pins + sidekiq + UPGRADE_BASE_VERSION; test_upgrade.py asserts head image == official 3.5.3 + no sidekiq.
- unit tests: resolve_upgrade_base matrix + previous/ apply/skip/stale + COMPOSE_FILE layering.
2026-06-17 00:15:06 +00:00

30 lines
1.9 KiB
YAML

---
version: "3.8"
# cc-ci ENVIRONMENTAL overlay (phase prevb) — applies to ALL deploys (base + PR head). Node-reality
# tweaks ONLY; NO version-specific image pins or service add/drop. (Version-specific repairs, when a
# previous base can't deploy as-published, live in tests/<recipe>/previous/ — base-only. discourse
# needs none: the dynamic base = last-green / main tip = bitnamilegacy/discourse:3.5.0 deploys clean.)
#
# Fusing version-specific config into this all-deploys overlay was the prevb bug: the old overlay
# re-pinned app+sidekiq to bitnamilegacy/discourse:3.3.1 and re-added the sidekiq service on EVERY
# deploy, so the PR head (official discourse/discourse:3.5.3, sidekiq dropped) was silently reverted
# to the old image and its migration never tested. Removed entirely; only the environmental tweak below
# remains.
#
# WHY (environmental — depends on the cc-ci node, must apply to the head too): the upgrade chaos
# redeploy (base bitnamilegacy:3.5.0 → PR-head official 3.5.3) runs the OLD and NEW Rails-heavy
# (DB-migrate + asset-precompile) tasks. The published recipe sets
# app.deploy.update_config.order: start-first, which keeps both co-resident (~2x memory); under the
# single CI node's memory pressure the NEW task intermittently OOMs swarm's update monitor →
# failure_action: rollback reverts the app spec (incl. the coop-cloud.<stack>.chaos-version label)
# while the old task keeps serving, and the upgrade is misreported (dstamp finding, JOURNAL-dstamp).
# FIX: order: stop-first so the NEW task boots with the full host memory (no 2x co-residency) and
# genuinely becomes healthy. failure_action: rollback is LEFT intact → a genuinely broken head still
# rolls back and is caught (lifecycle.assert_upgrade_converged) — NO test/assertion is weakened.
# Trade-off: brief real downtime during the CI upgrade (covered by DEPLOY_TIMEOUT 3600).
services:
app:
deploy:
update_config:
order: stop-first