--- version: "3.8" # cc-ci ENVIRONMENTAL overlay (phase prevb) — applies to ALL deploys (base + PR head). Node-reality # tweaks ONLY; NO version-specific image pins or service add/drop. (Version-specific repairs, when a # previous base can't deploy as-published, live in tests//previous/ — base-only. discourse # needs none: the dynamic base = last-green / main tip = bitnamilegacy/discourse:3.5.0 deploys clean.) # # Fusing version-specific config into this all-deploys overlay was the prevb bug: the old overlay # re-pinned app+sidekiq to bitnamilegacy/discourse:3.3.1 and re-added the sidekiq service on EVERY # deploy, so the PR head (official discourse/discourse:3.5.3, sidekiq dropped) was silently reverted # to the old image and its migration never tested. Removed entirely; only the environmental tweak below # remains. # # WHY (environmental — depends on the cc-ci node, must apply to the head too): the upgrade chaos # redeploy (base bitnamilegacy:3.5.0 → PR-head official 3.5.3) runs the OLD and NEW Rails-heavy # (DB-migrate + asset-precompile) tasks. The published recipe sets # app.deploy.update_config.order: start-first, which keeps both co-resident (~2x memory); under the # single CI node's memory pressure the NEW task intermittently OOMs swarm's update monitor → # failure_action: rollback reverts the app spec (incl. the coop-cloud..chaos-version label) # while the old task keeps serving, and the upgrade is misreported (dstamp finding, JOURNAL-dstamp). # FIX: order: stop-first so the NEW task boots with the full host memory (no 2x co-residency) and # genuinely becomes healthy. failure_action: rollback is LEFT intact → a genuinely broken head still # rolls back and is caught (lifecycle.assert_upgrade_converged) — NO test/assertion is weakened. # Trade-off: brief real downtime during the CI upgrade (covered by DEPLOY_TIMEOUT 3600). services: app: deploy: update_config: order: stop-first