55 lines
3.7 KiB
YAML
55 lines
3.7 KiB
YAML
---
|
||
# cc-ci overlay (Phase 2 F2-14b) — minimal, single-purpose: widen the `app` healthcheck
|
||
# start_period so the UPGRADE-tier BASE deploy (a previous published ghost version) can converge.
|
||
#
|
||
# WHY THIS OVERLAY EXISTS (plan-ccci-compose-overlay-policy.md §1 "minimal justified fallback"):
|
||
# upgrade-to-latest must always run (policy §1) → the harness base-deploys the previous published
|
||
# version (e.g. 1.1.1+6-alpine), then `deploy --chaos` to the recipe-PR head. Ghost's fresh-DB first
|
||
# boot runs a full schema migration that is ~6-9 min on cc-ci (round-trip-bound, NOT CPU-bound). The
|
||
# published base versions ship `start_period: 1m` (+10×30s ≈ 6 min grace) on the app healthcheck —
|
||
# too tight: swarm kills the still-migrating task, leaving a held `migrations_lock` → every later
|
||
# task deadlocks (MigrationsAreLockedError) → the base never converges → upgrade-to-latest can't run.
|
||
#
|
||
# The recipe-PR (recipe-maintainers/ghost#1) fixes this for the HEAD by bumping start_period to a
|
||
# literal 15m IN THE RECIPE. But the BASE is a *published* version that predates the PR, so it still
|
||
# carries 1m. start_period CANNOT be an env var (abra validates the literal compose 'duration' BEFORE
|
||
# substitution → FATA; Adversary-reproduced, REVIEW-2 4b862f6), so this cc-ci overlay applies the same
|
||
# 15m grace to the base ONLY to make the from-version deployable — exactly the policy-blessed
|
||
# "minimal overlay on the from-version so upgrade-to-latest can run". It is grace-only: a healthy
|
||
# check still marks healthy immediately, so NO test/assertion is weakened and fast hosts are
|
||
# unaffected. It is idempotent on the head (head already ships 15m). Merges deeply onto the base
|
||
# healthcheck (test/interval/timeout/retries preserved; only start_period overridden).
|
||
#
|
||
# The `db` (mysql:8.0) healthcheck gets the same grace: on the loaded cc-ci host a FRESH mysql data
|
||
# dir init (InnoDB + system tables + root-password apply) takes ~6-10 min, far exceeding the recipe's
|
||
# 1m db start_period (+10×30s ≈ 6 min) — swarm kills mysql MID-INIT (exit 137 "unhealthy container"),
|
||
# leaving a half-written data dir whose InnoDB redo logs are corrupt ("Cannot create redo log files
|
||
# because data files are corrupt") → every restart fails → permanent deadlock. Widening the db
|
||
# start_period to 15m lets the slow first-boot init finish before the healthcheck can fail it. This
|
||
# bites BOTH base and head (the published recipe ships db start_period 1m everywhere), so the overlay
|
||
# applies on both (persists untracked across the head checkout) — a recipe-PR candidate too.
|
||
# Grace-only; masks no defect; weakens no test.
|
||
#
|
||
# The app also needs a DB-ready wait during the base→head crossover. On current Ghost heads the
|
||
# upgrade concurrently replaces mysql 8.0 with mysql 8.4; swarm starts the new app task before the
|
||
# replacement db service is accepting connections, so Ghost exits immediately with
|
||
# `ENOTFOUND`/`ECONNREFUSED` against `${STACK_NAME}_db` and swarm pauses the rolling update BEFORE any
|
||
# delayed retry can help. Wrapping the app command with a tiny TCP-ready wait preserves the real
|
||
# upgrade/assertion while removing this transient service-ordering race.
|
||
services:
|
||
app:
|
||
entrypoint:
|
||
- sh
|
||
- -ec
|
||
- |
|
||
host="$$database__connection__host"
|
||
until node -e 'const net=require("net"); const host=process.argv[1]; const socket=net.connect({host, port: 3306}, () => { socket.end(); process.exit(0); }); socket.on("error", () => process.exit(1)); setTimeout(() => process.exit(1), 1000);' "$$host"; do
|
||
sleep 2
|
||
done
|
||
exec /abra-entrypoint.sh node current/index.js
|
||
healthcheck:
|
||
start_period: 15m
|
||
db:
|
||
healthcheck:
|
||
start_period: 15m
|