Files
cc-ci/tests/ghost/compose.ccci.yml
autonomic-bot d44f799de9
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
fix(cfold): wait for ghost db in entrypoint
2026-06-13 03:58:59 +00:00

55 lines
3.7 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
# cc-ci overlay (Phase 2 F2-14b) — minimal, single-purpose: widen the `app` healthcheck
# start_period so the UPGRADE-tier BASE deploy (a previous published ghost version) can converge.
#
# WHY THIS OVERLAY EXISTS (plan-ccci-compose-overlay-policy.md §1 "minimal justified fallback"):
# upgrade-to-latest must always run (policy §1) → the harness base-deploys the previous published
# version (e.g. 1.1.1+6-alpine), then `deploy --chaos` to the recipe-PR head. Ghost's fresh-DB first
# boot runs a full schema migration that is ~6-9 min on cc-ci (round-trip-bound, NOT CPU-bound). The
# published base versions ship `start_period: 1m` (+10×30s ≈ 6 min grace) on the app healthcheck —
# too tight: swarm kills the still-migrating task, leaving a held `migrations_lock` → every later
# task deadlocks (MigrationsAreLockedError) → the base never converges → upgrade-to-latest can't run.
#
# The recipe-PR (recipe-maintainers/ghost#1) fixes this for the HEAD by bumping start_period to a
# literal 15m IN THE RECIPE. But the BASE is a *published* version that predates the PR, so it still
# carries 1m. start_period CANNOT be an env var (abra validates the literal compose 'duration' BEFORE
# substitution → FATA; Adversary-reproduced, REVIEW-2 4b862f6), so this cc-ci overlay applies the same
# 15m grace to the base ONLY to make the from-version deployable — exactly the policy-blessed
# "minimal overlay on the from-version so upgrade-to-latest can run". It is grace-only: a healthy
# check still marks healthy immediately, so NO test/assertion is weakened and fast hosts are
# unaffected. It is idempotent on the head (head already ships 15m). Merges deeply onto the base
# healthcheck (test/interval/timeout/retries preserved; only start_period overridden).
#
# The `db` (mysql:8.0) healthcheck gets the same grace: on the loaded cc-ci host a FRESH mysql data
# dir init (InnoDB + system tables + root-password apply) takes ~6-10 min, far exceeding the recipe's
# 1m db start_period (+10×30s ≈ 6 min) — swarm kills mysql MID-INIT (exit 137 "unhealthy container"),
# leaving a half-written data dir whose InnoDB redo logs are corrupt ("Cannot create redo log files
# because data files are corrupt") → every restart fails → permanent deadlock. Widening the db
# start_period to 15m lets the slow first-boot init finish before the healthcheck can fail it. This
# bites BOTH base and head (the published recipe ships db start_period 1m everywhere), so the overlay
# applies on both (persists untracked across the head checkout) — a recipe-PR candidate too.
# Grace-only; masks no defect; weakens no test.
#
# The app also needs a DB-ready wait during the base→head crossover. On current Ghost heads the
# upgrade concurrently replaces mysql 8.0 with mysql 8.4; swarm starts the new app task before the
# replacement db service is accepting connections, so Ghost exits immediately with
# `ENOTFOUND`/`ECONNREFUSED` against `${STACK_NAME}_db` and swarm pauses the rolling update BEFORE any
# delayed retry can help. Wrapping the app command with a tiny TCP-ready wait preserves the real
# upgrade/assertion while removing this transient service-ordering race.
services:
app:
entrypoint:
- sh
- -ec
- |
host="$$database__connection__host"
until node -e 'const net=require("net"); const host=process.argv[1]; const socket=net.connect({host, port: 3306}, () => { socket.end(); process.exit(0); }); socket.on("error", () => process.exit(1)); setTimeout(() => process.exit(1), 1000);' "$$host"; do
sleep 2
done
exec /abra-entrypoint.sh node current/index.js
healthcheck:
start_period: 15m
db:
healthcheck:
start_period: 15m