Files
cc-ci/tests/ghost/compose.ccci-health.yml
autonomic-bot 13da216f8d fix(2): ghost healthcheck start_period overlay — fixes fresh-migration lock deadlock
Root cause: Ghost's fresh-DB first boot runs a ~6-9min schema migration (round-trip-bound, not CPU);
the recipe healthcheck start_period:1m (~6min grace) kills the still-migrating task, leaving a stale
migrations_lock → every later task deadlocks (MigrationsAreLockedError). Hit on both 2- and 4-vCPU.
Fix (cc-ci deploy overlay, NOT a recipe/test change): compose.ccci-health.yml raises app healthcheck
start_period to 900s, wired via recipe_meta COMPOSE_FILE + install_steps.sh (+ CHAOS_BASE_DEPLOY for
the untracked overlay). No assertion weakened. Budget 1200s = migration + convergence. Only the
install tier needs it (upgrade redeploys on the populated DB → fast boot).
2026-05-30 05:23:47 +01:00

19 lines
1.2 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci deploy overlay (NOT a recipe change) — raises ONLY the app healthcheck start_period.
#
# Ghost's first-boot runs a full schema migration (dozens of CREATE TABLEs, each a separate MySQL
# round-trip → ~6-9min on cc-ci) against the fresh `ghost` DB. The upstream recipe healthcheck uses
# `start_period: 1m` (+ 10×30s retries ≈ 6min grace); on cc-ci the migration regularly exceeds that,
# so swarm marks the still-migrating task unhealthy and KILLS it mid-migration — which leaves a stale
# `migrations_lock` row, and every later task then refuses to boot (`MigrationsAreLockedError`
# deadlock). This is round-trip-bound, so more vCPU does not close the gap.
#
# Raising the START_PERIOD (failures ignored during it; a PASS still marks healthy immediately) lets
# the fresh migration finish + release the lock, after which Ghost serves and the (unchanged) check
# passes. This is DEPLOY/infra tuning, not a test change — no assertion is weakened, and the app's
# real healthcheck still gates readiness. Applied via recipe_meta COMPOSE_FILE; only the install
# tier's fresh migration needs it (the upgrade redeploy boots on the already-populated DB → fast).
services:
app:
healthcheck:
start_period: 900s