fix(2): ghost healthcheck start_period overlay — fixes fresh-migration lock deadlock
Root cause: Ghost's fresh-DB first boot runs a ~6-9min schema migration (round-trip-bound, not CPU); the recipe healthcheck start_period:1m (~6min grace) kills the still-migrating task, leaving a stale migrations_lock → every later task deadlocks (MigrationsAreLockedError). Hit on both 2- and 4-vCPU. Fix (cc-ci deploy overlay, NOT a recipe/test change): compose.ccci-health.yml raises app healthcheck start_period to 900s, wired via recipe_meta COMPOSE_FILE + install_steps.sh (+ CHAOS_BASE_DEPLOY for the untracked overlay). No assertion weakened. Budget 1200s = migration + convergence. Only the install tier needs it (upgrade redeploys on the populated DB → fast boot).
This commit is contained in:
@ -8,14 +8,21 @@
|
||||
# mysqldump pre-hook; P4 (ops.py + test_{backup,restore,upgrade}.py) seeds a `ci_marker` row there.
|
||||
HEALTH_PATH = "/" # Ghost serves a themed site HTML at root (200)
|
||||
HEALTH_OK = (200,)
|
||||
DEPLOY_TIMEOUT = 900 # subprocess timeout for `abra app deploy`
|
||||
DEPLOY_TIMEOUT = 1200 # subprocess timeout for `abra app deploy`
|
||||
HTTP_TIMEOUT = 900
|
||||
|
||||
# Ghost's first-boot does a full schema migration (dozens of tables) against a fresh MySQL `ghost`
|
||||
# DB. The migration must finish within the recipe healthcheck grace (start_period 1m + 10×30s ≈ 6min)
|
||||
# — otherwise swarm kills the still-migrating task, which leaves a stale `migrations_lock` row and
|
||||
# every later task then refuses to boot (`MigrationsAreLockedError` deadlock). On the cc-ci node with
|
||||
# 4 dedicated vCPU the migration completes well inside that grace and the app converges in a few
|
||||
# minutes, so 900s is an ample-but-bounded budget (fails fast if the deadlock ever recurs, rather
|
||||
# than a long blackout). See DECISIONS (ghost MySQL cold-boot).
|
||||
EXTRA_ENV = {"TIMEOUT": "900"}
|
||||
# Ghost's fresh-DB first boot runs a full schema migration (dozens of CREATE TABLEs, each a separate
|
||||
# MySQL round-trip → ~6-9min on cc-ci, round-trip-bound so more vCPU doesn't help). The upstream
|
||||
# recipe healthcheck (`start_period: 1m` + 10×30s ≈ 6min grace) is too tight: swarm kills the still-
|
||||
# migrating task, leaving a stale `migrations_lock` → every later task deadlocks
|
||||
# (`MigrationsAreLockedError`). cc-ci provides a DEPLOY overlay `compose.ccci-health.yml` (raises the
|
||||
# app healthcheck start_period to 900s; failures ignored during it, a PASS still marks healthy at
|
||||
# once) via COMPOSE_FILE + install_steps.sh, so the fresh migration finishes + releases the lock.
|
||||
# This is infra/deploy tuning — NO test/assertion is weakened. CHAOS_BASE_DEPLOY lets the pinned base
|
||||
# deploy proceed with the untracked overlay present. TIMEOUT 1200s = migration (≤9min) + convergence,
|
||||
# bounded so a genuine failure still fails (not a long blackout). See DECISIONS (ghost MySQL cold-boot).
|
||||
CHAOS_BASE_DEPLOY = True
|
||||
EXTRA_ENV = {
|
||||
"TIMEOUT": "1200",
|
||||
"COMPOSE_FILE": "compose.yml:compose.ccci-health.yml",
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user