From 3ca45c73080d9ca056b9f4908bf1af2d4bb7cc9d Mon Sep 17 00:00:00 2001
From: autonomic-bot <maxf.account@proton.me>
Date: Sat, 30 May 2026 17:58:12 +0100
Subject: [PATCH] =?UTF-8?q?fix(2):=20ghost=20F2-14b=20=E2=80=94=20add=20db?=
 =?UTF-8?q?=20start=5Fperiod=20grace=20to=20base=20overlay?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Run #2 base deploy: fresh mysql:8.0 init on the loaded cc-ci host (load ~8) took >6min
(InnoDB ~90s + system-tables + root-pw apply, starved by the app crash-loop churn), exceeding
the recipe's 1m db start_period (+6min retry grace) → swarm killed mysql mid-init (exit 137
unhealthy) → corrupt InnoDB redo logs → permanent deadlock (same signature as run #1's stale
vol). Widen db healthcheck start_period to 15m (matches app) so the slow first-boot finishes
before the healthcheck can fail it. Grace-only, masks no defect; bites base+head (published
recipe ships db start_period 1m everywhere) so overlay covers both. Torn down corrupt vol.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 tests/ghost/compose.ccci.yml | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tests/ghost/compose.ccci.yml b/tests/ghost/compose.ccci.yml
index 2ca333b..9995168 100644
--- a/tests/ghost/compose.ccci.yml
+++ b/tests/ghost/compose.ccci.yml
@@ -19,7 +19,20 @@
 # check still marks healthy immediately, so NO test/assertion is weakened and fast hosts are
 # unaffected. It is idempotent on the head (head already ships 15m). Merges deeply onto the base
 # healthcheck (test/interval/timeout/retries preserved; only start_period overridden).
+#
+# The `db` (mysql:8.0) healthcheck gets the same grace: on the loaded cc-ci host a FRESH mysql data
+# dir init (InnoDB + system tables + root-password apply) takes ~6-10 min, far exceeding the recipe's
+# 1m db start_period (+10×30s ≈ 6 min) — swarm kills mysql MID-INIT (exit 137 "unhealthy container"),
+# leaving a half-written data dir whose InnoDB redo logs are corrupt ("Cannot create redo log files
+# because data files are corrupt") → every restart fails → permanent deadlock. Widening the db
+# start_period to 15m lets the slow first-boot init finish before the healthcheck can fail it. This
+# bites BOTH base and head (the published recipe ships db start_period 1m everywhere), so the overlay
+# applies on both (persists untracked across the head checkout) — a recipe-PR candidate too.
+# Grace-only; masks no defect; weakens no test.
 services:
   app:
     healthcheck:
       start_period: 15m
+  db:
+    healthcheck:
+      start_period: 15m