feat(2): ghost F2-14b — upgrade-to-latest base-grace overlay (compose.ccci.yml)

Course correction (REVIEW-2 bdef282) mandates upgrade-to-latest; harness base-deploys
prev published version 1.1.1+6-alpine which predates the recipe-PR 15m start_period bump
(ships 1m) → would deadlock on the ~6-9min fresh-DB migration (swarm kill mid-migration →
held migrations_lock). Policy-blessed minimal base overlay: compose.ccci.yml re-applies the
15m app-healthcheck start_period grace to the BASE so the from-version is deployable;
install_steps.sh provides it; CHAOS_BASE_DEPLOY skips clean-tree on the untracked overlay;
persists across head checkout (idempotent — PR head ships 15m). Grace-only, no test weakened.
Prior corrupt mysql vol (stale, interrupted init) torn down. Next: full run incl upgrade.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-30 17:49:05 +01:00
parent 7c3d20a270
commit 7feeadd0ec
3 changed files with 70 additions and 9 deletions

View File

@ -0,0 +1,25 @@
---
# cc-ci overlay (Phase 2 F2-14b) — minimal, single-purpose: widen the `app` healthcheck
# start_period so the UPGRADE-tier BASE deploy (a previous published ghost version) can converge.
#
# WHY THIS OVERLAY EXISTS (plan-ccci-compose-overlay-policy.md §1 "minimal justified fallback"):
# upgrade-to-latest must always run (policy §1) → the harness base-deploys the previous published
# version (e.g. 1.1.1+6-alpine), then `deploy --chaos` to the recipe-PR head. Ghost's fresh-DB first
# boot runs a full schema migration that is ~6-9 min on cc-ci (round-trip-bound, NOT CPU-bound). The
# published base versions ship `start_period: 1m` (+10×30s ≈ 6 min grace) on the app healthcheck —
# too tight: swarm kills the still-migrating task, leaving a held `migrations_lock` → every later
# task deadlocks (MigrationsAreLockedError) → the base never converges → upgrade-to-latest can't run.
#
# The recipe-PR (recipe-maintainers/ghost#1) fixes this for the HEAD by bumping start_period to a
# literal 15m IN THE RECIPE. But the BASE is a *published* version that predates the PR, so it still
# carries 1m. start_period CANNOT be an env var (abra validates the literal compose 'duration' BEFORE
# substitution → FATA; Adversary-reproduced, REVIEW-2 4b862f6), so this cc-ci overlay applies the same
# 15m grace to the base ONLY to make the from-version deployable — exactly the policy-blessed
# "minimal overlay on the from-version so upgrade-to-latest can run". It is grace-only: a healthy
# check still marks healthy immediately, so NO test/assertion is weakened and fast hosts are
# unaffected. It is idempotent on the head (head already ships 15m). Merges deeply onto the base
# healthcheck (test/interval/timeout/retries preserved; only start_period overridden).
services:
app:
healthcheck:
start_period: 15m

26
tests/ghost/install_steps.sh Executable file
View File

@ -0,0 +1,26 @@
#!/usr/bin/env bash
# ghost — INSTALL-TIME hook (Phase 2 F2-14b). Runs during the install tier AFTER `abra app new` +
# EXTRA_ENV + `abra app secret generate` and BEFORE the single `abra app deploy`
# (lifecycle.py::_run_install_steps), with CCCI_RECIPE / CCCI_APP_DOMAIN in env.
#
# Purpose: provide the cc-ci start_period-grace overlay (compose.ccci.yml) to the recipe checkout so
# the UPGRADE-tier BASE deploy (a previous published version whose app healthcheck still ships the
# too-tight 1m start_period) can survive ghost's ~6-9min fresh-DB migration and converge. See
# compose.ccci.yml's header for the full rationale. The overlay is referenced by recipe_meta
# COMPOSE_FILE; copying it here (it is a cc-ci file, not part of the recipe) makes it resolvable.
# It persists across the later `git checkout <head>` (untracked) so the head deploy also merges it
# (idempotent — the PR head already ships 15m). CHAOS_BASE_DEPLOY=True is set so abra's pinned-deploy
# clean-tree check doesn't FATA on the untracked overlay.
set -euo pipefail
: "${CCCI_RECIPE:?missing CCCI_RECIPE}"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
RECIPE_DIR="${HOME}/.abra/recipes/${CCCI_RECIPE}"
if [ ! -d "$RECIPE_DIR" ]; then
echo " ghost install_steps: recipe dir $RECIPE_DIR missing — cannot provide compose.ccci.yml" >&2
exit 1
fi
cp "$SCRIPT_DIR/compose.ccci.yml" "$RECIPE_DIR/compose.ccci.yml"
echo " ghost install_steps: provided compose.ccci.yml (app start_period grace) to recipe checkout (${CCCI_RECIPE})"

View File

@ -19,16 +19,26 @@ HTTP_TIMEOUT = 900
# #
# FIXED IN THE RECIPE-PR (recipe-maintainers/ghost#1, branch ci/mysql-backup): the app-service # FIXED IN THE RECIPE-PR (recipe-maintainers/ghost#1, branch ci/mysql-backup): the app-service
# healthcheck `start_period` is bumped to a literal 15m in the recipe itself — the real recipe # healthcheck `start_period` is bumped to a literal 15m in the recipe itself — the real recipe
# everyone runs, NOT a cc-ci compose fork. This is the plan §9 / plan-prefer-env-over-compose-overlay.md # everyone runs, NOT a cc-ci compose fork. This is the plan §9 / plan-ccci-compose-overlay-policy.md
# anti-drift path: start_period CANNOT be expressed as an env var (abra validates the literal compose # "prefer upstream PR" path: start_period CANNOT be expressed as an env var (abra validates the literal
# 'duration' format BEFORE env substitution — `${VAR}` / `"${VAR:-1m}"` → FATA 'Does not match format # compose 'duration' format BEFORE env substitution — `${VAR}` / `"${VAR:-1m}"` → FATA 'Does not match
# duration'; reproduced by the Adversary, REVIEW-2 4b862f6), so a literal recipe-PR bump is the only # format duration'; reproduced by the Adversary, REVIEW-2 4b862f6), so a literal recipe-PR bump is the
# §9-compliant way to widen it. Precedent: discourse + lasuite-drive collabora start_period recipe-PRs. # only §9-compliant way to widen it for the HEAD. Precedent: discourse + lasuite-drive collabora PRs.
# start_period only widens the startup grace window (a healthy check still marks healthy at once → fast # start_period only widens the startup grace window (a healthy check still marks healthy at once → fast
# hosts unaffected); NO test/assertion is weakened. With the bump in the recipe, the former cc-ci # hosts unaffected); NO test/assertion is weakened.
# DEPLOY overlay (`compose.ccci-health.yml` + `install_steps.sh` + COMPOSE_FILE + CHAOS_BASE_DEPLOY) #
# is DELETED. TIMEOUT 1200s = migration (≤9min) + convergence, bounded so a genuine failure still # UPGRADE-tier BASE grace (compose.ccci.yml): upgrade-to-latest must ALWAYS run
# fails (not a long blackout). See DECISIONS (ghost MySQL cold-boot / start_period recipe-PR). # (plan-ccci-compose-overlay-policy.md §1), so the harness base-deploys the previous PUBLISHED version
# (1.1.1+6-alpine) — which predates the PR and still ships the too-tight 1m start_period → it would
# deadlock on the same migration kill. compose.ccci.yml re-applies the 15m grace to the BASE so the
# from-version is deployable; install_steps.sh provides it to the checkout; CHAOS_BASE_DEPLOY skips the
# clean-tree gate on that untracked overlay. It persists across the head checkout (idempotent — the PR
# head already ships 15m). This is the policy-blessed "minimal overlay on the from-version so
# upgrade-to-latest can run" — grace-only, masks no defect, weakens no test.
# TIMEOUT 1200s = migration (≤9min) + convergence, bounded so a genuine failure still fails (not a
# long blackout). See DECISIONS (ghost MySQL cold-boot / start_period recipe-PR + base-grace overlay).
CHAOS_BASE_DEPLOY = True
EXTRA_ENV = { EXTRA_ENV = {
"TIMEOUT": "1200", "TIMEOUT": "1200",
"COMPOSE_FILE": "compose.yml:compose.ccci.yml",
} }