From 7feeadd0ec84ab8c4e3c2dd484c08b4730232eb7 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sat, 30 May 2026 17:49:05 +0100 Subject: [PATCH] =?UTF-8?q?feat(2):=20ghost=20F2-14b=20=E2=80=94=20upgrade?= =?UTF-8?q?-to-latest=20base-grace=20overlay=20(compose.ccci.yml)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Course correction (REVIEW-2 bdef282) mandates upgrade-to-latest; harness base-deploys prev published version 1.1.1+6-alpine which predates the recipe-PR 15m start_period bump (ships 1m) → would deadlock on the ~6-9min fresh-DB migration (swarm kill mid-migration → held migrations_lock). Policy-blessed minimal base overlay: compose.ccci.yml re-applies the 15m app-healthcheck start_period grace to the BASE so the from-version is deployable; install_steps.sh provides it; CHAOS_BASE_DEPLOY skips clean-tree on the untracked overlay; persists across head checkout (idempotent — PR head ships 15m). Grace-only, no test weakened. Prior corrupt mysql vol (stale, interrupted init) torn down. Next: full run incl upgrade. Co-Authored-By: Claude Opus 4.8 (1M context) --- tests/ghost/compose.ccci.yml | 25 +++++++++++++++++++++++++ tests/ghost/install_steps.sh | 26 ++++++++++++++++++++++++++ tests/ghost/recipe_meta.py | 28 +++++++++++++++++++--------- 3 files changed, 70 insertions(+), 9 deletions(-) create mode 100644 tests/ghost/compose.ccci.yml create mode 100755 tests/ghost/install_steps.sh diff --git a/tests/ghost/compose.ccci.yml b/tests/ghost/compose.ccci.yml new file mode 100644 index 0000000..2ca333b --- /dev/null +++ b/tests/ghost/compose.ccci.yml @@ -0,0 +1,25 @@ +--- +# cc-ci overlay (Phase 2 F2-14b) — minimal, single-purpose: widen the `app` healthcheck +# start_period so the UPGRADE-tier BASE deploy (a previous published ghost version) can converge. +# +# WHY THIS OVERLAY EXISTS (plan-ccci-compose-overlay-policy.md §1 "minimal justified fallback"): +# upgrade-to-latest must always run (policy §1) → the harness base-deploys the previous published +# version (e.g. 1.1.1+6-alpine), then `deploy --chaos` to the recipe-PR head. Ghost's fresh-DB first +# boot runs a full schema migration that is ~6-9 min on cc-ci (round-trip-bound, NOT CPU-bound). The +# published base versions ship `start_period: 1m` (+10×30s ≈ 6 min grace) on the app healthcheck — +# too tight: swarm kills the still-migrating task, leaving a held `migrations_lock` → every later +# task deadlocks (MigrationsAreLockedError) → the base never converges → upgrade-to-latest can't run. +# +# The recipe-PR (recipe-maintainers/ghost#1) fixes this for the HEAD by bumping start_period to a +# literal 15m IN THE RECIPE. But the BASE is a *published* version that predates the PR, so it still +# carries 1m. start_period CANNOT be an env var (abra validates the literal compose 'duration' BEFORE +# substitution → FATA; Adversary-reproduced, REVIEW-2 4b862f6), so this cc-ci overlay applies the same +# 15m grace to the base ONLY to make the from-version deployable — exactly the policy-blessed +# "minimal overlay on the from-version so upgrade-to-latest can run". It is grace-only: a healthy +# check still marks healthy immediately, so NO test/assertion is weakened and fast hosts are +# unaffected. It is idempotent on the head (head already ships 15m). Merges deeply onto the base +# healthcheck (test/interval/timeout/retries preserved; only start_period overridden). +services: + app: + healthcheck: + start_period: 15m diff --git a/tests/ghost/install_steps.sh b/tests/ghost/install_steps.sh new file mode 100755 index 0000000..ef10674 --- /dev/null +++ b/tests/ghost/install_steps.sh @@ -0,0 +1,26 @@ +#!/usr/bin/env bash +# ghost — INSTALL-TIME hook (Phase 2 F2-14b). Runs during the install tier AFTER `abra app new` + +# EXTRA_ENV + `abra app secret generate` and BEFORE the single `abra app deploy` +# (lifecycle.py::_run_install_steps), with CCCI_RECIPE / CCCI_APP_DOMAIN in env. +# +# Purpose: provide the cc-ci start_period-grace overlay (compose.ccci.yml) to the recipe checkout so +# the UPGRADE-tier BASE deploy (a previous published version whose app healthcheck still ships the +# too-tight 1m start_period) can survive ghost's ~6-9min fresh-DB migration and converge. See +# compose.ccci.yml's header for the full rationale. The overlay is referenced by recipe_meta +# COMPOSE_FILE; copying it here (it is a cc-ci file, not part of the recipe) makes it resolvable. +# It persists across the later `git checkout ` (untracked) so the head deploy also merges it +# (idempotent — the PR head already ships 15m). CHAOS_BASE_DEPLOY=True is set so abra's pinned-deploy +# clean-tree check doesn't FATA on the untracked overlay. +set -euo pipefail + +: "${CCCI_RECIPE:?missing CCCI_RECIPE}" +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +RECIPE_DIR="${HOME}/.abra/recipes/${CCCI_RECIPE}" + +if [ ! -d "$RECIPE_DIR" ]; then + echo " ghost install_steps: recipe dir $RECIPE_DIR missing — cannot provide compose.ccci.yml" >&2 + exit 1 +fi + +cp "$SCRIPT_DIR/compose.ccci.yml" "$RECIPE_DIR/compose.ccci.yml" +echo " ghost install_steps: provided compose.ccci.yml (app start_period grace) to recipe checkout (${CCCI_RECIPE})" diff --git a/tests/ghost/recipe_meta.py b/tests/ghost/recipe_meta.py index e55e339..2052930 100644 --- a/tests/ghost/recipe_meta.py +++ b/tests/ghost/recipe_meta.py @@ -19,16 +19,26 @@ HTTP_TIMEOUT = 900 # # FIXED IN THE RECIPE-PR (recipe-maintainers/ghost#1, branch ci/mysql-backup): the app-service # healthcheck `start_period` is bumped to a literal 15m in the recipe itself — the real recipe -# everyone runs, NOT a cc-ci compose fork. This is the plan §9 / plan-prefer-env-over-compose-overlay.md -# anti-drift path: start_period CANNOT be expressed as an env var (abra validates the literal compose -# 'duration' format BEFORE env substitution — `${VAR}` / `"${VAR:-1m}"` → FATA 'Does not match format -# duration'; reproduced by the Adversary, REVIEW-2 4b862f6), so a literal recipe-PR bump is the only -# §9-compliant way to widen it. Precedent: discourse + lasuite-drive collabora start_period recipe-PRs. +# everyone runs, NOT a cc-ci compose fork. This is the plan §9 / plan-ccci-compose-overlay-policy.md +# "prefer upstream PR" path: start_period CANNOT be expressed as an env var (abra validates the literal +# compose 'duration' format BEFORE env substitution — `${VAR}` / `"${VAR:-1m}"` → FATA 'Does not match +# format duration'; reproduced by the Adversary, REVIEW-2 4b862f6), so a literal recipe-PR bump is the +# only §9-compliant way to widen it for the HEAD. Precedent: discourse + lasuite-drive collabora PRs. # start_period only widens the startup grace window (a healthy check still marks healthy at once → fast -# hosts unaffected); NO test/assertion is weakened. With the bump in the recipe, the former cc-ci -# DEPLOY overlay (`compose.ccci-health.yml` + `install_steps.sh` + COMPOSE_FILE + CHAOS_BASE_DEPLOY) -# is DELETED. TIMEOUT 1200s = migration (≤9min) + convergence, bounded so a genuine failure still -# fails (not a long blackout). See DECISIONS (ghost MySQL cold-boot / start_period recipe-PR). +# hosts unaffected); NO test/assertion is weakened. +# +# UPGRADE-tier BASE grace (compose.ccci.yml): upgrade-to-latest must ALWAYS run +# (plan-ccci-compose-overlay-policy.md §1), so the harness base-deploys the previous PUBLISHED version +# (1.1.1+6-alpine) — which predates the PR and still ships the too-tight 1m start_period → it would +# deadlock on the same migration kill. compose.ccci.yml re-applies the 15m grace to the BASE so the +# from-version is deployable; install_steps.sh provides it to the checkout; CHAOS_BASE_DEPLOY skips the +# clean-tree gate on that untracked overlay. It persists across the head checkout (idempotent — the PR +# head already ships 15m). This is the policy-blessed "minimal overlay on the from-version so +# upgrade-to-latest can run" — grace-only, masks no defect, weakens no test. +# TIMEOUT 1200s = migration (≤9min) + convergence, bounded so a genuine failure still fails (not a +# long blackout). See DECISIONS (ghost MySQL cold-boot / start_period recipe-PR + base-grace overlay). +CHAOS_BASE_DEPLOY = True EXTRA_ENV = { "TIMEOUT": "1200", + "COMPOSE_FILE": "compose.yml:compose.ccci.yml", }