cc-ci/tests/ghost/recipe_meta.py

# Per-recipe harness config for ghost (Phase 2 Q4.4 — Node.js publishing platform).
# Ghost serves an HTML site at `/`; admin UI at `/ghost/`. The first GET to /ghost/ redirects
# to the setup wizard (302). Ghost exposes a JSON Content API at /ghost/api/content/ which
# requires an API key; the Admin API at /ghost/api/admin/ requires a session/token (see
# functional/_ghost.py — version-negotiated, no /v3/ path).
# State lives in a **MySQL** `ghost` DB (compose `db` service, mysql:8.0) + the `ghost_content`
# volume (themes/images) — NOT sqlite. The `db` service is backupbot-labelled with a logical
# mysqldump pre-hook; P4 (ops.py + test_{backup,restore,upgrade}.py) seeds a `ci_marker` row there.
HEALTH_PATH = "/"  # Ghost serves a themed site HTML at root (200)
HEALTH_OK = (200,)
DEPLOY_TIMEOUT = 2400  # subprocess timeout for `abra app deploy` (cold-boot wall-time, see below)
HTTP_TIMEOUT = 900

# Ghost's fresh-DB first boot runs a full schema migration (dozens of CREATE TABLEs, each a separate
# MySQL round-trip → ~6-9min on cc-ci, round-trip-bound so more vCPU doesn't help). The published
# recipe healthcheck used `start_period: 1m` (+10×30s ≈ 6min grace) — too tight on cc-ci: swarm kills
# the still-migrating task, leaving a stale `migrations_lock` → every later task deadlocks
# (`MigrationsAreLockedError`).
#
# FIXED IN THE RECIPE-PR (recipe-maintainers/ghost#1, branch ci/mysql-backup): the app-service
# healthcheck `start_period` is bumped to a literal 15m in the recipe itself — the real recipe
# everyone runs, NOT a cc-ci compose fork. This is the plan §9 / plan-ccci-compose-overlay-policy.md
# "prefer upstream PR" path: start_period CANNOT be expressed as an env var (abra validates the literal
# compose 'duration' format BEFORE env substitution — `${VAR}` / `"${VAR:-1m}"` → FATA 'Does not match
# format duration'; reproduced by the Adversary, REVIEW-2 4b862f6), so a literal recipe-PR bump is the
# only §9-compliant way to widen it for the HEAD. Precedent: discourse + lasuite-drive collabora PRs.
# start_period only widens the startup grace window (a healthy check still marks healthy at once → fast
# hosts unaffected); NO test/assertion is weakened.
#
# UPGRADE-tier BASE grace (compose.ccci.yml): upgrade-to-latest must ALWAYS run
# (plan-ccci-compose-overlay-policy.md §1), so the harness base-deploys the previous PUBLISHED version
# (1.1.1+6-alpine) — which predates the PR and still ships the too-tight 1m start_period → it would
# deadlock on the same migration kill. compose.ccci.yml re-applies the 15m grace to the BASE so the
# from-version is deployable; install_steps.sh provides it to the checkout; CHAOS_BASE_DEPLOY skips the
# clean-tree gate on that untracked overlay. It persists across the head checkout (idempotent — the PR
# head already ships 15m). This is the policy-blessed "minimal overlay on the from-version so
# upgrade-to-latest can run" — grace-only, masks no defect, weakens no test.
# TIMEOUT/DEPLOY_TIMEOUT 2400s: the BASE cold boot's wall-time is mysql fresh-dir init (~6min, during
# which the app crash-loops harmlessly on `ECONNREFUSED 3306` until mysql accepts connections — no
# migration progress lost, it hasn't started) PLUS the ~9-15min schema migration (round-trip-bound,
# slower under host load). 1200s was too tight (full4 killed at the near-final `email_recipients`
# tables while still 0/1); 2400s gives headroom while still bounding a genuine hang (matches discourse).
CHAOS_BASE_DEPLOY = True
EXTRA_ENV = {
    "TIMEOUT": "2400",
    "COMPOSE_FILE": "compose.yml:compose.ccci.yml",
}


def BACKUP_VERIFY(domain):
    """Post-backup integrity check (F2-14b). The recipe's backupbot db pre-hook dumps the ghost MySQL
    DB to `/var/lib/mysql/backup.sql.gz` (then restic captures that path). On the loaded single CI node
    the db container intermittently CYCLES mid-dump (observed: full5/6/7 RED, full8 green — pure race;
    NOT OOM, NOT healthcheck — db hc retries=10), so the dump is truncated/never written and restic
    snapshots an empty mysql path → a later restore reimports nothing → the seeded ci_marker is lost
    (P4 RED). This proves the dump completed: backup.sql.gz exists, is a VALID gzip, and is non-empty.
    Returning False makes the harness re-run the whole backup with a re-stabilised db (run_recipe_ci
    _perform_op). It is a READ-ONLY probe — it weakens no assertion; it only retries a flaky CAPTURE."""
    # NB: recipe_meta.py is exec()'d into a bare namespace (no __file__), so we CANNOT compute a path
    # here. The harness already runs with runner/ on sys.path and `harness` imported, so import directly.
    from harness import lifecycle

    try:
        out = lifecycle.exec_in_app(
            domain,
            [
                "sh",
                "-c",
                "gzip -t /var/lib/mysql/backup.sql.gz && wc -c < /var/lib/mysql/backup.sql.gz",
            ],
            service="db",
            timeout=60,
        ).strip()
    except Exception:  # noqa: BLE001 — exec fails if the db is mid-cycle: treat as not-yet-captured
        return False
    return out.isdigit() and int(out) > 0