feat(2): Q4.4 ghost + DEPLOY_TIMEOUT plumb-through for heavy recipes

Harness change (small, surgical):
- runner/harness/lifecycle.deploy_app gains a deploy_timeout param (default 900s); passes
  through to abra.deploy(timeout=...). For heavy recipes (ghost, matrix-synapse, lasuite-meet),
  the orchestrator + dep resolver now read recipe_meta.DEPLOY_TIMEOUT and pass it so the Python
  subprocess wrapping abra deploy doesn't SIGKILL it before the recipe's INTERNAL TIMEOUT
  (via EXTRA_ENV) finishes swarm convergence.
- runner/run_recipe_ci.py + runner/harness/deps.py: thread recipe_meta.DEPLOY_TIMEOUT into
  the per-recipe deploy_app call.

Q4.4 ghost enrollment:
- recipe_meta.py: HEALTH_PATH=/, DEPLOY_TIMEOUT=1200 (subprocess), EXTRA_ENV={TIMEOUT: 1200}
  (recipe internal). Ghost cold-start with theme + DB migration runs ~12-15min on cc-ci.
- functional/test_health_check.py: GET / returns 200 (themed site).
- functional/test_content_api.py: GET /ghost/api/content/settings/ returns 200 (settings JSON)
  or 401/403 (Ghost error envelope) — distinguishes ghost-server up + JSON API working from
  static fallback.
- functional/test_admin_redirect.py: GET /ghost/ returns 200 or 302 + Ghost branding;
  proves admin route is wired through nginx proxy.
- PARITY.md: recipe-maintainer corpus has no ghost tests/, Phase-2 health_check is the
  parity baseline; create-a-post deeper test deferred (DEFERRED.md, --extra-tests linked).

Cold-verifiable (log /root/ccci-q44-ghost-r3.log):
  RECIPE=ghost STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
  install + 3 functional tests PASS, deploy-count=1. 28/28 unit tests still PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-28 17:23:40 +01:00
parent 44e88f3750
commit 1bd7c7a1d3
8 changed files with 197 additions and 5 deletions

View File

@ -88,9 +88,13 @@ def deploy_deps(
# NB: each dep_app gets a fresh deploy_count entry only on `_record_deploy` which fires
# inside `lifecycle.deploy_app`. For Phase 2 the deploy-count guard (DG4.1) counts the
# parent + its deps as distinct install events — by design, since each is a separate app.
lifecycle.deploy_app(dep, domain, secrets=True)
# Use dep's own recipe_meta if provided
dm = meta_for.get(dep, {})
lifecycle.deploy_app(
dep,
domain,
secrets=True,
deploy_timeout=int(dm.get("DEPLOY_TIMEOUT", 900)),
)
try:
lifecycle.wait_healthy(
domain,

View File

@ -128,10 +128,16 @@ def deploy_app(
version: str | None = None,
secrets: bool = True,
install_steps_hook: tuple[str, str] | None = None,
deploy_timeout: int = 900,
) -> None:
"""Create + configure + deploy an app. Forces LETS_ENCRYPT_ENV='' so traefik serves the
wildcard cert via the file provider and NEVER attempts ACME (adversary finding A1). Applies any
per-recipe EXTRA_ENV (recipe_meta.py) and the custom install-steps hook (Phase 1d) before deploy."""
per-recipe EXTRA_ENV (recipe_meta.py) and the custom install-steps hook (Phase 1d) before deploy.
`deploy_timeout` is the subprocess timeout for `abra app deploy`. Caller (orchestrator) passes
`recipe_meta.DEPLOY_TIMEOUT` so heavy recipes (ghost, matrix-synapse, lasuite-meet) can extend
past the 900s default. abra's INTERNAL TIMEOUT (recipe's TIMEOUT env, default 300s) is set via
EXTRA_ENV; this is the Python subprocess wrapper's timeout so abra doesn't get SIGKILLed mid-deploy."""
_record_deploy()
abra.app_config_remove(domain) # clear any stale .env from a prior crashed run
abra.app_new(recipe, domain, version=version, secrets=secrets)
@ -153,7 +159,7 @@ def deploy_app(
abra.secret_generate(domain)
if install_steps_hook:
_run_install_steps(install_steps_hook, recipe, domain)
abra.deploy(domain, chaos=(version is None))
abra.deploy(domain, chaos=(version is None), timeout=deploy_timeout)
def _stack_name(domain: str) -> str:

View File

@ -379,7 +379,12 @@ def main() -> int:
else:
try:
lifecycle.deploy_app(
recipe, domain, version=base, secrets=True, install_steps_hook=hook
recipe,
domain,
version=base,
secrets=True,
install_steps_hook=hook,
deploy_timeout=int(meta.get("DEPLOY_TIMEOUT", 900)),
)
lifecycle.wait_healthy(
domain,