Files
cc-ci/runner/harness/deps.py
autonomic-bot 1bd7c7a1d3 feat(2): Q4.4 ghost + DEPLOY_TIMEOUT plumb-through for heavy recipes
Harness change (small, surgical):
- runner/harness/lifecycle.deploy_app gains a deploy_timeout param (default 900s); passes
  through to abra.deploy(timeout=...). For heavy recipes (ghost, matrix-synapse, lasuite-meet),
  the orchestrator + dep resolver now read recipe_meta.DEPLOY_TIMEOUT and pass it so the Python
  subprocess wrapping abra deploy doesn't SIGKILL it before the recipe's INTERNAL TIMEOUT
  (via EXTRA_ENV) finishes swarm convergence.
- runner/run_recipe_ci.py + runner/harness/deps.py: thread recipe_meta.DEPLOY_TIMEOUT into
  the per-recipe deploy_app call.

Q4.4 ghost enrollment:
- recipe_meta.py: HEALTH_PATH=/, DEPLOY_TIMEOUT=1200 (subprocess), EXTRA_ENV={TIMEOUT: 1200}
  (recipe internal). Ghost cold-start with theme + DB migration runs ~12-15min on cc-ci.
- functional/test_health_check.py: GET / returns 200 (themed site).
- functional/test_content_api.py: GET /ghost/api/content/settings/ returns 200 (settings JSON)
  or 401/403 (Ghost error envelope) — distinguishes ghost-server up + JSON API working from
  static fallback.
- functional/test_admin_redirect.py: GET /ghost/ returns 200 or 302 + Ghost branding;
  proves admin route is wired through nginx proxy.
- PARITY.md: recipe-maintainer corpus has no ghost tests/, Phase-2 health_check is the
  parity baseline; create-a-post deeper test deferred (DEFERRED.md, --extra-tests linked).

Cold-verifiable (log /root/ccci-q44-ghost-r3.log):
  RECIPE=ghost STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
  install + 3 functional tests PASS, deploy-count=1. 28/28 unit tests still PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 17:23:40 +01:00

156 lines
7.0 KiB
Python

"""Dependency-resolver harness primitive (Phase 2 §4.2 / Q2.3).
A Phase-2 recipe may declare a set of OTHER recipes it requires to run its tests (e.g.
lasuite-docs requires keycloak as its SSO provider). The orchestrator reads the deps list,
deploys each one BEFORE the recipe-under-test, persists their per-run identity to a JSON file
the recipe's tests can read, and tears them down at the end of the run.
Per Phase-2 DECISIONS:
- Deps are declared on the cc-ci side in `tests/<recipe>/recipe_meta.py` as
`DEPS = ["keycloak", ...]` (a list of recipe names). This keeps the cc-ci surface authoritative
per plan §1.4 (cc-ci is self-contained at runtime).
- Each dep is deployed at a unique per-run domain `<dep[:4]>-<6hex>` (the same naming scheme as
the recipe under test, but the 6hex is derived from `recipe + pr + ref + dep_name` so two deps
of the same kind by different recipes never collide on a host).
- Dep deploys are SEQUENTIAL, never concurrent (per plan §4.2 — heavy deps + recipe under test
must share the single node's MAX_TESTS budget without exceeding it).
- Each dep is undeployed in the orchestrator's `finally`, in **reverse** order so a recipe-under-
test can depend on multiple deps with a dependency chain (a → b → c teardown is c → b → a).
Run state:
- `$CCCI_DEPS_FILE` — JSON file written by the orchestrator after each dep deploys; each entry is
`{"recipe": "<dep-recipe>", "domain": "<dep-domain>", "version": null}`. Tests access via the
`deps_apps` pytest fixture defined in `tests/conftest.py`.
"""
from __future__ import annotations
import contextlib
import json
import os
from typing import Iterable
from . import lifecycle, naming
def declared_deps(recipe: str) -> list[str]:
"""Read `DEPS` from `tests/<recipe>/recipe_meta.py` — a list of recipe names this recipe needs
deployed alongside it. Returns [] if none."""
path = os.path.join(
os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py"
)
if not os.path.exists(path):
return []
ns: dict = {}
with open(path) as fh:
exec(compile(fh.read(), path, "exec"), ns) # noqa: S102 (trusted, in-repo)
deps = ns.get("DEPS") or []
return [str(d) for d in deps if d]
def dep_domain(parent_recipe: str, pr: str, ref: str | None, dep_recipe: str) -> str:
"""Per-run domain for a dep app. Distinct from the parent's domain so two recipes' deps don't
collide. The 6hex is derived from (parent_recipe, pr, ref, dep_recipe) — stable per run, but
different for every (parent, dep) pair so deps belonging to different parents don't collide
on the same node."""
# naming.app_domain hashes (recipe, pr, ref). Bake parent_recipe + dep_recipe into the ref so
# the hash distinguishes (parent_A,dep_X) from (parent_B,dep_X). The recipe arg drives the
# <recipe[:4]> prefix — passing dep_recipe keeps the visible prefix correct (`keyc-...`).
synthetic_ref = f"{parent_recipe}|{ref or ''}|dep|{dep_recipe}"
return naming.app_domain(dep_recipe, pr, synthetic_ref)
def write_run_state(deps_state: list[dict]) -> None:
"""Write the deps state file ($CCCI_DEPS_FILE) so dependent tests can find their dep apps via
the `deps_apps` fixture. No-op if the env var isn't set."""
path = os.environ.get("CCCI_DEPS_FILE")
if not path:
return
with open(path, "w") as f:
json.dump(deps_state, f)
def deploy_deps(
parent_recipe: str,
pr: str,
ref: str | None,
deps: Iterable[str],
meta_for: dict[str, dict] | None = None,
) -> list[dict]:
"""Deploy each declared dep, sequentially, at its per-run domain. Returns the list of state
dicts (one per dep). `meta_for` maps dep_recipe -> meta (HEALTH_PATH/HEALTH_OK/timeouts) so the
readiness wait uses per-dep config; missing dep meta falls back to (/, 200/301/302, 600s)."""
meta_for = meta_for or {}
state: list[dict] = []
for dep in deps:
domain = dep_domain(parent_recipe, pr, ref, dep)
print(f" dep: deploying {dep} -> {domain}", flush=True)
# NB: each dep_app gets a fresh deploy_count entry only on `_record_deploy` which fires
# inside `lifecycle.deploy_app`. For Phase 2 the deploy-count guard (DG4.1) counts the
# parent + its deps as distinct install events — by design, since each is a separate app.
dm = meta_for.get(dep, {})
lifecycle.deploy_app(
dep,
domain,
secrets=True,
deploy_timeout=int(dm.get("DEPLOY_TIMEOUT", 900)),
)
try:
lifecycle.wait_healthy(
domain,
ok_codes=tuple(dm.get("HEALTH_OK", (200, 301, 302))),
path=dm.get("HEALTH_PATH", "/"),
deploy_timeout=int(dm.get("DEPLOY_TIMEOUT", 600)),
http_timeout=int(dm.get("HTTP_TIMEOUT", 600)),
)
except Exception:
# If a dep fails to converge, abort the whole resolve — let the caller teardown
print(f" dep: {dep} ({domain}) failed readiness; tearing down", flush=True)
with contextlib.suppress(Exception):
lifecycle.teardown_app(domain, verify=False)
raise
state.append({"recipe": dep, "domain": domain})
print(f" dep: {dep} ready @ {domain}", flush=True)
write_run_state(state)
return state
def teardown_deps(state: list[dict]) -> None:
"""Undeploy each dep in reverse order. **VERIFY=True (F2-5 fix)**: per plan §9 teardown is
sacred — a dep that leaks containers/volumes/secrets corrupts the next run that uses the same
deterministic dep domain.
Failures are LOGGED LOUDLY (not silently suppressed) so a leak is visible in the run output;
we continue to teardown other deps so one failure doesn't strand the rest; after all attempts
we **raise** if any dep failed to fully teardown — the orchestrator's outer `finally` then
decides whether the leak is a run-failure (it should be, mirroring lifecycle.teardown_app's
own raise-on-residual behaviour at `verify=True`).
"""
errors: list[str] = []
for entry in reversed(state):
domain = entry.get("domain")
if not domain:
continue
recipe = entry.get("recipe", "?")
print(f" dep: tearing down {recipe} @ {domain}", flush=True)
try:
lifecycle.teardown_app(domain, verify=True)
except Exception as e: # noqa: BLE001 — every failure must be visible, but we want to try the rest first
msg = f"dep {recipe} @ {domain} teardown failed: {e}"
print(f" !! {msg}", flush=True)
errors.append(msg)
if errors:
raise lifecycle.TeardownError("dep teardown failures: " + " ; ".join(errors))
def load_run_state() -> list[dict]:
"""Read the current run's deps state (used by the `deps_apps` fixture). Returns [] if unset."""
path = os.environ.get("CCCI_DEPS_FILE")
if not path or not os.path.exists(path):
return []
try:
with open(path) as f:
return json.load(f) or []
except (OSError, ValueError):
return []