# The cc-ci test architecture — generic suite + additive recipe overlays (Phase 1d + 1e) Every recipe gets a **generic lifecycle test suite for free** — the floor under every run, always on by default. Recipe-specific tests *layer additively* on top: when a recipe ships an overlay for an op, the **generic still runs alongside it** (the floor is never silently lost). So `!testme` is meaningful on **any** recipe immediately (zero config), and adding recipe-specific coverage is a thin overlay that adds, it doesn't subtract. ## Architectural invariant — generic-first, custom-additive (read this first) This is the load-bearing principle of the whole test architecture. If you're maintaining cc-ci a year from now, this is the one rule that should still hold. - **Generic tests are simple and easily runnable.** They are recipe-agnostic, depend only on the recipe being deployable (install / upgrade / backup / restore against the recipe alone), and ship as the floor for every recipe. No SSO provider, no external deps, no per-recipe state scaffolding — just "does this recipe deploy and lifecycle work?" - **Generic must not depend on custom.** A custom test or a custom-tests setup (e.g. SSO/OIDC dep provisioning) **can never be a precondition for the generic tier to pass.** Concretely: the orchestrator runs all generic tiers (install → upgrade → backup → restore) against the recipe **alone, with no deps deployed**, then runs the `setup_custom_tests` step (deps + post-deps wiring) only after — and a failure there is **isolated** to the custom tier (tests tagged `@pytest.mark.requires_deps` skip with reason `"deps-not-ready"`; generic tier reports normally). See `cc-ci-plan/plan-sso-dep-testing.md` for the SSO-dep specifics. - **Custom tests are the thoroughness layer — and they cost more to maintain.** They're more thorough (authenticated APIs, multi-app flows, version-specific browser selectors, helper scripts, state-management) and *therefore* take more maintenance: an SSO provider's admin API changes, a recipe's app-launch URL contract shifts between versions, a Socket.IO primitive needs to track upstream — these are real ongoing costs that the generic tier deliberately doesn't carry. - **A future maintainer can choose to focus on the generic tier alone** and still get meaningful signal: every enrolled recipe gets *some* CI coverage from the generic floor, and the custom-additive layer can be scaled down or paused without breaking that floor. The choice of *how much* per-recipe depth to maintain is open to whoever owns cc-ci later — generic-only is a valid permanent operating mode. If anything in this codebase ever asks you to make generic depend on custom (or to put a custom precondition before a generic tier), that's the signal it's drifted off the invariant — push back and restore the separation. ## The model: tiers against one shared deployment A run is a sequence of **tiers**. The orchestrator (`runner/run_recipe_ci.py`) deploys the app **once** and runs each tier against that single live deployment, then tears it down **once** in a `finally`. The orchestrator **owns** each mutating op (upgrade/backup/restore) and runs it **exactly once**; the assertion files (generic and overlay) evaluate the *post-op* state and never perform the op themselves. Asserted every run: **`deploy-count = 1`** (one `abra app new`). ``` deploy ONCE (base version: the previous published version when an upgrade tier will run and one exists — so upgrade is a real previous→PR-head; else the target / current PR head) → INSTALL [optional pre_install seed] then generic + overlay assertions (no op) → UPGRADE [optional pre_upgrade seed] then abra app deploy --chaos to PR-head (op once) then generic + overlay assertions → BACKUP [optional pre_backup seed] then abra app backup create (op once) then generic + overlay assertions (backup-capable only) → RESTORE [optional pre_restore mutate] then abra app restore (op once) then generic + overlay assertions (backup-capable only) → CUSTOM any non-lifecycle test_*.py (only if defined) teardown ONCE (in finally) ``` Each assertion file is its own `pytest` invocation, so the run reports **per-operation** pass / fail / skip (`install / upgrade / backup / restore / custom`). The shared live domain is passed in `CCCI_APP_DOMAIN` and exposed by the `live_app` fixture; **all assertion tiers are assertion-only and never deploy or tear down** (that is the orchestrator's job). Op results an assertion needs (pre-upgrade identity, the produced backup `snapshot_id`) pass op→assertion via a run-scoped JSON state file at `$CCCI_OP_STATE_FILE`, read by `generic.op_state()`. ## The generic default (recipe-agnostic, the floor — Phase 1e HC3) Lives in the shared harness — `runner/harness/generic.py` + `tests/_generic/test_.py` — so there is no per-recipe copy-paste: - **install** (`generic.assert_serving`) — services converged (the app's *own* replicas are N/N) **and** a real HTTP(S) response in `HEALTH_OK` (which excludes 404, so a Traefik unmatched-router fallback fails) **and** the body isn't Traefik's default 404 page. A bounded poll (no bare `sleep`) so a state-mutating op settles, while a persistent failure still fails within the timeout. A CA-verified TLS handshake also runs as an **infra cert sanity check** (catches a lapsed/mis-rotated wildcard); it does **not** distinguish app-vs-fallback (Traefik serves the wildcard zone-wide) — that's the converged + non-404 check. - **upgrade** (`generic.assert_upgraded`) — assert serving after the orchestrator's chaos upgrade (HC1: `abra app deploy --chaos` of the PR-head checkout) and that the deployment is genuinely the code under test: when the intended PR-head commit is known, the deployed `coop-cloud..chaos-version` label **must match** it — direct, non-vacuous proof. (A stale prev-checkout chaos redeploy would stamp prev's commit, not the PR-head, and fail here.) When head_ref is unknown, falls back to a move check (version/image/chaos changed vs pre-upgrade). - **backup** (`generic.assert_backup_artifact`) — assert a snapshot artifact was produced (the `snapshot_id` captured by the orchestrator from `abra app backup create`). Honest limit: the generic verifies the *mechanism*, not app-specific data integrity (that's an overlay, below). - **restore** (`generic.assert_restore_healthy`) — assert the app is healthy + serving after the orchestrator's restore op (`assert_serving` polls so the post-restore reconverge settles). **Backup-capability** is auto-detected: a recipe is backup-capable iff a `compose*.yml` carries a truthy `backupbot.backup` label (override with `BACKUP_CAPABLE` in `recipe_meta.py`). For non-backup-capable recipes the backup/restore tiers are a clean **N/A skip** — not a failure. ## Recipe overlays — additive (the generic floor is always on by default) Convention: a recipe-specific tier is a file named exactly `test_install.py` / `test_upgrade.py` / `test_backup.py` / `test_restore.py`. **When present it runs ALONGSIDE the generic for that op** (both evaluate the shared post-op state); when absent, only the generic runs. Overlays are **assertion-only** — they never perform the op (the orchestrator owns it). Overlay sources, in precedence order: ``` repo-local /tests/test_.py (upstream-authoritative; gated by HC2 allowlist) > cc-ci tests//test_.py (CI-curated overlay) + generic tests/_generic/test_.py (the floor; runs alongside by default) ``` Only ONE overlay source wins for a given op (repo-local > cc-ci); the generic floor runs **in addition** unless explicitly opted out. **Custom (non-lifecycle) `test_*.py`** — any other `test_*.py` (e.g. `test_sso.py`) is **opt-in and additive**: it has no generic equivalent and runs only when present, discovered from both locations (repo-local gated by the HC2 allowlist). ### Pre-op seed hooks (per-recipe `ops.py`) A data-continuity overlay needs to seed state **before** the op (write a marker, create a DB row, etc.). Since the orchestrator owns the op, overlays place their seed in an optional per-recipe `tests//ops.py`: ```python # tests//ops.py from harness import lifecycle def pre_upgrade(domain, meta): # seed a marker before the harness performs the upgrade lifecycle.exec_in_app(domain, ["sh", "-c", "echo upgrade-survives > /path/marker"]) def pre_backup(domain, meta): # establish a known "original" state before the backup op captures it lifecycle.exec_in_app(domain, ["sh", "-c", "echo original > /path/marker"]) def pre_restore(domain, meta): # diverge from the backed-up state so a successful restore is observable lifecycle.exec_in_app(domain, ["sh", "-c", "echo mutated > /path/marker"]) ``` The orchestrator imports `ops.py` in-process (with the recipe dir on `sys.path`, so it can import sibling helpers like `kc_admin.py`) and calls `pre_(domain, meta)` immediately before performing the op. Then `test_.py` asserts the post-op state. See `tests/custom-html/` (volume marker), `tests/keycloak/` (admin-API/realm), `tests/matrix-synapse/`, `tests/lasuite-docs/` (psql in the `db` service) for worked examples. ### Opting out of the generic floor The generic runs additively by default. To skip it (e.g. when an overlay's recipe-specific check fully replaces the generic's mechanism check) set, in increasing specificity: - **env `CCCI_SKIP_GENERIC=1`** — skip generic for ALL ops (run-wide). - **env `CCCI_SKIP_GENERIC_=1`** — e.g. `CCCI_SKIP_GENERIC_UPGRADE=1` — skip generic for that one op. - **declarative in `recipe_meta.py`** — `SKIP_GENERIC = ["upgrade"]` (per-op) or `SKIP_GENERIC = ["all"]`. Opting out is per-recipe and visible in git — not a hidden global. Truthy = `1`/`true`/`yes`/`on`. ## Repo-local trust gate (HC2) — default-deny PR-author-controlled code (a recipe repo's own `tests/test_*.py`, `install_steps.sh`, `ops.py`) runs on the CI host with `/run/secrets/*` present — an untrusted-code risk. By default the harness runs **only cc-ci-authored** overlays/hooks (`tests//...`) + the generic. Repo-local code is **discovered-but-not-executed** unless its recipe appears in **`tests/repo-local-approved.txt`** (a checked-in, git-auditable allowlist — one recipe name per line; `#` comments + blank lines ignored; a lone `*` is NOT a wildcard). To approve a recipe a cc-ci maintainer reviews its repo-local tests and adds the recipe name in a cc-ci PR (override the allowlist location with `CCCI_REPO_LOCAL_APPROVED_FILE` — used by tests + cold demonstrations). The gate is centralized in `runner/harness/discovery.py` (`repo_local_approved` / `_gated`) so every discovery function (`resolve_overlay_op`, `custom_tests`, `install_steps`, `pre_op_hook`) honors it identically; unit tests (`tests/unit/test_discovery.py`) pin the behavior (approved-vs-not for every kind of code). ## Custom install-steps hook (and the graceful-generic rule) Some recipes need setup the generic flow won't do (pre-seed content, set an env/secret, run a one-off command). Provide a shell hook — `tests//install_steps.sh` (cc-ci) or repo-local `tests/install_steps.sh` (repo-local wins, gated by the HC2 allowlist). The orchestrator runs it during the install tier **after `abra app new` + env defaults, before `abra app deploy`**, with env: - `CCCI_APP_DOMAIN` — the run's app domain - `CCCI_RECIPE` — the recipe name - `CCCI_APP_ENV` — path to the app's `.env` (for `abra`-side edits) **Graceful-generic rule:** a recipe with **no** hook still attempts the generic install. A recipe that genuinely needs a step will **fail the generic install — and that's the correct, reported outcome** (per-op `install: fail`); the fix is to add the step, not to special-case the harness. Worked example: `tests/custom-html-tiny/install_steps.sh` seeds an `index.html` into the static server's content volume — without it the generic install fails 404, with it it passes. ## The HC1 upgrade path — chaos to the PR-head code under test Concretely, the upgrade tier: 1. base deployment is the **previous published version** (a clean pinned-tag deploy). 2. orchestrator captures `head_ref` (preferring `$REF` — the PR head sha; falls back to the recipe checkout HEAD for non-PR `!testme`). 3. on the upgrade tier: re-checkout the recipe to `head_ref` (the prev-tag base deploy reset the working tree), capture the pre-upgrade identity, then **`abra app deploy --chaos`** redeploys the running app at that checkout — in place, NOT a new install. 4. `assert_upgraded` (generic) asserts serving + that the deployed `coop-cloud..chaos-version` matches `head_ref` — proving the PR-head code was deployed. Reconciliation with the deploy-once guard: `abra.deploy` (chaos) is called directly, not through `deploy_app`, so `_record_deploy()` does not fire — `deploy-count` counts only `abra app new` installs and stays 1. ## How to add a recipe overlay (zero → some coverage) 1. The recipe is already testable with **zero config** — enrol it (poll list + mirror) and the generic floor runs (`docs/enroll-recipe.md`). 2. To add recipe-specific coverage, drop `tests//test_.py` (copy an existing one, e.g. `tests/custom-html/test_upgrade.py`). Assert the POST-op state — reading app state through `lifecycle.exec_in_app` (volume/DB) for data checks, not HTTP. Generic + your overlay both run. 3. If the overlay needs to seed PRE-op state (data-continuity markers, the backup→restore divergence), drop `tests//ops.py` with `pre_upgrade/pre_backup/pre_restore(domain, meta)`. 4. If the recipe needs install-time setup, add `tests//install_steps.sh`. 5. Set per-recipe knobs (health path, timeouts, opt-out) in `recipe_meta.py`. 6. **Never weaken or skip an assertion to make a run pass** — a red tier is information. Per-recipe config (`tests//recipe_meta.py`, all optional): ```python HEALTH_PATH = "/realms/master" # path that returns a healthy status (default "/") HEALTH_OK = (200,) # acceptable status codes (default 200/301/302) DEPLOY_TIMEOUT = 600 # seconds for services to converge (default 600) HTTP_TIMEOUT = 600 # seconds for the app to answer (default 300) BACKUP_CAPABLE = True # override backup-capability auto-detection (default: scan compose) EXTRA_ENV = {"KEY": "value"} # or EXTRA_ENV(domain) -> dict; extra .env keys set at deploy SKIP_GENERIC = ["upgrade"] # per-recipe declarative opt-out from generic ops ("all" = every op) ``` The harness self-tests for discovery / precedence / the HC2 allowlist live in `tests/unit/` (run: `cc-ci-run -m pytest tests/unit`); they are never picked up as overlays/custom tests.