Files
cc-ci/docs/testing.md
autonomic-bot da558ca946
All checks were successful
continuous-integration/drone/push Build is passing
docs: P6 — rewrite customization docs to the restructured end state (rcust)
recipe-customization.md: review spec -> reference. Single registry-backed loader + validation
rules + HookCtx convention (§4); generated key table kept byte-identical (sync test); §5 end-state
shape (op_state/deps fixtures, ctx ops.py, placement rule, first-class compose.ccci.yml, no
setup_custom_tests.sh); §7 manifest block + dev-only CCCI_SKIP_GENERIC*; §8 rewritten as
restructure outcomes (R1/R2/R3/R5/R6/R7/R8 resolved + how, R4 mitigated by manifest, R9
rejected-by-decision); §9 index updated to the new symbols.

testing.md: install-time deps isolation replaces the setup_custom_tests step in the invariant
(generic still never depends on custom — failure isolation via requires_deps/F2-11); ops.py
example to pre_<op>(ctx); placement rule; generic opt-out now documented LOCAL-DEV-ONLY env with
CI !! warning (declarative SKIP_GENERIC gone); partial key list points at the generated table.

enroll-recipe.md: tree + worked examples updated (lasuite-docs install-time OIDC wiring +
install_steps.sh; mumble post-F2-14c shape — UPGRADE_EXTRA_ENV native overlay, private _
constants, no CHAOS_BASE_DEPLOY); deps fixture (entry.domain) replaces deps_apps; ctx hook
signatures; compose.ccci.yml first-class bullet; key list points at the generated table.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 19:07:41 +00:00

244 lines
15 KiB
Markdown

# The cc-ci test architecture — generic suite + additive recipe overlays (Phase 1d + 1e)
Every recipe gets a **generic lifecycle test suite for free** — the floor under every run, always
on by default. Recipe-specific tests *layer additively* on top: when a recipe ships an overlay for an
op, the **generic still runs alongside it** (the floor is never silently lost). So `!testme` is
meaningful on **any** recipe immediately (zero config), and adding recipe-specific coverage is a thin
overlay that adds, it doesn't subtract.
## Architectural invariant — generic-first, custom-additive (read this first)
This is the load-bearing principle of the whole test architecture. If you're maintaining cc-ci a
year from now, this is the one rule that should still hold.
- **Generic tests are simple and easily runnable.** They are recipe-agnostic, depend only on the
recipe being deployable (install / upgrade / backup / restore against the recipe alone), and
ship as the floor for every recipe. No SSO provider, no external deps, no per-recipe state
scaffolding — just "does this recipe deploy and lifecycle work?"
- **Generic must not depend on custom.** A custom test or a custom-tests setup (e.g. SSO/OIDC dep
provisioning) **can never be a precondition for the generic tier to pass.** Concretely: deps are
provisioned BEFORE the single deploy (so `install_steps.sh` can wire OIDC env into that one
deploy), but a dep-provisioning failure is **isolated** to the custom tier — the recipe still
deploys alone, every generic tier (install → upgrade → backup → restore) runs normally, and
tests tagged `@pytest.mark.requires_deps` skip with reason `"deps-not-ready"` (a counted,
reported skip — F2-11). A deps failure can never fail or block a generic tier. See
`cc-ci-plan/plan-sso-dep-testing.md` for the SSO-dep specifics.
- **Custom tests are the thoroughness layer — and they cost more to maintain.** They're more
thorough (authenticated APIs, multi-app flows, version-specific browser selectors, helper
scripts, state-management) and *therefore* take more maintenance: an SSO provider's admin API
changes, a recipe's app-launch URL contract shifts between versions, a Socket.IO primitive
needs to track upstream — these are real ongoing costs that the generic tier deliberately
doesn't carry.
- **A future maintainer can choose to focus on the generic tier alone** and still get meaningful
signal: every enrolled recipe gets *some* CI coverage from the generic floor, and the
custom-additive layer can be scaled down or paused without breaking that floor. The choice of
*how much* per-recipe depth to maintain is open to whoever owns cc-ci later — generic-only is
a valid permanent operating mode.
If anything in this codebase ever asks you to make generic depend on custom (or to put a custom
precondition before a generic tier), that's the signal it's drifted off the invariant — push back
and restore the separation.
## The model: tiers against one shared deployment
A run is a sequence of **tiers**. The orchestrator (`runner/run_recipe_ci.py`) deploys the app
**once** and runs each tier against that single live deployment, then tears it down **once** in a
`finally`. The orchestrator **owns** each mutating op (upgrade/backup/restore) and runs it **exactly
once**; the assertion files (generic and overlay) evaluate the *post-op* state and never perform the
op themselves. Asserted every run: **`deploy-count = 1`** (one `abra app new`).
```
deploy ONCE (base version: the previous published version when an upgrade tier will run and one
exists — so upgrade is a real previous→PR-head; else the target / current PR head)
→ INSTALL [optional pre_install seed] then generic + overlay assertions (no op)
→ UPGRADE [optional pre_upgrade seed] then abra app deploy --chaos to PR-head (op once)
then generic + overlay assertions
→ BACKUP [optional pre_backup seed] then abra app backup create (op once)
then generic + overlay assertions (backup-capable only)
→ RESTORE [optional pre_restore mutate] then abra app restore (op once)
then generic + overlay assertions (backup-capable only)
→ CUSTOM any non-lifecycle test_*.py (only if defined)
teardown ONCE (in finally)
```
Each assertion file is its own `pytest` invocation, so the run reports **per-operation** pass / fail
/ skip (`install / upgrade / backup / restore / custom`). The shared live domain is passed in
`CCCI_APP_DOMAIN` and exposed by the `live_app` fixture; **all assertion tiers are assertion-only and
never deploy or tear down** (that is the orchestrator's job). Op results an assertion needs
(pre-upgrade identity, the produced backup `snapshot_id`) pass op→assertion via a run-scoped JSON
state file at `$CCCI_OP_STATE_FILE`, read by `generic.op_state()`.
## The generic default (recipe-agnostic, the floor — Phase 1e HC3)
Lives in the shared harness — `runner/harness/generic.py` + `tests/_generic/test_<op>.py` — so there
is no per-recipe copy-paste:
- **install** (`generic.assert_serving`) — services converged (the app's *own* replicas are N/N) **and**
a real HTTP(S) response in `HEALTH_OK` (which excludes 404, so a Traefik unmatched-router fallback
fails) **and** the body isn't Traefik's default 404 page. A bounded poll (no bare `sleep`) so a
state-mutating op settles, while a persistent failure still fails within the timeout. A CA-verified
TLS handshake also runs as an **infra cert sanity check** (catches a lapsed/mis-rotated wildcard);
it does **not** distinguish app-vs-fallback (Traefik serves the wildcard zone-wide) — that's the
converged + non-404 check.
- **upgrade** (`generic.assert_upgraded`) — assert serving after the orchestrator's chaos upgrade
(HC1: `abra app deploy --chaos` of the PR-head checkout) and that the deployment is genuinely the
code under test: when the intended PR-head commit is known, the deployed
`coop-cloud.<stack>.chaos-version` label **must match** it — direct, non-vacuous proof. (A stale
prev-checkout chaos redeploy would stamp prev's commit, not the PR-head, and fail here.) When
head_ref is unknown, falls back to a move check (version/image/chaos changed vs pre-upgrade).
- **backup** (`generic.assert_backup_artifact`) — assert a snapshot artifact was produced (the
`snapshot_id` captured by the orchestrator from `abra app backup create`). Honest limit: the
generic verifies the *mechanism*, not app-specific data integrity (that's an overlay, below).
- **restore** (`generic.assert_restore_healthy`) — assert the app is healthy + serving after the
orchestrator's restore op (`assert_serving` polls so the post-restore reconverge settles).
**Backup-capability** is auto-detected: a recipe is backup-capable iff a `compose*.yml` carries a
truthy `backupbot.backup` label (override with `BACKUP_CAPABLE` in `recipe_meta.py`). For
non-backup-capable recipes the backup/restore tiers are a clean **N/A skip** — not a failure.
## Recipe overlays — additive (the generic floor is always on by default)
Convention: a recipe-specific tier is a file named exactly `test_install.py` / `test_upgrade.py` /
`test_backup.py` / `test_restore.py`. **When present it runs ALONGSIDE the generic for that op**
(both evaluate the shared post-op state); when absent, only the generic runs. Overlays are
**assertion-only** — they never perform the op (the orchestrator owns it).
Overlay sources, in precedence order:
```
repo-local <recipe-repo>/tests/test_<op>.py (upstream-authoritative; gated by HC2 allowlist)
> cc-ci tests/<recipe>/test_<op>.py (CI-curated overlay)
+ generic tests/_generic/test_<op>.py (the floor; runs alongside by default)
```
Only ONE overlay source wins for a given op (repo-local > cc-ci); the generic floor runs **in
addition** unless explicitly opted out.
**Custom (non-lifecycle) tests** — e.g. `functional/test_sso.py` — are **opt-in and additive**:
they have no generic equivalent and run only when present, discovered from both locations
(repo-local gated by the HC2 allowlist). Placement rule: custom tests live ONLY under
`functional/` or `playwright/`; a top-level `test_*.py` is a lifecycle overlay and nothing else
(top-level non-lifecycle files are not discovered).
### Pre-op seed hooks (per-recipe `ops.py`)
A data-continuity overlay needs to seed state **before** the op (write a marker, create a DB row,
etc.). Since the orchestrator owns the op, overlays place their seed in an optional per-recipe
`tests/<recipe>/ops.py`:
```python
# tests/<recipe>/ops.py
from harness import lifecycle
def pre_upgrade(ctx):
# seed a marker before the harness performs the upgrade
lifecycle.exec_in_app(ctx.domain, ["sh", "-c", "echo upgrade-survives > /path/marker"])
def pre_backup(ctx):
# establish a known "original" state before the backup op captures it
lifecycle.exec_in_app(ctx.domain, ["sh", "-c", "echo original > /path/marker"])
def pre_restore(ctx):
# diverge from the backed-up state so a successful restore is observable
lifecycle.exec_in_app(ctx.domain, ["sh", "-c", "echo mutated > /path/marker"])
```
The orchestrator imports `ops.py` in-process (with the recipe dir on `sys.path`, so it can import
sibling helpers like `kc_admin.py`) and calls `pre_<op>(ctx)` immediately before performing the
op — `ctx` is the uniform `HookCtx` every recipe hook receives (`.domain`, `.base_url`, `.meta`,
`.deps`, `.op``docs/recipe-customization.md` §4.1). Then `test_<op>.py` asserts the post-op
state. See `tests/custom-html/` (volume marker),
`tests/keycloak/` (admin-API/realm), `tests/matrix-synapse/`, `tests/lasuite-docs/` (psql in the `db`
service) for worked examples.
### Opting out of the generic floor (LOCAL-DEV-ONLY)
The generic runs additively by default and there is **no declarative opt-out** — no recipe can
ship without the floor. For local iteration only (e.g. re-running one tier while developing an
overlay), two env escape hatches exist:
- **env `CCCI_SKIP_GENERIC=1`** — skip generic for ALL ops (run-wide).
- **env `CCCI_SKIP_GENERIC_<OP>=1`** — e.g. `CCCI_SKIP_GENERIC_UPGRADE=1` — skip generic for that one op.
Truthy = `1`/`true`/`yes`/`on`. If either is active in a CI (drone) run, the run prints a loud
`!!` warning and the customization manifest records it (`docs/recipe-customization.md` §7).
## Repo-local trust gate (HC2) — default-deny
PR-author-controlled code (a recipe repo's own `tests/test_*.py`, `install_steps.sh`, `ops.py`) runs
on the CI host with `/run/secrets/*` present — an untrusted-code risk. By default the harness runs
**only cc-ci-authored** overlays/hooks (`tests/<recipe>/...`) + the generic. Repo-local code is
**discovered-but-not-executed** unless its recipe appears in **`tests/repo-local-approved.txt`** (a
checked-in, git-auditable allowlist — one recipe name per line; `#` comments + blank lines ignored;
a lone `*` is NOT a wildcard). To approve a recipe a cc-ci maintainer reviews its repo-local tests
and adds the recipe name in a cc-ci PR (override the allowlist location with
`CCCI_REPO_LOCAL_APPROVED_FILE` — used by tests + cold demonstrations).
The gate is centralized in `runner/harness/discovery.py` (`repo_local_approved` /
`_gated`) so every discovery function (`resolve_overlay_op`, `custom_tests`, `install_steps`,
`pre_op_hook`) honors it identically; unit tests (`tests/unit/test_discovery.py`) pin the behavior
(approved-vs-not for every kind of code).
## Custom install-steps hook (and the graceful-generic rule)
Some recipes need setup the generic flow won't do (pre-seed content, set an env/secret, run a one-off
command). Provide a shell hook — `tests/<recipe>/install_steps.sh` (cc-ci) or repo-local
`tests/install_steps.sh` (repo-local wins, gated by the HC2 allowlist). The orchestrator runs it
during the install tier **after `abra app new` + env defaults, before `abra app deploy`**, with env:
- `CCCI_APP_DOMAIN` — the run's app domain
- `CCCI_RECIPE` — the recipe name
- `CCCI_APP_ENV` — path to the app's `.env` (for `abra`-side edits)
**Graceful-generic rule:** a recipe with **no** hook still attempts the generic install. A recipe
that genuinely needs a step will **fail the generic install — and that's the correct, reported
outcome** (per-op `install: fail`); the fix is to add the step, not to special-case the harness.
Worked example: `tests/custom-html-tiny/install_steps.sh` seeds an `index.html` into the static
server's content volume — without it the generic install fails 404, with it it passes.
## The HC1 upgrade path — chaos to the PR-head code under test
Concretely, the upgrade tier:
1. base deployment is the **previous published version** (a clean pinned-tag deploy).
2. orchestrator captures `head_ref` (preferring `$REF` — the PR head sha; falls back to the recipe
checkout HEAD for non-PR `!testme`).
3. on the upgrade tier: re-checkout the recipe to `head_ref` (the prev-tag base deploy reset the
working tree), capture the pre-upgrade identity, then **`abra app deploy --chaos`** redeploys the
running app at that checkout — in place, NOT a new install.
4. `assert_upgraded` (generic) asserts serving + that the deployed
`coop-cloud.<stack>.chaos-version` matches `head_ref` — proving the PR-head code was deployed.
Reconciliation with the deploy-once guard: `abra.deploy` (chaos) is called directly, not through
`deploy_app`, so `_record_deploy()` does not fire — `deploy-count` counts only `abra app new`
installs and stays 1.
## How to add a recipe overlay (zero → some coverage)
1. The recipe is already testable with **zero config** — enrol it (poll list + mirror) and the
generic floor runs (`docs/enroll-recipe.md`).
2. To add recipe-specific coverage, drop `tests/<recipe>/test_<op>.py` (copy an existing one, e.g.
`tests/custom-html/test_upgrade.py`). Assert the POST-op state — reading app state through
`lifecycle.exec_in_app` (volume/DB) for data checks, not HTTP. Generic + your overlay both run.
3. If the overlay needs to seed PRE-op state (data-continuity markers, the backup→restore
divergence), drop `tests/<recipe>/ops.py` with `pre_upgrade/pre_backup/pre_restore(ctx)`.
4. If the recipe needs install-time setup, add `tests/<recipe>/install_steps.sh`.
5. Set per-recipe knobs (health path, timeouts) in `recipe_meta.py`.
6. **Never weaken or skip an assertion to make a run pass** — a red tier is information.
Per-recipe config (`tests/<recipe>/recipe_meta.py`, all optional — the COMPLETE key reference is
the generated table in `docs/recipe-customization.md` §4; unknown keys are hard errors, private
constants are underscore-prefixed):
```python
HEALTH_PATH = "/realms/master" # path that returns a healthy status (default "/")
HEALTH_OK = (200,) # acceptable status codes (default 200/301/302)
DEPLOY_TIMEOUT = 600 # seconds for services to converge (default 600)
HTTP_TIMEOUT = 600 # seconds for the app to answer (default 300)
BACKUP_CAPABLE = True # override backup-capability auto-detection (default: scan compose)
EXTRA_ENV = {"KEY": "value"} # or EXTRA_ENV(ctx) -> dict; extra .env keys set at deploy
```
The harness self-tests for discovery / precedence / the HC2 allowlist live in `tests/unit/` (run:
`cc-ci-run -m pytest tests/unit`); they are never picked up as overlays/custom tests.