Per operator-2026-05-28 SSO-dep plan (plan-sso-dep-testing.md). Substantial orchestrator
restructuring:
NEW LIFECYCLE ORDER:
1. Recipe deploy ALONE (no deps).
2. install / upgrade / backup / restore — recipe-only generic tiers.
3. setup_custom_tests step (NEW):
a. Deploy each declared dep + provision realm/client/test-user via harness.sso.
b. Write $CCCI_DEPS_FILE in dict shape {dep_recipe: {domain, realm, client_id, client_secret,
admin_user, admin_password, discovery_url, token_url, ...}}.
c. Run tests/<recipe>/setup_custom_tests.sh hook (jq-readable; wires OIDC env via abra
secret insert + .env edits + in-place 'abra app deploy --force --chaos').
4. CUSTOM tier with deps-ready flag; @pytest.mark.requires_deps tests skip with
'deps-not-ready: <reason>' when setup_custom_tests fails. NON-deps custom tests still run
normally — FAILURE ISOLATION (a DoD item per plan).
5. Teardown: recipe first, deps in reverse declaration order.
Harness changes:
- runner/run_recipe_ci.py: deps deploy moves from BEFORE recipe deploy to AFTER restore tier.
Adds _enrich_deps_with_sso() + _run_setup_custom_tests_hook(). DG4.1 generalised to
'one abra app new per app' (recipe + each dep); in-place redeploys (\--force) don't count.
- runner/harness/deps.py: write_run_state + load_run_state accept dict OR list shape;
deps_as_dict() coerces either to a recipe→entry map.
- runner/harness/sso.py: admin_password_inside() public re-export.
- tests/conftest.py: deps_creds fixture (full creds dict); deps_apps fixture flattens to
recipe→domain string. pytest_collection_modifyitems hook skips
\@pytest.mark.requires_deps tests when CCCI_DEPS_READY=0.
pytest_configure registers the marker.
Recipe content:
- tests/lasuite-docs/setup_custom_tests.sh: NEW hook reads $CCCI_DEPS_FILE via jq;
inserts oidc_rpcs secret at BUMPED version (v1→v2) since abra app new -S generates v1 first
and Swarm forbids overwriting; updates SECRET_OIDC_RPCS_VERSION in .env; writes 9 OIDC env
vars (REALM/DISCOVERY/AUTH/TOKEN/USERINFO/LOGOUT/JWKS/CLIENT_ID/SCOPES); ensures trailing
newline on .env so writes don't concatenate (caught a 'TIMEOUT=900OIDC_REALM=...' bug);
triggers in-place 'abra app deploy --force --chaos --no-input'.
- tests/lasuite-docs/functional/test_oidc_with_keycloak.py: refactored to consume deps_creds
fixture (no longer calls setup_keycloak_realm itself — the orchestrator does it in
setup_custom_tests). Marked \@pytest.mark.requires_deps.
Cold-verifiable on cc-ci (log /root/ccci-refactor-lasuite-r5.log):
RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
install: PASS, custom: 3 PASS incl. test_oidc_password_grant_against_dep_keycloak.
deploy-count = 2 (expect 2) — DG4.1 generalised holds.
Smoke regression: RECIPE=custom-html STAGES=install,custom → 5 PASS, deploy-count=1.
Closes DEFERRED.md #5 (lasuite-docs OIDC parity ports via this plan).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
237 lines
15 KiB
Markdown
237 lines
15 KiB
Markdown
# The cc-ci test architecture — generic suite + additive recipe overlays (Phase 1d + 1e)
|
|
|
|
Every recipe gets a **generic lifecycle test suite for free** — the floor under every run, always
|
|
on by default. Recipe-specific tests *layer additively* on top: when a recipe ships an overlay for an
|
|
op, the **generic still runs alongside it** (the floor is never silently lost). So `!testme` is
|
|
meaningful on **any** recipe immediately (zero config), and adding recipe-specific coverage is a thin
|
|
overlay that adds, it doesn't subtract.
|
|
|
|
## Architectural invariant — generic-first, custom-additive (read this first)
|
|
|
|
This is the load-bearing principle of the whole test architecture. If you're maintaining cc-ci a
|
|
year from now, this is the one rule that should still hold.
|
|
|
|
- **Generic tests are simple and easily runnable.** They are recipe-agnostic, depend only on the
|
|
recipe being deployable (install / upgrade / backup / restore against the recipe alone), and
|
|
ship as the floor for every recipe. No SSO provider, no external deps, no per-recipe state
|
|
scaffolding — just "does this recipe deploy and lifecycle work?"
|
|
- **Generic must not depend on custom.** A custom test or a custom-tests setup (e.g. SSO/OIDC dep
|
|
provisioning) **can never be a precondition for the generic tier to pass.** Concretely: the
|
|
orchestrator runs all generic tiers (install → upgrade → backup → restore) against the recipe
|
|
**alone, with no deps deployed**, then runs the `setup_custom_tests` step (deps + post-deps
|
|
wiring) only after — and a failure there is **isolated** to the custom tier (tests tagged
|
|
`@pytest.mark.requires_deps` skip with reason `"deps-not-ready"`; generic tier reports
|
|
normally). See `cc-ci-plan/plan-sso-dep-testing.md` for the SSO-dep specifics.
|
|
- **Custom tests are the thoroughness layer — and they cost more to maintain.** They're more
|
|
thorough (authenticated APIs, multi-app flows, version-specific browser selectors, helper
|
|
scripts, state-management) and *therefore* take more maintenance: an SSO provider's admin API
|
|
changes, a recipe's app-launch URL contract shifts between versions, a Socket.IO primitive
|
|
needs to track upstream — these are real ongoing costs that the generic tier deliberately
|
|
doesn't carry.
|
|
- **A future maintainer can choose to focus on the generic tier alone** and still get meaningful
|
|
signal: every enrolled recipe gets *some* CI coverage from the generic floor, and the
|
|
custom-additive layer can be scaled down or paused without breaking that floor. The choice of
|
|
*how much* per-recipe depth to maintain is open to whoever owns cc-ci later — generic-only is
|
|
a valid permanent operating mode.
|
|
|
|
If anything in this codebase ever asks you to make generic depend on custom (or to put a custom
|
|
precondition before a generic tier), that's the signal it's drifted off the invariant — push back
|
|
and restore the separation.
|
|
|
|
## The model: tiers against one shared deployment
|
|
|
|
A run is a sequence of **tiers**. The orchestrator (`runner/run_recipe_ci.py`) deploys the app
|
|
**once** and runs each tier against that single live deployment, then tears it down **once** in a
|
|
`finally`. The orchestrator **owns** each mutating op (upgrade/backup/restore) and runs it **exactly
|
|
once**; the assertion files (generic and overlay) evaluate the *post-op* state and never perform the
|
|
op themselves. Asserted every run: **`deploy-count = 1`** (one `abra app new`).
|
|
|
|
```
|
|
deploy ONCE (base version: the previous published version when an upgrade tier will run and one
|
|
exists — so upgrade is a real previous→PR-head; else the target / current PR head)
|
|
→ INSTALL [optional pre_install seed] then generic + overlay assertions (no op)
|
|
→ UPGRADE [optional pre_upgrade seed] then abra app deploy --chaos to PR-head (op once)
|
|
then generic + overlay assertions
|
|
→ BACKUP [optional pre_backup seed] then abra app backup create (op once)
|
|
then generic + overlay assertions (backup-capable only)
|
|
→ RESTORE [optional pre_restore mutate] then abra app restore (op once)
|
|
then generic + overlay assertions (backup-capable only)
|
|
→ CUSTOM any non-lifecycle test_*.py (only if defined)
|
|
teardown ONCE (in finally)
|
|
```
|
|
|
|
Each assertion file is its own `pytest` invocation, so the run reports **per-operation** pass / fail
|
|
/ skip (`install / upgrade / backup / restore / custom`). The shared live domain is passed in
|
|
`CCCI_APP_DOMAIN` and exposed by the `live_app` fixture; **all assertion tiers are assertion-only and
|
|
never deploy or tear down** (that is the orchestrator's job). Op results an assertion needs
|
|
(pre-upgrade identity, the produced backup `snapshot_id`) pass op→assertion via a run-scoped JSON
|
|
state file at `$CCCI_OP_STATE_FILE`, read by `generic.op_state()`.
|
|
|
|
## The generic default (recipe-agnostic, the floor — Phase 1e HC3)
|
|
|
|
Lives in the shared harness — `runner/harness/generic.py` + `tests/_generic/test_<op>.py` — so there
|
|
is no per-recipe copy-paste:
|
|
|
|
- **install** (`generic.assert_serving`) — services converged (the app's *own* replicas are N/N) **and**
|
|
a real HTTP(S) response in `HEALTH_OK` (which excludes 404, so a Traefik unmatched-router fallback
|
|
fails) **and** the body isn't Traefik's default 404 page. A bounded poll (no bare `sleep`) so a
|
|
state-mutating op settles, while a persistent failure still fails within the timeout. A CA-verified
|
|
TLS handshake also runs as an **infra cert sanity check** (catches a lapsed/mis-rotated wildcard);
|
|
it does **not** distinguish app-vs-fallback (Traefik serves the wildcard zone-wide) — that's the
|
|
converged + non-404 check.
|
|
- **upgrade** (`generic.assert_upgraded`) — assert serving after the orchestrator's chaos upgrade
|
|
(HC1: `abra app deploy --chaos` of the PR-head checkout) and that the deployment is genuinely the
|
|
code under test: when the intended PR-head commit is known, the deployed
|
|
`coop-cloud.<stack>.chaos-version` label **must match** it — direct, non-vacuous proof. (A stale
|
|
prev-checkout chaos redeploy would stamp prev's commit, not the PR-head, and fail here.) When
|
|
head_ref is unknown, falls back to a move check (version/image/chaos changed vs pre-upgrade).
|
|
- **backup** (`generic.assert_backup_artifact`) — assert a snapshot artifact was produced (the
|
|
`snapshot_id` captured by the orchestrator from `abra app backup create`). Honest limit: the
|
|
generic verifies the *mechanism*, not app-specific data integrity (that's an overlay, below).
|
|
- **restore** (`generic.assert_restore_healthy`) — assert the app is healthy + serving after the
|
|
orchestrator's restore op (`assert_serving` polls so the post-restore reconverge settles).
|
|
|
|
**Backup-capability** is auto-detected: a recipe is backup-capable iff a `compose*.yml` carries a
|
|
truthy `backupbot.backup` label (override with `BACKUP_CAPABLE` in `recipe_meta.py`). For
|
|
non-backup-capable recipes the backup/restore tiers are a clean **N/A skip** — not a failure.
|
|
|
|
## Recipe overlays — additive (the generic floor is always on by default)
|
|
|
|
Convention: a recipe-specific tier is a file named exactly `test_install.py` / `test_upgrade.py` /
|
|
`test_backup.py` / `test_restore.py`. **When present it runs ALONGSIDE the generic for that op**
|
|
(both evaluate the shared post-op state); when absent, only the generic runs. Overlays are
|
|
**assertion-only** — they never perform the op (the orchestrator owns it).
|
|
|
|
Overlay sources, in precedence order:
|
|
|
|
```
|
|
repo-local <recipe-repo>/tests/test_<op>.py (upstream-authoritative; gated by HC2 allowlist)
|
|
> cc-ci tests/<recipe>/test_<op>.py (CI-curated overlay)
|
|
+ generic tests/_generic/test_<op>.py (the floor; runs alongside by default)
|
|
```
|
|
|
|
Only ONE overlay source wins for a given op (repo-local > cc-ci); the generic floor runs **in
|
|
addition** unless explicitly opted out.
|
|
|
|
**Custom (non-lifecycle) `test_*.py`** — any other `test_*.py` (e.g. `test_sso.py`) is **opt-in and
|
|
additive**: it has no generic equivalent and runs only when present, discovered from both locations
|
|
(repo-local gated by the HC2 allowlist).
|
|
|
|
### Pre-op seed hooks (per-recipe `ops.py`)
|
|
|
|
A data-continuity overlay needs to seed state **before** the op (write a marker, create a DB row,
|
|
etc.). Since the orchestrator owns the op, overlays place their seed in an optional per-recipe
|
|
`tests/<recipe>/ops.py`:
|
|
|
|
```python
|
|
# tests/<recipe>/ops.py
|
|
from harness import lifecycle
|
|
|
|
def pre_upgrade(domain, meta):
|
|
# seed a marker before the harness performs the upgrade
|
|
lifecycle.exec_in_app(domain, ["sh", "-c", "echo upgrade-survives > /path/marker"])
|
|
|
|
def pre_backup(domain, meta):
|
|
# establish a known "original" state before the backup op captures it
|
|
lifecycle.exec_in_app(domain, ["sh", "-c", "echo original > /path/marker"])
|
|
|
|
def pre_restore(domain, meta):
|
|
# diverge from the backed-up state so a successful restore is observable
|
|
lifecycle.exec_in_app(domain, ["sh", "-c", "echo mutated > /path/marker"])
|
|
```
|
|
|
|
The orchestrator imports `ops.py` in-process (with the recipe dir on `sys.path`, so it can import
|
|
sibling helpers like `kc_admin.py`) and calls `pre_<op>(domain, meta)` immediately before performing
|
|
the op. Then `test_<op>.py` asserts the post-op state. See `tests/custom-html/` (volume marker),
|
|
`tests/keycloak/` (admin-API/realm), `tests/matrix-synapse/`, `tests/lasuite-docs/` (psql in the `db`
|
|
service) for worked examples.
|
|
|
|
### Opting out of the generic floor
|
|
|
|
The generic runs additively by default. To skip it (e.g. when an overlay's recipe-specific check
|
|
fully replaces the generic's mechanism check) set, in increasing specificity:
|
|
|
|
- **env `CCCI_SKIP_GENERIC=1`** — skip generic for ALL ops (run-wide).
|
|
- **env `CCCI_SKIP_GENERIC_<OP>=1`** — e.g. `CCCI_SKIP_GENERIC_UPGRADE=1` — skip generic for that one op.
|
|
- **declarative in `recipe_meta.py`** — `SKIP_GENERIC = ["upgrade"]` (per-op) or `SKIP_GENERIC = ["all"]`.
|
|
|
|
Opting out is per-recipe and visible in git — not a hidden global. Truthy = `1`/`true`/`yes`/`on`.
|
|
|
|
## Repo-local trust gate (HC2) — default-deny
|
|
|
|
PR-author-controlled code (a recipe repo's own `tests/test_*.py`, `install_steps.sh`, `ops.py`) runs
|
|
on the CI host with `/run/secrets/*` present — an untrusted-code risk. By default the harness runs
|
|
**only cc-ci-authored** overlays/hooks (`tests/<recipe>/...`) + the generic. Repo-local code is
|
|
**discovered-but-not-executed** unless its recipe appears in **`tests/repo-local-approved.txt`** (a
|
|
checked-in, git-auditable allowlist — one recipe name per line; `#` comments + blank lines ignored;
|
|
a lone `*` is NOT a wildcard). To approve a recipe a cc-ci maintainer reviews its repo-local tests
|
|
and adds the recipe name in a cc-ci PR (override the allowlist location with
|
|
`CCCI_REPO_LOCAL_APPROVED_FILE` — used by tests + cold demonstrations).
|
|
|
|
The gate is centralized in `runner/harness/discovery.py` (`repo_local_approved` /
|
|
`_gated`) so every discovery function (`resolve_overlay_op`, `custom_tests`, `install_steps`,
|
|
`pre_op_hook`) honors it identically; unit tests (`tests/unit/test_discovery.py`) pin the behavior
|
|
(approved-vs-not for every kind of code).
|
|
|
|
## Custom install-steps hook (and the graceful-generic rule)
|
|
|
|
Some recipes need setup the generic flow won't do (pre-seed content, set an env/secret, run a one-off
|
|
command). Provide a shell hook — `tests/<recipe>/install_steps.sh` (cc-ci) or repo-local
|
|
`tests/install_steps.sh` (repo-local wins, gated by the HC2 allowlist). The orchestrator runs it
|
|
during the install tier **after `abra app new` + env defaults, before `abra app deploy`**, with env:
|
|
|
|
- `CCCI_APP_DOMAIN` — the run's app domain
|
|
- `CCCI_RECIPE` — the recipe name
|
|
- `CCCI_APP_ENV` — path to the app's `.env` (for `abra`-side edits)
|
|
|
|
**Graceful-generic rule:** a recipe with **no** hook still attempts the generic install. A recipe
|
|
that genuinely needs a step will **fail the generic install — and that's the correct, reported
|
|
outcome** (per-op `install: fail`); the fix is to add the step, not to special-case the harness.
|
|
Worked example: `tests/custom-html-tiny/install_steps.sh` seeds an `index.html` into the static
|
|
server's content volume — without it the generic install fails 404, with it it passes.
|
|
|
|
## The HC1 upgrade path — chaos to the PR-head code under test
|
|
|
|
Concretely, the upgrade tier:
|
|
|
|
1. base deployment is the **previous published version** (a clean pinned-tag deploy).
|
|
2. orchestrator captures `head_ref` (preferring `$REF` — the PR head sha; falls back to the recipe
|
|
checkout HEAD for non-PR `!testme`).
|
|
3. on the upgrade tier: re-checkout the recipe to `head_ref` (the prev-tag base deploy reset the
|
|
working tree), capture the pre-upgrade identity, then **`abra app deploy --chaos`** redeploys the
|
|
running app at that checkout — in place, NOT a new install.
|
|
4. `assert_upgraded` (generic) asserts serving + that the deployed
|
|
`coop-cloud.<stack>.chaos-version` matches `head_ref` — proving the PR-head code was deployed.
|
|
|
|
Reconciliation with the deploy-once guard: `abra.deploy` (chaos) is called directly, not through
|
|
`deploy_app`, so `_record_deploy()` does not fire — `deploy-count` counts only `abra app new`
|
|
installs and stays 1.
|
|
|
|
## How to add a recipe overlay (zero → some coverage)
|
|
|
|
1. The recipe is already testable with **zero config** — enrol it (poll list + mirror) and the
|
|
generic floor runs (`docs/enroll-recipe.md`).
|
|
2. To add recipe-specific coverage, drop `tests/<recipe>/test_<op>.py` (copy an existing one, e.g.
|
|
`tests/custom-html/test_upgrade.py`). Assert the POST-op state — reading app state through
|
|
`lifecycle.exec_in_app` (volume/DB) for data checks, not HTTP. Generic + your overlay both run.
|
|
3. If the overlay needs to seed PRE-op state (data-continuity markers, the backup→restore
|
|
divergence), drop `tests/<recipe>/ops.py` with `pre_upgrade/pre_backup/pre_restore(domain, meta)`.
|
|
4. If the recipe needs install-time setup, add `tests/<recipe>/install_steps.sh`.
|
|
5. Set per-recipe knobs (health path, timeouts, opt-out) in `recipe_meta.py`.
|
|
6. **Never weaken or skip an assertion to make a run pass** — a red tier is information.
|
|
|
|
Per-recipe config (`tests/<recipe>/recipe_meta.py`, all optional):
|
|
|
|
```python
|
|
HEALTH_PATH = "/realms/master" # path that returns a healthy status (default "/")
|
|
HEALTH_OK = (200,) # acceptable status codes (default 200/301/302)
|
|
DEPLOY_TIMEOUT = 600 # seconds for services to converge (default 600)
|
|
HTTP_TIMEOUT = 600 # seconds for the app to answer (default 300)
|
|
BACKUP_CAPABLE = True # override backup-capability auto-detection (default: scan compose)
|
|
EXTRA_ENV = {"KEY": "value"} # or EXTRA_ENV(domain) -> dict; extra .env keys set at deploy
|
|
SKIP_GENERIC = ["upgrade"] # per-recipe declarative opt-out from generic ops ("all" = every op)
|
|
```
|
|
|
|
The harness self-tests for discovery / precedence / the HC2 allowlist live in `tests/unit/` (run:
|
|
`cc-ci-run -m pytest tests/unit`); they are never picked up as overlays/custom tests.
|