Files
cc-ci/docs/testing.md
autonomic-bot 41ede13042 feat(2): refactor — SSO-dep plan refinement (deps AFTER generic + setup_custom_tests + failure isolation)
Per operator-2026-05-28 SSO-dep plan (plan-sso-dep-testing.md). Substantial orchestrator
restructuring:

NEW LIFECYCLE ORDER:
  1. Recipe deploy ALONE (no deps).
  2. install / upgrade / backup / restore — recipe-only generic tiers.
  3. setup_custom_tests step (NEW):
     a. Deploy each declared dep + provision realm/client/test-user via harness.sso.
     b. Write $CCCI_DEPS_FILE in dict shape {dep_recipe: {domain, realm, client_id, client_secret,
        admin_user, admin_password, discovery_url, token_url, ...}}.
     c. Run tests/<recipe>/setup_custom_tests.sh hook (jq-readable; wires OIDC env via abra
        secret insert + .env edits + in-place 'abra app deploy --force --chaos').
  4. CUSTOM tier with deps-ready flag; @pytest.mark.requires_deps tests skip with
     'deps-not-ready: <reason>' when setup_custom_tests fails. NON-deps custom tests still run
     normally — FAILURE ISOLATION (a DoD item per plan).
  5. Teardown: recipe first, deps in reverse declaration order.

Harness changes:
- runner/run_recipe_ci.py: deps deploy moves from BEFORE recipe deploy to AFTER restore tier.
  Adds _enrich_deps_with_sso() + _run_setup_custom_tests_hook(). DG4.1 generalised to
  'one abra app new per app' (recipe + each dep); in-place redeploys (\--force) don't count.
- runner/harness/deps.py: write_run_state + load_run_state accept dict OR list shape;
  deps_as_dict() coerces either to a recipe→entry map.
- runner/harness/sso.py: admin_password_inside() public re-export.
- tests/conftest.py: deps_creds fixture (full creds dict); deps_apps fixture flattens to
  recipe→domain string. pytest_collection_modifyitems hook skips
  \@pytest.mark.requires_deps tests when CCCI_DEPS_READY=0.
  pytest_configure registers the marker.

Recipe content:
- tests/lasuite-docs/setup_custom_tests.sh: NEW hook reads $CCCI_DEPS_FILE via jq;
  inserts oidc_rpcs secret at BUMPED version (v1→v2) since abra app new -S generates v1 first
  and Swarm forbids overwriting; updates SECRET_OIDC_RPCS_VERSION in .env; writes 9 OIDC env
  vars (REALM/DISCOVERY/AUTH/TOKEN/USERINFO/LOGOUT/JWKS/CLIENT_ID/SCOPES); ensures trailing
  newline on .env so writes don't concatenate (caught a 'TIMEOUT=900OIDC_REALM=...' bug);
  triggers in-place 'abra app deploy --force --chaos --no-input'.
- tests/lasuite-docs/functional/test_oidc_with_keycloak.py: refactored to consume deps_creds
  fixture (no longer calls setup_keycloak_realm itself — the orchestrator does it in
  setup_custom_tests). Marked \@pytest.mark.requires_deps.

Cold-verifiable on cc-ci (log /root/ccci-refactor-lasuite-r5.log):
  RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
  install: PASS, custom: 3 PASS incl. test_oidc_password_grant_against_dep_keycloak.
  deploy-count = 2 (expect 2) — DG4.1 generalised holds.
  Smoke regression: RECIPE=custom-html STAGES=install,custom → 5 PASS, deploy-count=1.

Closes DEFERRED.md #5 (lasuite-docs OIDC parity ports via this plan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 19:11:42 +01:00

15 KiB

The cc-ci test architecture — generic suite + additive recipe overlays (Phase 1d + 1e)

Every recipe gets a generic lifecycle test suite for free — the floor under every run, always on by default. Recipe-specific tests layer additively on top: when a recipe ships an overlay for an op, the generic still runs alongside it (the floor is never silently lost). So !testme is meaningful on any recipe immediately (zero config), and adding recipe-specific coverage is a thin overlay that adds, it doesn't subtract.

Architectural invariant — generic-first, custom-additive (read this first)

This is the load-bearing principle of the whole test architecture. If you're maintaining cc-ci a year from now, this is the one rule that should still hold.

  • Generic tests are simple and easily runnable. They are recipe-agnostic, depend only on the recipe being deployable (install / upgrade / backup / restore against the recipe alone), and ship as the floor for every recipe. No SSO provider, no external deps, no per-recipe state scaffolding — just "does this recipe deploy and lifecycle work?"
  • Generic must not depend on custom. A custom test or a custom-tests setup (e.g. SSO/OIDC dep provisioning) can never be a precondition for the generic tier to pass. Concretely: the orchestrator runs all generic tiers (install → upgrade → backup → restore) against the recipe alone, with no deps deployed, then runs the setup_custom_tests step (deps + post-deps wiring) only after — and a failure there is isolated to the custom tier (tests tagged @pytest.mark.requires_deps skip with reason "deps-not-ready"; generic tier reports normally). See cc-ci-plan/plan-sso-dep-testing.md for the SSO-dep specifics.
  • Custom tests are the thoroughness layer — and they cost more to maintain. They're more thorough (authenticated APIs, multi-app flows, version-specific browser selectors, helper scripts, state-management) and therefore take more maintenance: an SSO provider's admin API changes, a recipe's app-launch URL contract shifts between versions, a Socket.IO primitive needs to track upstream — these are real ongoing costs that the generic tier deliberately doesn't carry.
  • A future maintainer can choose to focus on the generic tier alone and still get meaningful signal: every enrolled recipe gets some CI coverage from the generic floor, and the custom-additive layer can be scaled down or paused without breaking that floor. The choice of how much per-recipe depth to maintain is open to whoever owns cc-ci later — generic-only is a valid permanent operating mode.

If anything in this codebase ever asks you to make generic depend on custom (or to put a custom precondition before a generic tier), that's the signal it's drifted off the invariant — push back and restore the separation.

The model: tiers against one shared deployment

A run is a sequence of tiers. The orchestrator (runner/run_recipe_ci.py) deploys the app once and runs each tier against that single live deployment, then tears it down once in a finally. The orchestrator owns each mutating op (upgrade/backup/restore) and runs it exactly once; the assertion files (generic and overlay) evaluate the post-op state and never perform the op themselves. Asserted every run: deploy-count = 1 (one abra app new).

deploy ONCE  (base version: the previous published version when an upgrade tier will run and one
              exists — so upgrade is a real previous→PR-head; else the target / current PR head)
  → INSTALL    [optional pre_install seed]   then  generic + overlay assertions   (no op)
  → UPGRADE    [optional pre_upgrade seed]   then  abra app deploy --chaos to PR-head      (op once)
                                             then  generic + overlay assertions
  → BACKUP     [optional pre_backup seed]    then  abra app backup create                  (op once)
                                             then  generic + overlay assertions    (backup-capable only)
  → RESTORE    [optional pre_restore mutate] then  abra app restore                        (op once)
                                             then  generic + overlay assertions    (backup-capable only)
  → CUSTOM     any non-lifecycle test_*.py                                          (only if defined)
teardown ONCE (in finally)

Each assertion file is its own pytest invocation, so the run reports per-operation pass / fail / skip (install / upgrade / backup / restore / custom). The shared live domain is passed in CCCI_APP_DOMAIN and exposed by the live_app fixture; all assertion tiers are assertion-only and never deploy or tear down (that is the orchestrator's job). Op results an assertion needs (pre-upgrade identity, the produced backup snapshot_id) pass op→assertion via a run-scoped JSON state file at $CCCI_OP_STATE_FILE, read by generic.op_state().

The generic default (recipe-agnostic, the floor — Phase 1e HC3)

Lives in the shared harness — runner/harness/generic.py + tests/_generic/test_<op>.py — so there is no per-recipe copy-paste:

  • install (generic.assert_serving) — services converged (the app's own replicas are N/N) and a real HTTP(S) response in HEALTH_OK (which excludes 404, so a Traefik unmatched-router fallback fails) and the body isn't Traefik's default 404 page. A bounded poll (no bare sleep) so a state-mutating op settles, while a persistent failure still fails within the timeout. A CA-verified TLS handshake also runs as an infra cert sanity check (catches a lapsed/mis-rotated wildcard); it does not distinguish app-vs-fallback (Traefik serves the wildcard zone-wide) — that's the converged + non-404 check.
  • upgrade (generic.assert_upgraded) — assert serving after the orchestrator's chaos upgrade (HC1: abra app deploy --chaos of the PR-head checkout) and that the deployment is genuinely the code under test: when the intended PR-head commit is known, the deployed coop-cloud.<stack>.chaos-version label must match it — direct, non-vacuous proof. (A stale prev-checkout chaos redeploy would stamp prev's commit, not the PR-head, and fail here.) When head_ref is unknown, falls back to a move check (version/image/chaos changed vs pre-upgrade).
  • backup (generic.assert_backup_artifact) — assert a snapshot artifact was produced (the snapshot_id captured by the orchestrator from abra app backup create). Honest limit: the generic verifies the mechanism, not app-specific data integrity (that's an overlay, below).
  • restore (generic.assert_restore_healthy) — assert the app is healthy + serving after the orchestrator's restore op (assert_serving polls so the post-restore reconverge settles).

Backup-capability is auto-detected: a recipe is backup-capable iff a compose*.yml carries a truthy backupbot.backup label (override with BACKUP_CAPABLE in recipe_meta.py). For non-backup-capable recipes the backup/restore tiers are a clean N/A skip — not a failure.

Recipe overlays — additive (the generic floor is always on by default)

Convention: a recipe-specific tier is a file named exactly test_install.py / test_upgrade.py / test_backup.py / test_restore.py. When present it runs ALONGSIDE the generic for that op (both evaluate the shared post-op state); when absent, only the generic runs. Overlays are assertion-only — they never perform the op (the orchestrator owns it).

Overlay sources, in precedence order:

repo-local  <recipe-repo>/tests/test_<op>.py     (upstream-authoritative; gated by HC2 allowlist)
  >  cc-ci  tests/<recipe>/test_<op>.py           (CI-curated overlay)
  +  generic tests/_generic/test_<op>.py          (the floor; runs alongside by default)

Only ONE overlay source wins for a given op (repo-local > cc-ci); the generic floor runs in addition unless explicitly opted out.

Custom (non-lifecycle) test_*.py — any other test_*.py (e.g. test_sso.py) is opt-in and additive: it has no generic equivalent and runs only when present, discovered from both locations (repo-local gated by the HC2 allowlist).

Pre-op seed hooks (per-recipe ops.py)

A data-continuity overlay needs to seed state before the op (write a marker, create a DB row, etc.). Since the orchestrator owns the op, overlays place their seed in an optional per-recipe tests/<recipe>/ops.py:

# tests/<recipe>/ops.py
from harness import lifecycle

def pre_upgrade(domain, meta):
    # seed a marker before the harness performs the upgrade
    lifecycle.exec_in_app(domain, ["sh", "-c", "echo upgrade-survives > /path/marker"])

def pre_backup(domain, meta):
    # establish a known "original" state before the backup op captures it
    lifecycle.exec_in_app(domain, ["sh", "-c", "echo original > /path/marker"])

def pre_restore(domain, meta):
    # diverge from the backed-up state so a successful restore is observable
    lifecycle.exec_in_app(domain, ["sh", "-c", "echo mutated > /path/marker"])

The orchestrator imports ops.py in-process (with the recipe dir on sys.path, so it can import sibling helpers like kc_admin.py) and calls pre_<op>(domain, meta) immediately before performing the op. Then test_<op>.py asserts the post-op state. See tests/custom-html/ (volume marker), tests/keycloak/ (admin-API/realm), tests/matrix-synapse/, tests/lasuite-docs/ (psql in the db service) for worked examples.

Opting out of the generic floor

The generic runs additively by default. To skip it (e.g. when an overlay's recipe-specific check fully replaces the generic's mechanism check) set, in increasing specificity:

  • env CCCI_SKIP_GENERIC=1 — skip generic for ALL ops (run-wide).
  • env CCCI_SKIP_GENERIC_<OP>=1 — e.g. CCCI_SKIP_GENERIC_UPGRADE=1 — skip generic for that one op.
  • declarative in recipe_meta.pySKIP_GENERIC = ["upgrade"] (per-op) or SKIP_GENERIC = ["all"].

Opting out is per-recipe and visible in git — not a hidden global. Truthy = 1/true/yes/on.

Repo-local trust gate (HC2) — default-deny

PR-author-controlled code (a recipe repo's own tests/test_*.py, install_steps.sh, ops.py) runs on the CI host with /run/secrets/* present — an untrusted-code risk. By default the harness runs only cc-ci-authored overlays/hooks (tests/<recipe>/...) + the generic. Repo-local code is discovered-but-not-executed unless its recipe appears in tests/repo-local-approved.txt (a checked-in, git-auditable allowlist — one recipe name per line; # comments + blank lines ignored; a lone * is NOT a wildcard). To approve a recipe a cc-ci maintainer reviews its repo-local tests and adds the recipe name in a cc-ci PR (override the allowlist location with CCCI_REPO_LOCAL_APPROVED_FILE — used by tests + cold demonstrations).

The gate is centralized in runner/harness/discovery.py (repo_local_approved / _gated) so every discovery function (resolve_overlay_op, custom_tests, install_steps, pre_op_hook) honors it identically; unit tests (tests/unit/test_discovery.py) pin the behavior (approved-vs-not for every kind of code).

Custom install-steps hook (and the graceful-generic rule)

Some recipes need setup the generic flow won't do (pre-seed content, set an env/secret, run a one-off command). Provide a shell hook — tests/<recipe>/install_steps.sh (cc-ci) or repo-local tests/install_steps.sh (repo-local wins, gated by the HC2 allowlist). The orchestrator runs it during the install tier after abra app new + env defaults, before abra app deploy, with env:

  • CCCI_APP_DOMAIN — the run's app domain
  • CCCI_RECIPE — the recipe name
  • CCCI_APP_ENV — path to the app's .env (for abra-side edits)

Graceful-generic rule: a recipe with no hook still attempts the generic install. A recipe that genuinely needs a step will fail the generic install — and that's the correct, reported outcome (per-op install: fail); the fix is to add the step, not to special-case the harness. Worked example: tests/custom-html-tiny/install_steps.sh seeds an index.html into the static server's content volume — without it the generic install fails 404, with it it passes.

The HC1 upgrade path — chaos to the PR-head code under test

Concretely, the upgrade tier:

  1. base deployment is the previous published version (a clean pinned-tag deploy).
  2. orchestrator captures head_ref (preferring $REF — the PR head sha; falls back to the recipe checkout HEAD for non-PR !testme).
  3. on the upgrade tier: re-checkout the recipe to head_ref (the prev-tag base deploy reset the working tree), capture the pre-upgrade identity, then abra app deploy --chaos redeploys the running app at that checkout — in place, NOT a new install.
  4. assert_upgraded (generic) asserts serving + that the deployed coop-cloud.<stack>.chaos-version matches head_ref — proving the PR-head code was deployed.

Reconciliation with the deploy-once guard: abra.deploy (chaos) is called directly, not through deploy_app, so _record_deploy() does not fire — deploy-count counts only abra app new installs and stays 1.

How to add a recipe overlay (zero → some coverage)

  1. The recipe is already testable with zero config — enrol it (poll list + mirror) and the generic floor runs (docs/enroll-recipe.md).
  2. To add recipe-specific coverage, drop tests/<recipe>/test_<op>.py (copy an existing one, e.g. tests/custom-html/test_upgrade.py). Assert the POST-op state — reading app state through lifecycle.exec_in_app (volume/DB) for data checks, not HTTP. Generic + your overlay both run.
  3. If the overlay needs to seed PRE-op state (data-continuity markers, the backup→restore divergence), drop tests/<recipe>/ops.py with pre_upgrade/pre_backup/pre_restore(domain, meta).
  4. If the recipe needs install-time setup, add tests/<recipe>/install_steps.sh.
  5. Set per-recipe knobs (health path, timeouts, opt-out) in recipe_meta.py.
  6. Never weaken or skip an assertion to make a run pass — a red tier is information.

Per-recipe config (tests/<recipe>/recipe_meta.py, all optional):

HEALTH_PATH = "/realms/master"   # path that returns a healthy status (default "/")
HEALTH_OK = (200,)               # acceptable status codes (default 200/301/302)
DEPLOY_TIMEOUT = 600             # seconds for services to converge (default 600)
HTTP_TIMEOUT = 600               # seconds for the app to answer (default 300)
BACKUP_CAPABLE = True            # override backup-capability auto-detection (default: scan compose)
EXTRA_ENV = {"KEY": "value"}     # or EXTRA_ENV(domain) -> dict; extra .env keys set at deploy
SKIP_GENERIC = ["upgrade"]       # per-recipe declarative opt-out from generic ops ("all" = every op)

The harness self-tests for discovery / precedence / the HC2 allowlist live in tests/unit/ (run: cc-ci-run -m pytest tests/unit); they are never picked up as overlays/custom tests.