# cc-ci Phase 1d — Generic test suite + layered recipe overlays (Autonomous Build Plan) **Status:** QUEUED — runs **after Phase 1b** (`plan-phase1b-review-lint.md`) and **before Phase 2** (`plan-phase2-recipe-tests.md`). It is the **test-architecture foundation** Phase 2 builds on, so it must precede it. **Transition:** **manual** (operator kicks it off at the post-1b check-in). **Builds on:** the post-1b codebase (the runner/harness, `.drone.yml`, the comment-bridge, the proof recipes, the `nix/` + `machine-docs/` layout from 1b RL5/RL6). **Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies. **This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md` **Phase order:** 1c → 1b → **1d** → 2 → 2b → 3. --- ## 0. Why this phase Today a recipe only gets tested if someone has authored tests for it. That doesn't scale to ~18+ recipes and gives nothing for a brand-new recipe. The operator's model (2026-05-27): **every recipe gets a generic lifecycle test suite for free**, and recipe-specific tests *layer on top* rather than being the only thing that runs. This makes `!testme` meaningful on **any** recipe immediately, turns Phase 2 into "add overlays where they add value" instead of "write everything from scratch," and gives Phase 3 a natural basis for a YunoHost-style **level** (how many tiers pass). Core principle: **the generic is the default for each lifecycle op; a recipe's own `test_.py` may override *or* extend that default.** The exact additive-vs-override mechanism is the **Builder's design call** (operator, 2026-05-27: override — e.g. a present `test_install.py` *replaces* the generic install assertions — is a perfectly good option; additive is fine too; could even be per-recipe opt-in). What's **fixed** and non-negotiable: (1) when a recipe defines **no** test for an op, **the generic runs** (so any recipe is testable with zero config); (2) the harness owns a **single shared deployment** that generic + recipe assertions reuse — **no redeploy** (§2.2); (3) custom (non-lifecycle) tests are opt-in. > **SUPERSEDED (2026-05-28) by Phase 1e HC3** (`plan-phase1e-harness-corrections.md`): the override > default is replaced — the **generic now runs by default *alongside* an overlay (additive)**, > skipped only via an explicit opt-out. Also: repo-local PR code is gated to approved recipes (HC2), > and the upgrade tier targets the PR head (HC1). Read 1e for the current behavior. --- ## 1. Definition of Done (Phase 1d exit condition) Terminates when every item holds **and the Adversary has independently cold-verified** (logged in `machine-docs/REVIEW-1d.md`): - [ ] **DG1 — Generic INSTALL test.** A recipe-agnostic install test exists that, given only a recipe name and **zero** recipe-specific config, does `abra app new` (sane defaults / auto secrets) → `deploy` → polls to converged (all services running/healthy, no bare `sleep`) → asserts the app is **actually serving** (real HTTP(S) response on its `.ci.commoninternet.net` domain through Traefik — **not** a Traefik 404/default cert, not health-only). Demonstrated **green** on ≥1 simple recipe (e.g. `custom-html`) that has **no** cc-ci or repo-local tests. - [ ] **DG2 — Generic UPGRADE test.** Deploy the **previous published version** (the last release *before* the code under test), then **upgrade to the code under test (PR head) via `abra app deploy --chaos`** (chaos = the current checkout) — i.e. previous-release → PR-head, not previous → newest-published-tag. Assert services reconverge and the app still serves. **OPERATOR CORRECTION (2026-05-28):** the current 1d impl upgrades to the newest *published tag* and (because deploying the prev tag re-checks-out the recipe) never deploys the PR head — so a recipe PR's actual changes aren't exercised by the upgrade path. Fix: after deploying prev, **restore the PR-head checkout** (re-checkout the PR ref / re-snapshot) and `deploy --chaos` to it as the upgrade target. The "deployment actually moved" assertion (`do_upgrade`) still applies, but adapt it for prev→PR-head (a PR may not bump the version label — also accept an image/config change, or assert the running config now matches the PR head). For a non-PR `!testme`, "current checkout" = the catalogue current, so upgrade tests prev→current. (Data- continuity assertions remain recipe overlays — see §2.1.) - [ ] **DG3 — Generic BACKUP + RESTORE tests.** For backup-capable recipes (declare backupbot / `abra app backup` support): run backup → assert a snapshot artifact is produced; then restore → assert restore completes and the app is healthy after. For recipes that declare **no** backup config, backup/restore are cleanly **N/A (skipped)** — *not* a failure. - [ ] **DG4 — Layering (override-or-extend; generic is the default).** The harness composes a recipe's run from the generic default per op, **overridden or extended** by the recipe's `test_install.py` / `test_upgrade.py` / `test_backup.py` / `test_restore.py` when present (in cc-ci's tests dir **or** repo-local in the recipe's `tests/`) — the additive-vs-override mechanism is the Builder's design choice, recorded in `machine-docs/DECISIONS.md`. **Invariant: if a recipe defines no test for an op, the generic runs.** Arbitrary **custom** `test_*.py` run **only if defined**. Discovery + cc-ci-vs-repo-local precedence is implemented and settled in `machine-docs/DECISIONS.md`. - [ ] **DG4.1 — Overlays reuse the deployment (no redeploy).** Overlay + custom assertions run against the **same live deployment** the generic tier brought up — **one deploy per run, one teardown** (§2.2), lifecycle ops mutating it in place. Verified: adding an overlay to a recipe causes **no** extra `abra app new/deploy/undeploy` beyond the shared run (assert via deploy-count / harness logs). Re-provisioning only where an op semantically demands it, and then explicitly. - [ ] **DG5 — Custom install-steps hook (with the "generic-anyway" rule).** The harness supports **defined** per-recipe extra install steps (cc-ci or repo-local) that run before/around the generic install. A recipe with **no** customization still **attempts the generic suite**. Proven both ways: (a) a recipe needing a custom step **fails the generic install as expected** when the step is absent; (b) the **same** recipe **passes** once the custom step is added — demonstrating the hook + the graceful-generic-failure are both real. - [ ] **DG6 — `!testme` end-to-end on an unconfigured recipe.** A `!testme` on a recipe with no cc-ci/repo-local customization runs the full generic suite through the real pipeline (bridge → Drone → deploy → assert → undeploy → report) and reports **per-operation** pass/fail/skip (install/upgrade/backup/restore). - [ ] **DG7 — Real, DRY, clean.** No softened/`skip`/`xfail`/can't-fail assertions; generic logic lives in the **shared harness** (M6.5 — no per-recipe copy-paste); every run **undeploys** (teardown in `finally`), respects `MAX_TESTS`. - [ ] **DG8 — Documented + cold-verified.** `docs/` explains the generic suite, the overlay convention (file names + locations + precedence), and the custom-install-steps hook + how to add a recipe overlay. The Adversary re-runs the acceptance checks from a cold start within 24h. When DG1–DG8 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1d.md`. --- ## 2. The layered test model (the core design) For a given recipe, the harness assembles and runs **tiers**, each = `generic [+ overlays]`: (`gen` = generic default; `→` = "overridden/extended by, if the recipe defines it" — mechanism is the Builder's call, §2.2): ``` INSTALL = gen_install → test_install.py (else gen) ← always runs UPGRADE = gen_upgrade → test_upgrade.py (else gen) ← always runs BACKUP = gen_backup → test_backup.py (else gen) ← if backup-capable, else N/A RESTORE = gen_restore → test_restore.py (else gen) ← if backup-capable, else N/A CUSTOM = recipe test_*.py (anything beyond the four) ← ONLY if defined ``` ### 2.1 Generic baseline suite (recipe-agnostic) - **install** — `abra app new --domain .ci.commoninternet.net` with non-interactive defaults + auto-generated secrets; `abra app deploy`; poll to converged; assert a real HTTP(S) response from the app over its domain (status + that it's the app, not Traefik's fallback). - **upgrade** — deploy a prior/pinned version, `abra app upgrade` to target, assert reconverge + still serving. - **backup / restore** — only for recipes declaring backup config; verify the **mechanism** (backup produces an artifact; restore completes + app healthy). **Honest limit:** generic backup/restore can't assert app-specific *data integrity* without recipe knowledge — that's a recipe overlay (`test_backup.py`/`test_restore.py` seed a marker + assert it survives). State this in docs. ### 2.2 Layering — override or extend (Builder's call), always **reuse the deployment** A recipe-defined `test_install.py` (etc.) either **overrides** the generic assertions for that op or **extends** them — the Builder picks the mechanism (a present `test_.py` replacing the generic is a fine, simple model; or additive; or per-recipe opt-in). The invariant either way: **if a recipe defines nothing for an op, the generic runs.** This guarantees every recipe is testable with zero config, while letting a recipe with a poor generic fit supply its own. **Reuse the deployment — do NOT redeploy per test (operator requirement, 2026-05-27).** Overlay assertions run against the **same live deployment** the generic tier already brought up — no extra `abra app new`/`deploy`/`undeploy` per overlay. The target shape: **one deploy per run**, then the lifecycle ops mutate that single deployment in sequence and *all* assertions (generic + overlays + custom) run against it, then **one teardown** at the end: ``` deploy ONCE → INSTALL assertions: generic_install + test_install.py (same live app) → UPGRADE in place (abra app upgrade) assertions: generic_upgrade + test_upgrade.py (same app, upgraded) → BACKUP (if capable) → generic_backup + test_backup.py → RESTORE (if capable) → generic_restore + test_restore.py → CUSTOM test_*.py (same live app) teardown ONCE (in finally) ``` So a seed/marker written by `test_backup.py` is the same data `test_restore.py` checks; an overlay's extra HTTP assertion hits the app the generic install already deployed. Tiers that intentionally change state (UPGRADE, RESTORE) do so on the shared deployment in order. The only time a tier re-provisions is when an op semantically requires it (e.g. a from-scratch restore-into-blank variant) — and that must be explicit, not the default. This is also the main Phase-2b speed win. ### 2.3 Custom tests `test_*.py` that aren't one of the four lifecycle names run **only when present** for that recipe — no generic equivalent, purely opt-in (e.g. `test_sso.py`, `test_federation.py`). ### 2.4 Custom install steps (and the graceful-generic rule) Some recipes need extra setup the generic flow won't do (pre-seed a DB, set an env/secret, run a one-off command). The harness exposes a **defined** per-recipe install-steps hook (cc-ci or repo-local). Rules: - If a recipe **declares** custom install steps → run them as part of the install tier. - If a recipe has **no** customization defined anywhere → **still attempt the generic suite.** Recipes that genuinely need special steps will **fail the generic install — and that's acceptable and expected**; the failure is reported (per-op), and the fix is to add the custom step (Phase 2 work), not to special-case the harness. ### 2.5 Discovery + precedence - **Locations:** cc-ci's test dir (e.g. `tests//`) and the recipe repo's `tests/` (repo-local). The harness discovers overlays + custom-install-steps from both. - **Precedence (OPEN — settle in DECISIONS):** proposed default — **both layer**; repo-local is the upstream-authoritative source and always runs, cc-ci's overlay runs in addition (and may pin/extend for the CI env). Define the rule for same-named collisions explicitly. --- ## 3. Milestones (bounded) - **G0 — Generic install.** Implement generic_install in the shared harness; green on `custom-html` with no recipe config. *Accept:* DG1. - **G1 — Generic upgrade + backup/restore.** Add generic_upgrade; add generic_backup/restore gated on backup-capability (clean N/A otherwise). *Accept:* DG2, DG3. - **G2 — Layering + discovery.** Implement the generic+overlay composition and cc-ci/repo-local discovery + precedence; prove an overlay runs on top of generic. *Accept:* DG4. - **G3 — Custom install-steps hook + graceful-generic.** Implement the hook; demonstrate the fail-without / pass-with proof on one recipe needing a step. *Accept:* DG5. - **G4 — `!testme` integration + per-op reporting + docs + cold verify.** *Accept:* DG6, DG7, DG8; then flip `machine-docs/STATUS-1d.md` to `## DONE`. --- ## 4. Guardrails - **Generic is the default; recipe tests override or extend it** (Builder's mechanism) — but the generic **always** runs when a recipe defines no test for that op. Never let a recipe end up with *no* assertion for an op it should be tested on. - **Generic failure ≠ harness bug** — a recipe needing custom steps failing the generic install is a correct, reported outcome; fix by adding the step (Phase 2), don't weaken/special-case the generic. - **Deploy once, reuse it** — overlays run against the generic tier's live deployment; no per-test/per-overlay redeploy. One deploy + one teardown per run; re-provision only when an op truly needs it, explicitly. (Correctness *and* the main perf lever.) - **DRY** — generic logic in the shared harness, not per-recipe (M6.5); overlays are thin. - **No weakened tests** — real assertions on real app state; teardown always; honor `MAX_TESTS`. - **Bounded** — build the architecture + prove it on a couple of recipes; the full per-recipe overlay authoring is Phase 2, not here. --- ## 5. Impact on later phases (reshapes the plan set) - **Phase 2 (`plan-phase2-recipe-tests.md`)** changes from "author every test from scratch" to: every enrolled recipe gets the generic suite for free; Phase 2 = **author the additive overlays** (port recipe-maintainer tests as `test_*.py` overlays) **+ define custom install steps** where a recipe fails generically. Update Phase 2 to reference 1d as its foundation. - **Phase 3 (`plan-phase3-results-ux.md`)** — the YunoHost-style **level** maps cleanly onto the tiers: e.g. installs (generic) → +upgrade → +backup/restore → +custom assertions. The level is derived from which tiers pass. - **Phase 2b (perf)** — the generic suite is the common hot path, so it's the prime target for the image-cache / readiness / dedup optimizations. --- ## 6. Open decisions (log in machine-docs/DECISIONS.md) - **Override vs extend (Builder's call).** Does a present `test_.py` **replace** the generic assertions for that op, **add** to them, or is it **per-recipe opt-in**? Operator leans: override is a good, simple model. Builder decides + documents — keeping the invariant "no recipe test for an op ⇒ generic runs" and the single-shared-deployment rule. - **cc-ci vs repo-local precedence** for overlays + install-steps (§2.5) and same-name collision rule. - Exact **overlay file convention**: fixed names (`test_install.py`…) + discovery dir layout (`tests//` in cc-ci; `tests/` in the recipe repo). - How **custom install steps** are declared: a shell hook (`install_steps.sh`), a pytest fixture, or a declarative field — pick the simplest that the harness can run uniformly. - **Backup-capability detection**: how the harness decides a recipe is backup-capable (backupbot labels present / `abra app backup` exit) to choose run-vs-N/A for DG3. - Whether generic **upgrade** should always go previous→latest, or test the specific version-bump under `!testme` (PR-driven). - Per-op result vocabulary (`pass`/`fail`/`skip(N/A)`/`error`) feeding the Phase-3 level. - **Deployment-sharing scope**: confirm one-deploy-per-run for the whole lifecycle (install→upgrade→ backup→restore→custom on a single deployment) vs per-tier deployments; and how a failed earlier tier (e.g. install) affects later tiers sharing that deployment (fail-fast vs continue-and-report).