cc-ci-orchestrator/cc-ci-plan/plan-phase1d-generic-test-suite.md

# cc-ci Phase 1d — Generic test suite + layered recipe overlays (Autonomous Build Plan)

**Status:** QUEUED — runs **after Phase 1b** (`plan-phase1b-review-lint.md`) and **before Phase 2**
(`plan-phase2-recipe-tests.md`). It is the **test-architecture foundation** Phase 2 builds on, so it
must precede it.
**Transition:** **manual** (operator kicks it off at the post-1b check-in).
**Builds on:** the post-1b codebase (the runner/harness, `.drone.yml`, the comment-bridge, the proof
recipes, the `nix/` + `machine-docs/` layout from 1b RL5/RL6).
**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md`
**Phase order:** 1c → 1b → **1d** → 2 → 2b → 3.

---

## 0. Why this phase

Today a recipe only gets tested if someone has authored tests for it. That doesn't scale to ~18+
recipes and gives nothing for a brand-new recipe. The operator's model (2026-05-27): **every recipe
gets a generic lifecycle test suite for free**, and recipe-specific tests *layer on top* rather than
being the only thing that runs. This makes `!testme` meaningful on **any** recipe immediately, turns
Phase 2 into "add overlays where they add value" instead of "write everything from scratch," and
gives Phase 3 a natural basis for a YunoHost-style **level** (how many tiers pass).

Core principle: **the generic is the default for each lifecycle op; a recipe's own `test_<op>.py`
may override *or* extend that default.** The exact additive-vs-override mechanism is the **Builder's
design call** (operator, 2026-05-27: override — e.g. a present `test_install.py` *replaces* the
generic install assertions — is a perfectly good option; additive is fine too; could even be
per-recipe opt-in). What's **fixed** and non-negotiable: (1) when a recipe defines **no** test for an
op, **the generic runs** (so any recipe is testable with zero config); (2) the harness owns a
**single shared deployment** that generic + recipe assertions reuse — **no redeploy** (§2.2); (3)
custom (non-lifecycle) tests are opt-in.

> **SUPERSEDED (2026-05-28) by Phase 1e HC3** (`plan-phase1e-harness-corrections.md`): the override
> default is replaced — the **generic now runs by default *alongside* an overlay (additive)**,
> skipped only via an explicit opt-out. Also: repo-local PR code is gated to approved recipes (HC2),
> and the upgrade tier targets the PR head (HC1). Read 1e for the current behavior.

---

## 1. Definition of Done (Phase 1d exit condition)

Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
`machine-docs/REVIEW-1d.md`):

- [ ] **DG1 — Generic INSTALL test.** A recipe-agnostic install test exists that, given only a
      recipe name and **zero** recipe-specific config, does `abra app new` (sane defaults / auto
      secrets) → `deploy` → polls to converged (all services running/healthy, no bare `sleep`) →
      asserts the app is **actually serving** (real HTTP(S) response on its
      `<run>.ci.commoninternet.net` domain through Traefik — **not** a Traefik 404/default cert, not
      health-only). Demonstrated **green** on ≥1 simple recipe (e.g. `custom-html`) that has **no**
      cc-ci or repo-local tests.
- [ ] **DG2 — Generic UPGRADE test.** Deploy the **previous published version** (the last release
      *before* the code under test), then **upgrade to the code under test (PR head) via
      `abra app deploy --chaos`** (chaos = the current checkout) — i.e. previous-release → PR-head,
      not previous → newest-published-tag. Assert services reconverge and the app still serves.
      **OPERATOR CORRECTION (2026-05-28):** the current 1d impl upgrades to the newest *published tag*
      and (because deploying the prev tag re-checks-out the recipe) never deploys the PR head — so a
      recipe PR's actual changes aren't exercised by the upgrade path. Fix: after deploying prev,
      **restore the PR-head checkout** (re-checkout the PR ref / re-snapshot) and `deploy --chaos` to
      it as the upgrade target. The "deployment actually moved" assertion (`do_upgrade`) still
      applies, but adapt it for prev→PR-head (a PR may not bump the version label — also accept an
      image/config change, or assert the running config now matches the PR head). For a non-PR
      `!testme`, "current checkout" = the catalogue current, so upgrade tests prev→current. (Data-
      continuity assertions remain recipe overlays — see §2.1.)
- [ ] **DG3 — Generic BACKUP + RESTORE tests.** For backup-capable recipes (declare backupbot /
      `abra app backup` support): run backup → assert a snapshot artifact is produced; then restore →
      assert restore completes and the app is healthy after. For recipes that declare **no** backup
      config, backup/restore are cleanly **N/A (skipped)** — *not* a failure.
- [ ] **DG4 — Layering (override-or-extend; generic is the default).** The harness composes a
      recipe's run from the generic default per op, **overridden or extended** by the recipe's
      `test_install.py` / `test_upgrade.py` / `test_backup.py` / `test_restore.py` when present (in
      cc-ci's tests dir **or** repo-local in the recipe's `tests/`) — the additive-vs-override
      mechanism is the Builder's design choice, recorded in `machine-docs/DECISIONS.md`. **Invariant:
      if a recipe defines no test for an op, the generic runs.** Arbitrary **custom** `test_*.py` run
      **only if defined**. Discovery + cc-ci-vs-repo-local precedence is
      implemented and settled in `machine-docs/DECISIONS.md`.
- [ ] **DG4.1 — Overlays reuse the deployment (no redeploy).** Overlay + custom assertions run
      against the **same live deployment** the generic tier brought up — **one deploy per run, one
      teardown** (§2.2), lifecycle ops mutating it in place. Verified: adding an overlay to a recipe
      causes **no** extra `abra app new/deploy/undeploy` beyond the shared run (assert via
      deploy-count / harness logs). Re-provisioning only where an op semantically demands it, and
      then explicitly.
- [ ] **DG5 — Custom install-steps hook (with the "generic-anyway" rule).** The harness supports
      **defined** per-recipe extra install steps (cc-ci or repo-local) that run before/around the
      generic install. A recipe with **no** customization still **attempts the generic suite**.
      Proven both ways: (a) a recipe needing a custom step **fails the generic install as expected**
      when the step is absent; (b) the **same** recipe **passes** once the custom step is added —
      demonstrating the hook + the graceful-generic-failure are both real.
- [ ] **DG6 — `!testme` end-to-end on an unconfigured recipe.** A `!testme` on a recipe with no
      cc-ci/repo-local customization runs the full generic suite through the real pipeline
      (bridge → Drone → deploy → assert → undeploy → report) and reports **per-operation**
      pass/fail/skip (install/upgrade/backup/restore).
- [ ] **DG7 — Real, DRY, clean.** No softened/`skip`/`xfail`/can't-fail assertions; generic logic
      lives in the **shared harness** (M6.5 — no per-recipe copy-paste); every run **undeploys**
      (teardown in `finally`), respects `MAX_TESTS`.
- [ ] **DG8 — Documented + cold-verified.** `docs/` explains the generic suite, the overlay
      convention (file names + locations + precedence), and the custom-install-steps hook + how to
      add a recipe overlay. The Adversary re-runs the acceptance checks from a cold start within 24h.

When DG1–DG8 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1d.md`.

---

## 2. The layered test model (the core design)

For a given recipe, the harness assembles and runs **tiers**, each = `generic [+ overlays]`:

(`gen` = generic default; `→` = "overridden/extended by, if the recipe defines it" — mechanism is
the Builder's call, §2.2):
```
INSTALL   = gen_install   → test_install.py    (else gen)    ← always runs
UPGRADE   = gen_upgrade    → test_upgrade.py    (else gen)    ← always runs
BACKUP    = gen_backup     → test_backup.py     (else gen)    ← if backup-capable, else N/A
RESTORE   = gen_restore    → test_restore.py    (else gen)    ← if backup-capable, else N/A
CUSTOM    = recipe test_*.py   (anything beyond the four)     ← ONLY if defined
```

### 2.1 Generic baseline suite (recipe-agnostic)
- **install** — `abra app new <recipe> --domain <run>.ci.commoninternet.net` with non-interactive
  defaults + auto-generated secrets; `abra app deploy`; poll to converged; assert a real HTTP(S)
  response from the app over its domain (status + that it's the app, not Traefik's fallback).
- **upgrade** — deploy a prior/pinned version, `abra app upgrade` to target, assert reconverge +
  still serving.
- **backup / restore** — only for recipes declaring backup config; verify the **mechanism** (backup
  produces an artifact; restore completes + app healthy). **Honest limit:** generic backup/restore
  can't assert app-specific *data integrity* without recipe knowledge — that's a recipe overlay
  (`test_backup.py`/`test_restore.py` seed a marker + assert it survives). State this in docs.

### 2.2 Layering — override or extend (Builder's call), always **reuse the deployment**
A recipe-defined `test_install.py` (etc.) either **overrides** the generic assertions for that op or
**extends** them — the Builder picks the mechanism (a present `test_<op>.py` replacing the generic is
a fine, simple model; or additive; or per-recipe opt-in). The invariant either way: **if a recipe
defines nothing for an op, the generic runs.** This guarantees every recipe is testable with zero
config, while letting a recipe with a poor generic fit supply its own.

**Reuse the deployment — do NOT redeploy per test (operator requirement, 2026-05-27).** Overlay
assertions run against the **same live deployment** the generic tier already brought up — no extra
`abra app new`/`deploy`/`undeploy` per overlay. The target shape: **one deploy per run**, then the
lifecycle ops mutate that single deployment in sequence and *all* assertions (generic + overlays +
custom) run against it, then **one teardown** at the end:

```
deploy ONCE
  → INSTALL  assertions: generic_install  + test_install.py        (same live app)
  → UPGRADE  in place (abra app upgrade)
       assertions: generic_upgrade + test_upgrade.py               (same app, upgraded)
  → BACKUP   (if capable) → generic_backup + test_backup.py
  → RESTORE  (if capable) → generic_restore + test_restore.py
  → CUSTOM   test_*.py                                             (same live app)
teardown ONCE (in finally)
```

So a seed/marker written by `test_backup.py` is the same data `test_restore.py` checks; an overlay's
extra HTTP assertion hits the app the generic install already deployed. Tiers that intentionally
change state (UPGRADE, RESTORE) do so on the shared deployment in order. The only time a tier
re-provisions is when an op semantically requires it (e.g. a from-scratch restore-into-blank variant)
— and that must be explicit, not the default. This is also the main Phase-2b speed win.

### 2.3 Custom tests
`test_*.py` that aren't one of the four lifecycle names run **only when present** for that recipe —
no generic equivalent, purely opt-in (e.g. `test_sso.py`, `test_federation.py`).

### 2.4 Custom install steps (and the graceful-generic rule)
Some recipes need extra setup the generic flow won't do (pre-seed a DB, set an env/secret, run a
one-off command). The harness exposes a **defined** per-recipe install-steps hook (cc-ci or
repo-local). Rules:
- If a recipe **declares** custom install steps → run them as part of the install tier.
- If a recipe has **no** customization defined anywhere → **still attempt the generic suite.**
  Recipes that genuinely need special steps will **fail the generic install — and that's acceptable
  and expected**; the failure is reported (per-op), and the fix is to add the custom step (Phase 2
  work), not to special-case the harness.

### 2.5 Discovery + precedence
- **Locations:** cc-ci's test dir (e.g. `tests/<recipe>/`) and the recipe repo's `tests/`
  (repo-local). The harness discovers overlays + custom-install-steps from both.
- **Precedence (OPEN — settle in DECISIONS):** proposed default — **both layer**; repo-local is the
  upstream-authoritative source and always runs, cc-ci's overlay runs in addition (and may pin/extend
  for the CI env). Define the rule for same-named collisions explicitly.

---

## 3. Milestones (bounded)

- **G0 — Generic install.** Implement generic_install in the shared harness; green on `custom-html`
  with no recipe config. *Accept:* DG1.
- **G1 — Generic upgrade + backup/restore.** Add generic_upgrade; add generic_backup/restore gated
  on backup-capability (clean N/A otherwise). *Accept:* DG2, DG3.
- **G2 — Layering + discovery.** Implement the generic+overlay composition and cc-ci/repo-local
  discovery + precedence; prove an overlay runs on top of generic. *Accept:* DG4.
- **G3 — Custom install-steps hook + graceful-generic.** Implement the hook; demonstrate the
  fail-without / pass-with proof on one recipe needing a step. *Accept:* DG5.
- **G4 — `!testme` integration + per-op reporting + docs + cold verify.** *Accept:* DG6, DG7, DG8;
  then flip `machine-docs/STATUS-1d.md` to `## DONE`.

---

## 4. Guardrails
- **Generic is the default; recipe tests override or extend it** (Builder's mechanism) — but the
  generic **always** runs when a recipe defines no test for that op. Never let a recipe end up with
  *no* assertion for an op it should be tested on.
- **Generic failure ≠ harness bug** — a recipe needing custom steps failing the generic install is a
  correct, reported outcome; fix by adding the step (Phase 2), don't weaken/special-case the generic.
- **Deploy once, reuse it** — overlays run against the generic tier's live deployment; no
  per-test/per-overlay redeploy. One deploy + one teardown per run; re-provision only when an op
  truly needs it, explicitly. (Correctness *and* the main perf lever.)
- **DRY** — generic logic in the shared harness, not per-recipe (M6.5); overlays are thin.
- **No weakened tests** — real assertions on real app state; teardown always; honor `MAX_TESTS`.
- **Bounded** — build the architecture + prove it on a couple of recipes; the full per-recipe overlay
  authoring is Phase 2, not here.

---

## 5. Impact on later phases (reshapes the plan set)
- **Phase 2 (`plan-phase2-recipe-tests.md`)** changes from "author every test from scratch" to:
  every enrolled recipe gets the generic suite for free; Phase 2 = **author the additive overlays**
  (port recipe-maintainer tests as `test_*.py` overlays) **+ define custom install steps** where a
  recipe fails generically. Update Phase 2 to reference 1d as its foundation.
- **Phase 3 (`plan-phase3-results-ux.md`)** — the YunoHost-style **level** maps cleanly onto the
  tiers: e.g. installs (generic) → +upgrade → +backup/restore → +custom assertions. The level is
  derived from which tiers pass.
- **Phase 2b (perf)** — the generic suite is the common hot path, so it's the prime target for the
  image-cache / readiness / dedup optimizations.

---

## 6. Open decisions (log in machine-docs/DECISIONS.md)
- **Override vs extend (Builder's call).** Does a present `test_<op>.py` **replace** the generic
  assertions for that op, **add** to them, or is it **per-recipe opt-in**? Operator leans: override
  is a good, simple model. Builder decides + documents — keeping the invariant "no recipe test for an
  op ⇒ generic runs" and the single-shared-deployment rule.
- **cc-ci vs repo-local precedence** for overlays + install-steps (§2.5) and same-name collision rule.
- Exact **overlay file convention**: fixed names (`test_install.py`…) + discovery dir layout
  (`tests/<recipe>/` in cc-ci; `tests/` in the recipe repo).
- How **custom install steps** are declared: a shell hook (`install_steps.sh`), a pytest fixture, or
  a declarative field — pick the simplest that the harness can run uniformly.
- **Backup-capability detection**: how the harness decides a recipe is backup-capable (backupbot
  labels present / `abra app backup` exit) to choose run-vs-N/A for DG3.
- Whether generic **upgrade** should always go previous→latest, or test the specific
  version-bump under `!testme` (PR-driven).
- Per-op result vocabulary (`pass`/`fail`/`skip(N/A)`/`error`) feeding the Phase-3 level.
- **Deployment-sharing scope**: confirm one-deploy-per-run for the whole lifecycle (install→upgrade→
  backup→restore→custom on a single deployment) vs per-tier deployments; and how a failed earlier
  tier (e.g. install) affects later tiers sharing that deployment (fail-fast vs continue-and-report).