Files
cc-ci-orchestrator/cc-ci-plan/plan-phase1d-generic-test-suite.md
autonomic-bot 36a6c9872a orchestrator: reboot-resilience + session auto-resume + full session plan/tooling
Reboot survival for the Pi orchestrator host:
- systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot
  records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the
  orchestrator session.
- reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count).
- launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed
  orchestrator announces itself (PushNotification) + reports reboots.
- AGENTS.md: on-startup notify routine documented.

Plans/tooling accumulated this session:
- plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review),
  sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance.
- launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution,
  limit-stall re-nudge, INBOX side-channel detection.
- plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED.
- prompts: isolation discipline + INBOX + pacing.
- .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 20:28:10 +01:00

244 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci Phase 1d — Generic test suite + layered recipe overlays (Autonomous Build Plan)
**Status:** QUEUED — runs **after Phase 1b** (`plan-phase1b-review-lint.md`) and **before Phase 2**
(`plan-phase2-recipe-tests.md`). It is the **test-architecture foundation** Phase 2 builds on, so it
must precede it.
**Transition:** **manual** (operator kicks it off at the post-1b check-in).
**Builds on:** the post-1b codebase (the runner/harness, `.drone.yml`, the comment-bridge, the proof
recipes, the `nix/` + `machine-docs/` layout from 1b RL5/RL6).
**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md`
**Phase order:** 1c → 1b → **1d** → 2 → 2b → 3.
---
## 0. Why this phase
Today a recipe only gets tested if someone has authored tests for it. That doesn't scale to ~18+
recipes and gives nothing for a brand-new recipe. The operator's model (2026-05-27): **every recipe
gets a generic lifecycle test suite for free**, and recipe-specific tests *layer on top* rather than
being the only thing that runs. This makes `!testme` meaningful on **any** recipe immediately, turns
Phase 2 into "add overlays where they add value" instead of "write everything from scratch," and
gives Phase 3 a natural basis for a YunoHost-style **level** (how many tiers pass).
Core principle: **the generic is the default for each lifecycle op; a recipe's own `test_<op>.py`
may override *or* extend that default.** The exact additive-vs-override mechanism is the **Builder's
design call** (operator, 2026-05-27: override — e.g. a present `test_install.py` *replaces* the
generic install assertions — is a perfectly good option; additive is fine too; could even be
per-recipe opt-in). What's **fixed** and non-negotiable: (1) when a recipe defines **no** test for an
op, **the generic runs** (so any recipe is testable with zero config); (2) the harness owns a
**single shared deployment** that generic + recipe assertions reuse — **no redeploy** (§2.2); (3)
custom (non-lifecycle) tests are opt-in.
> **SUPERSEDED (2026-05-28) by Phase 1e HC3** (`plan-phase1e-harness-corrections.md`): the override
> default is replaced — the **generic now runs by default *alongside* an overlay (additive)**,
> skipped only via an explicit opt-out. Also: repo-local PR code is gated to approved recipes (HC2),
> and the upgrade tier targets the PR head (HC1). Read 1e for the current behavior.
---
## 1. Definition of Done (Phase 1d exit condition)
Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
`machine-docs/REVIEW-1d.md`):
- [ ] **DG1 — Generic INSTALL test.** A recipe-agnostic install test exists that, given only a
recipe name and **zero** recipe-specific config, does `abra app new` (sane defaults / auto
secrets) → `deploy` → polls to converged (all services running/healthy, no bare `sleep`) →
asserts the app is **actually serving** (real HTTP(S) response on its
`<run>.ci.commoninternet.net` domain through Traefik — **not** a Traefik 404/default cert, not
health-only). Demonstrated **green** on ≥1 simple recipe (e.g. `custom-html`) that has **no**
cc-ci or repo-local tests.
- [ ] **DG2 — Generic UPGRADE test.** Deploy the **previous published version** (the last release
*before* the code under test), then **upgrade to the code under test (PR head) via
`abra app deploy --chaos`** (chaos = the current checkout) — i.e. previous-release → PR-head,
not previous → newest-published-tag. Assert services reconverge and the app still serves.
**OPERATOR CORRECTION (2026-05-28):** the current 1d impl upgrades to the newest *published tag*
and (because deploying the prev tag re-checks-out the recipe) never deploys the PR head — so a
recipe PR's actual changes aren't exercised by the upgrade path. Fix: after deploying prev,
**restore the PR-head checkout** (re-checkout the PR ref / re-snapshot) and `deploy --chaos` to
it as the upgrade target. The "deployment actually moved" assertion (`do_upgrade`) still
applies, but adapt it for prev→PR-head (a PR may not bump the version label — also accept an
image/config change, or assert the running config now matches the PR head). For a non-PR
`!testme`, "current checkout" = the catalogue current, so upgrade tests prev→current. (Data-
continuity assertions remain recipe overlays — see §2.1.)
- [ ] **DG3 — Generic BACKUP + RESTORE tests.** For backup-capable recipes (declare backupbot /
`abra app backup` support): run backup → assert a snapshot artifact is produced; then restore →
assert restore completes and the app is healthy after. For recipes that declare **no** backup
config, backup/restore are cleanly **N/A (skipped)***not* a failure.
- [ ] **DG4 — Layering (override-or-extend; generic is the default).** The harness composes a
recipe's run from the generic default per op, **overridden or extended** by the recipe's
`test_install.py` / `test_upgrade.py` / `test_backup.py` / `test_restore.py` when present (in
cc-ci's tests dir **or** repo-local in the recipe's `tests/`) — the additive-vs-override
mechanism is the Builder's design choice, recorded in `machine-docs/DECISIONS.md`. **Invariant:
if a recipe defines no test for an op, the generic runs.** Arbitrary **custom** `test_*.py` run
**only if defined**. Discovery + cc-ci-vs-repo-local precedence is
implemented and settled in `machine-docs/DECISIONS.md`.
- [ ] **DG4.1 — Overlays reuse the deployment (no redeploy).** Overlay + custom assertions run
against the **same live deployment** the generic tier brought up — **one deploy per run, one
teardown** (§2.2), lifecycle ops mutating it in place. Verified: adding an overlay to a recipe
causes **no** extra `abra app new/deploy/undeploy` beyond the shared run (assert via
deploy-count / harness logs). Re-provisioning only where an op semantically demands it, and
then explicitly.
- [ ] **DG5 — Custom install-steps hook (with the "generic-anyway" rule).** The harness supports
**defined** per-recipe extra install steps (cc-ci or repo-local) that run before/around the
generic install. A recipe with **no** customization still **attempts the generic suite**.
Proven both ways: (a) a recipe needing a custom step **fails the generic install as expected**
when the step is absent; (b) the **same** recipe **passes** once the custom step is added —
demonstrating the hook + the graceful-generic-failure are both real.
- [ ] **DG6 — `!testme` end-to-end on an unconfigured recipe.** A `!testme` on a recipe with no
cc-ci/repo-local customization runs the full generic suite through the real pipeline
(bridge → Drone → deploy → assert → undeploy → report) and reports **per-operation**
pass/fail/skip (install/upgrade/backup/restore).
- [ ] **DG7 — Real, DRY, clean.** No softened/`skip`/`xfail`/can't-fail assertions; generic logic
lives in the **shared harness** (M6.5 — no per-recipe copy-paste); every run **undeploys**
(teardown in `finally`), respects `MAX_TESTS`.
- [ ] **DG8 — Documented + cold-verified.** `docs/` explains the generic suite, the overlay
convention (file names + locations + precedence), and the custom-install-steps hook + how to
add a recipe overlay. The Adversary re-runs the acceptance checks from a cold start within 24h.
When DG1DG8 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1d.md`.
---
## 2. The layered test model (the core design)
For a given recipe, the harness assembles and runs **tiers**, each = `generic [+ overlays]`:
(`gen` = generic default; `→` = "overridden/extended by, if the recipe defines it" — mechanism is
the Builder's call, §2.2):
```
INSTALL = gen_install → test_install.py (else gen) ← always runs
UPGRADE = gen_upgrade → test_upgrade.py (else gen) ← always runs
BACKUP = gen_backup → test_backup.py (else gen) ← if backup-capable, else N/A
RESTORE = gen_restore → test_restore.py (else gen) ← if backup-capable, else N/A
CUSTOM = recipe test_*.py (anything beyond the four) ← ONLY if defined
```
### 2.1 Generic baseline suite (recipe-agnostic)
- **install** — `abra app new <recipe> --domain <run>.ci.commoninternet.net` with non-interactive
defaults + auto-generated secrets; `abra app deploy`; poll to converged; assert a real HTTP(S)
response from the app over its domain (status + that it's the app, not Traefik's fallback).
- **upgrade** — deploy a prior/pinned version, `abra app upgrade` to target, assert reconverge +
still serving.
- **backup / restore** — only for recipes declaring backup config; verify the **mechanism** (backup
produces an artifact; restore completes + app healthy). **Honest limit:** generic backup/restore
can't assert app-specific *data integrity* without recipe knowledge — that's a recipe overlay
(`test_backup.py`/`test_restore.py` seed a marker + assert it survives). State this in docs.
### 2.2 Layering — override or extend (Builder's call), always **reuse the deployment**
A recipe-defined `test_install.py` (etc.) either **overrides** the generic assertions for that op or
**extends** them — the Builder picks the mechanism (a present `test_<op>.py` replacing the generic is
a fine, simple model; or additive; or per-recipe opt-in). The invariant either way: **if a recipe
defines nothing for an op, the generic runs.** This guarantees every recipe is testable with zero
config, while letting a recipe with a poor generic fit supply its own.
**Reuse the deployment — do NOT redeploy per test (operator requirement, 2026-05-27).** Overlay
assertions run against the **same live deployment** the generic tier already brought up — no extra
`abra app new`/`deploy`/`undeploy` per overlay. The target shape: **one deploy per run**, then the
lifecycle ops mutate that single deployment in sequence and *all* assertions (generic + overlays +
custom) run against it, then **one teardown** at the end:
```
deploy ONCE
→ INSTALL assertions: generic_install + test_install.py (same live app)
→ UPGRADE in place (abra app upgrade)
assertions: generic_upgrade + test_upgrade.py (same app, upgraded)
→ BACKUP (if capable) → generic_backup + test_backup.py
→ RESTORE (if capable) → generic_restore + test_restore.py
→ CUSTOM test_*.py (same live app)
teardown ONCE (in finally)
```
So a seed/marker written by `test_backup.py` is the same data `test_restore.py` checks; an overlay's
extra HTTP assertion hits the app the generic install already deployed. Tiers that intentionally
change state (UPGRADE, RESTORE) do so on the shared deployment in order. The only time a tier
re-provisions is when an op semantically requires it (e.g. a from-scratch restore-into-blank variant)
— and that must be explicit, not the default. This is also the main Phase-2b speed win.
### 2.3 Custom tests
`test_*.py` that aren't one of the four lifecycle names run **only when present** for that recipe —
no generic equivalent, purely opt-in (e.g. `test_sso.py`, `test_federation.py`).
### 2.4 Custom install steps (and the graceful-generic rule)
Some recipes need extra setup the generic flow won't do (pre-seed a DB, set an env/secret, run a
one-off command). The harness exposes a **defined** per-recipe install-steps hook (cc-ci or
repo-local). Rules:
- If a recipe **declares** custom install steps → run them as part of the install tier.
- If a recipe has **no** customization defined anywhere → **still attempt the generic suite.**
Recipes that genuinely need special steps will **fail the generic install — and that's acceptable
and expected**; the failure is reported (per-op), and the fix is to add the custom step (Phase 2
work), not to special-case the harness.
### 2.5 Discovery + precedence
- **Locations:** cc-ci's test dir (e.g. `tests/<recipe>/`) and the recipe repo's `tests/`
(repo-local). The harness discovers overlays + custom-install-steps from both.
- **Precedence (OPEN — settle in DECISIONS):** proposed default — **both layer**; repo-local is the
upstream-authoritative source and always runs, cc-ci's overlay runs in addition (and may pin/extend
for the CI env). Define the rule for same-named collisions explicitly.
---
## 3. Milestones (bounded)
- **G0 — Generic install.** Implement generic_install in the shared harness; green on `custom-html`
with no recipe config. *Accept:* DG1.
- **G1 — Generic upgrade + backup/restore.** Add generic_upgrade; add generic_backup/restore gated
on backup-capability (clean N/A otherwise). *Accept:* DG2, DG3.
- **G2 — Layering + discovery.** Implement the generic+overlay composition and cc-ci/repo-local
discovery + precedence; prove an overlay runs on top of generic. *Accept:* DG4.
- **G3 — Custom install-steps hook + graceful-generic.** Implement the hook; demonstrate the
fail-without / pass-with proof on one recipe needing a step. *Accept:* DG5.
- **G4 — `!testme` integration + per-op reporting + docs + cold verify.** *Accept:* DG6, DG7, DG8;
then flip `machine-docs/STATUS-1d.md` to `## DONE`.
---
## 4. Guardrails
- **Generic is the default; recipe tests override or extend it** (Builder's mechanism) — but the
generic **always** runs when a recipe defines no test for that op. Never let a recipe end up with
*no* assertion for an op it should be tested on.
- **Generic failure ≠ harness bug** — a recipe needing custom steps failing the generic install is a
correct, reported outcome; fix by adding the step (Phase 2), don't weaken/special-case the generic.
- **Deploy once, reuse it** — overlays run against the generic tier's live deployment; no
per-test/per-overlay redeploy. One deploy + one teardown per run; re-provision only when an op
truly needs it, explicitly. (Correctness *and* the main perf lever.)
- **DRY** — generic logic in the shared harness, not per-recipe (M6.5); overlays are thin.
- **No weakened tests** — real assertions on real app state; teardown always; honor `MAX_TESTS`.
- **Bounded** — build the architecture + prove it on a couple of recipes; the full per-recipe overlay
authoring is Phase 2, not here.
---
## 5. Impact on later phases (reshapes the plan set)
- **Phase 2 (`plan-phase2-recipe-tests.md`)** changes from "author every test from scratch" to:
every enrolled recipe gets the generic suite for free; Phase 2 = **author the additive overlays**
(port recipe-maintainer tests as `test_*.py` overlays) **+ define custom install steps** where a
recipe fails generically. Update Phase 2 to reference 1d as its foundation.
- **Phase 3 (`plan-phase3-results-ux.md`)** — the YunoHost-style **level** maps cleanly onto the
tiers: e.g. installs (generic) → +upgrade → +backup/restore → +custom assertions. The level is
derived from which tiers pass.
- **Phase 2b (perf)** — the generic suite is the common hot path, so it's the prime target for the
image-cache / readiness / dedup optimizations.
---
## 6. Open decisions (log in machine-docs/DECISIONS.md)
- **Override vs extend (Builder's call).** Does a present `test_<op>.py` **replace** the generic
assertions for that op, **add** to them, or is it **per-recipe opt-in**? Operator leans: override
is a good, simple model. Builder decides + documents — keeping the invariant "no recipe test for an
op ⇒ generic runs" and the single-shared-deployment rule.
- **cc-ci vs repo-local precedence** for overlays + install-steps (§2.5) and same-name collision rule.
- Exact **overlay file convention**: fixed names (`test_install.py`…) + discovery dir layout
(`tests/<recipe>/` in cc-ci; `tests/` in the recipe repo).
- How **custom install steps** are declared: a shell hook (`install_steps.sh`), a pytest fixture, or
a declarative field — pick the simplest that the harness can run uniformly.
- **Backup-capability detection**: how the harness decides a recipe is backup-capable (backupbot
labels present / `abra app backup` exit) to choose run-vs-N/A for DG3.
- Whether generic **upgrade** should always go previous→latest, or test the specific
version-bump under `!testme` (PR-driven).
- Per-op result vocabulary (`pass`/`fail`/`skip(N/A)`/`error`) feeding the Phase-3 level.
- **Deployment-sharing scope**: confirm one-deploy-per-run for the whole lifecycle (install→upgrade→
backup→restore→custom on a single deployment) vs per-tier deployments; and how a failed earlier
tier (e.g. install) affects later tiers sharing that deployment (fail-fast vs continue-and-report).