Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
244 lines
17 KiB
Markdown
244 lines
17 KiB
Markdown
# cc-ci Phase 1d — Generic test suite + layered recipe overlays (Autonomous Build Plan)
|
||
|
||
**Status:** QUEUED — runs **after Phase 1b** (`plan-phase1b-review-lint.md`) and **before Phase 2**
|
||
(`plan-phase2-recipe-tests.md`). It is the **test-architecture foundation** Phase 2 builds on, so it
|
||
must precede it.
|
||
**Transition:** **manual** (operator kicks it off at the post-1b check-in).
|
||
**Builds on:** the post-1b codebase (the runner/harness, `.drone.yml`, the comment-bridge, the proof
|
||
recipes, the `nix/` + `machine-docs/` layout from 1b RL5/RL6).
|
||
**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
|
||
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md`
|
||
**Phase order:** 1c → 1b → **1d** → 2 → 2b → 3.
|
||
|
||
---
|
||
|
||
## 0. Why this phase
|
||
|
||
Today a recipe only gets tested if someone has authored tests for it. That doesn't scale to ~18+
|
||
recipes and gives nothing for a brand-new recipe. The operator's model (2026-05-27): **every recipe
|
||
gets a generic lifecycle test suite for free**, and recipe-specific tests *layer on top* rather than
|
||
being the only thing that runs. This makes `!testme` meaningful on **any** recipe immediately, turns
|
||
Phase 2 into "add overlays where they add value" instead of "write everything from scratch," and
|
||
gives Phase 3 a natural basis for a YunoHost-style **level** (how many tiers pass).
|
||
|
||
Core principle: **the generic is the default for each lifecycle op; a recipe's own `test_<op>.py`
|
||
may override *or* extend that default.** The exact additive-vs-override mechanism is the **Builder's
|
||
design call** (operator, 2026-05-27: override — e.g. a present `test_install.py` *replaces* the
|
||
generic install assertions — is a perfectly good option; additive is fine too; could even be
|
||
per-recipe opt-in). What's **fixed** and non-negotiable: (1) when a recipe defines **no** test for an
|
||
op, **the generic runs** (so any recipe is testable with zero config); (2) the harness owns a
|
||
**single shared deployment** that generic + recipe assertions reuse — **no redeploy** (§2.2); (3)
|
||
custom (non-lifecycle) tests are opt-in.
|
||
|
||
> **SUPERSEDED (2026-05-28) by Phase 1e HC3** (`plan-phase1e-harness-corrections.md`): the override
|
||
> default is replaced — the **generic now runs by default *alongside* an overlay (additive)**,
|
||
> skipped only via an explicit opt-out. Also: repo-local PR code is gated to approved recipes (HC2),
|
||
> and the upgrade tier targets the PR head (HC1). Read 1e for the current behavior.
|
||
|
||
---
|
||
|
||
## 1. Definition of Done (Phase 1d exit condition)
|
||
|
||
Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
|
||
`machine-docs/REVIEW-1d.md`):
|
||
|
||
- [ ] **DG1 — Generic INSTALL test.** A recipe-agnostic install test exists that, given only a
|
||
recipe name and **zero** recipe-specific config, does `abra app new` (sane defaults / auto
|
||
secrets) → `deploy` → polls to converged (all services running/healthy, no bare `sleep`) →
|
||
asserts the app is **actually serving** (real HTTP(S) response on its
|
||
`<run>.ci.commoninternet.net` domain through Traefik — **not** a Traefik 404/default cert, not
|
||
health-only). Demonstrated **green** on ≥1 simple recipe (e.g. `custom-html`) that has **no**
|
||
cc-ci or repo-local tests.
|
||
- [ ] **DG2 — Generic UPGRADE test.** Deploy the **previous published version** (the last release
|
||
*before* the code under test), then **upgrade to the code under test (PR head) via
|
||
`abra app deploy --chaos`** (chaos = the current checkout) — i.e. previous-release → PR-head,
|
||
not previous → newest-published-tag. Assert services reconverge and the app still serves.
|
||
**OPERATOR CORRECTION (2026-05-28):** the current 1d impl upgrades to the newest *published tag*
|
||
and (because deploying the prev tag re-checks-out the recipe) never deploys the PR head — so a
|
||
recipe PR's actual changes aren't exercised by the upgrade path. Fix: after deploying prev,
|
||
**restore the PR-head checkout** (re-checkout the PR ref / re-snapshot) and `deploy --chaos` to
|
||
it as the upgrade target. The "deployment actually moved" assertion (`do_upgrade`) still
|
||
applies, but adapt it for prev→PR-head (a PR may not bump the version label — also accept an
|
||
image/config change, or assert the running config now matches the PR head). For a non-PR
|
||
`!testme`, "current checkout" = the catalogue current, so upgrade tests prev→current. (Data-
|
||
continuity assertions remain recipe overlays — see §2.1.)
|
||
- [ ] **DG3 — Generic BACKUP + RESTORE tests.** For backup-capable recipes (declare backupbot /
|
||
`abra app backup` support): run backup → assert a snapshot artifact is produced; then restore →
|
||
assert restore completes and the app is healthy after. For recipes that declare **no** backup
|
||
config, backup/restore are cleanly **N/A (skipped)** — *not* a failure.
|
||
- [ ] **DG4 — Layering (override-or-extend; generic is the default).** The harness composes a
|
||
recipe's run from the generic default per op, **overridden or extended** by the recipe's
|
||
`test_install.py` / `test_upgrade.py` / `test_backup.py` / `test_restore.py` when present (in
|
||
cc-ci's tests dir **or** repo-local in the recipe's `tests/`) — the additive-vs-override
|
||
mechanism is the Builder's design choice, recorded in `machine-docs/DECISIONS.md`. **Invariant:
|
||
if a recipe defines no test for an op, the generic runs.** Arbitrary **custom** `test_*.py` run
|
||
**only if defined**. Discovery + cc-ci-vs-repo-local precedence is
|
||
implemented and settled in `machine-docs/DECISIONS.md`.
|
||
- [ ] **DG4.1 — Overlays reuse the deployment (no redeploy).** Overlay + custom assertions run
|
||
against the **same live deployment** the generic tier brought up — **one deploy per run, one
|
||
teardown** (§2.2), lifecycle ops mutating it in place. Verified: adding an overlay to a recipe
|
||
causes **no** extra `abra app new/deploy/undeploy` beyond the shared run (assert via
|
||
deploy-count / harness logs). Re-provisioning only where an op semantically demands it, and
|
||
then explicitly.
|
||
- [ ] **DG5 — Custom install-steps hook (with the "generic-anyway" rule).** The harness supports
|
||
**defined** per-recipe extra install steps (cc-ci or repo-local) that run before/around the
|
||
generic install. A recipe with **no** customization still **attempts the generic suite**.
|
||
Proven both ways: (a) a recipe needing a custom step **fails the generic install as expected**
|
||
when the step is absent; (b) the **same** recipe **passes** once the custom step is added —
|
||
demonstrating the hook + the graceful-generic-failure are both real.
|
||
- [ ] **DG6 — `!testme` end-to-end on an unconfigured recipe.** A `!testme` on a recipe with no
|
||
cc-ci/repo-local customization runs the full generic suite through the real pipeline
|
||
(bridge → Drone → deploy → assert → undeploy → report) and reports **per-operation**
|
||
pass/fail/skip (install/upgrade/backup/restore).
|
||
- [ ] **DG7 — Real, DRY, clean.** No softened/`skip`/`xfail`/can't-fail assertions; generic logic
|
||
lives in the **shared harness** (M6.5 — no per-recipe copy-paste); every run **undeploys**
|
||
(teardown in `finally`), respects `MAX_TESTS`.
|
||
- [ ] **DG8 — Documented + cold-verified.** `docs/` explains the generic suite, the overlay
|
||
convention (file names + locations + precedence), and the custom-install-steps hook + how to
|
||
add a recipe overlay. The Adversary re-runs the acceptance checks from a cold start within 24h.
|
||
|
||
When DG1–DG8 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1d.md`.
|
||
|
||
---
|
||
|
||
## 2. The layered test model (the core design)
|
||
|
||
For a given recipe, the harness assembles and runs **tiers**, each = `generic [+ overlays]`:
|
||
|
||
(`gen` = generic default; `→` = "overridden/extended by, if the recipe defines it" — mechanism is
|
||
the Builder's call, §2.2):
|
||
```
|
||
INSTALL = gen_install → test_install.py (else gen) ← always runs
|
||
UPGRADE = gen_upgrade → test_upgrade.py (else gen) ← always runs
|
||
BACKUP = gen_backup → test_backup.py (else gen) ← if backup-capable, else N/A
|
||
RESTORE = gen_restore → test_restore.py (else gen) ← if backup-capable, else N/A
|
||
CUSTOM = recipe test_*.py (anything beyond the four) ← ONLY if defined
|
||
```
|
||
|
||
### 2.1 Generic baseline suite (recipe-agnostic)
|
||
- **install** — `abra app new <recipe> --domain <run>.ci.commoninternet.net` with non-interactive
|
||
defaults + auto-generated secrets; `abra app deploy`; poll to converged; assert a real HTTP(S)
|
||
response from the app over its domain (status + that it's the app, not Traefik's fallback).
|
||
- **upgrade** — deploy a prior/pinned version, `abra app upgrade` to target, assert reconverge +
|
||
still serving.
|
||
- **backup / restore** — only for recipes declaring backup config; verify the **mechanism** (backup
|
||
produces an artifact; restore completes + app healthy). **Honest limit:** generic backup/restore
|
||
can't assert app-specific *data integrity* without recipe knowledge — that's a recipe overlay
|
||
(`test_backup.py`/`test_restore.py` seed a marker + assert it survives). State this in docs.
|
||
|
||
### 2.2 Layering — override or extend (Builder's call), always **reuse the deployment**
|
||
A recipe-defined `test_install.py` (etc.) either **overrides** the generic assertions for that op or
|
||
**extends** them — the Builder picks the mechanism (a present `test_<op>.py` replacing the generic is
|
||
a fine, simple model; or additive; or per-recipe opt-in). The invariant either way: **if a recipe
|
||
defines nothing for an op, the generic runs.** This guarantees every recipe is testable with zero
|
||
config, while letting a recipe with a poor generic fit supply its own.
|
||
|
||
**Reuse the deployment — do NOT redeploy per test (operator requirement, 2026-05-27).** Overlay
|
||
assertions run against the **same live deployment** the generic tier already brought up — no extra
|
||
`abra app new`/`deploy`/`undeploy` per overlay. The target shape: **one deploy per run**, then the
|
||
lifecycle ops mutate that single deployment in sequence and *all* assertions (generic + overlays +
|
||
custom) run against it, then **one teardown** at the end:
|
||
|
||
```
|
||
deploy ONCE
|
||
→ INSTALL assertions: generic_install + test_install.py (same live app)
|
||
→ UPGRADE in place (abra app upgrade)
|
||
assertions: generic_upgrade + test_upgrade.py (same app, upgraded)
|
||
→ BACKUP (if capable) → generic_backup + test_backup.py
|
||
→ RESTORE (if capable) → generic_restore + test_restore.py
|
||
→ CUSTOM test_*.py (same live app)
|
||
teardown ONCE (in finally)
|
||
```
|
||
|
||
So a seed/marker written by `test_backup.py` is the same data `test_restore.py` checks; an overlay's
|
||
extra HTTP assertion hits the app the generic install already deployed. Tiers that intentionally
|
||
change state (UPGRADE, RESTORE) do so on the shared deployment in order. The only time a tier
|
||
re-provisions is when an op semantically requires it (e.g. a from-scratch restore-into-blank variant)
|
||
— and that must be explicit, not the default. This is also the main Phase-2b speed win.
|
||
|
||
### 2.3 Custom tests
|
||
`test_*.py` that aren't one of the four lifecycle names run **only when present** for that recipe —
|
||
no generic equivalent, purely opt-in (e.g. `test_sso.py`, `test_federation.py`).
|
||
|
||
### 2.4 Custom install steps (and the graceful-generic rule)
|
||
Some recipes need extra setup the generic flow won't do (pre-seed a DB, set an env/secret, run a
|
||
one-off command). The harness exposes a **defined** per-recipe install-steps hook (cc-ci or
|
||
repo-local). Rules:
|
||
- If a recipe **declares** custom install steps → run them as part of the install tier.
|
||
- If a recipe has **no** customization defined anywhere → **still attempt the generic suite.**
|
||
Recipes that genuinely need special steps will **fail the generic install — and that's acceptable
|
||
and expected**; the failure is reported (per-op), and the fix is to add the custom step (Phase 2
|
||
work), not to special-case the harness.
|
||
|
||
### 2.5 Discovery + precedence
|
||
- **Locations:** cc-ci's test dir (e.g. `tests/<recipe>/`) and the recipe repo's `tests/`
|
||
(repo-local). The harness discovers overlays + custom-install-steps from both.
|
||
- **Precedence (OPEN — settle in DECISIONS):** proposed default — **both layer**; repo-local is the
|
||
upstream-authoritative source and always runs, cc-ci's overlay runs in addition (and may pin/extend
|
||
for the CI env). Define the rule for same-named collisions explicitly.
|
||
|
||
---
|
||
|
||
## 3. Milestones (bounded)
|
||
|
||
- **G0 — Generic install.** Implement generic_install in the shared harness; green on `custom-html`
|
||
with no recipe config. *Accept:* DG1.
|
||
- **G1 — Generic upgrade + backup/restore.** Add generic_upgrade; add generic_backup/restore gated
|
||
on backup-capability (clean N/A otherwise). *Accept:* DG2, DG3.
|
||
- **G2 — Layering + discovery.** Implement the generic+overlay composition and cc-ci/repo-local
|
||
discovery + precedence; prove an overlay runs on top of generic. *Accept:* DG4.
|
||
- **G3 — Custom install-steps hook + graceful-generic.** Implement the hook; demonstrate the
|
||
fail-without / pass-with proof on one recipe needing a step. *Accept:* DG5.
|
||
- **G4 — `!testme` integration + per-op reporting + docs + cold verify.** *Accept:* DG6, DG7, DG8;
|
||
then flip `machine-docs/STATUS-1d.md` to `## DONE`.
|
||
|
||
---
|
||
|
||
## 4. Guardrails
|
||
- **Generic is the default; recipe tests override or extend it** (Builder's mechanism) — but the
|
||
generic **always** runs when a recipe defines no test for that op. Never let a recipe end up with
|
||
*no* assertion for an op it should be tested on.
|
||
- **Generic failure ≠ harness bug** — a recipe needing custom steps failing the generic install is a
|
||
correct, reported outcome; fix by adding the step (Phase 2), don't weaken/special-case the generic.
|
||
- **Deploy once, reuse it** — overlays run against the generic tier's live deployment; no
|
||
per-test/per-overlay redeploy. One deploy + one teardown per run; re-provision only when an op
|
||
truly needs it, explicitly. (Correctness *and* the main perf lever.)
|
||
- **DRY** — generic logic in the shared harness, not per-recipe (M6.5); overlays are thin.
|
||
- **No weakened tests** — real assertions on real app state; teardown always; honor `MAX_TESTS`.
|
||
- **Bounded** — build the architecture + prove it on a couple of recipes; the full per-recipe overlay
|
||
authoring is Phase 2, not here.
|
||
|
||
---
|
||
|
||
## 5. Impact on later phases (reshapes the plan set)
|
||
- **Phase 2 (`plan-phase2-recipe-tests.md`)** changes from "author every test from scratch" to:
|
||
every enrolled recipe gets the generic suite for free; Phase 2 = **author the additive overlays**
|
||
(port recipe-maintainer tests as `test_*.py` overlays) **+ define custom install steps** where a
|
||
recipe fails generically. Update Phase 2 to reference 1d as its foundation.
|
||
- **Phase 3 (`plan-phase3-results-ux.md`)** — the YunoHost-style **level** maps cleanly onto the
|
||
tiers: e.g. installs (generic) → +upgrade → +backup/restore → +custom assertions. The level is
|
||
derived from which tiers pass.
|
||
- **Phase 2b (perf)** — the generic suite is the common hot path, so it's the prime target for the
|
||
image-cache / readiness / dedup optimizations.
|
||
|
||
---
|
||
|
||
## 6. Open decisions (log in machine-docs/DECISIONS.md)
|
||
- **Override vs extend (Builder's call).** Does a present `test_<op>.py` **replace** the generic
|
||
assertions for that op, **add** to them, or is it **per-recipe opt-in**? Operator leans: override
|
||
is a good, simple model. Builder decides + documents — keeping the invariant "no recipe test for an
|
||
op ⇒ generic runs" and the single-shared-deployment rule.
|
||
- **cc-ci vs repo-local precedence** for overlays + install-steps (§2.5) and same-name collision rule.
|
||
- Exact **overlay file convention**: fixed names (`test_install.py`…) + discovery dir layout
|
||
(`tests/<recipe>/` in cc-ci; `tests/` in the recipe repo).
|
||
- How **custom install steps** are declared: a shell hook (`install_steps.sh`), a pytest fixture, or
|
||
a declarative field — pick the simplest that the harness can run uniformly.
|
||
- **Backup-capability detection**: how the harness decides a recipe is backup-capable (backupbot
|
||
labels present / `abra app backup` exit) to choose run-vs-N/A for DG3.
|
||
- Whether generic **upgrade** should always go previous→latest, or test the specific
|
||
version-bump under `!testme` (PR-driven).
|
||
- Per-op result vocabulary (`pass`/`fail`/`skip(N/A)`/`error`) feeding the Phase-3 level.
|
||
- **Deployment-sharing scope**: confirm one-deploy-per-run for the whole lifecycle (install→upgrade→
|
||
backup→restore→custom on a single deployment) vs per-tier deployments; and how a failed earlier
|
||
tier (e.g. install) affects later tiers sharing that deployment (fail-fast vs continue-and-report).
|