Files
cc-ci-orchestrator/cc-ci-plan/plan-phase2-recipe-tests.md
autonomic-bot 36a6c9872a orchestrator: reboot-resilience + session auto-resume + full session plan/tooling
Reboot survival for the Pi orchestrator host:
- systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot
  records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the
  orchestrator session.
- reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count).
- launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed
  orchestrator announces itself (PushNotification) + reports reboots.
- AGENTS.md: on-startup notify routine documented.

Plans/tooling accumulated this session:
- plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review),
  sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance.
- launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution,
  limit-stall re-nudge, INBOX side-channel detection.
- plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED.
- prompts: isolation discipline + INBOX + pacing.
- .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 20:28:10 +01:00

301 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci Phase 2 — Comprehensive per-recipe test authoring (Autonomous Build Plan)
**Status:** QUEUED — starts after Phase 1b (`plan-phase1b-review-lint.md`) **and Phase 1d**
(`plan-phase1d-generic-test-suite.md`) reach `## DONE`.
**Builds on:** the Phase-1 cc-ci CI server (`plan.md`) **and Phase 1d's generic test suite** — every
recipe already gets generic install/upgrade/backup/restore for free, reusing one shared deployment.
So this phase is **not** "write every test from scratch": it's **authoring the recipe overlays**
(`test_<op>.py` that override/extend the generic, per 1d's model) + **defining custom install steps**
for recipes that fail the generic form, porting the recipe-maintainer corpus as overlays. This phase
adds **test content**, not infra.
**Reference corpus:** `references/recipe-maintainer/``/srv/recipe-maintainer/` (the existing,
human-maintained recipe tests — the canonical source to port from).
**Owner agents:** same Builder + Adversary loops + coordination protocol as Phase 1 (`plan.md` §6/§7).
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
---
## 0c. OIDC / SSO-dep recipes — follow the SSO-dep plan (NOT operator-pending)
Recipes that authenticate via OIDC (lasuite-docs, cryptpad, lasuite-drive, lasuite-meet, future)
do **not** need operator input to wire OIDC. The canonical pattern lives at
**`plan-sso-dep-testing.md`** — declared `DEPS = ["keycloak"]` in `recipe_meta.py`, orchestrator
co-deploys the dep, recipe `install_steps.sh` reads `$CCCI_DEPS_FILE` and writes the OIDC env vars
+ injects the client secret via abra. Authenticated tests use `harness.sso.oidc_password_grant` or
Playwright on the dep's login page. **`DEFERRED.md` entries that cite "operator input needed for
OIDC" are mis-categorised** — re-open and execute them per this plan.
## 0b. Auto-mirror missing recipes (NOT a blocker — autonomous loop work)
A recipe is not on `git.autonomic.zone/recipe-maintainers/<recipe>` is **not** an operator-pending
blocker. The bot is admin on `recipe-maintainers` (see memory `cc-ci-gitea-recipes`) and can create
private mirror repos. **For any recipe you want to enroll/test that isn't mirrored yet, mirror it
yourself** before enrolling, based on the **`recipe-create-pr` skill** —
`/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md` (which references
`/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md` for the full procedure).
The flow (adapt the skill's command for the new-mirror case):
1. Create the **private** mirror repo on `git.autonomic.zone/recipe-maintainers/<recipe>` (Gitea API
POST `/orgs/recipe-maintainers/repos`, bot creds from `.testenv`/§1.5).
2. Mirror the upstream `git.coopcloud.tech/coop-cloud/<recipe>` (clone --mirror → push, including
tags) so the mirror's `main` is upstream-synced and tags carry over.
3. Then proceed with normal enrollment + the lifecycle suite (1d generic + your overlays from this
phase).
Treat this as standard loop work — don't sit idle waiting on the operator for missing recipes.
## 0a. Prerequisite: Phase 1e harness corrections (must be DONE first)
The 1d/1b operator review produced three shared-harness corrections, now their own phase
**`plan-phase1e-harness-corrections.md`** (runs **before** this phase). Do **not** author overlays
until 1e is `## DONE`: it changes the foundation every overlay sits on — (HC1) upgrade goes
prev-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved
recipes** (default cc-ci-overlays + generic only); (HC3) the **generic runs by default alongside an
overlay**, skipped only via an explicit opt-out. See that plan for detail.
## 0. Relationship to Phase 1 (read first)
Phase 1 built the **machine**: the Drone pipeline, the `!testme` trigger (polling-primary), the
`coop-cloud/traefik` proxy, the shared harness (`runner/run_recipe_ci.py` + `tests/conftest.py` +
`runner/harness/`), the three stages (**install / upgrade / backup-restore**), guaranteed teardown,
the `MAX_TESTS` concurrency cap, and the results dashboard — proven on ~six recipes (D10).
Phase 2 **fills the machine with good tests for every Co-op Cloud app we maintain.** It does **not**
re-architect the runner. It reuses Phase-1's harness, stages, trigger, resource caps, and teardown.
Everything here is `tests/<recipe>/...` content + small, shared harness *additions* (ported from
recipe-maintainer's helpers). When reality forces a harness change, record it in `DECISIONS.md`.
Do not start Phase 2 until Phase-1 `STATUS.md` shows `## DONE` (Adversary-verified). The same loop
protocol, single-writer file ownership, and gate handshake (`plan.md` §6.1/§7) apply here.
---
## 1. Mission
For **every maintained Co-op Cloud recipe**, the cc-ci repo's `tests/<recipe>/` tree must contain a
genuine end-to-end test suite that, run by `!testme` on a real PR, proves the app actually works —
**at parity with what `recipe-maintainer` already tests, plus recipe-specific functional depth.**
Concretely, for each recipe:
1. **Port every existing recipe-maintainer test** for that recipe (a comparable cc-ci test for each
`recipe-info/<recipe>/tests/*.py`).
2. **Add ≥2 new recipe-specific functional tests** that exercise something characteristic of that
app (not just "it returns 200") — see §4.3.
3. All of it runs inside Phase-1's three stages and passes green via `!testme`.
---
## 2. Definition of Done (Phase 2 exit condition)
The loop terminates only when every item holds **and the Adversary has independently re-verified
each within 24h** (logged in `REVIEW.md`):
- [ ] **P1 — Coverage.** Every recipe in the **Phase-2 recipe set (§5)** has a `tests/<recipe>/`
suite enrolled and a full green `!testme` run (install + upgrade + backup-restore).
- [ ] **P2 — Parity port.** For each recipe, **every** test under
`recipe-info/<recipe>/tests/*.py` has a comparable cc-ci test (same thing verified), adapted to
the cc-ci harness. A mapping table (recipe-maintainer test → cc-ci test) is recorded in
`tests/<recipe>/PARITY.md`. Any test deliberately not ported is a documented `DECISIONS.md`
finding with the reason (e.g. obsolete, replaced by a better check), never a silent omission.
- [ ] **P3 — Recipe-specific depth.** Each recipe has **≥2 new functional tests** beyond parity that
confirm characteristic behavior (§4.3), with real assertions on app state/responses — not
health-only.
- [ ] **P4 — Backup data-integrity is real.** The backup-restore stage for each recipe **seeds
identifiable data, mutates/wipes, restores, and asserts the seeded data survived** (recipe-aware,
not just "service is up") — ported from recipe-maintainer's backup approach.
- [ ] **P5 — Dependencies handled.** Recipes needing other apps (SSO providers, DBs) declare deps
(ported from `recipe.toml` `requires`/`test_requires`); the harness deploys deps within the run
(respecting `MAX_TESTS`/node budget) and SSO setup runs automatically (§4.2).
- [ ] **P6 — Browser flows where they matter (D3).** Recipes whose core function is a UI flow have a
Playwright test of that flow (login, create-an-object, etc.), not just API checks.
- [ ] **P7 — No weakened tests, no corners cut (§7.1).** Every assertion is real and checks app
state; nothing is `skip`/`xfail`'d, mocked, or reduced to a health-only stand-in to go green.
The bar: anything meaningful is testable with effort (OIDC/SSO, federation, media, WOPI, WebRTC
connectivity, data survival all included). Any "untestable" claim is the rare exception — a true
environment-level blocker only, with the maximal subset still implemented and **Adversary
sign-off** (§8); "needs a browser / SSO / another app" is not a valid excuse.
- [ ] **P8 — Docs.** `docs/enroll-recipe.md` updated with the per-recipe test contract (§4.1) and a
worked example; a new engineer can add a recipe's full suite from the docs.
When all P1P8 hold and are Adversary-verified, write `## DONE` to Phase-2 `STATUS.md`.
---
## 3. Reference corpus (port from here)
`references/recipe-maintainer/` (`/srv/recipe-maintainer/`) is the source of truth for *what* to
test and *how* recipe-maintainer already does it. Key paths:
- **Per-recipe tests:** `recipe-info/<recipe>/tests/*.py` — the scripts to port (health_check,
oidc_login/oidc_integration, and recipe-specific ones like `meeting_flow.py`, `webrtc-media.py`,
`wopi_configured.py`, `goat_account.py`, `upload_conversion.py`).
- **Per-recipe metadata:** `recipe-info/<recipe>/recipe.toml` (deps + `[sso]` provider/setup_script),
`test.md` (target URLs, what each test checks, manual steps, network/health specifics), `setup.md`,
`upstream.md`, `setup_<provider>_integration.py` (SSO realm/client/test-user setup).
- **Shared utilities to port/adapt:** `utils/tests/helpers.py``http_get`/`http_post`,
`retry_http_get`, `assert_converges`, `wait_for_http`, `load_toml_credentials`, `abra(... tty_wrap)`,
`fresh_app`, `deploy_and_wait`. These map onto the Phase-1 harness fixtures.
- **Orchestration logic to mirror as stages:** `.claude/commands/recipe-test{,-all,-new,-update,-backup}.md`
(new-install / upgrade / backup-restore semantics — already Phase-1 stages; port the *assertions*).
- **Operational gotchas to bake in:** `learnings.md` — TTY-wrap (`script -qefc "abra …" /dev/null`)
for backup/restore/volume/secret/run/logs/lint; always `--chaos`; `backup-bot-two` must be present
for backup tests; prefer `docker service logs` over `abra app logs` (hangs non-interactively);
ghost-container volume-removal cleanup. Most are already in Phase-1 §4.3 — re-verify on the
installed abra.
**Adaptation note (important):** recipe-maintainer tests run against a *persistent* instance
(`cctest.autonomic.zone`) using `context_reset.py` to free memory. cc-ci runs **ephemeral per-PR
deploys**. So port the **assertions and setup logic**, but drive lifecycle through Phase-1's harness
(per-run isolated app `<recipe>-pr<n>-<sha>`, guaranteed teardown) — not the persistent-instance model.
---
## 4. The per-recipe test contract
### 4.1 Required structure (per recipe)
```
tests/<recipe>/
├── recipe.toml # ported deps + [sso] (from recipe-maintainer's recipe.toml)
├── PARITY.md # mapping: recipe-maintainer test -> cc-ci test (for P2)
├── test_install.py # Phase-1 install stage hook + recipe health/readiness
├── test_upgrade.py # Phase-1 upgrade stage hook (prev published -> PR version)
├── test_backup.py # Phase-1 backup stage: seed -> backup -> wipe -> restore -> assert data (P4)
├── functional/ # ported parity tests + NEW recipe-specific tests (P2, P3)
│ ├── health_check.py # ported
│ ├── <ported-tests>.py # one per recipe-maintainer test
│ └── <recipe>_<behavior>.py # NEW, >=2 recipe-specific functional tests
└── playwright/ # browser flows where the app's core UX is a UI (P6)
```
### 4.2 Shared harness additions (port from `utils/tests/helpers.py`)
Add to `runner/harness/` (reused across recipes), so per-recipe tests stay small:
- HTTP convergence helpers (`retry_http_get`, `assert_converges`, `wait_for_http`).
- **SSO setup + OIDC-flow helpers**: deploy the provider (keycloak/authentik), run the recipe's
`setup_<provider>_integration.py` (realm/client/test-user), persist credentials per-run, and a
reusable "full OIDC login → token → protected API call" assertion.
- **Dependency resolver**: read `tests/<recipe>/recipe.toml` `requires`/`test_requires`, deploy deps
before the recipe under test, tear them down with it. Mind the `MAX_TESTS`/node budget — a recipe +
its SSO provider is ≥2 live apps; sequence heavy ones.
- **Backup data-integrity helper**: seed a recipe-defined marker, snapshot, wipe volumes, restore,
re-read the marker (P4).
- TTY-wrapped abra wrappers + robust readiness waits (no bare `sleep`).
### 4.3 Recipe-specific functional tests (P3) — what "confirms how it works" means
Beyond health + parity, each recipe gets **≥2** tests exercising its characteristic behavior. Derive
them from the recipe's `test.md` and what the app *is for*. Representative targets:
- **keycloak** — create a realm+client via admin API; obtain a token (password & client-credentials
grants); validate JWT claims.
- **matrix-synapse** — register two users (admin API); one sends a room message, the other reads it;
media upload→download; `/_matrix/federation/v1/version` reachable.
- **immich** — upload an asset via API, list it back, confirm a thumbnail/derivative is generated.
- **lasuite-docs** — create a doc, edit via the API, confirm persistence; WOPI discovery (Collabora/
OnlyOffice) XML valid (port `wopi_configured`).
- **lasuite-drive** — upload a file to a workspace, list/download it; MinIO bucket present.
- **lasuite-meet** — create a room, two users get LiveKit tokens & join (port `meeting_flow`);
WebRTC connectivity probe (port `webrtc-media`).
- **cryptpad** — create a pad and confirm it persists (note client-side-encryption: page is
JS-rendered, so use Playwright, not bare curl — see recipe-maintainer's note).
- **bluesky-pds** — create a test account (goat CLI), create a post via atproto, fetch it back,
delete the account (port `goat_account`, extend with a post round-trip).
- **mumble** — connect a client/CLI and confirm channel presence (beyond TCP health).
- **n8n** — create a workflow via API, execute it, assert the result.
- **ghost / mattermost-lts / discourse / plausible / uptime-kuma / mailu** — create the app's primary
object (post / message / topic / tracked event / monitor / mailbox) and read it back.
- **custom-html / hedgedoc / gitea** — serve/persist content: write content, fetch it back.
(For any recipe, "≥2 specific tests" = at minimum: create-an-object + read-it-back, and one more that
touches a distinctive feature — SSO, federation, media, WOPI, WebRTC, etc.)
---
## 5. Phase-2 recipe set + enrollment
**Target set** (recipe-maintainer's maintained list — `maintained-recipes`): cryptpad, lasuite-drive,
lasuite-docs, lasuite-meet, matrix-synapse, keycloak, bluesky-pds, mumble, immich, mailu, custom-html,
mattermost-lts, n8n, ghost, drone, plausible, uptime-kuma, discourse (+ authentik as an SSO provider).
Several already live on the mirror (`recipe-maintainers/<recipe>`); the rest are brought over via the
**Phase-1 recipe mirror+PR flow** (`abra recipe fetch` → recipe-create-pr procedure;
see Phase-1 `plan.md` §4.1). Enroll **easy → hard**, dependency-tiered like recipe-maintainer
(`recipe.toml` tiers): SSO providers (keycloak, authentik) and DBs first, then their dependents.
Order guidance: prove the Phase-2 pattern on a simple recipe (custom-html) and a DB recipe (n8n),
then keycloak/authentik (providers), then the SSO-dependent suite (lasuite-*, immich, cryptpad), then
the heavy/standalone ones (matrix-synapse, mumble, bluesky-pds, ghost, discourse, mailu, mattermost,
plausible, uptime-kuma, immich). Respect `MAX_TESTS` and run heavy deploys sequentially.
---
## 6. Milestones (each ends with an Adversary gate)
- **Q0 — Harness additions.** Port `helpers.py` capabilities into `runner/harness/` (HTTP/convergence,
OIDC-flow, dependency resolver, backup data-integrity, TTY abra). *Accept:* a reference recipe
(custom-html) uses them for a full parity+specific suite, green via `!testme`.
- **Q1 — Pattern proof (2 recipes).** custom-html (simple) + n8n (single-DB): full parity port +
≥2 specific tests + real backup data-integrity. *Accept:* both green; `PARITY.md` complete.
- **Q2 — SSO providers.** keycloak + authentik: parity + specific tests; the reusable SSO-setup/
OIDC-flow harness works end-to-end. *Accept:* a dependent recipe can deploy a provider and run an
OIDC login test in one run.
- **Q3 — SSO-dependent suite.** lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich: deps
auto-deployed, SSO setup automated, parity + specific (WOPI, meeting/WebRTC, media, pad). *Accept:*
each green with deps, within node budget.
- **Q4 — Remaining recipes.** matrix-synapse, mumble, bluesky-pds, ghost, mattermost-lts, discourse,
plausible, uptime-kuma, mailu, drone. *Accept:* each green, parity + specific.
- **Q5 — Completeness + docs.** Every recipe in §5 covered; `PARITY.md` for all; `enroll-recipe.md`
documents the test contract with a worked example. *Accept:* Adversary re-runs a sampled subset
cold and confirms parity tables + specific tests are real (not health-only, not skipped); flip
Phase-2 `STATUS.md` to `## DONE`.
---
## 7. Loop protocol & guardrails (inherit from Phase 1)
Same as `plan.md` §6/§6.1/§7/§9. Phase-2-specific emphases:
- **Never weaken a test to go green** (P7) — the single most important rule; the Adversary watches
for health-only stand-ins, `skip`/`xfail`, or assertions that don't actually check app state.
- **Real data-integrity for backups** (P4) — "service is up after restore" is *not* sufficient; the
seeded data must be proven to survive.
### 7.1 Adversary mandate (Phase 2) — no skipped tests, no corners cut
The default assumption is that **everything meaningful about an app is testable with enough effort —
the job is to write a *good* test, not to declare it impossible.** OIDC/SSO login, token issuance and
JWT validation, federation endpoints, media upload/download, WOPI discovery, WebRTC ICE/connectivity,
backup data survival — these are all testable end-to-end and **must** be tested, not stubbed. The
Adversary actively enforces this and **reads the test bodies, not just pass/fail**:
- **Reject** any test that is `skip`/`xfail`/commented-out, mocked, mutated to a `health_check`
stand-in, asserts nothing material (e.g. only `status==200`), or is hard-coded to pass.
- **Reject "we couldn't test X"** unless it is a genuine *environment-level* limitation (e.g. the
test host cannot receive inbound UDP, so the full lasuite-meet *media relay* path can't complete) —
and even then demand the **maximal testable subset** (e.g. signaling, token issuance, ICE candidate
gathering) plus a `DECISIONS.md` justification with the specific technical blocker. "It's hard",
"needs a browser", "needs SSO setup", "needs another app deployed" are **not** valid reasons —
Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely to remove
those excuses.
- **Verify parity for real (P2):** for each `PARITY.md` row, confirm the cc-ci test checks the *same
thing* the recipe-maintainer original did — not a hollow rename.
- **Re-run cold and inspect:** the Adversary re-runs a sampled recipe's suite from a clean state and
reads the diffs/assertions; a green run with empty assertions is a FAIL and a `[adversary]` finding.
- **Respect Phase-1 resource caps** — deps multiply live apps per run; keep within `MAX_TESTS`/node
budget; sequence heavy recipes; teardown (incl. deps) is guaranteed.
- **Tests are recipe-versioned** — they run against the PR's recipe version; don't hardcode values
that break across upstream versions (read versions/endpoints dynamically, as helpers.py does).
- **Cite the source** — each ported test notes its `recipe-info/<recipe>/tests/<file>` origin in
`PARITY.md` so parity is auditable.
---
## 8. Open decisions (log in DECISIONS.md)
- Whether to **vendor** the ported helpers into `runner/harness/` vs import recipe-maintainer's
`helpers.py` directly. (Default: vendor/adapt — cc-ci must be self-contained and not depend on the
recipe-maintainer workspace at runtime.)
- Per-recipe **secret/SSO credential** persistence within a run (reuse Phase-1 §4.4-B run-scoped
store; SSO test users/clients are class-B, generated per run, destroyed at teardown).
- How many recipe-specific tests beyond the **≥2** floor per recipe (scale with the app's surface;
don't gold-plate trivial recipes).
- A test deemed "impossible" is the **rare exception**, not a convenient out (§7.1). It is only
acceptable for a true environment-level blocker (e.g. no inbound UDP for lasuite-meet's *media
relay*), requires the **maximal testable subset** still implemented, a specific technical reason in
`DECISIONS.md`, **and Adversary sign-off**. SSO/OIDC, browser flows, multi-app dependencies, and
data-integrity are explicitly **not** exceptions — they are testable and required.