Add Phase-2 plan: comprehensive per-recipe test authoring (after Phase-1 DONE)

Phase 2 fills the CI machine with good tests for every maintained Co-op Cloud app, using references/recipe-maintainer as the corpus: port a comparable cc-ci test for EACH existing recipe-maintainer test (parity, tracked in PARITY.md) + >=2 new recipe-specific functional tests per recipe, plus real backup data-integrity and SSO dependency handling. Reuses the Phase-1 harness/stages/trigger/resource-caps; adds test content + small shared-harness ports from helpers.py. Linked from the package README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 03:41:28 +01:00
parent 667c7cd5a0
commit 07faa6007f
2 changed files with 232 additions and 1 deletions
--- a/cc-ci-plan/README.md
+++ b/cc-ci-plan/README.md
@ -14,7 +14,9 @@ autonomous Claude loops (a Builder and an adversarial Reviewer) running over day

 | File | Purpose |
 |---|---|
-| `plan.md` | The plan. Agents treat it as their single source of truth. |
+| `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. |
+| `plan-phase2-recipe-tests.md` | **Phase 2** (starts after Phase-1 `## DONE`): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. |
+| `IDEAS.md` | Deferred/future ideas, parked out of current scope. |
 | `brief.md` | The original one-page brief (context only; `plan.md` supersedes it). |
 | `kickoff.md` | Launch & supervision guide. |
 | `launch.sh` | Starts both loops + a watchdog; restarts dead loops; stops on `## DONE`. |
--- a/cc-ci-plan/plan-phase2-recipe-tests.md
+++ b/cc-ci-plan/plan-phase2-recipe-tests.md
@ -0,0 +1,229 @@
+# cc-ci Phase 2 — Comprehensive per-recipe test authoring (Autonomous Build Plan)
+
+**Status:** QUEUED — starts only after Phase 1 (`plan.md`) reaches `## DONE`.
+**Builds on:** the Phase-1 cc-ci CI server (`plan.md`). This phase adds **test content**, not infra.
+**Reference corpus:** `references/recipe-maintainer/` → `/srv/recipe-maintainer/` (the existing,
+human-maintained recipe tests — the canonical source to port from).
+**Owner agents:** same Builder + Adversary loops + coordination protocol as Phase 1 (`plan.md` §6/§7).
+**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
+
+---
+
+## 0. Relationship to Phase 1 (read first)
+
+Phase 1 built the **machine**: the Drone pipeline, the `!testme` trigger (polling-primary), the
+`coop-cloud/traefik` proxy, the shared harness (`runner/run_recipe_ci.py` + `tests/conftest.py` +
+`runner/harness/`), the three stages (**install / upgrade / backup-restore**), guaranteed teardown,
+the `MAX_TESTS` concurrency cap, and the results dashboard — proven on ~six recipes (D10).
+
+Phase 2 **fills the machine with good tests for every Co-op Cloud app we maintain.** It does **not**
+re-architect the runner. It reuses Phase-1's harness, stages, trigger, resource caps, and teardown.
+Everything here is `tests/<recipe>/...` content + small, shared harness *additions* (ported from
+recipe-maintainer's helpers). When reality forces a harness change, record it in `DECISIONS.md`.
+
+Do not start Phase 2 until Phase-1 `STATUS.md` shows `## DONE` (Adversary-verified). The same loop
+protocol, single-writer file ownership, and gate handshake (`plan.md` §6.1/§7) apply here.
+
+---
+
+## 1. Mission
+
+For **every maintained Co-op Cloud recipe**, the cc-ci repo's `tests/<recipe>/` tree must contain a
+genuine end-to-end test suite that, run by `!testme` on a real PR, proves the app actually works —
+**at parity with what `recipe-maintainer` already tests, plus recipe-specific functional depth.**
+
+Concretely, for each recipe:
+1. **Port every existing recipe-maintainer test** for that recipe (a comparable cc-ci test for each
+   `recipe-info/<recipe>/tests/*.py`).
+2. **Add ≥2 new recipe-specific functional tests** that exercise something characteristic of that
+   app (not just "it returns 200") — see §4.3.
+3. All of it runs inside Phase-1's three stages and passes green via `!testme`.
+
+---
+
+## 2. Definition of Done (Phase 2 exit condition)
+
+The loop terminates only when every item holds **and the Adversary has independently re-verified
+each within 24h** (logged in `REVIEW.md`):
+
+- [ ] **P1 — Coverage.** Every recipe in the **Phase-2 recipe set (§5)** has a `tests/<recipe>/`
+      suite enrolled and a full green `!testme` run (install + upgrade + backup-restore).
+- [ ] **P2 — Parity port.** For each recipe, **every** test under
+      `recipe-info/<recipe>/tests/*.py` has a comparable cc-ci test (same thing verified), adapted to
+      the cc-ci harness. A mapping table (recipe-maintainer test → cc-ci test) is recorded in
+      `tests/<recipe>/PARITY.md`. Any test deliberately not ported is a documented `DECISIONS.md`
+      finding with the reason (e.g. obsolete, replaced by a better check), never a silent omission.
+- [ ] **P3 — Recipe-specific depth.** Each recipe has **≥2 new functional tests** beyond parity that
+      confirm characteristic behavior (§4.3), with real assertions on app state/responses — not
+      health-only.
+- [ ] **P4 — Backup data-integrity is real.** The backup-restore stage for each recipe **seeds
+      identifiable data, mutates/wipes, restores, and asserts the seeded data survived** (recipe-aware,
+      not just "service is up") — ported from recipe-maintainer's backup approach.
+- [ ] **P5 — Dependencies handled.** Recipes needing other apps (SSO providers, DBs) declare deps
+      (ported from `recipe.toml` `requires`/`test_requires`); the harness deploys deps within the run
+      (respecting `MAX_TESTS`/node budget) and SSO setup runs automatically (§4.2).
+- [ ] **P6 — Browser flows where they matter (D3).** Recipes whose core function is a UI flow have a
+      Playwright test of that flow (login, create-an-object, etc.), not just API checks.
+- [ ] **P7 — No weakened tests.** Every assertion is real; nothing is `skip`/`xfail`'d to go green.
+      Genuinely-untestable aspects are documented findings (`DECISIONS.md`), not silent skips.
+- [ ] **P8 — Docs.** `docs/enroll-recipe.md` updated with the per-recipe test contract (§4.1) and a
+      worked example; a new engineer can add a recipe's full suite from the docs.
+
+When all P1–P8 hold and are Adversary-verified, write `## DONE` to Phase-2 `STATUS.md`.
+
+---
+
+## 3. Reference corpus (port from here)
+
+`references/recipe-maintainer/` (`/srv/recipe-maintainer/`) is the source of truth for *what* to
+test and *how* recipe-maintainer already does it. Key paths:
+
+- **Per-recipe tests:** `recipe-info/<recipe>/tests/*.py` — the scripts to port (health_check,
+  oidc_login/oidc_integration, and recipe-specific ones like `meeting_flow.py`, `webrtc-media.py`,
+  `wopi_configured.py`, `goat_account.py`, `upload_conversion.py`).
+- **Per-recipe metadata:** `recipe-info/<recipe>/recipe.toml` (deps + `[sso]` provider/setup_script),
+  `test.md` (target URLs, what each test checks, manual steps, network/health specifics), `setup.md`,
+  `upstream.md`, `setup_<provider>_integration.py` (SSO realm/client/test-user setup).
+- **Shared utilities to port/adapt:** `utils/tests/helpers.py` — `http_get`/`http_post`,
+  `retry_http_get`, `assert_converges`, `wait_for_http`, `load_toml_credentials`, `abra(... tty_wrap)`,
+  `fresh_app`, `deploy_and_wait`. These map onto the Phase-1 harness fixtures.
+- **Orchestration logic to mirror as stages:** `.claude/commands/recipe-test{,-all,-new,-update,-backup}.md`
+  (new-install / upgrade / backup-restore semantics — already Phase-1 stages; port the *assertions*).
+- **Operational gotchas to bake in:** `learnings.md` — TTY-wrap (`script -qefc "abra …" /dev/null`)
+  for backup/restore/volume/secret/run/logs/lint; always `--chaos`; `backup-bot-two` must be present
+  for backup tests; prefer `docker service logs` over `abra app logs` (hangs non-interactively);
+  ghost-container volume-removal cleanup. Most are already in Phase-1 §4.3 — re-verify on the
+  installed abra.
+
+**Adaptation note (important):** recipe-maintainer tests run against a *persistent* instance
+(`cctest.autonomic.zone`) using `context_reset.py` to free memory. cc-ci runs **ephemeral per-PR
+deploys**. So port the **assertions and setup logic**, but drive lifecycle through Phase-1's harness
+(per-run isolated app `<recipe>-pr<n>-<sha>`, guaranteed teardown) — not the persistent-instance model.
+
+---
+
+## 4. The per-recipe test contract
+
+### 4.1 Required structure (per recipe)
+```
+tests/<recipe>/
+├── recipe.toml          # ported deps + [sso] (from recipe-maintainer's recipe.toml)
+├── PARITY.md            # mapping: recipe-maintainer test -> cc-ci test (for P2)
+├── test_install.py      # Phase-1 install stage hook + recipe health/readiness
+├── test_upgrade.py      # Phase-1 upgrade stage hook (prev published -> PR version)
+├── test_backup.py       # Phase-1 backup stage: seed -> backup -> wipe -> restore -> assert data (P4)
+├── functional/          # ported parity tests + NEW recipe-specific tests (P2, P3)
+│   ├── health_check.py          # ported
+│   ├── <ported-tests>.py        # one per recipe-maintainer test
+│   └── <recipe>_<behavior>.py   # NEW, >=2 recipe-specific functional tests
+└── playwright/          # browser flows where the app's core UX is a UI (P6)
+```
+
+### 4.2 Shared harness additions (port from `utils/tests/helpers.py`)
+Add to `runner/harness/` (reused across recipes), so per-recipe tests stay small:
+- HTTP convergence helpers (`retry_http_get`, `assert_converges`, `wait_for_http`).
+- **SSO setup + OIDC-flow helpers**: deploy the provider (keycloak/authentik), run the recipe's
+  `setup_<provider>_integration.py` (realm/client/test-user), persist credentials per-run, and a
+  reusable "full OIDC login → token → protected API call" assertion.
+- **Dependency resolver**: read `tests/<recipe>/recipe.toml` `requires`/`test_requires`, deploy deps
+  before the recipe under test, tear them down with it. Mind the `MAX_TESTS`/node budget — a recipe +
+  its SSO provider is ≥2 live apps; sequence heavy ones.
+- **Backup data-integrity helper**: seed a recipe-defined marker, snapshot, wipe volumes, restore,
+  re-read the marker (P4).
+- TTY-wrapped abra wrappers + robust readiness waits (no bare `sleep`).
+
+### 4.3 Recipe-specific functional tests (P3) — what "confirms how it works" means
+Beyond health + parity, each recipe gets **≥2** tests exercising its characteristic behavior. Derive
+them from the recipe's `test.md` and what the app *is for*. Representative targets:
+
+- **keycloak** — create a realm+client via admin API; obtain a token (password & client-credentials
+  grants); validate JWT claims.
+- **matrix-synapse** — register two users (admin API); one sends a room message, the other reads it;
+  media upload→download; `/_matrix/federation/v1/version` reachable.
+- **immich** — upload an asset via API, list it back, confirm a thumbnail/derivative is generated.
+- **lasuite-docs** — create a doc, edit via the API, confirm persistence; WOPI discovery (Collabora/
+  OnlyOffice) XML valid (port `wopi_configured`).
+- **lasuite-drive** — upload a file to a workspace, list/download it; MinIO bucket present.
+- **lasuite-meet** — create a room, two users get LiveKit tokens & join (port `meeting_flow`);
+  WebRTC connectivity probe (port `webrtc-media`).
+- **cryptpad** — create a pad and confirm it persists (note client-side-encryption: page is
+  JS-rendered, so use Playwright, not bare curl — see recipe-maintainer's note).
+- **bluesky-pds** — create a test account (goat CLI), create a post via atproto, fetch it back,
+  delete the account (port `goat_account`, extend with a post round-trip).
+- **mumble** — connect a client/CLI and confirm channel presence (beyond TCP health).
+- **n8n** — create a workflow via API, execute it, assert the result.
+- **ghost / mattermost-lts / discourse / plausible / uptime-kuma / mailu** — create the app's primary
+  object (post / message / topic / tracked event / monitor / mailbox) and read it back.
+- **custom-html / hedgedoc / gitea** — serve/persist content: write content, fetch it back.
+
+(For any recipe, "≥2 specific tests" = at minimum: create-an-object + read-it-back, and one more that
+touches a distinctive feature — SSO, federation, media, WOPI, WebRTC, etc.)
+
+---
+
+## 5. Phase-2 recipe set + enrollment
+
+**Target set** (recipe-maintainer's maintained list — `maintained-recipes`): cryptpad, lasuite-drive,
+lasuite-docs, lasuite-meet, matrix-synapse, keycloak, bluesky-pds, mumble, immich, mailu, custom-html,
+mattermost-lts, n8n, ghost, drone, plausible, uptime-kuma, discourse (+ authentik as an SSO provider).
+
+Several already live on the mirror (`recipe-maintainers/<recipe>`); the rest are brought over via the
+**Phase-1 recipe mirror+PR flow** (`abra recipe fetch` → recipe-create-pr procedure;
+see Phase-1 `plan.md` §4.1). Enroll **easy → hard**, dependency-tiered like recipe-maintainer
+(`recipe.toml` tiers): SSO providers (keycloak, authentik) and DBs first, then their dependents.
+
+Order guidance: prove the Phase-2 pattern on a simple recipe (custom-html) and a DB recipe (n8n),
+then keycloak/authentik (providers), then the SSO-dependent suite (lasuite-*, immich, cryptpad), then
+the heavy/standalone ones (matrix-synapse, mumble, bluesky-pds, ghost, discourse, mailu, mattermost,
+plausible, uptime-kuma, immich). Respect `MAX_TESTS` and run heavy deploys sequentially.
+
+---
+
+## 6. Milestones (each ends with an Adversary gate)
+
+- **Q0 — Harness additions.** Port `helpers.py` capabilities into `runner/harness/` (HTTP/convergence,
+  OIDC-flow, dependency resolver, backup data-integrity, TTY abra). *Accept:* a reference recipe
+  (custom-html) uses them for a full parity+specific suite, green via `!testme`.
+- **Q1 — Pattern proof (2 recipes).** custom-html (simple) + n8n (single-DB): full parity port +
+  ≥2 specific tests + real backup data-integrity. *Accept:* both green; `PARITY.md` complete.
+- **Q2 — SSO providers.** keycloak + authentik: parity + specific tests; the reusable SSO-setup/
+  OIDC-flow harness works end-to-end. *Accept:* a dependent recipe can deploy a provider and run an
+  OIDC login test in one run.
+- **Q3 — SSO-dependent suite.** lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich: deps
+  auto-deployed, SSO setup automated, parity + specific (WOPI, meeting/WebRTC, media, pad). *Accept:*
+  each green with deps, within node budget.
+- **Q4 — Remaining recipes.** matrix-synapse, mumble, bluesky-pds, ghost, mattermost-lts, discourse,
+  plausible, uptime-kuma, mailu, drone. *Accept:* each green, parity + specific.
+- **Q5 — Completeness + docs.** Every recipe in §5 covered; `PARITY.md` for all; `enroll-recipe.md`
+  documents the test contract with a worked example. *Accept:* Adversary re-runs a sampled subset
+  cold and confirms parity tables + specific tests are real (not health-only, not skipped); flip
+  Phase-2 `STATUS.md` to `## DONE`.
+
+---
+
+## 7. Loop protocol & guardrails (inherit from Phase 1)
+
+Same as `plan.md` §6/§6.1/§7/§9. Phase-2-specific emphases:
+- **Never weaken a test to go green** (P7) — the single most important rule; the Adversary watches
+  for health-only stand-ins, `skip`/`xfail`, or assertions that don't actually check app state.
+- **Real data-integrity for backups** (P4) — "service is up after restore" is *not* sufficient; the
+  seeded data must be proven to survive.
+- **Respect Phase-1 resource caps** — deps multiply live apps per run; keep within `MAX_TESTS`/node
+  budget; sequence heavy recipes; teardown (incl. deps) is guaranteed.
+- **Tests are recipe-versioned** — they run against the PR's recipe version; don't hardcode values
+  that break across upstream versions (read versions/endpoints dynamically, as helpers.py does).
+- **Cite the source** — each ported test notes its `recipe-info/<recipe>/tests/<file>` origin in
+  `PARITY.md` so parity is auditable.
+
+---
+
+## 8. Open decisions (log in DECISIONS.md)
+- Whether to **vendor** the ported helpers into `runner/harness/` vs import recipe-maintainer's
+  `helpers.py` directly. (Default: vendor/adapt — cc-ci must be self-contained and not depend on the
+  recipe-maintainer workspace at runtime.)
+- Per-recipe **secret/SSO credential** persistence within a run (reuse Phase-1 §4.4-B run-scoped
+  store; SSO test users/clients are class-B, generated per run, destroyed at teardown).
+- How many recipe-specific tests beyond the **≥2** floor per recipe (scale with the app's surface;
+  don't gold-plate trivial recipes).
+- Recipes that genuinely can't be CI'd in this environment (e.g. ones needing inbound UDP/TURN like
+  lasuite-meet's media path) → document the tested subset + why (mirror Phase-1 D10's honesty rule).