Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
301 lines
20 KiB
Markdown
301 lines
20 KiB
Markdown
# cc-ci Phase 2 — Comprehensive per-recipe test authoring (Autonomous Build Plan)
|
||
|
||
**Status:** QUEUED — starts after Phase 1b (`plan-phase1b-review-lint.md`) **and Phase 1d**
|
||
(`plan-phase1d-generic-test-suite.md`) reach `## DONE`.
|
||
**Builds on:** the Phase-1 cc-ci CI server (`plan.md`) **and Phase 1d's generic test suite** — every
|
||
recipe already gets generic install/upgrade/backup/restore for free, reusing one shared deployment.
|
||
So this phase is **not** "write every test from scratch": it's **authoring the recipe overlays**
|
||
(`test_<op>.py` that override/extend the generic, per 1d's model) + **defining custom install steps**
|
||
for recipes that fail the generic form, porting the recipe-maintainer corpus as overlays. This phase
|
||
adds **test content**, not infra.
|
||
**Reference corpus:** `references/recipe-maintainer/` → `/srv/recipe-maintainer/` (the existing,
|
||
human-maintained recipe tests — the canonical source to port from).
|
||
**Owner agents:** same Builder + Adversary loops + coordination protocol as Phase 1 (`plan.md` §6/§7).
|
||
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
|
||
|
||
---
|
||
|
||
## 0c. OIDC / SSO-dep recipes — follow the SSO-dep plan (NOT operator-pending)
|
||
|
||
Recipes that authenticate via OIDC (lasuite-docs, cryptpad, lasuite-drive, lasuite-meet, future)
|
||
do **not** need operator input to wire OIDC. The canonical pattern lives at
|
||
**`plan-sso-dep-testing.md`** — declared `DEPS = ["keycloak"]` in `recipe_meta.py`, orchestrator
|
||
co-deploys the dep, recipe `install_steps.sh` reads `$CCCI_DEPS_FILE` and writes the OIDC env vars
|
||
+ injects the client secret via abra. Authenticated tests use `harness.sso.oidc_password_grant` or
|
||
Playwright on the dep's login page. **`DEFERRED.md` entries that cite "operator input needed for
|
||
OIDC" are mis-categorised** — re-open and execute them per this plan.
|
||
|
||
## 0b. Auto-mirror missing recipes (NOT a blocker — autonomous loop work)
|
||
|
||
A recipe is not on `git.autonomic.zone/recipe-maintainers/<recipe>` is **not** an operator-pending
|
||
blocker. The bot is admin on `recipe-maintainers` (see memory `cc-ci-gitea-recipes`) and can create
|
||
private mirror repos. **For any recipe you want to enroll/test that isn't mirrored yet, mirror it
|
||
yourself** before enrolling, based on the **`recipe-create-pr` skill** —
|
||
`/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md` (which references
|
||
`/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md` for the full procedure).
|
||
|
||
The flow (adapt the skill's command for the new-mirror case):
|
||
1. Create the **private** mirror repo on `git.autonomic.zone/recipe-maintainers/<recipe>` (Gitea API
|
||
POST `/orgs/recipe-maintainers/repos`, bot creds from `.testenv`/§1.5).
|
||
2. Mirror the upstream `git.coopcloud.tech/coop-cloud/<recipe>` (clone --mirror → push, including
|
||
tags) so the mirror's `main` is upstream-synced and tags carry over.
|
||
3. Then proceed with normal enrollment + the lifecycle suite (1d generic + your overlays from this
|
||
phase).
|
||
|
||
Treat this as standard loop work — don't sit idle waiting on the operator for missing recipes.
|
||
|
||
## 0a. Prerequisite: Phase 1e harness corrections (must be DONE first)
|
||
|
||
The 1d/1b operator review produced three shared-harness corrections, now their own phase
|
||
**`plan-phase1e-harness-corrections.md`** (runs **before** this phase). Do **not** author overlays
|
||
until 1e is `## DONE`: it changes the foundation every overlay sits on — (HC1) upgrade goes
|
||
prev-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved
|
||
recipes** (default cc-ci-overlays + generic only); (HC3) the **generic runs by default alongside an
|
||
overlay**, skipped only via an explicit opt-out. See that plan for detail.
|
||
|
||
## 0. Relationship to Phase 1 (read first)
|
||
|
||
Phase 1 built the **machine**: the Drone pipeline, the `!testme` trigger (polling-primary), the
|
||
`coop-cloud/traefik` proxy, the shared harness (`runner/run_recipe_ci.py` + `tests/conftest.py` +
|
||
`runner/harness/`), the three stages (**install / upgrade / backup-restore**), guaranteed teardown,
|
||
the `MAX_TESTS` concurrency cap, and the results dashboard — proven on ~six recipes (D10).
|
||
|
||
Phase 2 **fills the machine with good tests for every Co-op Cloud app we maintain.** It does **not**
|
||
re-architect the runner. It reuses Phase-1's harness, stages, trigger, resource caps, and teardown.
|
||
Everything here is `tests/<recipe>/...` content + small, shared harness *additions* (ported from
|
||
recipe-maintainer's helpers). When reality forces a harness change, record it in `DECISIONS.md`.
|
||
|
||
Do not start Phase 2 until Phase-1 `STATUS.md` shows `## DONE` (Adversary-verified). The same loop
|
||
protocol, single-writer file ownership, and gate handshake (`plan.md` §6.1/§7) apply here.
|
||
|
||
---
|
||
|
||
## 1. Mission
|
||
|
||
For **every maintained Co-op Cloud recipe**, the cc-ci repo's `tests/<recipe>/` tree must contain a
|
||
genuine end-to-end test suite that, run by `!testme` on a real PR, proves the app actually works —
|
||
**at parity with what `recipe-maintainer` already tests, plus recipe-specific functional depth.**
|
||
|
||
Concretely, for each recipe:
|
||
1. **Port every existing recipe-maintainer test** for that recipe (a comparable cc-ci test for each
|
||
`recipe-info/<recipe>/tests/*.py`).
|
||
2. **Add ≥2 new recipe-specific functional tests** that exercise something characteristic of that
|
||
app (not just "it returns 200") — see §4.3.
|
||
3. All of it runs inside Phase-1's three stages and passes green via `!testme`.
|
||
|
||
---
|
||
|
||
## 2. Definition of Done (Phase 2 exit condition)
|
||
|
||
The loop terminates only when every item holds **and the Adversary has independently re-verified
|
||
each within 24h** (logged in `REVIEW.md`):
|
||
|
||
- [ ] **P1 — Coverage.** Every recipe in the **Phase-2 recipe set (§5)** has a `tests/<recipe>/`
|
||
suite enrolled and a full green `!testme` run (install + upgrade + backup-restore).
|
||
- [ ] **P2 — Parity port.** For each recipe, **every** test under
|
||
`recipe-info/<recipe>/tests/*.py` has a comparable cc-ci test (same thing verified), adapted to
|
||
the cc-ci harness. A mapping table (recipe-maintainer test → cc-ci test) is recorded in
|
||
`tests/<recipe>/PARITY.md`. Any test deliberately not ported is a documented `DECISIONS.md`
|
||
finding with the reason (e.g. obsolete, replaced by a better check), never a silent omission.
|
||
- [ ] **P3 — Recipe-specific depth.** Each recipe has **≥2 new functional tests** beyond parity that
|
||
confirm characteristic behavior (§4.3), with real assertions on app state/responses — not
|
||
health-only.
|
||
- [ ] **P4 — Backup data-integrity is real.** The backup-restore stage for each recipe **seeds
|
||
identifiable data, mutates/wipes, restores, and asserts the seeded data survived** (recipe-aware,
|
||
not just "service is up") — ported from recipe-maintainer's backup approach.
|
||
- [ ] **P5 — Dependencies handled.** Recipes needing other apps (SSO providers, DBs) declare deps
|
||
(ported from `recipe.toml` `requires`/`test_requires`); the harness deploys deps within the run
|
||
(respecting `MAX_TESTS`/node budget) and SSO setup runs automatically (§4.2).
|
||
- [ ] **P6 — Browser flows where they matter (D3).** Recipes whose core function is a UI flow have a
|
||
Playwright test of that flow (login, create-an-object, etc.), not just API checks.
|
||
- [ ] **P7 — No weakened tests, no corners cut (§7.1).** Every assertion is real and checks app
|
||
state; nothing is `skip`/`xfail`'d, mocked, or reduced to a health-only stand-in to go green.
|
||
The bar: anything meaningful is testable with effort (OIDC/SSO, federation, media, WOPI, WebRTC
|
||
connectivity, data survival all included). Any "untestable" claim is the rare exception — a true
|
||
environment-level blocker only, with the maximal subset still implemented and **Adversary
|
||
sign-off** (§8); "needs a browser / SSO / another app" is not a valid excuse.
|
||
- [ ] **P8 — Docs.** `docs/enroll-recipe.md` updated with the per-recipe test contract (§4.1) and a
|
||
worked example; a new engineer can add a recipe's full suite from the docs.
|
||
|
||
When all P1–P8 hold and are Adversary-verified, write `## DONE` to Phase-2 `STATUS.md`.
|
||
|
||
---
|
||
|
||
## 3. Reference corpus (port from here)
|
||
|
||
`references/recipe-maintainer/` (`/srv/recipe-maintainer/`) is the source of truth for *what* to
|
||
test and *how* recipe-maintainer already does it. Key paths:
|
||
|
||
- **Per-recipe tests:** `recipe-info/<recipe>/tests/*.py` — the scripts to port (health_check,
|
||
oidc_login/oidc_integration, and recipe-specific ones like `meeting_flow.py`, `webrtc-media.py`,
|
||
`wopi_configured.py`, `goat_account.py`, `upload_conversion.py`).
|
||
- **Per-recipe metadata:** `recipe-info/<recipe>/recipe.toml` (deps + `[sso]` provider/setup_script),
|
||
`test.md` (target URLs, what each test checks, manual steps, network/health specifics), `setup.md`,
|
||
`upstream.md`, `setup_<provider>_integration.py` (SSO realm/client/test-user setup).
|
||
- **Shared utilities to port/adapt:** `utils/tests/helpers.py` — `http_get`/`http_post`,
|
||
`retry_http_get`, `assert_converges`, `wait_for_http`, `load_toml_credentials`, `abra(... tty_wrap)`,
|
||
`fresh_app`, `deploy_and_wait`. These map onto the Phase-1 harness fixtures.
|
||
- **Orchestration logic to mirror as stages:** `.claude/commands/recipe-test{,-all,-new,-update,-backup}.md`
|
||
(new-install / upgrade / backup-restore semantics — already Phase-1 stages; port the *assertions*).
|
||
- **Operational gotchas to bake in:** `learnings.md` — TTY-wrap (`script -qefc "abra …" /dev/null`)
|
||
for backup/restore/volume/secret/run/logs/lint; always `--chaos`; `backup-bot-two` must be present
|
||
for backup tests; prefer `docker service logs` over `abra app logs` (hangs non-interactively);
|
||
ghost-container volume-removal cleanup. Most are already in Phase-1 §4.3 — re-verify on the
|
||
installed abra.
|
||
|
||
**Adaptation note (important):** recipe-maintainer tests run against a *persistent* instance
|
||
(`cctest.autonomic.zone`) using `context_reset.py` to free memory. cc-ci runs **ephemeral per-PR
|
||
deploys**. So port the **assertions and setup logic**, but drive lifecycle through Phase-1's harness
|
||
(per-run isolated app `<recipe>-pr<n>-<sha>`, guaranteed teardown) — not the persistent-instance model.
|
||
|
||
---
|
||
|
||
## 4. The per-recipe test contract
|
||
|
||
### 4.1 Required structure (per recipe)
|
||
```
|
||
tests/<recipe>/
|
||
├── recipe.toml # ported deps + [sso] (from recipe-maintainer's recipe.toml)
|
||
├── PARITY.md # mapping: recipe-maintainer test -> cc-ci test (for P2)
|
||
├── test_install.py # Phase-1 install stage hook + recipe health/readiness
|
||
├── test_upgrade.py # Phase-1 upgrade stage hook (prev published -> PR version)
|
||
├── test_backup.py # Phase-1 backup stage: seed -> backup -> wipe -> restore -> assert data (P4)
|
||
├── functional/ # ported parity tests + NEW recipe-specific tests (P2, P3)
|
||
│ ├── health_check.py # ported
|
||
│ ├── <ported-tests>.py # one per recipe-maintainer test
|
||
│ └── <recipe>_<behavior>.py # NEW, >=2 recipe-specific functional tests
|
||
└── playwright/ # browser flows where the app's core UX is a UI (P6)
|
||
```
|
||
|
||
### 4.2 Shared harness additions (port from `utils/tests/helpers.py`)
|
||
Add to `runner/harness/` (reused across recipes), so per-recipe tests stay small:
|
||
- HTTP convergence helpers (`retry_http_get`, `assert_converges`, `wait_for_http`).
|
||
- **SSO setup + OIDC-flow helpers**: deploy the provider (keycloak/authentik), run the recipe's
|
||
`setup_<provider>_integration.py` (realm/client/test-user), persist credentials per-run, and a
|
||
reusable "full OIDC login → token → protected API call" assertion.
|
||
- **Dependency resolver**: read `tests/<recipe>/recipe.toml` `requires`/`test_requires`, deploy deps
|
||
before the recipe under test, tear them down with it. Mind the `MAX_TESTS`/node budget — a recipe +
|
||
its SSO provider is ≥2 live apps; sequence heavy ones.
|
||
- **Backup data-integrity helper**: seed a recipe-defined marker, snapshot, wipe volumes, restore,
|
||
re-read the marker (P4).
|
||
- TTY-wrapped abra wrappers + robust readiness waits (no bare `sleep`).
|
||
|
||
### 4.3 Recipe-specific functional tests (P3) — what "confirms how it works" means
|
||
Beyond health + parity, each recipe gets **≥2** tests exercising its characteristic behavior. Derive
|
||
them from the recipe's `test.md` and what the app *is for*. Representative targets:
|
||
|
||
- **keycloak** — create a realm+client via admin API; obtain a token (password & client-credentials
|
||
grants); validate JWT claims.
|
||
- **matrix-synapse** — register two users (admin API); one sends a room message, the other reads it;
|
||
media upload→download; `/_matrix/federation/v1/version` reachable.
|
||
- **immich** — upload an asset via API, list it back, confirm a thumbnail/derivative is generated.
|
||
- **lasuite-docs** — create a doc, edit via the API, confirm persistence; WOPI discovery (Collabora/
|
||
OnlyOffice) XML valid (port `wopi_configured`).
|
||
- **lasuite-drive** — upload a file to a workspace, list/download it; MinIO bucket present.
|
||
- **lasuite-meet** — create a room, two users get LiveKit tokens & join (port `meeting_flow`);
|
||
WebRTC connectivity probe (port `webrtc-media`).
|
||
- **cryptpad** — create a pad and confirm it persists (note client-side-encryption: page is
|
||
JS-rendered, so use Playwright, not bare curl — see recipe-maintainer's note).
|
||
- **bluesky-pds** — create a test account (goat CLI), create a post via atproto, fetch it back,
|
||
delete the account (port `goat_account`, extend with a post round-trip).
|
||
- **mumble** — connect a client/CLI and confirm channel presence (beyond TCP health).
|
||
- **n8n** — create a workflow via API, execute it, assert the result.
|
||
- **ghost / mattermost-lts / discourse / plausible / uptime-kuma / mailu** — create the app's primary
|
||
object (post / message / topic / tracked event / monitor / mailbox) and read it back.
|
||
- **custom-html / hedgedoc / gitea** — serve/persist content: write content, fetch it back.
|
||
|
||
(For any recipe, "≥2 specific tests" = at minimum: create-an-object + read-it-back, and one more that
|
||
touches a distinctive feature — SSO, federation, media, WOPI, WebRTC, etc.)
|
||
|
||
---
|
||
|
||
## 5. Phase-2 recipe set + enrollment
|
||
|
||
**Target set** (recipe-maintainer's maintained list — `maintained-recipes`): cryptpad, lasuite-drive,
|
||
lasuite-docs, lasuite-meet, matrix-synapse, keycloak, bluesky-pds, mumble, immich, mailu, custom-html,
|
||
mattermost-lts, n8n, ghost, drone, plausible, uptime-kuma, discourse (+ authentik as an SSO provider).
|
||
|
||
Several already live on the mirror (`recipe-maintainers/<recipe>`); the rest are brought over via the
|
||
**Phase-1 recipe mirror+PR flow** (`abra recipe fetch` → recipe-create-pr procedure;
|
||
see Phase-1 `plan.md` §4.1). Enroll **easy → hard**, dependency-tiered like recipe-maintainer
|
||
(`recipe.toml` tiers): SSO providers (keycloak, authentik) and DBs first, then their dependents.
|
||
|
||
Order guidance: prove the Phase-2 pattern on a simple recipe (custom-html) and a DB recipe (n8n),
|
||
then keycloak/authentik (providers), then the SSO-dependent suite (lasuite-*, immich, cryptpad), then
|
||
the heavy/standalone ones (matrix-synapse, mumble, bluesky-pds, ghost, discourse, mailu, mattermost,
|
||
plausible, uptime-kuma, immich). Respect `MAX_TESTS` and run heavy deploys sequentially.
|
||
|
||
---
|
||
|
||
## 6. Milestones (each ends with an Adversary gate)
|
||
|
||
- **Q0 — Harness additions.** Port `helpers.py` capabilities into `runner/harness/` (HTTP/convergence,
|
||
OIDC-flow, dependency resolver, backup data-integrity, TTY abra). *Accept:* a reference recipe
|
||
(custom-html) uses them for a full parity+specific suite, green via `!testme`.
|
||
- **Q1 — Pattern proof (2 recipes).** custom-html (simple) + n8n (single-DB): full parity port +
|
||
≥2 specific tests + real backup data-integrity. *Accept:* both green; `PARITY.md` complete.
|
||
- **Q2 — SSO providers.** keycloak + authentik: parity + specific tests; the reusable SSO-setup/
|
||
OIDC-flow harness works end-to-end. *Accept:* a dependent recipe can deploy a provider and run an
|
||
OIDC login test in one run.
|
||
- **Q3 — SSO-dependent suite.** lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich: deps
|
||
auto-deployed, SSO setup automated, parity + specific (WOPI, meeting/WebRTC, media, pad). *Accept:*
|
||
each green with deps, within node budget.
|
||
- **Q4 — Remaining recipes.** matrix-synapse, mumble, bluesky-pds, ghost, mattermost-lts, discourse,
|
||
plausible, uptime-kuma, mailu, drone. *Accept:* each green, parity + specific.
|
||
- **Q5 — Completeness + docs.** Every recipe in §5 covered; `PARITY.md` for all; `enroll-recipe.md`
|
||
documents the test contract with a worked example. *Accept:* Adversary re-runs a sampled subset
|
||
cold and confirms parity tables + specific tests are real (not health-only, not skipped); flip
|
||
Phase-2 `STATUS.md` to `## DONE`.
|
||
|
||
---
|
||
|
||
## 7. Loop protocol & guardrails (inherit from Phase 1)
|
||
|
||
Same as `plan.md` §6/§6.1/§7/§9. Phase-2-specific emphases:
|
||
- **Never weaken a test to go green** (P7) — the single most important rule; the Adversary watches
|
||
for health-only stand-ins, `skip`/`xfail`, or assertions that don't actually check app state.
|
||
- **Real data-integrity for backups** (P4) — "service is up after restore" is *not* sufficient; the
|
||
seeded data must be proven to survive.
|
||
|
||
### 7.1 Adversary mandate (Phase 2) — no skipped tests, no corners cut
|
||
The default assumption is that **everything meaningful about an app is testable with enough effort —
|
||
the job is to write a *good* test, not to declare it impossible.** OIDC/SSO login, token issuance and
|
||
JWT validation, federation endpoints, media upload/download, WOPI discovery, WebRTC ICE/connectivity,
|
||
backup data survival — these are all testable end-to-end and **must** be tested, not stubbed. The
|
||
Adversary actively enforces this and **reads the test bodies, not just pass/fail**:
|
||
- **Reject** any test that is `skip`/`xfail`/commented-out, mocked, mutated to a `health_check`
|
||
stand-in, asserts nothing material (e.g. only `status==200`), or is hard-coded to pass.
|
||
- **Reject "we couldn't test X"** unless it is a genuine *environment-level* limitation (e.g. the
|
||
test host cannot receive inbound UDP, so the full lasuite-meet *media relay* path can't complete) —
|
||
and even then demand the **maximal testable subset** (e.g. signaling, token issuance, ICE candidate
|
||
gathering) plus a `DECISIONS.md` justification with the specific technical blocker. "It's hard",
|
||
"needs a browser", "needs SSO setup", "needs another app deployed" are **not** valid reasons —
|
||
Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely to remove
|
||
those excuses.
|
||
- **Verify parity for real (P2):** for each `PARITY.md` row, confirm the cc-ci test checks the *same
|
||
thing* the recipe-maintainer original did — not a hollow rename.
|
||
- **Re-run cold and inspect:** the Adversary re-runs a sampled recipe's suite from a clean state and
|
||
reads the diffs/assertions; a green run with empty assertions is a FAIL and a `[adversary]` finding.
|
||
- **Respect Phase-1 resource caps** — deps multiply live apps per run; keep within `MAX_TESTS`/node
|
||
budget; sequence heavy recipes; teardown (incl. deps) is guaranteed.
|
||
- **Tests are recipe-versioned** — they run against the PR's recipe version; don't hardcode values
|
||
that break across upstream versions (read versions/endpoints dynamically, as helpers.py does).
|
||
- **Cite the source** — each ported test notes its `recipe-info/<recipe>/tests/<file>` origin in
|
||
`PARITY.md` so parity is auditable.
|
||
|
||
---
|
||
|
||
## 8. Open decisions (log in DECISIONS.md)
|
||
- Whether to **vendor** the ported helpers into `runner/harness/` vs import recipe-maintainer's
|
||
`helpers.py` directly. (Default: vendor/adapt — cc-ci must be self-contained and not depend on the
|
||
recipe-maintainer workspace at runtime.)
|
||
- Per-recipe **secret/SSO credential** persistence within a run (reuse Phase-1 §4.4-B run-scoped
|
||
store; SSO test users/clients are class-B, generated per run, destroyed at teardown).
|
||
- How many recipe-specific tests beyond the **≥2** floor per recipe (scale with the app's surface;
|
||
don't gold-plate trivial recipes).
|
||
- A test deemed "impossible" is the **rare exception**, not a convenient out (§7.1). It is only
|
||
acceptable for a true environment-level blocker (e.g. no inbound UDP for lasuite-meet's *media
|
||
relay*), requires the **maximal testable subset** still implemented, a specific technical reason in
|
||
`DECISIONS.md`, **and Adversary sign-off**. SSO/OIDC, browser flows, multi-app dependencies, and
|
||
data-integrity are explicitly **not** exceptions — they are testable and required.
|