Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
20 KiB
cc-ci Phase 2 — Comprehensive per-recipe test authoring (Autonomous Build Plan)
Status: QUEUED — starts after Phase 1b (plan-phase1b-review-lint.md) and Phase 1d
(plan-phase1d-generic-test-suite.md) reach ## DONE.
Builds on: the Phase-1 cc-ci CI server (plan.md) and Phase 1d's generic test suite — every
recipe already gets generic install/upgrade/backup/restore for free, reusing one shared deployment.
So this phase is not "write every test from scratch": it's authoring the recipe overlays
(test_<op>.py that override/extend the generic, per 1d's model) + defining custom install steps
for recipes that fail the generic form, porting the recipe-maintainer corpus as overlays. This phase
adds test content, not infra.
Reference corpus: references/recipe-maintainer/ → /srv/recipe-maintainer/ (the existing,
human-maintained recipe tests — the canonical source to port from).
Owner agents: same Builder + Adversary loops + coordination protocol as Phase 1 (plan.md §6/§7).
This file's path: /srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md
0c. OIDC / SSO-dep recipes — follow the SSO-dep plan (NOT operator-pending)
Recipes that authenticate via OIDC (lasuite-docs, cryptpad, lasuite-drive, lasuite-meet, future)
do not need operator input to wire OIDC. The canonical pattern lives at
plan-sso-dep-testing.md — declared DEPS = ["keycloak"] in recipe_meta.py, orchestrator
co-deploys the dep, recipe install_steps.sh reads $CCCI_DEPS_FILE and writes the OIDC env vars
- injects the client secret via abra. Authenticated tests use
harness.sso.oidc_password_grantor Playwright on the dep's login page.DEFERRED.mdentries that cite "operator input needed for OIDC" are mis-categorised — re-open and execute them per this plan.
0b. Auto-mirror missing recipes (NOT a blocker — autonomous loop work)
A recipe is not on git.autonomic.zone/recipe-maintainers/<recipe> is not an operator-pending
blocker. The bot is admin on recipe-maintainers (see memory cc-ci-gitea-recipes) and can create
private mirror repos. For any recipe you want to enroll/test that isn't mirrored yet, mirror it
yourself before enrolling, based on the recipe-create-pr skill —
/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md (which references
/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md for the full procedure).
The flow (adapt the skill's command for the new-mirror case):
- Create the private mirror repo on
git.autonomic.zone/recipe-maintainers/<recipe>(Gitea API POST/orgs/recipe-maintainers/repos, bot creds from.testenv/§1.5). - Mirror the upstream
git.coopcloud.tech/coop-cloud/<recipe>(clone --mirror → push, including tags) so the mirror'smainis upstream-synced and tags carry over. - Then proceed with normal enrollment + the lifecycle suite (1d generic + your overlays from this phase).
Treat this as standard loop work — don't sit idle waiting on the operator for missing recipes.
0a. Prerequisite: Phase 1e harness corrections (must be DONE first)
The 1d/1b operator review produced three shared-harness corrections, now their own phase
plan-phase1e-harness-corrections.md (runs before this phase). Do not author overlays
until 1e is ## DONE: it changes the foundation every overlay sits on — (HC1) upgrade goes
prev-release → PR head via deploy --chaos; (HC2) repo-local PR code runs only for approved
recipes (default cc-ci-overlays + generic only); (HC3) the generic runs by default alongside an
overlay, skipped only via an explicit opt-out. See that plan for detail.
0. Relationship to Phase 1 (read first)
Phase 1 built the machine: the Drone pipeline, the !testme trigger (polling-primary), the
coop-cloud/traefik proxy, the shared harness (runner/run_recipe_ci.py + tests/conftest.py +
runner/harness/), the three stages (install / upgrade / backup-restore), guaranteed teardown,
the MAX_TESTS concurrency cap, and the results dashboard — proven on ~six recipes (D10).
Phase 2 fills the machine with good tests for every Co-op Cloud app we maintain. It does not
re-architect the runner. It reuses Phase-1's harness, stages, trigger, resource caps, and teardown.
Everything here is tests/<recipe>/... content + small, shared harness additions (ported from
recipe-maintainer's helpers). When reality forces a harness change, record it in DECISIONS.md.
Do not start Phase 2 until Phase-1 STATUS.md shows ## DONE (Adversary-verified). The same loop
protocol, single-writer file ownership, and gate handshake (plan.md §6.1/§7) apply here.
1. Mission
For every maintained Co-op Cloud recipe, the cc-ci repo's tests/<recipe>/ tree must contain a
genuine end-to-end test suite that, run by !testme on a real PR, proves the app actually works —
at parity with what recipe-maintainer already tests, plus recipe-specific functional depth.
Concretely, for each recipe:
- Port every existing recipe-maintainer test for that recipe (a comparable cc-ci test for each
recipe-info/<recipe>/tests/*.py). - Add ≥2 new recipe-specific functional tests that exercise something characteristic of that app (not just "it returns 200") — see §4.3.
- All of it runs inside Phase-1's three stages and passes green via
!testme.
2. Definition of Done (Phase 2 exit condition)
The loop terminates only when every item holds and the Adversary has independently re-verified
each within 24h (logged in REVIEW.md):
- P1 — Coverage. Every recipe in the Phase-2 recipe set (§5) has a
tests/<recipe>/suite enrolled and a full green!testmerun (install + upgrade + backup-restore). - P2 — Parity port. For each recipe, every test under
recipe-info/<recipe>/tests/*.pyhas a comparable cc-ci test (same thing verified), adapted to the cc-ci harness. A mapping table (recipe-maintainer test → cc-ci test) is recorded intests/<recipe>/PARITY.md. Any test deliberately not ported is a documentedDECISIONS.mdfinding with the reason (e.g. obsolete, replaced by a better check), never a silent omission. - P3 — Recipe-specific depth. Each recipe has ≥2 new functional tests beyond parity that confirm characteristic behavior (§4.3), with real assertions on app state/responses — not health-only.
- P4 — Backup data-integrity is real. The backup-restore stage for each recipe seeds identifiable data, mutates/wipes, restores, and asserts the seeded data survived (recipe-aware, not just "service is up") — ported from recipe-maintainer's backup approach.
- P5 — Dependencies handled. Recipes needing other apps (SSO providers, DBs) declare deps
(ported from
recipe.tomlrequires/test_requires); the harness deploys deps within the run (respectingMAX_TESTS/node budget) and SSO setup runs automatically (§4.2). - P6 — Browser flows where they matter (D3). Recipes whose core function is a UI flow have a Playwright test of that flow (login, create-an-object, etc.), not just API checks.
- P7 — No weakened tests, no corners cut (§7.1). Every assertion is real and checks app
state; nothing is
skip/xfail'd, mocked, or reduced to a health-only stand-in to go green. The bar: anything meaningful is testable with effort (OIDC/SSO, federation, media, WOPI, WebRTC connectivity, data survival all included). Any "untestable" claim is the rare exception — a true environment-level blocker only, with the maximal subset still implemented and Adversary sign-off (§8); "needs a browser / SSO / another app" is not a valid excuse. - P8 — Docs.
docs/enroll-recipe.mdupdated with the per-recipe test contract (§4.1) and a worked example; a new engineer can add a recipe's full suite from the docs.
When all P1–P8 hold and are Adversary-verified, write ## DONE to Phase-2 STATUS.md.
3. Reference corpus (port from here)
references/recipe-maintainer/ (/srv/recipe-maintainer/) is the source of truth for what to
test and how recipe-maintainer already does it. Key paths:
- Per-recipe tests:
recipe-info/<recipe>/tests/*.py— the scripts to port (health_check, oidc_login/oidc_integration, and recipe-specific ones likemeeting_flow.py,webrtc-media.py,wopi_configured.py,goat_account.py,upload_conversion.py). - Per-recipe metadata:
recipe-info/<recipe>/recipe.toml(deps +[sso]provider/setup_script),test.md(target URLs, what each test checks, manual steps, network/health specifics),setup.md,upstream.md,setup_<provider>_integration.py(SSO realm/client/test-user setup). - Shared utilities to port/adapt:
utils/tests/helpers.py—http_get/http_post,retry_http_get,assert_converges,wait_for_http,load_toml_credentials,abra(... tty_wrap),fresh_app,deploy_and_wait. These map onto the Phase-1 harness fixtures. - Orchestration logic to mirror as stages:
.claude/commands/recipe-test{,-all,-new,-update,-backup}.md(new-install / upgrade / backup-restore semantics — already Phase-1 stages; port the assertions). - Operational gotchas to bake in:
learnings.md— TTY-wrap (script -qefc "abra …" /dev/null) for backup/restore/volume/secret/run/logs/lint; always--chaos;backup-bot-twomust be present for backup tests; preferdocker service logsoverabra app logs(hangs non-interactively); ghost-container volume-removal cleanup. Most are already in Phase-1 §4.3 — re-verify on the installed abra.
Adaptation note (important): recipe-maintainer tests run against a persistent instance
(cctest.autonomic.zone) using context_reset.py to free memory. cc-ci runs ephemeral per-PR
deploys. So port the assertions and setup logic, but drive lifecycle through Phase-1's harness
(per-run isolated app <recipe>-pr<n>-<sha>, guaranteed teardown) — not the persistent-instance model.
4. The per-recipe test contract
4.1 Required structure (per recipe)
tests/<recipe>/
├── recipe.toml # ported deps + [sso] (from recipe-maintainer's recipe.toml)
├── PARITY.md # mapping: recipe-maintainer test -> cc-ci test (for P2)
├── test_install.py # Phase-1 install stage hook + recipe health/readiness
├── test_upgrade.py # Phase-1 upgrade stage hook (prev published -> PR version)
├── test_backup.py # Phase-1 backup stage: seed -> backup -> wipe -> restore -> assert data (P4)
├── functional/ # ported parity tests + NEW recipe-specific tests (P2, P3)
│ ├── health_check.py # ported
│ ├── <ported-tests>.py # one per recipe-maintainer test
│ └── <recipe>_<behavior>.py # NEW, >=2 recipe-specific functional tests
└── playwright/ # browser flows where the app's core UX is a UI (P6)
4.2 Shared harness additions (port from utils/tests/helpers.py)
Add to runner/harness/ (reused across recipes), so per-recipe tests stay small:
- HTTP convergence helpers (
retry_http_get,assert_converges,wait_for_http). - SSO setup + OIDC-flow helpers: deploy the provider (keycloak/authentik), run the recipe's
setup_<provider>_integration.py(realm/client/test-user), persist credentials per-run, and a reusable "full OIDC login → token → protected API call" assertion. - Dependency resolver: read
tests/<recipe>/recipe.tomlrequires/test_requires, deploy deps before the recipe under test, tear them down with it. Mind theMAX_TESTS/node budget — a recipe + its SSO provider is ≥2 live apps; sequence heavy ones. - Backup data-integrity helper: seed a recipe-defined marker, snapshot, wipe volumes, restore, re-read the marker (P4).
- TTY-wrapped abra wrappers + robust readiness waits (no bare
sleep).
4.3 Recipe-specific functional tests (P3) — what "confirms how it works" means
Beyond health + parity, each recipe gets ≥2 tests exercising its characteristic behavior. Derive
them from the recipe's test.md and what the app is for. Representative targets:
- keycloak — create a realm+client via admin API; obtain a token (password & client-credentials grants); validate JWT claims.
- matrix-synapse — register two users (admin API); one sends a room message, the other reads it;
media upload→download;
/_matrix/federation/v1/versionreachable. - immich — upload an asset via API, list it back, confirm a thumbnail/derivative is generated.
- lasuite-docs — create a doc, edit via the API, confirm persistence; WOPI discovery (Collabora/
OnlyOffice) XML valid (port
wopi_configured). - lasuite-drive — upload a file to a workspace, list/download it; MinIO bucket present.
- lasuite-meet — create a room, two users get LiveKit tokens & join (port
meeting_flow); WebRTC connectivity probe (portwebrtc-media). - cryptpad — create a pad and confirm it persists (note client-side-encryption: page is JS-rendered, so use Playwright, not bare curl — see recipe-maintainer's note).
- bluesky-pds — create a test account (goat CLI), create a post via atproto, fetch it back,
delete the account (port
goat_account, extend with a post round-trip). - mumble — connect a client/CLI and confirm channel presence (beyond TCP health).
- n8n — create a workflow via API, execute it, assert the result.
- ghost / mattermost-lts / discourse / plausible / uptime-kuma / mailu — create the app's primary object (post / message / topic / tracked event / monitor / mailbox) and read it back.
- custom-html / hedgedoc / gitea — serve/persist content: write content, fetch it back.
(For any recipe, "≥2 specific tests" = at minimum: create-an-object + read-it-back, and one more that touches a distinctive feature — SSO, federation, media, WOPI, WebRTC, etc.)
5. Phase-2 recipe set + enrollment
Target set (recipe-maintainer's maintained list — maintained-recipes): cryptpad, lasuite-drive,
lasuite-docs, lasuite-meet, matrix-synapse, keycloak, bluesky-pds, mumble, immich, mailu, custom-html,
mattermost-lts, n8n, ghost, drone, plausible, uptime-kuma, discourse (+ authentik as an SSO provider).
Several already live on the mirror (recipe-maintainers/<recipe>); the rest are brought over via the
Phase-1 recipe mirror+PR flow (abra recipe fetch → recipe-create-pr procedure;
see Phase-1 plan.md §4.1). Enroll easy → hard, dependency-tiered like recipe-maintainer
(recipe.toml tiers): SSO providers (keycloak, authentik) and DBs first, then their dependents.
Order guidance: prove the Phase-2 pattern on a simple recipe (custom-html) and a DB recipe (n8n),
then keycloak/authentik (providers), then the SSO-dependent suite (lasuite-*, immich, cryptpad), then
the heavy/standalone ones (matrix-synapse, mumble, bluesky-pds, ghost, discourse, mailu, mattermost,
plausible, uptime-kuma, immich). Respect MAX_TESTS and run heavy deploys sequentially.
6. Milestones (each ends with an Adversary gate)
- Q0 — Harness additions. Port
helpers.pycapabilities intorunner/harness/(HTTP/convergence, OIDC-flow, dependency resolver, backup data-integrity, TTY abra). Accept: a reference recipe (custom-html) uses them for a full parity+specific suite, green via!testme. - Q1 — Pattern proof (2 recipes). custom-html (simple) + n8n (single-DB): full parity port +
≥2 specific tests + real backup data-integrity. Accept: both green;
PARITY.mdcomplete. - Q2 — SSO providers. keycloak + authentik: parity + specific tests; the reusable SSO-setup/ OIDC-flow harness works end-to-end. Accept: a dependent recipe can deploy a provider and run an OIDC login test in one run.
- Q3 — SSO-dependent suite. lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich: deps auto-deployed, SSO setup automated, parity + specific (WOPI, meeting/WebRTC, media, pad). Accept: each green with deps, within node budget.
- Q4 — Remaining recipes. matrix-synapse, mumble, bluesky-pds, ghost, mattermost-lts, discourse, plausible, uptime-kuma, mailu, drone. Accept: each green, parity + specific.
- Q5 — Completeness + docs. Every recipe in §5 covered;
PARITY.mdfor all;enroll-recipe.mddocuments the test contract with a worked example. Accept: Adversary re-runs a sampled subset cold and confirms parity tables + specific tests are real (not health-only, not skipped); flip Phase-2STATUS.mdto## DONE.
7. Loop protocol & guardrails (inherit from Phase 1)
Same as plan.md §6/§6.1/§7/§9. Phase-2-specific emphases:
- Never weaken a test to go green (P7) — the single most important rule; the Adversary watches
for health-only stand-ins,
skip/xfail, or assertions that don't actually check app state. - Real data-integrity for backups (P4) — "service is up after restore" is not sufficient; the seeded data must be proven to survive.
7.1 Adversary mandate (Phase 2) — no skipped tests, no corners cut
The default assumption is that everything meaningful about an app is testable with enough effort — the job is to write a good test, not to declare it impossible. OIDC/SSO login, token issuance and JWT validation, federation endpoints, media upload/download, WOPI discovery, WebRTC ICE/connectivity, backup data survival — these are all testable end-to-end and must be tested, not stubbed. The Adversary actively enforces this and reads the test bodies, not just pass/fail:
- Reject any test that is
skip/xfail/commented-out, mocked, mutated to ahealth_checkstand-in, asserts nothing material (e.g. onlystatus==200), or is hard-coded to pass. - Reject "we couldn't test X" unless it is a genuine environment-level limitation (e.g. the
test host cannot receive inbound UDP, so the full lasuite-meet media relay path can't complete) —
and even then demand the maximal testable subset (e.g. signaling, token issuance, ICE candidate
gathering) plus a
DECISIONS.mdjustification with the specific technical blocker. "It's hard", "needs a browser", "needs SSO setup", "needs another app deployed" are not valid reasons — Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely to remove those excuses. - Verify parity for real (P2): for each
PARITY.mdrow, confirm the cc-ci test checks the same thing the recipe-maintainer original did — not a hollow rename. - Re-run cold and inspect: the Adversary re-runs a sampled recipe's suite from a clean state and
reads the diffs/assertions; a green run with empty assertions is a FAIL and a
[adversary]finding. - Respect Phase-1 resource caps — deps multiply live apps per run; keep within
MAX_TESTS/node budget; sequence heavy recipes; teardown (incl. deps) is guaranteed. - Tests are recipe-versioned — they run against the PR's recipe version; don't hardcode values that break across upstream versions (read versions/endpoints dynamically, as helpers.py does).
- Cite the source — each ported test notes its
recipe-info/<recipe>/tests/<file>origin inPARITY.mdso parity is auditable.
8. Open decisions (log in DECISIONS.md)
- Whether to vendor the ported helpers into
runner/harness/vs import recipe-maintainer'shelpers.pydirectly. (Default: vendor/adapt — cc-ci must be self-contained and not depend on the recipe-maintainer workspace at runtime.) - Per-recipe secret/SSO credential persistence within a run (reuse Phase-1 §4.4-B run-scoped store; SSO test users/clients are class-B, generated per run, destroyed at teardown).
- How many recipe-specific tests beyond the ≥2 floor per recipe (scale with the app's surface; don't gold-plate trivial recipes).
- A test deemed "impossible" is the rare exception, not a convenient out (§7.1). It is only
acceptable for a true environment-level blocker (e.g. no inbound UDP for lasuite-meet's media
relay), requires the maximal testable subset still implemented, a specific technical reason in
DECISIONS.md, and Adversary sign-off. SSO/OIDC, browser flows, multi-app dependencies, and data-integrity are explicitly not exceptions — they are testable and required.