architecture.md: components, the !testme flow, network/TLS, resource safety, enrollment. runbook.md: where to look, common failure modes (timeout/rate-limit/auth/skip/health/data), orphan cleanup, re-trigger, cancel. Completes the D9 doc set (README+install+enroll+secrets+arch+runbook). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.4 KiB
Architecture
cc-ci turns a !testme PR comment into a real end-to-end deploy + test of a Co-op Cloud recipe and
reports the result back. Everything on the cc-ci host is declared in this repo's NixOS flake.
Components
| Component | Where | Role |
|---|---|---|
| comment-bridge | bridge/bridge.py, modules/bridge.nix (swarm svc, ci.commoninternet.net/hook) |
Polls enrolled repos for !testme (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. |
| Drone server | modules/drone.nix — coop-cloud drone recipe via abra (drone.ci.commoninternet.net, Gitea SSO) |
CI engine. Holds the recipe-ci (custom-event) and self-test (push) pipelines (.drone.yml). |
| Drone exec runner | modules/drone-runner.nix — host systemd service |
Runs pipeline steps on the host so they can drive abra/Docker. DRONE_RUNNER_CAPACITY=1 (MAX_TESTS) caps concurrent builds; the rest queue natively. |
| harness | runner/run_recipe_ci.py + runner/harness/ + tests/ |
Orchestrates per run: fetch recipe at the PR head → install → upgrade → backup/restore → recipe-local (D4) → guaranteed teardown. pytest + Playwright via the Nix cc-ci-run env. |
| swarm + traefik | modules/swarm.nix, modules/proxy.nix — coop-cloud traefik recipe via abra |
Single-node Docker Swarm + proxy overlay; traefik terminates TLS with the pre-issued wildcard cert (file provider, no ACME). The real deploy target for recipes-under-test. |
| backup-bot-two | modules/backupbot.nix |
restic-based volume/DB backups; abra app backup/restore drive it. |
| dashboard | dashboard/dashboard.py, modules/dashboard.nix (ci.commoninternet.net) |
YunoHost-CI-like overview: latest run per recipe + status badges + run links; /badge/<recipe>.svg. |
| secrets | modules/secrets.nix + secrets/secrets.yaml (sops-nix) |
Infra secrets, decrypted at activation via the host SSH key as the age identity. See secrets.md. |
All swarm infra (traefik, drone, bridge, dashboard, backupbot) is brought up by idempotent-reconcile
systemd oneshots that converge on every activation/boot (no run-once sentinels) — so a from-scratch
install is git clone + nixos-rebuild switch + the operator preconditions (install.md).
The !testme flow
PR comment "!testme"
│ (poll ≤30s, read-only; or optional admin webhook → /hook, HMAC-verified)
▼ comment-bridge: exact-match "!testme"? · commenter ∈ recipe-maintainers org? · resolve PR head
▼ Drone API: create build (event=custom, params RECIPE/REF/PR/SRC)
▼ recipe-ci pipeline (exec runner, on host): cc-ci-run runner/run_recipe_ci.py
│ fetch recipe@PR-head (mirror clone + upstream version tags) → install → upgrade → backup
│ → recipe-local (D4) → ALWAYS teardown (undeploy+volumes+secrets, verified)
▼ bridge watcher polls the build → edits the PR comment to ✅ passed / ❌ <status>
▼ dashboard reflects latest-per-recipe status + badges
Network & TLS (see install.md §domain)
*.ci.commoninternet.net (and bare ci.commoninternet.net) resolve to an operator gateway that
TLS-passthroughs by SNI to cc-ci. cc-ci's traefik terminates TLS with the pre-issued wildcard
cert at /var/lib/ci-certs/live/ (no ACME, no DNS token on the box). Each run gets a unique short
subdomain <recipe[:4]>-<6hex>.ci.commoninternet.net (covered by the wildcard) so concurrent/serial
runs never collide; it's torn down at run end.
Resource safety (§4.2/§4.3)
- MAX_TESTS=1 (runner capacity) → at most one test app live; Drone queues the rest.
- Per-build timeout 60m (Drone repo timeout) → a hung build is killed, freeing the slot.
- Guaranteed teardown (
try/finally) + a run-start janitor that reaps orphaned*--scheme apps (backstop for a SIGKILL'd build).CCCI_JANITOR_MAX_AGE=0in the recipe-ci pipeline (safe at capacity=1). - Heavy recipes pull many images; keep registry creds configured + adequate disk (see
runbook.md).
Enrolling a recipe (D5, see enroll-recipe.md)
Add tests/<recipe>/ (recipe_meta.py + test_install/upgrade/backup.py) + the repo to the bridge
POLL_REPOS. Per-recipe quirks go in recipe_meta.py (HEALTH_PATH/timeouts, EXTRA_ENV for e.g.
cryptpad's SANDBOX_DOMAIN or lasuite's TIMEOUT) — no shared-harness edits.