All checks were successful
continuous-integration/drone/push Build is passing
architecture.md: components, the !testme flow, network/TLS, resource safety, enrollment. runbook.md: where to look, common failure modes (timeout/rate-limit/auth/skip/health/data), orphan cleanup, re-trigger, cancel. Completes the D9 doc set (README+install+enroll+secrets+arch+runbook). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
59 lines
4.4 KiB
Markdown
59 lines
4.4 KiB
Markdown
# Architecture
|
|
|
|
cc-ci turns a `!testme` PR comment into a real end-to-end deploy + test of a Co-op Cloud recipe and
|
|
reports the result back. Everything on the `cc-ci` host is declared in this repo's NixOS flake.
|
|
|
|
## Components
|
|
|
|
| Component | Where | Role |
|
|
|---|---|---|
|
|
| **comment-bridge** | `bridge/bridge.py`, `modules/bridge.nix` (swarm svc, `ci.commoninternet.net/hook`) | Polls enrolled repos for `!testme` (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. |
|
|
| **Drone server** | `modules/drone.nix` — coop-cloud `drone` recipe via abra (`drone.ci.commoninternet.net`, Gitea SSO) | CI engine. Holds the `recipe-ci` (custom-event) and `self-test` (push) pipelines (`.drone.yml`). |
|
|
| **Drone exec runner** | `modules/drone-runner.nix` — host systemd service | Runs pipeline steps **on the host** so they can drive `abra`/Docker. `DRONE_RUNNER_CAPACITY=1` (MAX_TESTS) caps concurrent builds; the rest queue natively. |
|
|
| **harness** | `runner/run_recipe_ci.py` + `runner/harness/` + `tests/` | Orchestrates per run: fetch recipe at the PR head → install → upgrade → backup/restore → recipe-local (D4) → guaranteed teardown. pytest + Playwright via the Nix `cc-ci-run` env. |
|
|
| **swarm + traefik** | `modules/swarm.nix`, `modules/proxy.nix` — coop-cloud `traefik` recipe via abra | Single-node Docker Swarm + `proxy` overlay; traefik terminates TLS with the pre-issued wildcard cert (file provider, **no ACME**). The real deploy target for recipes-under-test. |
|
|
| **backup-bot-two** | `modules/backupbot.nix` | restic-based volume/DB backups; `abra app backup/restore` drive it. |
|
|
| **dashboard** | `dashboard/dashboard.py`, `modules/dashboard.nix` (`ci.commoninternet.net`) | YunoHost-CI-like overview: latest run per recipe + status badges + run links; `/badge/<recipe>.svg`. |
|
|
| **secrets** | `modules/secrets.nix` + `secrets/secrets.yaml` (sops-nix) | Infra secrets, decrypted at activation via the host SSH key as the age identity. See `secrets.md`. |
|
|
|
|
All swarm infra (traefik, drone, bridge, dashboard, backupbot) is brought up by **idempotent-reconcile
|
|
systemd oneshots** that converge on every activation/boot (no run-once sentinels) — so a from-scratch
|
|
install is `git clone` + `nixos-rebuild switch` + the operator preconditions (`install.md`).
|
|
|
|
## The `!testme` flow
|
|
|
|
```
|
|
PR comment "!testme"
|
|
│ (poll ≤30s, read-only; or optional admin webhook → /hook, HMAC-verified)
|
|
▼ comment-bridge: exact-match "!testme"? · commenter ∈ recipe-maintainers org? · resolve PR head
|
|
▼ Drone API: create build (event=custom, params RECIPE/REF/PR/SRC)
|
|
▼ recipe-ci pipeline (exec runner, on host): cc-ci-run runner/run_recipe_ci.py
|
|
│ fetch recipe@PR-head (mirror clone + upstream version tags) → install → upgrade → backup
|
|
│ → recipe-local (D4) → ALWAYS teardown (undeploy+volumes+secrets, verified)
|
|
▼ bridge watcher polls the build → edits the PR comment to ✅ passed / ❌ <status>
|
|
▼ dashboard reflects latest-per-recipe status + badges
|
|
```
|
|
|
|
## Network & TLS (see install.md §domain)
|
|
|
|
`*.ci.commoninternet.net` (and bare `ci.commoninternet.net`) resolve to an operator **gateway** that
|
|
**TLS-passthroughs** by SNI to cc-ci. cc-ci's traefik terminates TLS with the **pre-issued wildcard
|
|
cert** at `/var/lib/ci-certs/live/` (no ACME, no DNS token on the box). Each run gets a unique short
|
|
subdomain `<recipe[:4]>-<6hex>.ci.commoninternet.net` (covered by the wildcard) so concurrent/serial
|
|
runs never collide; it's torn down at run end.
|
|
|
|
## Resource safety (§4.2/§4.3)
|
|
|
|
- **MAX_TESTS=1** (runner capacity) → at most one test app live; Drone queues the rest.
|
|
- **Per-build timeout 60m** (Drone repo timeout) → a hung build is killed, freeing the slot.
|
|
- **Guaranteed teardown** (`try/finally`) + a **run-start janitor** that reaps orphaned `*-`-scheme
|
|
apps (backstop for a SIGKILL'd build). `CCCI_JANITOR_MAX_AGE=0` in the recipe-ci pipeline (safe at
|
|
capacity=1).
|
|
- Heavy recipes pull many images; keep registry creds configured + adequate disk (see `runbook.md`).
|
|
|
|
## Enrolling a recipe (D5, see enroll-recipe.md)
|
|
|
|
Add `tests/<recipe>/` (recipe_meta.py + test_install/upgrade/backup.py) + the repo to the bridge
|
|
`POLL_REPOS`. Per-recipe quirks go in `recipe_meta.py` (HEALTH_PATH/timeouts, `EXTRA_ENV` for e.g.
|
|
cryptpad's SANDBOX_DOMAIN or lasuite's TIMEOUT) — **no shared-harness edits**.
|