# Architecture cc-ci turns a `!testme` PR comment into a real end-to-end deploy + test of a Co-op Cloud recipe and reports the result back. Everything on the `cc-ci` host is declared in this repo's NixOS flake. ## Repo layout All Nix code lives under **`nix/`** — `nix/hosts/cc-ci/` (the machine config) and `nix/modules/` (the service modules). `flake.nix` / `flake.lock` stay at the **repo root** as the entry point, so the build reference is unchanged (`nixos-rebuild switch --flake '…#cc-ci'`). Application source sits at the root (`bridge/`, `dashboard/`, `runner/`, `tests/`); encrypted secrets are the `secrets/` submodule. ## Components | Component | Where | Role | |---|---|---| | **comment-bridge** | `bridge/bridge.py`, `nix/modules/bridge.nix` (swarm svc, `ci.commoninternet.net/hook`) | Polls enrolled repos for `!testme` (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. | | **Drone server** | `nix/modules/drone.nix` — coop-cloud `drone` recipe via abra (`drone.ci.commoninternet.net`, Gitea SSO) | CI engine. Holds the `recipe-ci` (custom-event) and `self-test` (push) pipelines (`.drone.yml`). | | **Drone exec runner** | `nix/modules/drone-runner.nix` — host systemd service | Runs pipeline steps **on the host** so they can drive `abra`/Docker. `DRONE_RUNNER_CAPACITY=1` (MAX_TESTS) caps concurrent builds; the rest queue natively. | | **harness** | `runner/run_recipe_ci.py` + `runner/harness/` + `tests/` | Orchestrates per run: fetch recipe at the PR head → install → upgrade → backup/restore → recipe-local (D4) → guaranteed teardown. pytest + Playwright via the Nix `cc-ci-run` env. | | **swarm + traefik** | `nix/modules/swarm.nix`, `nix/modules/proxy.nix` — coop-cloud `traefik` recipe via abra | Single-node Docker Swarm + `proxy` overlay; traefik terminates TLS with the wildcard cert (**sops-decrypted from git** to `/var/lib/ci-certs/live`, file provider, **no ACME**). The real deploy target for recipes-under-test. | | **backup-bot-two** | `nix/modules/backupbot.nix` | restic-based volume/DB backups; `abra app backup/restore` drive it. | | **dashboard** | `dashboard/dashboard.py`, `nix/modules/dashboard.nix` (`ci.commoninternet.net`) | YunoHost-CI-like overview: latest run per recipe + status badges + run links; `/badge/.svg`. | | **secrets** | `nix/modules/secrets.nix` + `secrets/` = **`cc-ci-secrets` submodule** (sops-nix) | **Phase-1c secrets model:** ALL secrets incl. the **wildcard TLS cert+key are sops-encrypted in git** in the private `cc-ci-secrets` repo, mounted as a **git submodule** at `secrets/` (the base `cc-ci` repo holds **no** secret material). Decrypted at activation by the **bootstrap age key** at `/var/lib/sops-nix/key.txt` (`sops.age.keyFile`) — cc-ci's host-derived age identity, or the **off-box recovery key on a fresh/cloned host** whose SSH key isn't a recipient; the host SSH key is also offered (`sops.age.sshKeyPaths`). The cert is decrypted to `/var/lib/ci-certs/live/` (no out-of-band file drop). This **one** age key is the only secret not in git. See `secrets.md`. | All swarm infra (traefik, drone, bridge, dashboard, backupbot) is brought up by **idempotent-reconcile systemd oneshots** that converge on every activation/boot (no run-once sentinels), **serialized** (proxy→drone→bridge→dashboard→backupbot) so a single switch converges on a blank host — so a from-scratch install is `git clone --recursive` + provision the one bootstrap age key + `nixos-rebuild switch` + the external DNS/gateway (`install.md`). **Phase-1c verified this on a real throwaway VM (D8): blank host + the two repos + the age key → a fully-converged cc-ci that serves a real `!testme` run end-to-end over the public domain.** ## The `!testme` flow ``` PR comment "!testme" │ (poll ≤30s, read-only; or optional admin webhook → /hook, HMAC-verified) ▼ comment-bridge: exact-match "!testme"? · commenter ∈ recipe-maintainers org? · resolve PR head ▼ Drone API: create build (event=custom, params RECIPE/REF/PR/SRC) ▼ recipe-ci pipeline (exec runner, on host): cc-ci-run runner/run_recipe_ci.py │ fetch recipe@PR-head (mirror clone + upstream version tags) → install → upgrade → backup │ → recipe-local (D4) → ALWAYS teardown (undeploy+volumes+secrets, verified) ▼ bridge watcher polls the build → edits the PR comment to ✅ passed / ❌ ▼ dashboard reflects latest-per-recipe status + badges ``` ## Network & TLS (see install.md §domain) `*.ci.commoninternet.net` (and bare `ci.commoninternet.net`) resolve to an operator **gateway** that **TLS-passthroughs** by SNI to cc-ci. cc-ci's traefik terminates TLS with the **wildcard cert sops-decrypted from git** (`cc-ci-secrets`) to `/var/lib/ci-certs/live/` (no ACME, no DNS token on the box; operator re-issues + re-commits to rotate). Each run gets a unique short subdomain `-<6hex>.ci.commoninternet.net` (covered by the wildcard) so concurrent/serial runs never collide; it's torn down at run end. ## Resource safety (§4.2/§4.3) - **MAX_TESTS=1** (runner capacity) → at most one test app live; Drone queues the rest. - **Per-build timeout 60m** (Drone repo timeout) → a hung build is killed, freeing the slot. - **Guaranteed teardown** (`try/finally`) + a **run-start janitor** that reaps orphaned `*-`-scheme apps (backstop for a SIGKILL'd build). `CCCI_JANITOR_MAX_AGE=0` in the recipe-ci pipeline (safe at capacity=1). - Heavy recipes pull many images; keep registry creds configured + adequate disk (see `runbook.md`). ## Enrolling a recipe (D5, see enroll-recipe.md) Add `tests//` (recipe_meta.py + test_install/upgrade/backup.py) + the repo to the bridge `POLL_REPOS`. Per-recipe quirks go in `recipe_meta.py` (HEALTH_PATH/timeouts, `EXTRA_ENV` for e.g. cryptpad's SANDBOX_DOMAIN or lasuite's TIMEOUT) — **no shared-harness edits**.