Files
cc-ci/docs/architecture.md
autonomic-bot 433ec9de30 refactor(1b): RL5 — consolidate Nix code under nix/ (modules->nix/modules, hosts->nix/hosts)
flake.nix/flake.lock STAY at root so the build ref #cc-ci is unchanged; only flake's internal
configuration.nix path updated. Root-relative refs inside moved modules re-based ../X -> ../../X
(secrets/bridge/dashboard); configuration.nix's ../../modules imports unchanged (both dirs under nix/).
Living docs (README, architecture/install/secrets/enroll) + .drone.yml comment updated to nix/...;
append-only history logs left as-is. DECISIONS.md records RL5 + the deferred-coordinated RL6.

Verified on cc-ci: nixos-rebuild build 'path:#cc-ci' -> toplevel 8i3jcad9 (BYTE-IDENTICAL to the
pre-move build — store derivations are content-addressed on file contents, module .nix not in the
runtime closure); scripts/lint.sh -> lint: PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 21:19:09 +01:00

5.9 KiB

Architecture

cc-ci turns a !testme PR comment into a real end-to-end deploy + test of a Co-op Cloud recipe and reports the result back. Everything on the cc-ci host is declared in this repo's NixOS flake.

Repo layout

All Nix code lives under nix/nix/hosts/cc-ci/ (the machine config) and nix/modules/ (the service modules). flake.nix / flake.lock stay at the repo root as the entry point, so the build reference is unchanged (nixos-rebuild switch --flake '…#cc-ci'). Application source sits at the root (bridge/, dashboard/, runner/, tests/); encrypted secrets are the secrets/ submodule.

Components

Component Where Role
comment-bridge bridge/bridge.py, nix/modules/bridge.nix (swarm svc, ci.commoninternet.net/hook) Polls enrolled repos for !testme (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail.
Drone server nix/modules/drone.nix — coop-cloud drone recipe via abra (drone.ci.commoninternet.net, Gitea SSO) CI engine. Holds the recipe-ci (custom-event) and self-test (push) pipelines (.drone.yml).
Drone exec runner nix/modules/drone-runner.nix — host systemd service Runs pipeline steps on the host so they can drive abra/Docker. DRONE_RUNNER_CAPACITY=1 (MAX_TESTS) caps concurrent builds; the rest queue natively.
harness runner/run_recipe_ci.py + runner/harness/ + tests/ Orchestrates per run: fetch recipe at the PR head → install → upgrade → backup/restore → recipe-local (D4) → guaranteed teardown. pytest + Playwright via the Nix cc-ci-run env.
swarm + traefik nix/modules/swarm.nix, nix/modules/proxy.nix — coop-cloud traefik recipe via abra Single-node Docker Swarm + proxy overlay; traefik terminates TLS with the wildcard cert (sops-decrypted from git to /var/lib/ci-certs/live, file provider, no ACME). The real deploy target for recipes-under-test.
backup-bot-two nix/modules/backupbot.nix restic-based volume/DB backups; abra app backup/restore drive it.
dashboard dashboard/dashboard.py, nix/modules/dashboard.nix (ci.commoninternet.net) YunoHost-CI-like overview: latest run per recipe + status badges + run links; /badge/<recipe>.svg.
secrets nix/modules/secrets.nix + secrets/ = cc-ci-secrets submodule (sops-nix) Phase-1c secrets model: ALL secrets incl. the wildcard TLS cert+key are sops-encrypted in git in the private cc-ci-secrets repo, mounted as a git submodule at secrets/ (the base cc-ci repo holds no secret material). Decrypted at activation by the bootstrap age key at /var/lib/sops-nix/key.txt (sops.age.keyFile) — cc-ci's host-derived age identity, or the off-box recovery key on a fresh/cloned host whose SSH key isn't a recipient; the host SSH key is also offered (sops.age.sshKeyPaths). The cert is decrypted to /var/lib/ci-certs/live/ (no out-of-band file drop). This one age key is the only secret not in git. See secrets.md.

All swarm infra (traefik, drone, bridge, dashboard, backupbot) is brought up by idempotent-reconcile systemd oneshots that converge on every activation/boot (no run-once sentinels), serialized (proxy→drone→bridge→dashboard→backupbot) so a single switch converges on a blank host — so a from-scratch install is git clone --recursive + provision the one bootstrap age key + nixos-rebuild switch + the external DNS/gateway (install.md). Phase-1c verified this on a real throwaway VM (D8): blank host + the two repos + the age key → a fully-converged cc-ci that serves a real !testme run end-to-end over the public domain.

The !testme flow

PR comment "!testme"
  │  (poll ≤30s, read-only; or optional admin webhook → /hook, HMAC-verified)
  ▼ comment-bridge: exact-match "!testme"? · commenter ∈ recipe-maintainers org? · resolve PR head
  ▼ Drone API: create build (event=custom, params RECIPE/REF/PR/SRC)
  ▼ recipe-ci pipeline (exec runner, on host): cc-ci-run runner/run_recipe_ci.py
  │    fetch recipe@PR-head (mirror clone + upstream version tags) → install → upgrade → backup
  │    → recipe-local (D4) → ALWAYS teardown (undeploy+volumes+secrets, verified)
  ▼ bridge watcher polls the build → edits the PR comment to ✅ passed / ❌ <status>
  ▼ dashboard reflects latest-per-recipe status + badges

Network & TLS (see install.md §domain)

*.ci.commoninternet.net (and bare ci.commoninternet.net) resolve to an operator gateway that TLS-passthroughs by SNI to cc-ci. cc-ci's traefik terminates TLS with the wildcard cert sops-decrypted from git (cc-ci-secrets) to /var/lib/ci-certs/live/ (no ACME, no DNS token on the box; operator re-issues + re-commits to rotate). Each run gets a unique short subdomain <recipe[:4]>-<6hex>.ci.commoninternet.net (covered by the wildcard) so concurrent/serial runs never collide; it's torn down at run end.

Resource safety (§4.2/§4.3)

  • MAX_TESTS=1 (runner capacity) → at most one test app live; Drone queues the rest.
  • Per-build timeout 60m (Drone repo timeout) → a hung build is killed, freeing the slot.
  • Guaranteed teardown (try/finally) + a run-start janitor that reaps orphaned *--scheme apps (backstop for a SIGKILL'd build). CCCI_JANITOR_MAX_AGE=0 in the recipe-ci pipeline (safe at capacity=1).
  • Heavy recipes pull many images; keep registry creds configured + adequate disk (see runbook.md).

Enrolling a recipe (D5, see enroll-recipe.md)

Add tests/<recipe>/ (recipe_meta.py + test_install/upgrade/backup.py) + the repo to the bridge POLL_REPOS. Per-recipe quirks go in recipe_meta.py (HEALTH_PATH/timeouts, EXTRA_ENV for e.g. cryptpad's SANDBOX_DOMAIN or lasuite's TIMEOUT) — no shared-harness edits.