refactor(1b): RL5 — consolidate Nix code under nix/ (modules->nix/modules, hosts->nix/hosts)

flake.nix/flake.lock STAY at root so the build ref #cc-ci is unchanged; only flake's internal
configuration.nix path updated. Root-relative refs inside moved modules re-based ../X -> ../../X
(secrets/bridge/dashboard); configuration.nix's ../../modules imports unchanged (both dirs under nix/).
Living docs (README, architecture/install/secrets/enroll) + .drone.yml comment updated to nix/...;
append-only history logs left as-is. DECISIONS.md records RL5 + the deferred-coordinated RL6.

Verified on cc-ci: nixos-rebuild build 'path:#cc-ci' -> toplevel 8i3jcad9 (BYTE-IDENTICAL to the
pre-move build — store derivations are content-addressed on file contents, module .nix not in the
runtime closure); scripts/lint.sh -> lint: PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-27 21:19:09 +01:00
parent 5a811e4ae4
commit 433ec9de30
21 changed files with 63 additions and 29 deletions

View File

@ -3,18 +3,26 @@
cc-ci turns a `!testme` PR comment into a real end-to-end deploy + test of a Co-op Cloud recipe and
reports the result back. Everything on the `cc-ci` host is declared in this repo's NixOS flake.
## Repo layout
All Nix code lives under **`nix/`** — `nix/hosts/cc-ci/` (the machine config) and `nix/modules/`
(the service modules). `flake.nix` / `flake.lock` stay at the **repo root** as the entry point, so
the build reference is unchanged (`nixos-rebuild switch --flake '…#cc-ci'`). Application source sits
at the root (`bridge/`, `dashboard/`, `runner/`, `tests/`); encrypted secrets are the `secrets/`
submodule.
## Components
| Component | Where | Role |
|---|---|---|
| **comment-bridge** | `bridge/bridge.py`, `modules/bridge.nix` (swarm svc, `ci.commoninternet.net/hook`) | Polls enrolled repos for `!testme` (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. |
| **Drone server** | `modules/drone.nix` — coop-cloud `drone` recipe via abra (`drone.ci.commoninternet.net`, Gitea SSO) | CI engine. Holds the `recipe-ci` (custom-event) and `self-test` (push) pipelines (`.drone.yml`). |
| **Drone exec runner** | `modules/drone-runner.nix` — host systemd service | Runs pipeline steps **on the host** so they can drive `abra`/Docker. `DRONE_RUNNER_CAPACITY=1` (MAX_TESTS) caps concurrent builds; the rest queue natively. |
| **comment-bridge** | `bridge/bridge.py`, `nix/modules/bridge.nix` (swarm svc, `ci.commoninternet.net/hook`) | Polls enrolled repos for `!testme` (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. |
| **Drone server** | `nix/modules/drone.nix` — coop-cloud `drone` recipe via abra (`drone.ci.commoninternet.net`, Gitea SSO) | CI engine. Holds the `recipe-ci` (custom-event) and `self-test` (push) pipelines (`.drone.yml`). |
| **Drone exec runner** | `nix/modules/drone-runner.nix` — host systemd service | Runs pipeline steps **on the host** so they can drive `abra`/Docker. `DRONE_RUNNER_CAPACITY=1` (MAX_TESTS) caps concurrent builds; the rest queue natively. |
| **harness** | `runner/run_recipe_ci.py` + `runner/harness/` + `tests/` | Orchestrates per run: fetch recipe at the PR head → install → upgrade → backup/restore → recipe-local (D4) → guaranteed teardown. pytest + Playwright via the Nix `cc-ci-run` env. |
| **swarm + traefik** | `modules/swarm.nix`, `modules/proxy.nix` — coop-cloud `traefik` recipe via abra | Single-node Docker Swarm + `proxy` overlay; traefik terminates TLS with the wildcard cert (**sops-decrypted from git** to `/var/lib/ci-certs/live`, file provider, **no ACME**). The real deploy target for recipes-under-test. |
| **backup-bot-two** | `modules/backupbot.nix` | restic-based volume/DB backups; `abra app backup/restore` drive it. |
| **dashboard** | `dashboard/dashboard.py`, `modules/dashboard.nix` (`ci.commoninternet.net`) | YunoHost-CI-like overview: latest run per recipe + status badges + run links; `/badge/<recipe>.svg`. |
| **secrets** | `modules/secrets.nix` + `secrets/` = **`cc-ci-secrets` submodule** (sops-nix) | **Phase-1c secrets model:** ALL secrets incl. the **wildcard TLS cert+key are sops-encrypted in git** in the private `cc-ci-secrets` repo, mounted as a **git submodule** at `secrets/` (the base `cc-ci` repo holds **no** secret material). Decrypted at activation by the **bootstrap age key** at `/var/lib/sops-nix/key.txt` (`sops.age.keyFile`) — cc-ci's host-derived age identity, or the **off-box recovery key on a fresh/cloned host** whose SSH key isn't a recipient; the host SSH key is also offered (`sops.age.sshKeyPaths`). The cert is decrypted to `/var/lib/ci-certs/live/` (no out-of-band file drop). This **one** age key is the only secret not in git. See `secrets.md`. |
| **swarm + traefik** | `nix/modules/swarm.nix`, `nix/modules/proxy.nix` — coop-cloud `traefik` recipe via abra | Single-node Docker Swarm + `proxy` overlay; traefik terminates TLS with the wildcard cert (**sops-decrypted from git** to `/var/lib/ci-certs/live`, file provider, **no ACME**). The real deploy target for recipes-under-test. |
| **backup-bot-two** | `nix/modules/backupbot.nix` | restic-based volume/DB backups; `abra app backup/restore` drive it. |
| **dashboard** | `dashboard/dashboard.py`, `nix/modules/dashboard.nix` (`ci.commoninternet.net`) | YunoHost-CI-like overview: latest run per recipe + status badges + run links; `/badge/<recipe>.svg`. |
| **secrets** | `nix/modules/secrets.nix` + `secrets/` = **`cc-ci-secrets` submodule** (sops-nix) | **Phase-1c secrets model:** ALL secrets incl. the **wildcard TLS cert+key are sops-encrypted in git** in the private `cc-ci-secrets` repo, mounted as a **git submodule** at `secrets/` (the base `cc-ci` repo holds **no** secret material). Decrypted at activation by the **bootstrap age key** at `/var/lib/sops-nix/key.txt` (`sops.age.keyFile`) — cc-ci's host-derived age identity, or the **off-box recovery key on a fresh/cloned host** whose SSH key isn't a recipient; the host SSH key is also offered (`sops.age.sshKeyPaths`). The cert is decrypted to `/var/lib/ci-certs/live/` (no out-of-band file drop). This **one** age key is the only secret not in git. See `secrets.md`. |
All swarm infra (traefik, drone, bridge, dashboard, backupbot) is brought up by **idempotent-reconcile
systemd oneshots** that converge on every activation/boot (no run-once sentinels), **serialized**

View File

@ -44,7 +44,7 @@ env `CCCI_BASE_URL` (e.g. `https://<app>.ci.commoninternet.net/`) and `CCCI_APP_
## 4. Add the repo to the bridge poll list
The trigger is **polling** (primary): add the repo's full name to the comment-bridge `POLL_REPOS`
csv (`modules/bridge.nix`) and `nixos-rebuild switch`. The bridge then polls that repo's open PRs
csv (`nix/modules/bridge.nix`) and `nixos-rebuild switch`. The bridge then polls that repo's open PRs
every 30s and fires a run on a new `!testme` comment from an authorized org member. This needs only
**read + comment** access — no webhook, no repo-admin.

View File

@ -29,17 +29,17 @@ switch` → fully converged cc-ci, 0 failed units — see DECISIONS.md Phase-1c
**External infra (operator-owned, not on the box — class-A1):**
- DNS: `*.ci.commoninternet.net` (+ bare) → the **gateway**, which TLS-passthroughs (SNI) to cc-ci.
- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `modules/swarm.nix`).
- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `nix/modules/swarm.nix`).
- The wildcard cert is **renewed out-of-band** by the operator, who then re-encrypts it into
`cc-ci-secrets` (sops) and rebuilds — the Gandi DNS token never touches the box; **never ACME here.**
## 1. Apply the NixOS flake (this is the whole install)
The flake (`flake.nix`, `hosts/cc-ci/`, `modules/`) declares: base host, sops-nix (decrypts via the
The flake (`flake.nix`, `nix/hosts/cc-ci/`, `nix/modules/`) declares: base host, sops-nix (decrypts via the
host SSH key), Docker + single-node Swarm + the `proxy` overlay + firewall 80/443
(`modules/swarm.nix`), abra (`modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot**
(`modules/proxy.nix`), the **Drone server reconcile oneshot** (`modules/drone.nix`), and the
**Drone exec runner** (`modules/drone-runner.nix`).
(`nix/modules/swarm.nix`), abra (`nix/modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot**
(`nix/modules/proxy.nix`), the **Drone server reconcile oneshot** (`nix/modules/drone.nix`), and the
**Drone exec runner** (`nix/modules/drone-runner.nix`).
```sh
# 1. Clone base + the private secrets submodule (bot/deploy creds for cc-ci-secrets).

View File

@ -65,7 +65,7 @@ All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/<name>`
| `drone_rpc_secret` | Drone server ↔ exec runner RPC | `openssl rand -hex 32` |
| `drone_gitea_client_secret` | Drone↔Gitea OAuth app | from the Gitea OAuth app creation |
| `bridge_webhook_hmac` | comment-bridge webhook HMAC | `openssl rand -hex 32` |
| `bridge_drone_token` | bridge + dashboard → Drone API | hex token; **injected as the bot's Drone machine token** via `DRONE_USER_CREATE=…,token:$(cat /run/secrets/bridge_drone_token)` (modules/drone.nix) so it's reproducible on a fresh Drone DB (else the bridge gets 401 on a clean-room rebuild) |
| `bridge_drone_token` | bridge + dashboard → Drone API | hex token; **injected as the bot's Drone machine token** via `DRONE_USER_CREATE=…,token:$(cat /run/secrets/bridge_drone_token)` (nix/modules/drone.nix) so it's reproducible on a fresh Drone DB (else the bridge gets 401 on a clean-room rebuild) |
| `bridge_gitea_token` | bridge → Gitea API (poll/comment) | minted Gitea token (bot) |
| `restic_password` | backup-bot-two restic repo | **abra-generated** (`abra app secret generate`, kept stable across reconciles) |
@ -76,7 +76,7 @@ All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/<name>`
`cc-ci-secrets`, then bump the base repo's submodule pointer (`git add secrets && commit`).
3. For swarm-secret-backed values, **bump the consuming app's secret version** so the reconcile
re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone `RPC_SECRET_VERSION`
v1→v2 (modules/drone.nix), bridge `cc_ci_bridge_*_v<n>` (modules/bridge.nix). Update both ends
v1→v2 (nix/modules/drone.nix), bridge `cc_ci_bridge_*_v<n>` (nix/modules/bridge.nix). Update both ends
(server + runner share `drone_rpc_secret`).
4. `git commit` + push, sync to host, `nixos-rebuild switch` → reconcile re-inserts + redeploys.
5. Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers).