M2: Drone server + exec runner up; infra as idempotent-reconcile oneshots
Convert proxy+drone bring-up to writeShellApplication systemd oneshots that reconcile every activation (orchestrator steer). pkgs.abra overlay. Runner connected via RPC (polling, capacity=2). install.md = clone + nixos-rebuild switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -2,8 +2,12 @@
|
||||
|
||||
> WORK IN PROGRESS — grows with each milestone; the full from-scratch rebuild is verified at M9 (D8).
|
||||
|
||||
cc-ci is declared as a NixOS flake (this repo) plus a reproducible proxy-deploy step. Target:
|
||||
a NixOS 24.11 host reachable as `cc-ci` over SSH (root), with the operator preconditions in place.
|
||||
cc-ci is declared **entirely** as a NixOS flake (this repo). Bringing up the box is just
|
||||
**clone + `nixos-rebuild switch`** + the operator preconditions — no manual post-steps. The proxy
|
||||
(traefik) and Drone server are deployed by **idempotent-reconcile systemd oneshots** (`modules/
|
||||
proxy.nix`, `modules/drone.nix`) that converge the swarm to the desired state on every activation
|
||||
and boot (and self-heal drift), mirroring `swarm-init`. Target: a NixOS 24.11 host reachable as
|
||||
`cc-ci` over SSH (root).
|
||||
|
||||
## Operator preconditions (class-A1, see DECISIONS.md / docs/baseline.md)
|
||||
|
||||
@ -12,43 +16,40 @@ a NixOS 24.11 host reachable as `cc-ci` over SSH (root), with the operator preco
|
||||
- DNS: `*.ci.commoninternet.net` (+ bare) → the **gateway**, which TLS-passthroughs (SNI) to cc-ci.
|
||||
- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `modules/swarm.nix`).
|
||||
|
||||
## 1. Apply the NixOS flake
|
||||
## 1. Apply the NixOS flake (this is the whole install)
|
||||
|
||||
The flake (`flake.nix`, `hosts/cc-ci/`, `modules/`) declares: base host, sops-nix (decrypts via the
|
||||
host SSH key), Docker + single-node Swarm + the `proxy` overlay (`modules/swarm.nix`), and abra
|
||||
(`modules/abra.nix`).
|
||||
host SSH key), Docker + single-node Swarm + the `proxy` overlay + firewall 80/443
|
||||
(`modules/swarm.nix`), abra (`modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot**
|
||||
(`modules/proxy.nix`), the **Drone server reconcile oneshot** (`modules/drone.nix`), and the
|
||||
**Drone exec runner** (`modules/drone-runner.nix`).
|
||||
|
||||
```sh
|
||||
# materialise the repo on the host (the build runs on cc-ci itself — see DECISIONS.md deploy mech)
|
||||
# e.g. git clone <repo> /root/cc-ci (or sync it)
|
||||
nixos-rebuild switch --flake /root/cc-ci#cc-ci
|
||||
# verify
|
||||
systemctl is-system-running # -> running
|
||||
docker info --format '{{.Swarm.LocalNodeState}}' # -> active
|
||||
docker network ls | grep proxy # -> proxy ... overlay swarm
|
||||
```
|
||||
|
||||
On activation, the reconcile oneshots (`deploy-proxy`, `deploy-drone`) run automatically and converge
|
||||
the swarm. Verify:
|
||||
|
||||
```sh
|
||||
systemctl is-system-running # -> running
|
||||
docker info --format '{{.Swarm.LocalNodeState}}' # -> active
|
||||
docker service ls # traefik (app+socket-proxy) + drone, all 1/1
|
||||
systemctl is-active deploy-proxy deploy-drone drone-runner-exec # -> active x3
|
||||
# wildcard cert served end-to-end via the gateway:
|
||||
curl -ksv --resolve probe.ci.commoninternet.net:443:<gateway-ip> https://probe.ci.commoninternet.net/ \
|
||||
2>&1 | grep -E 'subject:|HTTP/' # -> CN=*.ci.commoninternet.net, HTTP 404 (no app router yet)
|
||||
curl -ks --resolve drone.ci.commoninternet.net:443:<gateway-ip> \
|
||||
-o /dev/null -w '%{http_code}\n' https://drone.ci.commoninternet.net/healthz # -> 200
|
||||
```
|
||||
|
||||
> Tip: when driving the switch over an SSH session that rides Tailscale, run it as a detached unit so
|
||||
> it survives a momentary drop, and **use the absolute flake path** (systemd units run with cwd `/`):
|
||||
> `systemd-run --unit=ccci-sw --property=Type=oneshot nixos-rebuild switch --flake /root/cc-ci#cc-ci`
|
||||
|
||||
## 2. Deploy the reverse proxy (coop-cloud traefik, wildcard/file-provider, no ACME)
|
||||
## 2. (later milestones) comment-bridge, dashboard, recipe enrollment
|
||||
|
||||
```sh
|
||||
bash /root/cc-ci/scripts/deploy-proxy.sh
|
||||
```
|
||||
|
||||
This idempotently deploys the canonical Co-op Cloud `traefik` recipe via abra in wildcard mode,
|
||||
serving the pre-issued cert as the `ssl_cert`/`ssl_key` swarm secrets, with `LETS_ENCRYPT_ENV` empty
|
||||
so no ACME ever runs (see DECISIONS.md "Proxy: real coop-cloud/traefik via abra"). Verify:
|
||||
|
||||
```sh
|
||||
docker service ls | grep traefik # app + socket-proxy, 1/1
|
||||
# wildcard cert served end-to-end via the gateway:
|
||||
curl -ksv --resolve probe.ci.commoninternet.net:443:<gateway-ip> https://probe.ci.commoninternet.net/ \
|
||||
2>&1 | grep -E 'subject:|HTTP/' # -> CN=*.ci.commoninternet.net, HTTP 404 (no app router yet)
|
||||
```
|
||||
|
||||
## 3. (later milestones) Drone, comment-bridge, dashboard, recipe enrollment
|
||||
|
||||
See `docs/enroll-recipe.md` (D5), `docs/secrets.md` (D6), `docs/runbook.md`. Added as those land.
|
||||
See `docs/enroll-recipe.md` (D5), `docs/secrets.md` (D6), `docs/runbook.md`. Each new piece of infra
|
||||
is added as another idempotent reconcile oneshot, so this install stays a single `nixos-rebuild`.
|
||||
|
||||
Reference in New Issue
Block a user