Files
cc-ci/docs/install.md
autonomic-bot 992d87cfcd refactor(1b): RL6 — move Builder protocol files into machine-docs/ (README stays root)
git mv STATUS*/BACKLOG*/JOURNAL*/DECISIONS.md -> machine-docs/. README.md kept at root (operator
decision). Updated in-repo refs: README (status line + lint section + Loop-state section) and
docs/install.md -> machine-docs/...

Safe to move now: launch.sh already has resolve_state() (prefers machine-docs/ else root) used by
every STATUS/REVIEW read, and the running watchdog (pid 133191) was restarted AFTER that update, so
it is location-agnostic. scripts/lint.sh -> lint: PASS post-move. Adversary moves its own REVIEW*.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:35:30 +01:00

98 lines
6.0 KiB
Markdown

# Installing cc-ci from scratch
> The full from-scratch rebuild is **verified** (Phase-1c / D8): a blank NixOS Incus VM, given the two
> repos + the single bootstrap age key, becomes a fully-converged cc-ci via one `nixos-rebuild switch`.
cc-ci is declared **entirely** as a NixOS flake — base config in this repo (`cc-ci`) and **all
secrets (incl. the wildcard TLS cert) sops-encrypted in a private companion repo `cc-ci-secrets`,
mounted as a git submodule at `secrets/`**. Bringing up the box is: **clone `--recursive` + provision
the one bootstrap age key + `nixos-rebuild switch`** + the external DNS/gateway — no manual
post-steps. The proxy (traefik), Drone, comment-bridge, dashboard and backupbot are deployed by
**idempotent-reconcile systemd oneshots** that converge the swarm on every activation/boot (and
self-heal drift), mirroring `swarm-init`; they are **serialized** (proxy→drone→bridge→dashboard→
backupbot) so a single switch converges on a blank host. Target: a NixOS 24.11 host reachable over SSH (root).
*(Verified on a throwaway Incus VM: blank host + the two repos + the age key → one `nixos-rebuild
switch` → fully converged cc-ci, 0 failed units — see machine-docs/DECISIONS.md Phase-1c / D8.)*
## Preconditions
**The one out-of-band secret (provision before the first rebuild):**
- The **bootstrap age key** at `/var/lib/sops-nix/key.txt` (mode 0600). It must be a sops recipient
of `cc-ci-secrets/secrets.yaml`. Two cases:
- **Canonical cc-ci:** its SSH host key is already a recipient — also works via `age.sshKeyPaths`;
the keyFile holds the host-derived age identity (`ssh-to-age -private-key -i
/etc/ssh/ssh_host_ed25519_key`).
- **A fresh/cloned host** (different SSH host key, not a recipient): provision the **off-box
recovery age key** (`age1cmk26…`'s private half) there — it decrypts every secret incl. the cert.
Everything else (cert, Drone OAuth/RPC, webhook HMAC) is sops-encrypted **in git** — nothing else
is provisioned out-of-band.
**External infra (operator-owned, not on the box — class-A1):**
- DNS: `*.ci.commoninternet.net` (+ bare) → the **gateway**, which TLS-passthroughs (SNI) to cc-ci.
- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `nix/modules/swarm.nix`).
- The wildcard cert is **renewed out-of-band** by the operator, who then re-encrypts it into
`cc-ci-secrets` (sops) and rebuilds — the Gandi DNS token never touches the box; **never ACME here.**
## 1. Apply the NixOS flake (this is the whole install)
The flake (`flake.nix`, `nix/hosts/cc-ci/`, `nix/modules/`) declares: base host, sops-nix (decrypts via the
host SSH key), Docker + single-node Swarm + the `proxy` overlay + firewall 80/443
(`nix/modules/swarm.nix`), abra (`nix/modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot**
(`nix/modules/proxy.nix`), the **Drone server reconcile oneshot** (`nix/modules/drone.nix`), and the
**Drone exec runner** (`nix/modules/drone-runner.nix`).
```sh
# 1. Clone base + the private secrets submodule (bot/deploy creds for cc-ci-secrets).
# The submodule provides secrets/secrets.yaml (sops). Use a credential that can read
# recipe-maintainers/cc-ci-secrets, e.g. a per-command header (never persisted):
git clone --recursive https://git.autonomic.zone/recipe-maintainers/cc-ci.git /root/cc-ci
# (if cloned non-recursively: git -C /root/cc-ci submodule update --init)
# 2. Provision the bootstrap age key (see Preconditions) — the ONE out-of-band secret:
install -m700 -d /var/lib/sops-nix
install -m600 /path/to/bootstrap-age-key /var/lib/sops-nix/key.txt
# 3. One nixos-rebuild switch. NOTE: ?submodules=1 so the git flake includes secrets/.
nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'
```
On activation sops-nix decrypts every secret (incl. the wildcard cert → `/var/lib/ci-certs/live/`),
then the serialized reconcile oneshots converge the swarm. Verify:
```sh
systemctl is-system-running # -> running (0 failed units)
docker service ls # traefik app+socket-proxy, drone, bridge, dashboard, backups — all 1/1
# cert is sops-decrypted FROM GIT to the path traefik serves:
sha256sum /var/lib/ci-certs/live/fullchain.pem # symlink -> /run/secrets/wildcard_cert
# TLS served from the git cert, verified locally on the host (SNI ci.commoninternet.net):
curl -s --resolve probe.ci.commoninternet.net:443:127.0.0.1 \
-o /dev/null -w 'ssl_verify=%{ssl_verify_result}\n' https://probe.ci.commoninternet.net/ # -> 0
# (the served leaf fingerprint == the cert in cc-ci-secrets)
```
> Tip: when driving the switch over an SSH session that rides Tailscale, run it as a detached unit so
> it survives the tailscale restart during activation, and use the absolute flake ref:
> `systemd-run --no-block --unit=ccci-sw --property=Type=oneshot nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'`
> *(On the canonical cc-ci the build source is synced from the admin's clone via `tar | ssh` and built
> as a `path:` flake — no submodule fetch needed there; the `?submodules=1` form is for a git clone.)*
## 2. One-time: link Drone ↔ Gitea (OAuth grant)
The only manual post-rebuild step. Drone needs the bot's Gitea OAuth token (granted by an
interactive login) before it can sync/clone repos; this can't be Nix-declared without putting the
bot password on the box. The token then persists in Drone's `data` volume.
```sh
GITEA_USERNAME=autonomic-bot GITEA_PASSWORD=… bash scripts/bootstrap-drone-oauth.sh
# -> "drone login ok (admin=true)" / "repo recipe-maintainers/cc-ci active=true"
```
Verify a build runs green: push any commit to the cc-ci repo and watch
`https://drone.ci.commoninternet.net` (or the API) — the push webhook (set on activation) triggers
the `.drone.yml` self-test on the exec runner.
## 3. (later milestones) comment-bridge, dashboard, recipe enrollment
See `docs/enroll-recipe.md` (D5), `docs/secrets.md` (D6), `docs/runbook.md`. Each new piece of infra
is added as another idempotent reconcile oneshot, so this install stays a single `nixos-rebuild`.