Files
cc-ci/docs/install.md
autonomic-bot 70f108d2fa
All checks were successful
continuous-integration/drone/push Build is passing
1c/W4 DONE: genuine throwaway-VM live rebuild (single switch, 0 failed, byte-identical, TLS leaf==git cert); Gate W4 CLAIMED + install.md updated
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:37:02 +01:00

6.0 KiB

Installing cc-ci from scratch

The full from-scratch rebuild is verified (Phase-1c / D8): a blank NixOS Incus VM, given the two repos + the single bootstrap age key, becomes a fully-converged cc-ci via one nixos-rebuild switch.

cc-ci is declared entirely as a NixOS flake — base config in this repo (cc-ci) and all secrets (incl. the wildcard TLS cert) sops-encrypted in a private companion repo cc-ci-secrets, mounted as a git submodule at secrets/. Bringing up the box is: clone --recursive + provision the one bootstrap age key + nixos-rebuild switch + the external DNS/gateway — no manual post-steps. The proxy (traefik), Drone, comment-bridge, dashboard and backupbot are deployed by idempotent-reconcile systemd oneshots that converge the swarm on every activation/boot (and self-heal drift), mirroring swarm-init; they are serialized (proxy→drone→bridge→dashboard→ backupbot) so a single switch converges on a blank host. Target: a NixOS 24.11 host reachable over SSH (root). (Verified on a throwaway Incus VM: blank host + the two repos + the age key → one nixos-rebuild switch → fully converged cc-ci, 0 failed units — see DECISIONS.md Phase-1c / D8.)

Preconditions

The one out-of-band secret (provision before the first rebuild):

  • The bootstrap age key at /var/lib/sops-nix/key.txt (mode 0600). It must be a sops recipient of cc-ci-secrets/secrets.yaml. Two cases:
    • Canonical cc-ci: its SSH host key is already a recipient — also works via age.sshKeyPaths; the keyFile holds the host-derived age identity (ssh-to-age -private-key -i /etc/ssh/ssh_host_ed25519_key).
    • A fresh/cloned host (different SSH host key, not a recipient): provision the off-box recovery age key (age1cmk26…'s private half) there — it decrypts every secret incl. the cert. Everything else (cert, Drone OAuth/RPC, webhook HMAC) is sops-encrypted in git — nothing else is provisioned out-of-band.

External infra (operator-owned, not on the box — class-A1):

  • DNS: *.ci.commoninternet.net (+ bare) → the gateway, which TLS-passthroughs (SNI) to cc-ci.
  • Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by modules/swarm.nix).
  • The wildcard cert is renewed out-of-band by the operator, who then re-encrypts it into cc-ci-secrets (sops) and rebuilds — the Gandi DNS token never touches the box; never ACME here.

1. Apply the NixOS flake (this is the whole install)

The flake (flake.nix, hosts/cc-ci/, modules/) declares: base host, sops-nix (decrypts via the host SSH key), Docker + single-node Swarm + the proxy overlay + firewall 80/443 (modules/swarm.nix), abra (modules/abra.nix / packages.nix), the traefik reconcile oneshot (modules/proxy.nix), the Drone server reconcile oneshot (modules/drone.nix), and the Drone exec runner (modules/drone-runner.nix).

# 1. Clone base + the private secrets submodule (bot/deploy creds for cc-ci-secrets).
#    The submodule provides secrets/secrets.yaml (sops). Use a credential that can read
#    recipe-maintainers/cc-ci-secrets, e.g. a per-command header (never persisted):
git clone --recursive https://git.autonomic.zone/recipe-maintainers/cc-ci.git /root/cc-ci
#    (if cloned non-recursively: git -C /root/cc-ci submodule update --init)

# 2. Provision the bootstrap age key (see Preconditions) — the ONE out-of-band secret:
install -m700 -d /var/lib/sops-nix
install -m600 /path/to/bootstrap-age-key /var/lib/sops-nix/key.txt

# 3. One nixos-rebuild switch. NOTE: ?submodules=1 so the git flake includes secrets/.
nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'

On activation sops-nix decrypts every secret (incl. the wildcard cert → /var/lib/ci-certs/live/), then the serialized reconcile oneshots converge the swarm. Verify:

systemctl is-system-running                          # -> running (0 failed units)
docker service ls                                    # traefik app+socket-proxy, drone, bridge, dashboard, backups — all 1/1
# cert is sops-decrypted FROM GIT to the path traefik serves:
sha256sum /var/lib/ci-certs/live/fullchain.pem       # symlink -> /run/secrets/wildcard_cert
# TLS served from the git cert, verified locally on the host (SNI ci.commoninternet.net):
curl -s --resolve probe.ci.commoninternet.net:443:127.0.0.1 \
  -o /dev/null -w 'ssl_verify=%{ssl_verify_result}\n' https://probe.ci.commoninternet.net/   # -> 0
# (the served leaf fingerprint == the cert in cc-ci-secrets)

Tip: when driving the switch over an SSH session that rides Tailscale, run it as a detached unit so it survives the tailscale restart during activation, and use the absolute flake ref: systemd-run --no-block --unit=ccci-sw --property=Type=oneshot nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci' (On the canonical cc-ci the build source is synced from the admin's clone via tar | ssh and built as a path: flake — no submodule fetch needed there; the ?submodules=1 form is for a git clone.)

The only manual post-rebuild step. Drone needs the bot's Gitea OAuth token (granted by an interactive login) before it can sync/clone repos; this can't be Nix-declared without putting the bot password on the box. The token then persists in Drone's data volume.

GITEA_USERNAME=autonomic-bot GITEA_PASSWORD=… bash scripts/bootstrap-drone-oauth.sh
# -> "drone login ok (admin=true)" / "repo recipe-maintainers/cc-ci active=true"

Verify a build runs green: push any commit to the cc-ci repo and watch https://drone.ci.commoninternet.net (or the API) — the push webhook (set on activation) triggers the .drone.yml self-test on the exec runner.

3. (later milestones) comment-bridge, dashboard, recipe enrollment

See docs/enroll-recipe.md (D5), docs/secrets.md (D6), docs/runbook.md. Each new piece of infra is added as another idempotent reconcile oneshot, so this install stays a single nixos-rebuild.