git mv STATUS*/BACKLOG*/JOURNAL*/DECISIONS.md -> machine-docs/. README.md kept at root (operator decision). Updated in-repo refs: README (status line + lint section + Loop-state section) and docs/install.md -> machine-docs/... Safe to move now: launch.sh already has resolve_state() (prefers machine-docs/ else root) used by every STATUS/REVIEW read, and the running watchdog (pid 133191) was restarted AFTER that update, so it is location-agnostic. scripts/lint.sh -> lint: PASS post-move. Adversary moves its own REVIEW*.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.0 KiB
Installing cc-ci from scratch
The full from-scratch rebuild is verified (Phase-1c / D8): a blank NixOS Incus VM, given the two repos + the single bootstrap age key, becomes a fully-converged cc-ci via one
nixos-rebuild switch.
cc-ci is declared entirely as a NixOS flake — base config in this repo (cc-ci) and all
secrets (incl. the wildcard TLS cert) sops-encrypted in a private companion repo cc-ci-secrets,
mounted as a git submodule at secrets/. Bringing up the box is: clone --recursive + provision
the one bootstrap age key + nixos-rebuild switch + the external DNS/gateway — no manual
post-steps. The proxy (traefik), Drone, comment-bridge, dashboard and backupbot are deployed by
idempotent-reconcile systemd oneshots that converge the swarm on every activation/boot (and
self-heal drift), mirroring swarm-init; they are serialized (proxy→drone→bridge→dashboard→
backupbot) so a single switch converges on a blank host. Target: a NixOS 24.11 host reachable over SSH (root).
(Verified on a throwaway Incus VM: blank host + the two repos + the age key → one nixos-rebuild switch → fully converged cc-ci, 0 failed units — see machine-docs/DECISIONS.md Phase-1c / D8.)
Preconditions
The one out-of-band secret (provision before the first rebuild):
- The bootstrap age key at
/var/lib/sops-nix/key.txt(mode 0600). It must be a sops recipient ofcc-ci-secrets/secrets.yaml. Two cases:- Canonical cc-ci: its SSH host key is already a recipient — also works via
age.sshKeyPaths; the keyFile holds the host-derived age identity (ssh-to-age -private-key -i /etc/ssh/ssh_host_ed25519_key). - A fresh/cloned host (different SSH host key, not a recipient): provision the off-box
recovery age key (
age1cmk26…'s private half) there — it decrypts every secret incl. the cert. Everything else (cert, Drone OAuth/RPC, webhook HMAC) is sops-encrypted in git — nothing else is provisioned out-of-band.
- Canonical cc-ci: its SSH host key is already a recipient — also works via
External infra (operator-owned, not on the box — class-A1):
- DNS:
*.ci.commoninternet.net(+ bare) → the gateway, which TLS-passthroughs (SNI) to cc-ci. - Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by
nix/modules/swarm.nix). - The wildcard cert is renewed out-of-band by the operator, who then re-encrypts it into
cc-ci-secrets(sops) and rebuilds — the Gandi DNS token never touches the box; never ACME here.
1. Apply the NixOS flake (this is the whole install)
The flake (flake.nix, nix/hosts/cc-ci/, nix/modules/) declares: base host, sops-nix (decrypts via the
host SSH key), Docker + single-node Swarm + the proxy overlay + firewall 80/443
(nix/modules/swarm.nix), abra (nix/modules/abra.nix / packages.nix), the traefik reconcile oneshot
(nix/modules/proxy.nix), the Drone server reconcile oneshot (nix/modules/drone.nix), and the
Drone exec runner (nix/modules/drone-runner.nix).
# 1. Clone base + the private secrets submodule (bot/deploy creds for cc-ci-secrets).
# The submodule provides secrets/secrets.yaml (sops). Use a credential that can read
# recipe-maintainers/cc-ci-secrets, e.g. a per-command header (never persisted):
git clone --recursive https://git.autonomic.zone/recipe-maintainers/cc-ci.git /root/cc-ci
# (if cloned non-recursively: git -C /root/cc-ci submodule update --init)
# 2. Provision the bootstrap age key (see Preconditions) — the ONE out-of-band secret:
install -m700 -d /var/lib/sops-nix
install -m600 /path/to/bootstrap-age-key /var/lib/sops-nix/key.txt
# 3. One nixos-rebuild switch. NOTE: ?submodules=1 so the git flake includes secrets/.
nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'
On activation sops-nix decrypts every secret (incl. the wildcard cert → /var/lib/ci-certs/live/),
then the serialized reconcile oneshots converge the swarm. Verify:
systemctl is-system-running # -> running (0 failed units)
docker service ls # traefik app+socket-proxy, drone, bridge, dashboard, backups — all 1/1
# cert is sops-decrypted FROM GIT to the path traefik serves:
sha256sum /var/lib/ci-certs/live/fullchain.pem # symlink -> /run/secrets/wildcard_cert
# TLS served from the git cert, verified locally on the host (SNI ci.commoninternet.net):
curl -s --resolve probe.ci.commoninternet.net:443:127.0.0.1 \
-o /dev/null -w 'ssl_verify=%{ssl_verify_result}\n' https://probe.ci.commoninternet.net/ # -> 0
# (the served leaf fingerprint == the cert in cc-ci-secrets)
Tip: when driving the switch over an SSH session that rides Tailscale, run it as a detached unit so it survives the tailscale restart during activation, and use the absolute flake ref:
systemd-run --no-block --unit=ccci-sw --property=Type=oneshot nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'(On the canonical cc-ci the build source is synced from the admin's clone viatar | sshand built as apath:flake — no submodule fetch needed there; the?submodules=1form is for a git clone.)
2. One-time: link Drone ↔ Gitea (OAuth grant)
The only manual post-rebuild step. Drone needs the bot's Gitea OAuth token (granted by an
interactive login) before it can sync/clone repos; this can't be Nix-declared without putting the
bot password on the box. The token then persists in Drone's data volume.
GITEA_USERNAME=autonomic-bot GITEA_PASSWORD=… bash scripts/bootstrap-drone-oauth.sh
# -> "drone login ok (admin=true)" / "repo recipe-maintainers/cc-ci active=true"
Verify a build runs green: push any commit to the cc-ci repo and watch
https://drone.ci.commoninternet.net (or the API) — the push webhook (set on activation) triggers
the .drone.yml self-test on the exec runner.
3. (later milestones) comment-bridge, dashboard, recipe enrollment
See docs/enroll-recipe.md (D5), docs/secrets.md (D6), docs/runbook.md. Each new piece of infra
is added as another idempotent reconcile oneshot, so this install stays a single nixos-rebuild.