git mv STATUS*/BACKLOG*/JOURNAL*/DECISIONS.md -> machine-docs/. README.md kept at root (operator decision). Updated in-repo refs: README (status line + lint section + Loop-state section) and docs/install.md -> machine-docs/... Safe to move now: launch.sh already has resolve_state() (prefers machine-docs/ else root) used by every STATUS/REVIEW read, and the running watchdog (pid 133191) was restarted AFTER that update, so it is location-agnostic. scripts/lint.sh -> lint: PASS post-move. Adversary moves its own REVIEW*.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
98 lines
6.0 KiB
Markdown
98 lines
6.0 KiB
Markdown
# Installing cc-ci from scratch
|
|
|
|
> The full from-scratch rebuild is **verified** (Phase-1c / D8): a blank NixOS Incus VM, given the two
|
|
> repos + the single bootstrap age key, becomes a fully-converged cc-ci via one `nixos-rebuild switch`.
|
|
|
|
cc-ci is declared **entirely** as a NixOS flake — base config in this repo (`cc-ci`) and **all
|
|
secrets (incl. the wildcard TLS cert) sops-encrypted in a private companion repo `cc-ci-secrets`,
|
|
mounted as a git submodule at `secrets/`**. Bringing up the box is: **clone `--recursive` + provision
|
|
the one bootstrap age key + `nixos-rebuild switch`** + the external DNS/gateway — no manual
|
|
post-steps. The proxy (traefik), Drone, comment-bridge, dashboard and backupbot are deployed by
|
|
**idempotent-reconcile systemd oneshots** that converge the swarm on every activation/boot (and
|
|
self-heal drift), mirroring `swarm-init`; they are **serialized** (proxy→drone→bridge→dashboard→
|
|
backupbot) so a single switch converges on a blank host. Target: a NixOS 24.11 host reachable over SSH (root).
|
|
*(Verified on a throwaway Incus VM: blank host + the two repos + the age key → one `nixos-rebuild
|
|
switch` → fully converged cc-ci, 0 failed units — see machine-docs/DECISIONS.md Phase-1c / D8.)*
|
|
|
|
## Preconditions
|
|
|
|
**The one out-of-band secret (provision before the first rebuild):**
|
|
- The **bootstrap age key** at `/var/lib/sops-nix/key.txt` (mode 0600). It must be a sops recipient
|
|
of `cc-ci-secrets/secrets.yaml`. Two cases:
|
|
- **Canonical cc-ci:** its SSH host key is already a recipient — also works via `age.sshKeyPaths`;
|
|
the keyFile holds the host-derived age identity (`ssh-to-age -private-key -i
|
|
/etc/ssh/ssh_host_ed25519_key`).
|
|
- **A fresh/cloned host** (different SSH host key, not a recipient): provision the **off-box
|
|
recovery age key** (`age1cmk26…`'s private half) there — it decrypts every secret incl. the cert.
|
|
Everything else (cert, Drone OAuth/RPC, webhook HMAC) is sops-encrypted **in git** — nothing else
|
|
is provisioned out-of-band.
|
|
|
|
**External infra (operator-owned, not on the box — class-A1):**
|
|
- DNS: `*.ci.commoninternet.net` (+ bare) → the **gateway**, which TLS-passthroughs (SNI) to cc-ci.
|
|
- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `nix/modules/swarm.nix`).
|
|
- The wildcard cert is **renewed out-of-band** by the operator, who then re-encrypts it into
|
|
`cc-ci-secrets` (sops) and rebuilds — the Gandi DNS token never touches the box; **never ACME here.**
|
|
|
|
## 1. Apply the NixOS flake (this is the whole install)
|
|
|
|
The flake (`flake.nix`, `nix/hosts/cc-ci/`, `nix/modules/`) declares: base host, sops-nix (decrypts via the
|
|
host SSH key), Docker + single-node Swarm + the `proxy` overlay + firewall 80/443
|
|
(`nix/modules/swarm.nix`), abra (`nix/modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot**
|
|
(`nix/modules/proxy.nix`), the **Drone server reconcile oneshot** (`nix/modules/drone.nix`), and the
|
|
**Drone exec runner** (`nix/modules/drone-runner.nix`).
|
|
|
|
```sh
|
|
# 1. Clone base + the private secrets submodule (bot/deploy creds for cc-ci-secrets).
|
|
# The submodule provides secrets/secrets.yaml (sops). Use a credential that can read
|
|
# recipe-maintainers/cc-ci-secrets, e.g. a per-command header (never persisted):
|
|
git clone --recursive https://git.autonomic.zone/recipe-maintainers/cc-ci.git /root/cc-ci
|
|
# (if cloned non-recursively: git -C /root/cc-ci submodule update --init)
|
|
|
|
# 2. Provision the bootstrap age key (see Preconditions) — the ONE out-of-band secret:
|
|
install -m700 -d /var/lib/sops-nix
|
|
install -m600 /path/to/bootstrap-age-key /var/lib/sops-nix/key.txt
|
|
|
|
# 3. One nixos-rebuild switch. NOTE: ?submodules=1 so the git flake includes secrets/.
|
|
nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'
|
|
```
|
|
|
|
On activation sops-nix decrypts every secret (incl. the wildcard cert → `/var/lib/ci-certs/live/`),
|
|
then the serialized reconcile oneshots converge the swarm. Verify:
|
|
|
|
```sh
|
|
systemctl is-system-running # -> running (0 failed units)
|
|
docker service ls # traefik app+socket-proxy, drone, bridge, dashboard, backups — all 1/1
|
|
# cert is sops-decrypted FROM GIT to the path traefik serves:
|
|
sha256sum /var/lib/ci-certs/live/fullchain.pem # symlink -> /run/secrets/wildcard_cert
|
|
# TLS served from the git cert, verified locally on the host (SNI ci.commoninternet.net):
|
|
curl -s --resolve probe.ci.commoninternet.net:443:127.0.0.1 \
|
|
-o /dev/null -w 'ssl_verify=%{ssl_verify_result}\n' https://probe.ci.commoninternet.net/ # -> 0
|
|
# (the served leaf fingerprint == the cert in cc-ci-secrets)
|
|
```
|
|
|
|
> Tip: when driving the switch over an SSH session that rides Tailscale, run it as a detached unit so
|
|
> it survives the tailscale restart during activation, and use the absolute flake ref:
|
|
> `systemd-run --no-block --unit=ccci-sw --property=Type=oneshot nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'`
|
|
> *(On the canonical cc-ci the build source is synced from the admin's clone via `tar | ssh` and built
|
|
> as a `path:` flake — no submodule fetch needed there; the `?submodules=1` form is for a git clone.)*
|
|
|
|
## 2. One-time: link Drone ↔ Gitea (OAuth grant)
|
|
|
|
The only manual post-rebuild step. Drone needs the bot's Gitea OAuth token (granted by an
|
|
interactive login) before it can sync/clone repos; this can't be Nix-declared without putting the
|
|
bot password on the box. The token then persists in Drone's `data` volume.
|
|
|
|
```sh
|
|
GITEA_USERNAME=autonomic-bot GITEA_PASSWORD=… bash scripts/bootstrap-drone-oauth.sh
|
|
# -> "drone login ok (admin=true)" / "repo recipe-maintainers/cc-ci active=true"
|
|
```
|
|
|
|
Verify a build runs green: push any commit to the cc-ci repo and watch
|
|
`https://drone.ci.commoninternet.net` (or the API) — the push webhook (set on activation) triggers
|
|
the `.drone.yml` self-test on the exec runner.
|
|
|
|
## 3. (later milestones) comment-bridge, dashboard, recipe enrollment
|
|
|
|
See `docs/enroll-recipe.md` (D5), `docs/secrets.md` (D6), `docs/runbook.md`. Each new piece of infra
|
|
is added as another idempotent reconcile oneshot, so this install stays a single `nixos-rebuild`.
|