# Plan — migrate the cc-ci SERVER from the Incus VM to Hetzner (provision → benchmark → cutover → retire) **Status:** PROPOSED. Move the **cc-ci CI server** off the Incus VM `cc-nix-test` (b1: 2015 i5-6400T + **spinning HDD**, CPU-pressure ~55%, getting very slow) onto a **Hetzner `cpx32`** (4 vCPU / 8 GB / 160 GB **NVMe**, x86, ~€16.49/mo). Everything (Builder, Adversary, the !testme pipeline) then targets the fast new server. **This file:** `/srv/cc-ci/cc-ci-plan/plan-migrate-cc-ci-to-hetzner.md`. **Key enabler (verified 2026-05-31):** the bootstrap age key is **already on this VM** at `/srv/cc-ci/.sops/master-age.txt` and the `cc-ci-secrets` submodule is populated — so the new server can be **fully provisioned end-to-end with NO operator secret-blocker** (the D8 flow decrypts the TLS cert + all secrets). The Pi is not needed. **Architecture reminder:** the Builder/Adversary **loops run on this orchestrator VM** and reach the CI server via `ssh cc-ci`; the **!testme pipeline (Gitea webhook → bridge → Drone → harness) runs ON the cc-ci server**, and `*.ci.commoninternet.net` + the dashboard are served from it. "Switch everything to the new server" = make the Hetzner box the cc-ci, then repoint `ssh cc-ci`, the webhook/DNS, and the dashboard at it. The loops' code/clones don't move — only their target. --- ## Phase 1 — Provision the new Hetzner cc-ci, fully converged (assistant) Per **`plan-cc-ci-hetzner-terraform.md`** (the provisioning detail): `terraform/` in the cc-ci repo → `hcloud` `cpx32` from `ubuntu-24.04` → **pinned nixos-infect** → bare NixOS → add the **`cc-ci-hetzner` flake host** (the nixos-infect-generated DO/Hetzner hardware + the shared `nix/modules/*`) → run the **D8 flow**: clone `--recursive`, place `/srv/cc-ci/.sops/master-age.txt` at `/var/lib/sops-nix/key.txt`, `nixos-rebuild switch --flake .#cc-ci-hetzner`. The server joins the tailnet (TS_AUTH_KEY). - **Accept:** 0 failed units; traefik/drone/bridge/dashboard/backupbot up; the box is on the tailnet and ssh-able; terraform is idempotent (`plan` clean). This is a **real** server we keep (not the throwaway the terraform-plan first described) — do **not** `terraform destroy` once it converges. - Done in **parallel** — the old Incus cc-ci keeps serving the loops until Phase 3. ## Phase 2 — Benchmark: old vs new, two recipes (a short report) Pick **two representative recipes** — one light (e.g. `n8n` or `custom-html`) and one heavy/slow (e.g. `ghost` or `discourse` — the HDD-bound timeout cases). Run the **same full harness** (cold, install+upgrade+backup+restore+custom) on **both servers**: - old: `ssh cc-ci-incus` (the current `cc-nix-test`), new: `ssh cc-ci-hetzner`. - Capture **per-tier + total wall-clock** from the `RUN SUMMARY` for each recipe on each host. Write a short comparison report → **`docs/perf/hetzner-vs-incus.md`** in the cc-ci repo (table: recipe × tier × old-time × new-time × speedup). This empirically confirms the expected ~2–4× (more on the I/O-bound phases). *(Run identical conditions — same recipe versions, cold cache both sides.)* ## Phase 3 — Cutover: point everything at the new server (orchestrated; pick a quiet moment) 1. **Quiesce briefly:** ensure no live `!testme`/deploy is mid-run on the old server. 2. **Repoint the loops' `ssh cc-ci`** → the Hetzner box's tailnet IP: update `Host cc-ci` in `/home/loops/.ssh/config` (and root's) `HostName` → new IP. The loops keep working from this VM; only their target changes. (Keep a `Host cc-ci-incus` alias for the old box during the overlap.) 3. **DNS / webhook / gateway:** point `ci.commoninternet.net` + the `*.ci` wildcard **A record at the Hetzner public IP** (drop the TLS-passthrough gateway — Traefik on the droplet terminates directly; the sops wildcard cert works as-is). Re-point the Gitea `issue_comment` webhook → the new server so `!testme` triggers there. **DNS is operator-owned (`commoninternet.net`)** — the one operator step. 4. **Verify end-to-end on the new server:** a real PR `!testme` runs green through the new bridge→Drone→harness; the dashboard + `*.ci.commoninternet.net` load; the loops' `ssh cc-ci` deploys land on Hetzner. Re-run the relevant D-gates cold-verified by the Adversary. 5. Make `cc-ci-hetzner` the **canonical** `nixosConfigurations.cc-ci` in the flake (retire the Incus `hardware.nix` once the old box is gone). ## Phase 4 — Retire the old Incus cc-nix-test Once Hetzner is the verified live cc-ci: **stop** the Incus VM via the b1 Incus API (mTLS certs are on this VM under `incus-terraform-nix-vm-creator/terraform-secrets/`) — `PUT .../instances/cc-nix-test/ state {"action":"stop"}`. Keep it as a **cold standby for a few days**, then delete (frees b1). Update the memory/docs ([[cc-ci-setup]]) to point cc-ci at Hetzner. ## Who does what - **Assistant:** Phase 1 (the terraform + full convergence) and the Phase-2 benchmark runs. - **Orchestrator (me) + operator:** Phase 3 cutover (I do the ssh-repoint + the Incus stop via the API; **operator does the DNS change** + the go/no-go) and Phase 4. ## Guardrails - **Parallel bring-up** — never break the running Incus cc-ci until Hetzner is verified green; the cutover is the only switch moment, at a quiet point. - **No secrets in git** — `HCLOUD_TOKEN`, TS key, the age key (`.sops/`), tfstate all gitignored (`.gitignore` hardened for `*age*.txt`/`.sops/`); never echo/commit them. - **x86 `cpx32`**, pin the hcloud provider + nixos-infect rev (nixpkgs already pinned). - **Reproducible-from-scratch holds** (the D8 guarantee) — the Hetzner cc-ci comes from `terraform apply` + one `nixos-rebuild switch`, no hand steps beyond the operator DNS + age key. ## Definition of Done - Hetzner `cpx32` cc-ci fully converged (0 failed units) via terraform + the D8 flake flow. - `docs/perf/hetzner-vs-incus.md` shows the two-recipe old-vs-new comparison (real numbers). - The loops, `!testme` pipeline, dashboard, and `*.ci.commoninternet.net` all run on Hetzner; a PR `!testme` is green end-to-end there; D-gates re-verified. - The Incus `cc-nix-test` is stopped (cold standby → deletion); the flake's canonical `cc-ci` host is Hetzner; docs/memory updated.