Now the workspace is staged on the Hetzner cpx22 (server 134487234, public 91.98.47.73, tailnet cc-ci-orchestrator-1 @ 100.84.190.30): - configuration.nix: enable cc-ci-loops.service (wantedBy multi-user.target) so the loops + watchdog auto-resume on boot; wire reboot-log.sh as ExecStartPre so reboots auto-log to REBOOTS.md (boot_id-gated). - plan-orchestrator-hetzner-migration.md: full migration record. - REBOOTS.md / AGENTS.md: point the orchestrator host at Hetzner; first auto-logged reboot line. - launch-orchestrator.sh: default session id -> the Hetzner orchestrator session. - flake.lock: pin inputs. Verified: nixos-rebuild switch applied; systemctl is-enabled cc-ci-loops.service = enabled; ExecStartPre logged this boot to REBOOTS.md; loops healthy on phase 2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.0 KiB
Plan/record — migrate the ORCHESTRATOR off the Incus VM onto a Hetzner cloud server
Status: COMPLETE (2026-05-31). The orchestrator (Builder/Adversary loops + watchdog + this
supervising session) now runs on a dedicated Hetzner cloud server, declared by the
cc-ci-orchestrator-hetzner flake host. Kept as a historical record.
Why: the previous orchestrator host was the Incus VM cc-ci-orchestrator on b1
(100.116.55.106, 2 GB / 2 vCPU, see plan-orchestrator-migration — the earlier Pi→Incus move).
A dedicated Hetzner box gives dedicated vCPU + NVMe and decouples the orchestrator from b1's hardware.
This is the orchestrator analogue of the cc-ci server move in plan-migrate-cc-ci-to-hetzner
(that one moves the CI server; this one moves the orchestrator that drives the loops).
Note on naming: this migration was carried out directly via
terraform/+ thecc-ci-orchestrator-hetznerflake host. It is not the same asplan-migrate-cc-ci-to-hetzner.md(the cc-ci CI server → Hetznercpx32) norplan-orchestrator-migration.md(Pi → Incus VM). All three are distinct moves; only this file records the orchestrator → Hetzner step.
The new host (facts)
| Provider / type | Hetzner Cloud cpx22 — AMD 2 vCPU / 4 GB, dedicated vCPU, NVMe |
| Location | nbg1 (cpx11/cpx21 are retired there — hence cpx22) |
| Hetzner server ID | 134487234 |
| Public IPv4 | 168.119.126.100 (IPv6 disabled) |
| Tailnet | cc-ci-orchestrator-1 @ 100.84.190.30 (taila4a0bf.ts.net); joins via /etc/ts-auth-key |
| OS | debian-12 image → nixos-infect → NixOS, converged by the flake |
| Flake host | nixosConfigurations.cc-ci-orchestrator-hetzner (flake.nix → nix/hosts/cc-ci-orchestrator-hetzner/{configuration,hardware}.nix) |
| Workspace | /srv/cc-ci-orch (this repo); /srv/cc-ci is a symlink to it. Loop clones: /srv/cc-ci/cc-ci, /srv/cc-ci/cc-ci-adv |
The login keys (root authorizedKeys) and swap (4 GB disk swap — 4 GB RAM is tight for 3+ claude
sessions) are declared in configuration.nix.
How it was provisioned (reproducible)
The whole box is reproducible from terraform/ + one nixos-rebuild:
terraform apply(terraform/main.tf):hcloud_servercpx22fromdebian-12innbg1,user_data = user-data.shruns nixos-infect on first boot (Debian→NixOS, reboot).- Stage 2 (
terraform/README.md): SSH in, capture the nixos-infect hardware config (→nix/hosts/cc-ci-orchestrator-hetzner/hardware.nix), then converge:# on the server, from the repo root (/srv/cc-ci-orch) nixos-rebuild switch --flake .#cc-ci-orchestrator-hetzner - Stage credentials (not in git, placed once):
/etc/ts-auth-key(tailnet join), the loops'~/.ssh/cc-ci-root-ed25519+.testenv, and the sops master age key.claude auth login(device code) is the one interactive step so the loops can run--remote-control. - Stage the workspace: clone this repo to
/srv/cc-ci-orch(symlink/srv/cc-ci), the Builder / Adversary clones,cc-ci-secrets,references/; copy.cc-ci-logs/.phase-idx(resume point).
Commit trail: 0103f36 (terraform + flake host, initial cpx11) → 17951b8 (fix → cpx22,
add lock) → c44b967 (real cpx22 hardware config from nixos-infect, server 134487234). Plus the
close-out commit below (root keys, drop tailscale --ssh, enable the loops service, this doc).
Reboot-resilience (the point of running on a managed host)
configuration.nix declares systemd.services.cc-ci-loops — a oneshot that runs
launch.sh start with RESUME_PHASE=1 after network-online/tailscaled, bringing the loops +
watchdog back on boot. It was authored disabled ("defined but NOT enabled until workspace is
staged") with wantedBy commented out. Close-out (2026-05-31): the workspace is staged and the
loops are running, so wantedBy = [ "multi-user.target" ] was uncommented and nixos-rebuild switch
re-run → systemctl is-enabled cc-ci-loops.service = enabled. A reboot is now a non-event:
systemd resumes the saved phase. (reboot-log.sh, the ExecStartPre, appends to
REBOOTS.md boot_id-gated.)
Caveat seen at first boot on this host: the loops were initially started by hand during staging (not by the service), so the first boot did NOT log to
REBOOTS.mdand the service showedlinked/not-enabled. EnablingwantedBy(above) is what wires the automatic path.
Status of the migration
- ✅ Hetzner
cpx22provisioned + converged from the flake (terraform + nixos-infect + one rebuild). - ✅ On the tailnet (
cc-ci-orchestrator-1) and ssh-able on the public IP. - ✅ Loops + Adversary + watchdog running; phase sequence auto-advancing (watchdog on per-phase
## DONE). - ✅
cc-ci-loops.serviceenabled → reboot-resilient. - ◻︎ Old Incus orchestrator VM (
100.116.55.106) — keep as cold standby a few days, then delete. - ◻︎ Rotate the tailnet name once the old
cc-ci-orchestratorpeer is gone (this box is…-1).