Root-cause fix for the 2026-07-03 run stalling: the cc-ci host disk filled to 100% (ENOSPC) mid-run (Wave 6, lasuite-drive), the agent stopped to reclaim space, and nothing resumed it — the log-idle/429 watchdog only covers opencode-go usage-limit stalls, not an environmental wedge. - launch-upgrader.py: step-0 prereclaim_cc_ci() prunes STALE cc-ci docker images (unused AND older than a week, so this week's likely-reused images stay) before each weekly run. Best-effort; env-tunable (UPGRADER_PRERECLAIM*). - launch-supervisor.py (new): hourly glm-5.2 orchestrator wake-up. Cheap deterministic gate — no-ops (zero tokens) when the run is complete or progressing; only when a run stalled/died before completing does it launch a short-lived glm-5.2 agent to diagnose + drive it to a clean DONE. Progress is judged by live run-proc + log mtime (session_busy() is claude-tuned and misreads a headless opencode run as idle). - configuration.nix: cc-ci-upgrade-supervisor service + hourly timer (:07). - upgrade-all SKILL §0: note the stale-image reclaim for manual runs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WxbpH3DquKzoSTSwGvGuET
cc-ci-orchestrator
⚠️ HISTORICAL. This README describes the retired Incus VM (
100.116.55.106). The orchestrator now runs on Hetzner — the live host config isnix/hosts/cc-ci-orchestrator-hetzner/configuration.nix. Seecc-ci-plan/plan-orchestrator-hetzner-migration.mdfor the current setup. Kept for history.
NixOS config for the cc-ci-orchestrator Incus VM (b1, project terraform-ci, tailnet
100.116.55.106) — the reboot-resilient host for the cc-ci Builder/Adversary loops + watchdog +
orchestrator session, moved off the unstable 905 MiB Pi.
See cc-ci-plan/plan-orchestrator-migration.md for the full migration.
Files
configuration.nix— the VM's NixOS config (channel-based,nixos-24.11). Deployed to/etc/nixos/configuration.nixon the VM. Provides: nix-ld (so the standalone Claude Code Bun binary runs), tmux/git/python/jq + tools, a 4 GB swapfile, direct ssh to cc-ci (the VM is a tailnet peer — no SOCKS proxy needed, unlike the Pi), an idempotentclaude-installoneshot, and thecc-ci-loopssupervisor service (defined, enabled in Phase D once the workspace is staged).
Deploy (until this is wired to a flake/auto-pull)
# copy configuration.nix to the VM, then:
ssh cc-ci-orchestrator 'nixos-rebuild switch' # or run detached: see below
Over the (currently flaky) Pi→VM link, run the rebuild detached on the VM so an ssh/proxy drop
doesn't abort it, e.g. systemd-run --unit=orch-rebuild --collect nixos-rebuild switch then poll
journalctl -u orch-rebuild.
Status
- Phase A: VM created (2 GB / 2 vCPU / 30 GB), on tailnet, ssh-able. ✅
- Phase B: this config (DRAFT) — nix-ld/claude validation pending on the VM.
- Operator step pending (Phase C):
claude auth loginon the VM (device-code; can't be scripted). - Secrets to stage (Phase C, out-of-band):
/srv/cc-ci/.testenv,~/.ssh/cc-ci-root-ed25519, Incus mTLS certs, the sops master age key.