The orchestrator Pi is retired (2026-05-31). All agents now run on the cc-ci-orchestrator VM (NixOS, loops user, /srv/cc-ci). The VM is a direct tailnet peer to cc-ci — no SOCKS proxy, no userspace tailscaled, no ProxyCommand. Updated across all affected files: AGENTS.md - Remove Pi from reboot description; migration complete (not "parked") - cc-ci access: direct ssh, not via proxy kickoff.md - Prerequisites: direct tailnet peer, not proxy - Host deps: NixOS (not apt) - Fallback/Incus: b1 reachable directly, no --proxy curl flag plan.md §1 + §1.5 - §1 bootstrap: direct SSH, check tailscale status (not restart proxy) - §1.5 intro: "VM" not "sandbox host"; no proxy - Credentials table: remove TS_AUTH_KEY row; update cc-ci SSH row - Replace "Tailscale connection (proxy)" subsection with direct-peer description plan-orchestrator-migration.md - Mark COMPLETE (2026-05-31); historical record only plan-phase1c-full-reproducibility.md - Incus access: direct, not via SOCKS proxy prompts/builder.md + prompts/adversary.md - cc-ci access language only: direct ssh, no proxy restart instructions - adversary: *.ci.commoninternet.net via plain curl, no proxy flag REBOOTS.md - Retitle for VM; note Pi retired; Pi entries marked historical systemd/cc-ci-loops.service - User/Group/HOME/PATH: notplants → loops - Remove cc-ci-tailscaled.service dependency (no proxy on VM) - Add note about nix/configuration.nix as the authoritative VM declaration test-e2e-testme-acceptance.md - tailscale status: no --socket flag - ssh to throwaway: no ProxyCommand Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
146 lines
8.5 KiB
Markdown
146 lines
8.5 KiB
Markdown
# cc-ci — Kickoff & Launch
|
|
|
|
Everything needed to start the autonomous cc-ci build loop. The substance lives in `plan.md`;
|
|
this file explains how to launch and supervise the two agents.
|
|
|
|
## Folder contents
|
|
|
|
```
|
|
cc-ci-plan/
|
|
├── plan.md # THE plan — single source of truth (read this in full)
|
|
├── brief.md # original one-page brief (context only; superseded by plan.md)
|
|
├── kickoff.md # this file — how to launch & supervise
|
|
├── launch.sh # starts both loops + watchdog, stops on ## DONE
|
|
└── prompts/
|
|
├── builder.md # Builder loop prompt (fed to claude by launch.sh)
|
|
└── adversary.md # Adversary loop prompt
|
|
```
|
|
|
|
> Note: `/srv/cc-ci/cc-ci-plan/` (this folder) is the **planning + launch material**. The actual
|
|
> CI project — NixOS config, runner, tests — lives in a **separate git repo** the Builder creates
|
|
> at `git.autonomic.zone/recipe-maintainers/cc-ci`, cloned to `/srv/cc-ci/cc-ci` (Builder) and
|
|
> `/srv/cc-ci/cc-ci-adv` (Adversary). Don't confuse the two.
|
|
|
|
## Model: two independent loops (plan §6 / §6.1)
|
|
|
|
- **Builder** — builds the CI server; owns code + `STATUS.md`/`JOURNAL.md`/`DECISIONS.md` + the
|
|
`## Build backlog` section of `BACKLOG.md`.
|
|
- **Adversary** — independently disbelieves and re-verifies; owns `REVIEW.md` + `## Adversary
|
|
findings`. Holds veto over `## DONE`.
|
|
|
|
They run as two separate processes and coordinate **only** through the git repo. Single-writer file
|
|
ownership keeps concurrent pushes merge-clean.
|
|
|
|
## Two layers of "looping" — and why you want both
|
|
|
|
| Concern | Mechanism | Who provides it |
|
|
|---|---|---|
|
|
| **Iteration** — keep doing one unit of work, then wake again | `/loop` self-paced (ScheduleWakeup), per plan §7 pacing | each agent, in-session |
|
|
| **Resilience** — restart a loop whose process/sandbox died; stop all on `## DONE` | `launch.sh` watchdog (tmux + git poll) | this script |
|
|
| **Handoff signalling** — wake the *waiting* loop the moment its counterpart hands off | watchdog `handoff_check` (~30 s): Builder writes a `CLAIMED` gate → ping Adversary to verify; Adversary updates `REVIEW.md` → ping Builder to proceed | this script |
|
|
|
|
`/loop` alone is bound to its process: if the sandbox restarts, that loop is gone until something
|
|
relaunches it. The watchdog is that something. It also closes the **double-idle gap**: instead of a
|
|
pending gate/verdict sitting until the other loop's next scheduled wake, the watchdog pings the
|
|
waiting loop within ~30 s — so the Adversary can idle freely when nothing's pending (no busy-polling
|
|
or pointless re-verifying) yet still start verifying right after the Builder parks at a gate. Use all three.
|
|
|
|
## Launch
|
|
|
|
```bash
|
|
cd /srv/cc-ci/cc-ci-plan
|
|
|
|
# Optional but recommended once the repo exists, so the watchdog can detect ## DONE:
|
|
export CC_CI_REPO=https://git.autonomic.zone/recipe-maintainers/cc-ci.git
|
|
|
|
./launch.sh start # starts cc-ci-builder + cc-ci-adv + cc-ci-watchdog (tmux sessions)
|
|
./launch.sh status # session + DONE state
|
|
./launch.sh logs builder # tail a loop; also: logs adversary | logs watchdog
|
|
tmux attach -t cc-ci-builder # watch a loop live locally (detach: Ctrl-b d)
|
|
./launch.sh stop # stop everything
|
|
```
|
|
|
|
`launch.sh` is idempotent — re-running `start` won't duplicate a live session. Each agent runs as an
|
|
**interactive** `claude` in tmux (kickoff prompt passed as a positional arg, *not* piped — piping
|
|
forces print mode and breaks `/loop`). With `REMOTE_CONTROL=1` (default) each agent is launched with
|
|
`--remote-control`, so you can **watch and steer both loops from [claude.ai/code](https://claude.ai/code)**
|
|
(or the Claude mobile app) — not just via `tmux attach`. The box must be logged into the claude.ai
|
|
account (`claude auth status`); set `REMOTE_CONTROL=0` to skip the remote surface. The watchdog
|
|
(default every 300s) restarts any dead session — note a >~10-min network outage will exit the
|
|
`claude` process, after which the watchdog brings it back (a fresh remote-control session) — and
|
|
when `STATUS.md` shows `## DONE`, it kills the loops and exits.
|
|
|
|
Prerequisites the sessions inherit from your shell: SSH (root) to `cc-ci` directly (the orchestrator
|
|
VM is a direct tailnet peer — no proxy; §1.5), Gitea bot creds, and `git.autonomic.zone` access. Plus **preconfigured** operator inputs the
|
|
loop depends on (plan §4.0/§4.4): the wildcard `*.ci.commoninternet.net` DNS record pointing at a
|
|
gateway that TLS-passthroughs to cc-ci, and the **pre-issued wildcard cert** at
|
|
`/var/lib/ci-certs/live/` on cc-ci. The operator owns the DNS record + gateway + cert
|
|
issuance/renewal; the agent builds Traefik (file provider → that cert) + routing on cc-ci and does
|
|
**no ACME**. If any prerequisite is absent, the Builder parks at `STATUS.md ## Blocked` (plan §1/§9)
|
|
rather than improvise.
|
|
|
|
> Host deps: `launch.sh` needs **tmux** (and `claude`) — both are installed on the VM (NixOS). The script's `*_DIR`
|
|
> defaults now point at `/srv/cc-ci/...` (Builder clone `/srv/cc-ci/cc-ci`, Adversary
|
|
> `/srv/cc-ci/cc-ci-adv`); override the `*_DIR` env vars only if your layout differs.
|
|
|
|
## Optional: a cloud-side `/schedule` watchdog
|
|
|
|
`launch.sh`'s watchdog is itself a local process — if the *whole host* goes down it stops too. For
|
|
belt-and-suspenders durability, also create a `/schedule` routine (a remote agent that fires on a
|
|
cron and re-orients from the repo). From inside a Claude session:
|
|
|
|
```
|
|
/schedule every 2 hours: read /srv/cc-ci/cc-ci-plan/plan.md §7 and the cc-ci repo STATUS.md; if the
|
|
Builder/Adversary loops are not making progress (or launch.sh is not running), restart them via
|
|
/srv/cc-ci/cc-ci-plan/launch.sh start; stop when STATUS.md says ## DONE.
|
|
```
|
|
|
|
This complements the local watchdog: scheduled runs are fresh, independent agents, so they survive
|
|
process/context death that would take the in-session `/loop` and the local watchdog with it.
|
|
|
|
## Fallback: restart/recreate the cc-ci VM (orchestrator only)
|
|
|
|
**This is primarily an escape hatch for *you*, the supervising orchestrator.** The loops normally
|
|
reconfigure cc-ci only from inside (via Nix); power-cycling or recreating the VM shouldn't be their
|
|
default move — but it's not forbidden if one gets genuinely stuck. Reach for this when cc-ci itself
|
|
is wedged at a level that can't be fixed from inside (won't boot, disk full, swarm/Docker corrupted,
|
|
unreachable even after a proxy restart): use the Incus skill to power-cycle or rebuild the VM, then
|
|
re-bootstrap.
|
|
|
|
`cc-nix-test` (the cc-ci server, tailnet `100.90.116.4`) is a **NixOS Incus VM** on host **b1**
|
|
(`100.117.251.31:8443`, Incus project `terraform-ci`). Skill + Terraform live at
|
|
`/srv/incus-terraform-nix-vm-creator/` (`skills/incus-terraform/SKILL.md`); read that for full usage.
|
|
|
|
- **Access:** b1 (`100.117.251.31`) is reachable directly from the orchestrator VM (same tailnet).
|
|
Use the mTLS certs at `terraform-secrets/` — no proxy. Quick check:
|
|
```bash
|
|
CRT=/srv/incus-terraform-nix-vm-creator/terraform-secrets/terraform.crt
|
|
KEY=/srv/incus-terraform-nix-vm-creator/terraform-secrets/terraform.key
|
|
curl --cert "$CRT" --key "$KEY" -k -s \
|
|
https://100.117.251.31:8443/1.0/instances/cc-nix-test/state?project=terraform-ci
|
|
```
|
|
- **Soft restart (keeps the disk — preferred):** `POST .../1.0/instances/cc-nix-test/state?project=terraform-ci`
|
|
with `{"action":"restart"}` (or `"stop"` / `"start"`).
|
|
- **Full recreate (last resort):** the Terraform module in `/srv/incus-terraform-nix-vm-creator/projects/`
|
|
(`terraform apply` with `-var incus_remote_address=100.117.251.31 -var incus_project=terraform-ci
|
|
-var ts_auth_key=$TSKEY`). ⚠ **Recreating wipes the VM disk** — you must then re-apply the cc-ci
|
|
preconditions: the pre-issued TLS cert into `/var/lib/ci-certs/live/` and the
|
|
`cc-ci-root-ed25519` pubkey into root's `authorized_keys` (see the access notes), and the loops
|
|
re-run §1 Bootstrap. Prefer a soft restart; only recreate if the VM is truly unrecoverable.
|
|
|
|
(Project cap: keep total RAM across `terraform-ci` instances under 10 GB — check before recreating.)
|
|
|
|
## Manual launch (no script)
|
|
|
|
If you'd rather not use `launch.sh`, start each agent interactively yourself (same result, no
|
|
supervision/restart), passing the prompt as a positional argument so the session stays interactive
|
|
and remote-controllable:
|
|
|
|
```bash
|
|
claude --remote-control 'cc-ci-builder' --dangerously-skip-permissions "$(cat prompts/builder.md)"
|
|
claude --remote-control 'cc-ci-adv' --dangerously-skip-permissions "$(cat prompts/adversary.md)"
|
|
```
|
|
|
|
Do **not** pipe the prompt (`cat prompts/builder.md | claude …`) — that forces print/headless mode,
|
|
which breaks `/loop` and remote control.
|