Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
84 lines
5.1 KiB
Markdown
84 lines
5.1 KiB
Markdown
# cc-ci-autonomous-orchestrator — AGENTS.md
|
|
|
|
This folder is the **orchestrator** workspace for building the **cc-ci** Co-op Cloud recipe CI
|
|
server. It holds the plan, the launch/supervision tooling, and the two loop prompts. The actual CI
|
|
project (NixOS config, test runner, recipe tests) lives in a **separate** repo the loops create at
|
|
`git.autonomic.zone/recipe-maintainers/cc-ci` — do not confuse the two.
|
|
|
|
## Three roles (don't conflate them)
|
|
|
|
1. **Orchestrator** — *this* session/role. Supervises: checks in on the two loops, reads their
|
|
logs/STATUS, makes changes to the plan/prompts, restarts loops, and owns the VM-level fallback.
|
|
It is **separate** from the loops and is the only role that should power-cycle/recreate the VM.
|
|
2. **Builder loop** — builds the CI server (`cc-ci-plan/prompts/builder.md`).
|
|
3. **Adversary loop** — independently disbelieves/verifies (`cc-ci-plan/prompts/adversary.md`).
|
|
|
|
The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator
|
|
watches from outside.
|
|
|
|
## On startup: announce yourself + report reboots
|
|
|
|
**Every time you (the orchestrator) start or resume, send a `PushNotification`** that you are online —
|
|
the operator wants to know the supervising session is back (especially after a reboot, which kills
|
|
this session along with the Pi). Include the current phase and the reboot count. Steps on startup:
|
|
1. Read `cc-ci-plan/REBOOTS.md` (count the `## Reboots` entries) and `cc-ci-plan/launch.sh status`
|
|
(current phase + whether the loops/watchdog are running).
|
|
2. `PushNotification` (proactive), e.g.: *"cc-ci orchestrator online — phase 2, loops+watchdog
|
|
running; N reboots logged (last <date>)."*
|
|
3. If a reboot happened while you were away (a new line in REBOOTS.md since you last looked, or the
|
|
loops are down), check that `cc-ci-loops.service` brought the loops back; if not, relaunch with
|
|
`RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
|
|
|
|
Reboot resilience is handled by **`cc-ci-loops.service`** (system unit): on boot it logs the reboot
|
|
to `REBOOTS.md` (boot_id-gated) and runs `launch.sh start` with `RESUME_PHASE=1`, so the loops +
|
|
watchdog auto-resume the saved phase. The orchestrator session itself is NOT auto-started — the
|
|
operator reconnects to it (that's why the startup notification matters). The fuller "move the
|
|
orchestrator onto its own VM" plan is parked at `cc-ci-plan/plan-orchestrator-migration.md`.
|
|
|
|
## Keep the orchestrator open, under remote-control
|
|
|
|
Run this session as a long-lived **interactive** session with `--remote-control` so the operator can
|
|
check in on the loops and steer/restart things from **claude.ai/code** (or the Claude mobile app)
|
|
without being at the terminal.
|
|
|
|
- **Already in the session?** Just run `/remote-control` — it attaches claude.ai/code to the live
|
|
conversation (no exit, no resume needed).
|
|
- **Starting fresh:** `claude --remote-control 'autonomous-orchestrator' --dangerously-skip-permissions`
|
|
- **Resuming this orchestrator later (history preserved):**
|
|
```bash
|
|
claude --resume autonomous-orchestrator --remote-control "autonomous-orchestrator" --dangerously-skip-permissions
|
|
```
|
|
Note the two names are different: `--resume <name|id>` restores *this conversation* (the name set
|
|
via `-n/--name`, shown in the `/resume` picker); the `--remote-control [name]` value is only the
|
|
web display label and resumes nothing. The conversation persists on disk across exits; remote
|
|
control itself only stays "connected" while the local process is alive (resume + re-enable to get
|
|
it back after a full exit).
|
|
|
|
Use it to: tail loop logs (`cc-ci-plan/launch.sh logs builder|adversary|watchdog`), inspect
|
|
`STATUS.md`/`REVIEW.md` in the cc-ci repo, edit the plan or prompts, restart a stuck loop, or
|
|
power-cycle/recreate the cc-ci VM (see `cc-ci-plan/kickoff.md` → "Fallback: restart/recreate the
|
|
cc-ci VM"). The orchestrator is the human's steering wheel; the loops are the engine.
|
|
|
|
## Launch & supervise the loops
|
|
|
|
- **Source of truth for the loops:** `cc-ci-plan/plan.md` (mission, Definition of Done, §1.5
|
|
credential map, §6 two-agent protocol, §7 loop discipline).
|
|
- **Launch/supervision guide:** `cc-ci-plan/kickoff.md`.
|
|
- `cc-ci-plan/launch.sh start` → both loops (interactive `--remote-control` in tmux) + a watchdog.
|
|
tmux is installed; `launch.sh` defaults now point at `/srv/cc-ci/...`.
|
|
|
|
## Access & credentials (pointers only — values are gitignored)
|
|
|
|
- `.testenv` (**NOT committed**): Tailscale auth key + Gitea bot creds. Load with
|
|
`set -a; . .testenv; set +a` (never echo the values).
|
|
- **cc-ci:** `ssh cc-ci` (root) tunnels through the persistent userspace-tailscaled SOCKS proxy on
|
|
`127.0.0.1:1055` (`cc-ci-tailscaled.service`). If down: `sudo systemctl restart cc-ci-tailscaled`.
|
|
- **Incus/VM fallback:** mTLS certs at `/srv/incus-terraform-nix-vm-creator/terraform-secrets/`;
|
|
b1 is on the same tailnet (reach via the same proxy). See kickoff "Fallback".
|
|
- **Full credential map + how to use each:** `plan.md` §1.5.
|
|
|
|
## Hard rule
|
|
|
|
Never commit secret values. `.testenv`, `*.tfstate`, `*.key`/`*.pem`, and the loop runtime/clone
|
|
dirs are gitignored. Reference secret *locations*, never their contents (`plan.md` §9).
|