Files
cc-ci-orchestrator/AGENTS.md
autonomic-bot 36a6c9872a orchestrator: reboot-resilience + session auto-resume + full session plan/tooling
Reboot survival for the Pi orchestrator host:
- systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot
  records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the
  orchestrator session.
- reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count).
- launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed
  orchestrator announces itself (PushNotification) + reports reboots.
- AGENTS.md: on-startup notify routine documented.

Plans/tooling accumulated this session:
- plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review),
  sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance.
- launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution,
  limit-stall re-nudge, INBOX side-channel detection.
- plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED.
- prompts: isolation discipline + INBOX + pacing.
- .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 20:28:10 +01:00

5.1 KiB

cc-ci-autonomous-orchestrator — AGENTS.md

This folder is the orchestrator workspace for building the cc-ci Co-op Cloud recipe CI server. It holds the plan, the launch/supervision tooling, and the two loop prompts. The actual CI project (NixOS config, test runner, recipe tests) lives in a separate repo the loops create at git.autonomic.zone/recipe-maintainers/cc-ci — do not confuse the two.

Three roles (don't conflate them)

  1. Orchestratorthis session/role. Supervises: checks in on the two loops, reads their logs/STATUS, makes changes to the plan/prompts, restarts loops, and owns the VM-level fallback. It is separate from the loops and is the only role that should power-cycle/recreate the VM.
  2. Builder loop — builds the CI server (cc-ci-plan/prompts/builder.md).
  3. Adversary loop — independently disbelieves/verifies (cc-ci-plan/prompts/adversary.md).

The two loops coordinate only through the cc-ci git repo (see plan.md §6.1). The orchestrator watches from outside.

On startup: announce yourself + report reboots

Every time you (the orchestrator) start or resume, send a PushNotification that you are online — the operator wants to know the supervising session is back (especially after a reboot, which kills this session along with the Pi). Include the current phase and the reboot count. Steps on startup:

  1. Read cc-ci-plan/REBOOTS.md (count the ## Reboots entries) and cc-ci-plan/launch.sh status (current phase + whether the loops/watchdog are running).
  2. PushNotification (proactive), e.g.: "cc-ci orchestrator online — phase 2, loops+watchdog running; N reboots logged (last )."
  3. If a reboot happened while you were away (a new line in REBOOTS.md since you last looked, or the loops are down), check that cc-ci-loops.service brought the loops back; if not, relaunch with RESUME_PHASE=1 cc-ci-plan/launch.sh start.

Reboot resilience is handled by cc-ci-loops.service (system unit): on boot it logs the reboot to REBOOTS.md (boot_id-gated) and runs launch.sh start with RESUME_PHASE=1, so the loops + watchdog auto-resume the saved phase. The orchestrator session itself is NOT auto-started — the operator reconnects to it (that's why the startup notification matters). The fuller "move the orchestrator onto its own VM" plan is parked at cc-ci-plan/plan-orchestrator-migration.md.

Keep the orchestrator open, under remote-control

Run this session as a long-lived interactive session with --remote-control so the operator can check in on the loops and steer/restart things from claude.ai/code (or the Claude mobile app) without being at the terminal.

  • Already in the session? Just run /remote-control — it attaches claude.ai/code to the live conversation (no exit, no resume needed).
  • Starting fresh: claude --remote-control 'autonomous-orchestrator' --dangerously-skip-permissions
  • Resuming this orchestrator later (history preserved):
    claude --resume autonomous-orchestrator --remote-control "autonomous-orchestrator" --dangerously-skip-permissions
    
    Note the two names are different: --resume <name|id> restores this conversation (the name set via -n/--name, shown in the /resume picker); the --remote-control [name] value is only the web display label and resumes nothing. The conversation persists on disk across exits; remote control itself only stays "connected" while the local process is alive (resume + re-enable to get it back after a full exit).

Use it to: tail loop logs (cc-ci-plan/launch.sh logs builder|adversary|watchdog), inspect STATUS.md/REVIEW.md in the cc-ci repo, edit the plan or prompts, restart a stuck loop, or power-cycle/recreate the cc-ci VM (see cc-ci-plan/kickoff.md → "Fallback: restart/recreate the cc-ci VM"). The orchestrator is the human's steering wheel; the loops are the engine.

Launch & supervise the loops

  • Source of truth for the loops: cc-ci-plan/plan.md (mission, Definition of Done, §1.5 credential map, §6 two-agent protocol, §7 loop discipline).
  • Launch/supervision guide: cc-ci-plan/kickoff.md.
  • cc-ci-plan/launch.sh start → both loops (interactive --remote-control in tmux) + a watchdog. tmux is installed; launch.sh defaults now point at /srv/cc-ci/....

Access & credentials (pointers only — values are gitignored)

  • .testenv (NOT committed): Tailscale auth key + Gitea bot creds. Load with set -a; . .testenv; set +a (never echo the values).
  • cc-ci: ssh cc-ci (root) tunnels through the persistent userspace-tailscaled SOCKS proxy on 127.0.0.1:1055 (cc-ci-tailscaled.service). If down: sudo systemctl restart cc-ci-tailscaled.
  • Incus/VM fallback: mTLS certs at /srv/incus-terraform-nix-vm-creator/terraform-secrets/; b1 is on the same tailnet (reach via the same proxy). See kickoff "Fallback".
  • Full credential map + how to use each: plan.md §1.5.

Hard rule

Never commit secret values. .testenv, *.tfstate, *.key/*.pem, and the loop runtime/clone dirs are gitignored. Reference secret locations, never their contents (plan.md §9).