Files
agent-orchestrator/README.md
autonomic-bot 289ef07df4 feat: agent-orchestrator v0.1.0 — generic multi-agent harness
Extracted and generalized from a project-specific agent launch engine. No project
specifics remain in code: paths, the loop kickoff preamble, handoff conventions, and the
on-complete hook are all config/template driven; session_prefix + log_dir are required.

- agents.py: driver + watchdog (data-driven backends via prompt_delivery arg|ping|exec;
  required session_prefix/log_dir; project-rooted path resolution; configurable kickoff
  template, handoff patterns, on_complete task; tmux-safe; selftest + init verbs)
- agent-log.py: config-driven claude transcript renderer
- agents.example.toml: self-contained 2-agent example (dependency-free demo backend)
- prompts/: generic builder/adversary/kickoff templates
- smoke.sh: isolated up+down sandbox proof that cleans up after itself
- flake.nix/.lock: devShell (python311 + tmux + git)
- README.md: schema + verbs + AI-PO usage + nix

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 18:39:00 +00:00

327 lines
17 KiB
Markdown

# agent-orchestrator
A generic, reusable harness for running and supervising a fleet of AI-agent sessions in **tmux**.
One driver script + one declarative config (`agents.toml`) describe every agent — a Builder /
Adversary loop pair, a persistent supervisor, a one-shot task — and a **watchdog** keeps them
alive, healed, paced, and coordinated. The watchdog reads the same config every tick, so there is
never any env-vs-file drift.
Nothing about any particular project lives in this repo. Paths, the loop **kickoff preamble**, the
**handoff conventions**, and the **on-complete** hook are all supplied by the project's config and
prompt files. A project consumes this repo as a pinned **git submodule** (`engine/`) and keeps its
own config, prompts, state, and tmux namespace — total isolation between projects.
```
agents.py the driver + watchdog (pure Python stdlib; needs python >= 3.11 for tomllib)
agent-log.py render claude JSONL transcripts into clean, greppable logs
agents.example.toml a self-contained 2-agent example project
prompts/ generic role + kickoff templates (builder / adversary / kickoff)
smoke.sh bring the example up + tear it down in an isolated sandbox, then clean up
flake.nix/.lock a Nix devShell with the runtime deps (python311, tmux, git)
```
---
## Quick start
```bash
nix develop # python311 + tmux + git on PATH (see "Nix" below)
python3 agents.py selftest # regression-test the activity detector (no config)
python3 agents.py status --config agents.example.toml # one table: every agent + the phase
./smoke.sh # prove up/down works end-to-end, isolated + clean
python3 agents.py init myproject # scaffold a starter agents.toml + prompts/
```
`up` is **use-or-create**: an already-running session is left alone, never double-started.
```bash
python3 agents.py --config agents.toml up # start all enabled agents + services + watchdog
python3 agents.py --config agents.toml up builder # start just one agent (by name)
python3 agents.py --config agents.toml down # stop everything
python3 agents.py --config agents.toml logs builder # tail one session's log
python3 agents.py --config agents.toml phase show # where the loop phase machine is
```
`--config` defaults to `./agents.toml`, falling back to one next to `agents.py`.
---
## The config: `agents.toml`
Five section types: `[watchdog]`, `[backend.<name>]`, `[defaults]`, `[[agent]]` / `[[service]]`,
and `[loop]`. See `agents.example.toml` for a complete, runnable example.
### `[watchdog]` — global supervisor cadence
```toml
[watchdog]
signal_interval = 30 # seconds between light checks (handoff / stall / limit)
heavy_interval = 300 # seconds between heal + phase-advance checks
limit_probe_fallback = 300 # re-probe cadence for a usage-limited agent when reset time is unparsable
limit_reset_slack = 45 # seconds to wait past a parsed reset before probing
stall_grace = 180 # seconds of slack past a WAITING-UNTIL marker before a stall reboot
```
### `[defaults]` — inherited by every agent
```toml
[defaults]
session_prefix = "myproj-" # REQUIRED: tmux namespace for this project. No implicit default.
log_dir = ".ao-state" # REQUIRED: logs + state/. Relative paths resolve against the config dir.
backend = "claude"
model = "claude-sonnet-4-6"
dir = "." # default working dir for agents (relative → project dir)
watch = "heal" # none | heal | heal+stall
project_dir = "." # OPTIONAL: project root for resolving prompts/paths (default: config's dir)
```
`session_prefix` and `log_dir` are **required** — the harness has no project-specific fallbacks.
Every relative path (`log_dir`, an agent's `dir`, `handoff.repo`, prompt/template files) resolves
against `project_dir`, which defaults to the directory holding the config file. When the config
lives in a sandbox but the prompts live elsewhere (as `smoke.sh` does), set `project_dir`
explicitly.
### `[backend.<name>]` — backends declared as data
A backend is fully described by config — no code change to add one. The one field that selects
behavior is `prompt_delivery`:
| `prompt_delivery` | how the kickoff reaches the agent | example |
|---|---|---|
| `"arg"` | passed as a CLI argument (claude-style) | `claude … "$(cat kickoff)"` |
| `"ping"` | typed in after a TUI connects (opencode-style) | attach, wait, send-keys |
| `"exec"` | a plain command; the prompt is written to a file | generic / demo |
```toml
[backend.claude]
bin = "claude"
flags = "--dangerously-skip-permissions"
remote_control = true # add a --remote-control <session> flag
supports_resume = true # honor an agent's resume=true
prompt_delivery = "arg"
process_name = "claude" # the pane process a healthy session runs (backend-mismatch healing)
submit_key = "Enter" # key to submit a typed message
stall_idle = 300 # seconds idle before a heal+stall agent is rebooted
active_re = "esc to interrupt|Running tool|· \\d+" # pane shows the agent is WORKING
limit_re = "usage limit|limit reached|reached your .*limit" # usage/rate-limit banner
fatal_re = "redacted_thinking|cannot be modified" # unrecoverable session state → kill + restart
[backend.opencode] # a TUI backend
bin = "opencode"
attach = "{bin} attach {server} --dir {dir}"
server = "http://127.0.0.1:4096"
prompt_delivery = "ping"
process_name = "opencode"
footer_ui = true # a static footer lingers after a turn → only the bottom = activity
log_grace = 180 # within this many seconds of a log write, treat as active
connect_delay = 12 # seconds to wait for the TUI before typing
submit_key = "C-m"
model_env = true # pass the model via OPENCODE_CONFIG_CONTENT
preamble = "set -a; . ./.env; set +a" # shell run before launch (e.g. load creds)
active_re = "esc interrupt|thinking|running tool|preparing patch"
limit_re = "usage limit|limit reached"
[backend.demo] # a dependency-free backend for testing the harness mechanics
bin = "echo '[demo] {session} up'; exec sleep 1000000"
prompt_delivery = "exec" # {kickoff}=prompt file, {session}=session name, {model}=model
```
For an `"arg"` backend the flag *templates* are configurable (so you can point at a non-claude
CLI): `resume_flag` (default `--resume '{id}'`), `model_flag` (default `--model '{model}'`),
`remote_control_flag` (default `--remote-control '{session}'`). A backend that sets `process_name`
participates in backend-mismatch healing; one that doesn't (e.g. `demo`) never does.
### `[[agent]]` — one block per agent
```toml
[[agent]]
name = "builder" # tmux session defaults to <session_prefix><name>; override with session=
kind = "loop" # loop | persistent | task
backend = "claude" # overrides defaults.backend
model = "claude-opus-4-8" # overrides defaults.model
dir = "." # working dir (relative → project dir)
role = "builder" # loop agents only: role prompt = <roles_dir>/<role>.md
resume = true # (arg backends with supports_resume) --resume <state/<name>.id>
watch = "heal+stall" # none | heal | heal+stall
enabled = true # false = not started by a bare `up`, not supervised
wake = { interval = 3600, prompt_file = "prompts/supervise.md" } # periodic nudge
prompt = """inline startup text""" # persistent/task agents; OR prompt_file = "path.md"
log_signature = "PROJECT PHASE" # optional: disambiguate agents that share a dir (agent-log.py)
```
| kind | prompt source | typical `watch` |
|---|---|---|
| `loop` | auto-built: kickoff template + `prompts/<role>.md` | `heal+stall` |
| `persistent` | `prompt` / `prompt_file` (+ optional `resume`, `wake`) | `heal` |
| `task` | `prompt` (runs once, then idles) | `none`, `enabled=false` |
**`watch` policy:**
| value | behavior |
|---|---|
| `none` | ignored by the watchdog entirely |
| `heal` | restart if the session is dead, FATAL-wedged, or running the wrong backend; pause all healing while inside a usage-limit window; **never** reboot just for being idle |
| `heal+stall` | everything in `heal`, **plus** reboot if idle past `stall_idle` — respecting any `WAITING-UNTIL: <ISO-8601>` self-wake marker the agent prints as its last line |
### `[[service]]` — non-AI helper processes
```toml
[[service]]
name = "cleanlogs"
command = "python3 agent-log.py follow-all"
```
Started by a bare `up`, killed by `down`. Just a supervised command in a tmux session.
### `[loop]` — the phase state machine (governs `kind="loop"` agents)
```toml
[loop]
state_file = "phase-idx" # under <log_dir>/state/
resume_phase = true # keep the phase index across restarts (don't reset to 0)
auto_advance = true # advance when the current phase's status file says done_marker
done_marker = "## DONE"
kickoff_template = "prompts/kickoff.md" # project preamble; slots {phase_id}/{plan}/{status}/{role}
roles_dir = "prompts" # role prompt = <roles_dir>/<role>.md
handoff = { repo = ".", claim_pings = "adversary", review_pings = "builder",
inboxes = ["ADVERSARY-INBOX.md", "BUILDER-INBOX.md"],
claim_pattern = "^claim", review_pattern = "^review", state_subdir = "machine-docs" }
on_complete = { trigger_file = ".run-on-complete", run = "reporter" } # run task agent on completion
phases = [
{ id = "p1", plan = "plans/p1.md", status = "STATUS-p1.md" },
{ id = "p2", plan = "plans/p2.md", status = "STATUS-p2.md", models = { builder = "claude-opus-4-8" } },
]
```
- **Kickoff template.** A loop agent's prompt is `kickoff_template` (with `{phase_id}`, `{plan}`,
`{status}`, `{role}` substituted from the current phase) followed by `<roles_dir>/<role>.md`.
Both are project files; this repo ships generic starters in `prompts/`. There is no built-in
preamble text.
- **Per-phase model override.** A phase's `models = { builder = "...", adversary = "..." }`
overrides those agents' model for just that phase (matched on the agent's `role`).
- **Auto-advance.** Each heavy tick, if the current phase's `status` file (looked up in
`handoff.repo`'s `state_subdir/` then its root) contains a real `done_marker` — not a "Not
yet…" placeholder — the watchdog stops the loops, bumps the phase index, and restarts them on
the next phase. After the last phase it writes a `SEQUENCE-COMPLETE` marker under `log_dir` and
stops the loops (idempotent — no churn). Appending a phase later clears the stale marker and
resumes. On completion, an optional `on_complete.run` task agent fires if its `trigger_file`
exists under `log_dir`.
- **Handoff signalling.** The watchdog watches `handoff.repo`'s `origin/main` for commits whose
subject matches `claim_pattern` / `review_pattern`, and watches the two `inboxes` files. When a
claim lands it pings the `claim_pings` agent; a review pings `review_pings`; an inbox change
pings the relevant side. This is how the Builder and Adversary coordinate purely through git.
---
## Config vs state
- **Config** = `agents.toml` — declarative, version-controlled, the only source of truth.
- **State** = `<log_dir>/state/` — machine-written runtime only: `phase-idx` (current phase),
`<name>.id` (resume id), `limited-<session>.json` (active usage-limit window),
`kickoff-<session>.txt` (the exact prompt last sent). Git-ignore your `log_dir`.
- **Env** = a one-off override for a *single* invocation only: `AGENT_MODEL_<name>=…` /
`AGENT_BACKEND_<name>=…`. The persisted watchdog ignores env and re-reads the file every tick —
deliberately, so env-vs-file drift can never silently revert a backend.
---
## The driver: verbs
The recommended (not required) verb set — an AI project-orchestrator can rely on these being
present, but a harness is free to add more:
```
agents.py up [name…] start enabled agents (+ services + watchdog); use-or-create
agents.py down [name…] stop agents/services/watchdog (all, or named)
agents.py status table of every agent: kind, backend, model, watch, state, phase
agents.py watchdog the supervisor loop (what the <prefix>watchdog session runs)
agents.py logs <name> tail that session's log
agents.py phase [show|next|set N] inspect / move the loop phase index
agents.py selftest regression-test the backend activity detector (needs no config)
agents.py init [dir] scaffold a starter agents.toml + prompts/ in a project dir
--config PATH use a specific config (default: ./agents.toml)
```
### The watchdog tick
`agents.py watchdog` runs as the `<prefix>watchdog` tmux session and **re-reads the config every
tick**. Each loop:
- **signal tick** (`signal_interval`): handoff pings; for each watched agent the usage-limit check,
and for `heal+stall` agents the stall check; fire any due `wake`.
- **heavy tick** (`heavy_interval`): advance the loop phase if the current one is done; otherwise
heal each watched agent per its `watch` policy. When the sequence is complete the finished loops
stay stopped, but persistent agents stay supervised.
**Usage-limit handling:** when an agent prints a limit banner, the watchdog parses the reset time,
arms a quiet window (never rebooting a limited agent), and at the end sends one probe to resume it
— re-arming if the banner re-prints.
---
## Driving the harness from an AI project-orchestrator
This harness is designed to be driven by an AI "project-orchestrator" (PO) that creates and runs
many projects, each pinning its own copy of this engine. The contract is intentionally **not
rigid** — the PO reads these docs and works out how to drive a project. What it can rely on:
1. **One config, one driver.** Everything the PO needs to know about a project's agents is in that
project's `agents.toml`; everything it can *do* is a verb above. To inspect, `status`. To start
or stop, `up` / `down`. To move the phase, `phase`.
2. **Isolation by `session_prefix`.** Two projects never collide as long as their `session_prefix`
differ. The PO assigns each project a unique prefix at creation.
3. **State is on disk, not in the PO.** Phase index, resume ids and limit windows live under the
project's `log_dir`. The PO can restart a project (or the whole host) and the watchdog resumes
from there.
4. **Knowledge is one-directional.** A project repo contains nothing about the PO or the fleet —
it can be run by hand and would have no idea a PO exists. The PO's fleet registry is the only
record of which projects exist and at what engine ref. This repo never reaches "up" toward a PO.
5. **Submodule pin = the engine version.** A project pins this repo at a tag (e.g. `v0.1.0`) as a
submodule under `engine/`. Bumping is per-project and opt-in (`git submodule update --remote`);
one project's bump can't break another.
A minimal project layout the PO scaffolds:
```
my-project/ # its own repo; knows nothing about the PO
agents.toml # harness config (this schema)
engine/ # this repo as a pinned submodule
prompts/ # role prompts + kickoff template
machine-docs/ # the loop pair's coordination files (STATUS/REVIEW/inboxes)
.ao-state/ # runtime state + logs (gitignored)
.env # project creds (never in git)
```
Run it by hand with `engine/agents.py up --config agents.toml`.
---
## Nix
A `flake.nix` provides a reproducible devShell with the runtime deps (`python311` for stdlib
`tomllib`, plus `tmux` and `git`):
```bash
nix develop # enter the shell
nix develop -c python3 agents.py selftest # or run one command in it
nix flake check # evaluate + build the devShell
```
The agent CLIs themselves (`claude`, `opencode`) are **external, non-Nix tools** — install them
per their own docs and make sure they are on `PATH` before launching live agents. The devShell
documents this in its banner.
---
## Adding things
- **Add an agent** — add an `[[agent]]` block; `agents.py up <name>`. No code change.
- **Add a backend** — add a `[backend.<name>]` block (`bin`, `prompt_delivery`, the regexes);
point an agent at it with `backend = "<name>"`.
- **Add / append a phase** — add an entry to `[loop].phases`; the watchdog advances into it
automatically (clearing a stale `SEQUENCE-COMPLETE` if the sequence had finished).
- **Change a model or backend** — edit the field (or a phase's `models = {}`), then
`agents.py down <name> && agents.py up <name>`. The watchdog re-reads the file; it won't fight you.