Files
agent-orchestrator/README.md
autonomic-bot 289ef07df4 feat: agent-orchestrator v0.1.0 — generic multi-agent harness
Extracted and generalized from a project-specific agent launch engine. No project
specifics remain in code: paths, the loop kickoff preamble, handoff conventions, and the
on-complete hook are all config/template driven; session_prefix + log_dir are required.

- agents.py: driver + watchdog (data-driven backends via prompt_delivery arg|ping|exec;
  required session_prefix/log_dir; project-rooted path resolution; configurable kickoff
  template, handoff patterns, on_complete task; tmux-safe; selftest + init verbs)
- agent-log.py: config-driven claude transcript renderer
- agents.example.toml: self-contained 2-agent example (dependency-free demo backend)
- prompts/: generic builder/adversary/kickoff templates
- smoke.sh: isolated up+down sandbox proof that cleans up after itself
- flake.nix/.lock: devShell (python311 + tmux + git)
- README.md: schema + verbs + AI-PO usage + nix

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 18:39:00 +00:00

17 KiB

agent-orchestrator

A generic, reusable harness for running and supervising a fleet of AI-agent sessions in tmux. One driver script + one declarative config (agents.toml) describe every agent — a Builder / Adversary loop pair, a persistent supervisor, a one-shot task — and a watchdog keeps them alive, healed, paced, and coordinated. The watchdog reads the same config every tick, so there is never any env-vs-file drift.

Nothing about any particular project lives in this repo. Paths, the loop kickoff preamble, the handoff conventions, and the on-complete hook are all supplied by the project's config and prompt files. A project consumes this repo as a pinned git submodule (engine/) and keeps its own config, prompts, state, and tmux namespace — total isolation between projects.

agents.py            the driver + watchdog (pure Python stdlib; needs python >= 3.11 for tomllib)
agent-log.py         render claude JSONL transcripts into clean, greppable logs
agents.example.toml  a self-contained 2-agent example project
prompts/             generic role + kickoff templates (builder / adversary / kickoff)
smoke.sh             bring the example up + tear it down in an isolated sandbox, then clean up
flake.nix/.lock      a Nix devShell with the runtime deps (python311, tmux, git)

Quick start

nix develop                                   # python311 + tmux + git on PATH (see "Nix" below)

python3 agents.py selftest                    # regression-test the activity detector (no config)
python3 agents.py status --config agents.example.toml   # one table: every agent + the phase
./smoke.sh                                    # prove up/down works end-to-end, isolated + clean

python3 agents.py init myproject              # scaffold a starter agents.toml + prompts/

up is use-or-create: an already-running session is left alone, never double-started.

python3 agents.py --config agents.toml up           # start all enabled agents + services + watchdog
python3 agents.py --config agents.toml up builder    # start just one agent (by name)
python3 agents.py --config agents.toml down          # stop everything
python3 agents.py --config agents.toml logs builder  # tail one session's log
python3 agents.py --config agents.toml phase show    # where the loop phase machine is

--config defaults to ./agents.toml, falling back to one next to agents.py.


The config: agents.toml

Five section types: [watchdog], [backend.<name>], [defaults], [[agent]] / [[service]], and [loop]. See agents.example.toml for a complete, runnable example.

[watchdog] — global supervisor cadence

[watchdog]
signal_interval      = 30    # seconds between light checks (handoff / stall / limit)
heavy_interval       = 300   # seconds between heal + phase-advance checks
limit_probe_fallback = 300   # re-probe cadence for a usage-limited agent when reset time is unparsable
limit_reset_slack    = 45    # seconds to wait past a parsed reset before probing
stall_grace          = 180   # seconds of slack past a WAITING-UNTIL marker before a stall reboot

[defaults] — inherited by every agent

[defaults]
session_prefix = "myproj-"   # REQUIRED: tmux namespace for this project. No implicit default.
log_dir        = ".ao-state" # REQUIRED: logs + state/. Relative paths resolve against the config dir.
backend        = "claude"
model          = "claude-sonnet-4-6"
dir            = "."         # default working dir for agents (relative → project dir)
watch          = "heal"      # none | heal | heal+stall
project_dir    = "."         # OPTIONAL: project root for resolving prompts/paths (default: config's dir)

session_prefix and log_dir are required — the harness has no project-specific fallbacks. Every relative path (log_dir, an agent's dir, handoff.repo, prompt/template files) resolves against project_dir, which defaults to the directory holding the config file. When the config lives in a sandbox but the prompts live elsewhere (as smoke.sh does), set project_dir explicitly.

[backend.<name>] — backends declared as data

A backend is fully described by config — no code change to add one. The one field that selects behavior is prompt_delivery:

prompt_delivery how the kickoff reaches the agent example
"arg" passed as a CLI argument (claude-style) claude … "$(cat kickoff)"
"ping" typed in after a TUI connects (opencode-style) attach, wait, send-keys
"exec" a plain command; the prompt is written to a file generic / demo
[backend.claude]
bin             = "claude"
flags           = "--dangerously-skip-permissions"
remote_control  = true          # add a --remote-control <session> flag
supports_resume = true          # honor an agent's resume=true
prompt_delivery = "arg"
process_name    = "claude"      # the pane process a healthy session runs (backend-mismatch healing)
submit_key      = "Enter"       # key to submit a typed message
stall_idle      = 300           # seconds idle before a heal+stall agent is rebooted
active_re = "esc to interrupt|Running tool|· \\d+"   # pane shows the agent is WORKING
limit_re  = "usage limit|limit reached|reached your .*limit"   # usage/rate-limit banner
fatal_re  = "redacted_thinking|cannot be modified"  # unrecoverable session state → kill + restart

[backend.opencode]              # a TUI backend
bin             = "opencode"
attach          = "{bin} attach {server} --dir {dir}"
server          = "http://127.0.0.1:4096"
prompt_delivery = "ping"
process_name    = "opencode"
footer_ui       = true          # a static footer lingers after a turn → only the bottom = activity
log_grace       = 180           # within this many seconds of a log write, treat as active
connect_delay   = 12            # seconds to wait for the TUI before typing
submit_key      = "C-m"
model_env       = true          # pass the model via OPENCODE_CONFIG_CONTENT
preamble        = "set -a; . ./.env; set +a"   # shell run before launch (e.g. load creds)
active_re = "esc interrupt|thinking|running tool|preparing patch"
limit_re  = "usage limit|limit reached"

[backend.demo]                  # a dependency-free backend for testing the harness mechanics
bin             = "echo '[demo] {session} up'; exec sleep 1000000"
prompt_delivery = "exec"        # {kickoff}=prompt file, {session}=session name, {model}=model

For an "arg" backend the flag templates are configurable (so you can point at a non-claude CLI): resume_flag (default --resume '{id}'), model_flag (default --model '{model}'), remote_control_flag (default --remote-control '{session}'). A backend that sets process_name participates in backend-mismatch healing; one that doesn't (e.g. demo) never does.

[[agent]] — one block per agent

[[agent]]
name    = "builder"            # tmux session defaults to <session_prefix><name>; override with session=
kind    = "loop"              # loop | persistent | task
backend = "claude"            # overrides defaults.backend
model   = "claude-opus-4-8"   # overrides defaults.model
dir     = "."                 # working dir (relative → project dir)
role    = "builder"           # loop agents only: role prompt = <roles_dir>/<role>.md
resume  = true                # (arg backends with supports_resume) --resume <state/<name>.id>
watch   = "heal+stall"        # none | heal | heal+stall
enabled = true                # false = not started by a bare `up`, not supervised
wake    = { interval = 3600, prompt_file = "prompts/supervise.md" }   # periodic nudge
prompt  = """inline startup text"""          # persistent/task agents; OR prompt_file = "path.md"
log_signature = "PROJECT PHASE"              # optional: disambiguate agents that share a dir (agent-log.py)
kind prompt source typical watch
loop auto-built: kickoff template + prompts/<role>.md heal+stall
persistent prompt / prompt_file (+ optional resume, wake) heal
task prompt (runs once, then idles) none, enabled=false

watch policy:

value behavior
none ignored by the watchdog entirely
heal restart if the session is dead, FATAL-wedged, or running the wrong backend; pause all healing while inside a usage-limit window; never reboot just for being idle
heal+stall everything in heal, plus reboot if idle past stall_idle — respecting any WAITING-UNTIL: <ISO-8601> self-wake marker the agent prints as its last line

[[service]] — non-AI helper processes

[[service]]
name    = "cleanlogs"
command = "python3 agent-log.py follow-all"

Started by a bare up, killed by down. Just a supervised command in a tmux session.

[loop] — the phase state machine (governs kind="loop" agents)

[loop]
state_file       = "phase-idx"          # under <log_dir>/state/
resume_phase     = true                 # keep the phase index across restarts (don't reset to 0)
auto_advance     = true                 # advance when the current phase's status file says done_marker
done_marker      = "## DONE"
kickoff_template = "prompts/kickoff.md" # project preamble; slots {phase_id}/{plan}/{status}/{role}
roles_dir        = "prompts"            # role prompt = <roles_dir>/<role>.md
handoff = { repo = ".", claim_pings = "adversary", review_pings = "builder",
            inboxes = ["ADVERSARY-INBOX.md", "BUILDER-INBOX.md"],
            claim_pattern = "^claim", review_pattern = "^review", state_subdir = "machine-docs" }
on_complete = { trigger_file = ".run-on-complete", run = "reporter" }   # run task agent on completion
phases = [
  { id = "p1", plan = "plans/p1.md", status = "STATUS-p1.md" },
  { id = "p2", plan = "plans/p2.md", status = "STATUS-p2.md", models = { builder = "claude-opus-4-8" } },
]
  • Kickoff template. A loop agent's prompt is kickoff_template (with {phase_id}, {plan}, {status}, {role} substituted from the current phase) followed by <roles_dir>/<role>.md. Both are project files; this repo ships generic starters in prompts/. There is no built-in preamble text.
  • Per-phase model override. A phase's models = { builder = "...", adversary = "..." } overrides those agents' model for just that phase (matched on the agent's role).
  • Auto-advance. Each heavy tick, if the current phase's status file (looked up in handoff.repo's state_subdir/ then its root) contains a real done_marker — not a "Not yet…" placeholder — the watchdog stops the loops, bumps the phase index, and restarts them on the next phase. After the last phase it writes a SEQUENCE-COMPLETE marker under log_dir and stops the loops (idempotent — no churn). Appending a phase later clears the stale marker and resumes. On completion, an optional on_complete.run task agent fires if its trigger_file exists under log_dir.
  • Handoff signalling. The watchdog watches handoff.repo's origin/main for commits whose subject matches claim_pattern / review_pattern, and watches the two inboxes files. When a claim lands it pings the claim_pings agent; a review pings review_pings; an inbox change pings the relevant side. This is how the Builder and Adversary coordinate purely through git.

Config vs state

  • Config = agents.toml — declarative, version-controlled, the only source of truth.
  • State = <log_dir>/state/ — machine-written runtime only: phase-idx (current phase), <name>.id (resume id), limited-<session>.json (active usage-limit window), kickoff-<session>.txt (the exact prompt last sent). Git-ignore your log_dir.
  • Env = a one-off override for a single invocation only: AGENT_MODEL_<name>=… / AGENT_BACKEND_<name>=…. The persisted watchdog ignores env and re-reads the file every tick — deliberately, so env-vs-file drift can never silently revert a backend.

The driver: verbs

The recommended (not required) verb set — an AI project-orchestrator can rely on these being present, but a harness is free to add more:

agents.py up [name…]               start enabled agents (+ services + watchdog); use-or-create
agents.py down [name…]             stop agents/services/watchdog (all, or named)
agents.py status                   table of every agent: kind, backend, model, watch, state, phase
agents.py watchdog                 the supervisor loop (what the <prefix>watchdog session runs)
agents.py logs <name>              tail that session's log
agents.py phase [show|next|set N]  inspect / move the loop phase index
agents.py selftest                 regression-test the backend activity detector (needs no config)
agents.py init [dir]               scaffold a starter agents.toml + prompts/ in a project dir
  --config PATH                    use a specific config (default: ./agents.toml)

The watchdog tick

agents.py watchdog runs as the <prefix>watchdog tmux session and re-reads the config every tick. Each loop:

  • signal tick (signal_interval): handoff pings; for each watched agent the usage-limit check, and for heal+stall agents the stall check; fire any due wake.
  • heavy tick (heavy_interval): advance the loop phase if the current one is done; otherwise heal each watched agent per its watch policy. When the sequence is complete the finished loops stay stopped, but persistent agents stay supervised.

Usage-limit handling: when an agent prints a limit banner, the watchdog parses the reset time, arms a quiet window (never rebooting a limited agent), and at the end sends one probe to resume it — re-arming if the banner re-prints.


Driving the harness from an AI project-orchestrator

This harness is designed to be driven by an AI "project-orchestrator" (PO) that creates and runs many projects, each pinning its own copy of this engine. The contract is intentionally not rigid — the PO reads these docs and works out how to drive a project. What it can rely on:

  1. One config, one driver. Everything the PO needs to know about a project's agents is in that project's agents.toml; everything it can do is a verb above. To inspect, status. To start or stop, up / down. To move the phase, phase.
  2. Isolation by session_prefix. Two projects never collide as long as their session_prefix differ. The PO assigns each project a unique prefix at creation.
  3. State is on disk, not in the PO. Phase index, resume ids and limit windows live under the project's log_dir. The PO can restart a project (or the whole host) and the watchdog resumes from there.
  4. Knowledge is one-directional. A project repo contains nothing about the PO or the fleet — it can be run by hand and would have no idea a PO exists. The PO's fleet registry is the only record of which projects exist and at what engine ref. This repo never reaches "up" toward a PO.
  5. Submodule pin = the engine version. A project pins this repo at a tag (e.g. v0.1.0) as a submodule under engine/. Bumping is per-project and opt-in (git submodule update --remote); one project's bump can't break another.

A minimal project layout the PO scaffolds:

my-project/                 # its own repo; knows nothing about the PO
  agents.toml               # harness config (this schema)
  engine/                   # this repo as a pinned submodule
  prompts/                  # role prompts + kickoff template
  machine-docs/             # the loop pair's coordination files (STATUS/REVIEW/inboxes)
  .ao-state/                # runtime state + logs (gitignored)
  .env                      # project creds (never in git)

Run it by hand with engine/agents.py up --config agents.toml.


Nix

A flake.nix provides a reproducible devShell with the runtime deps (python311 for stdlib tomllib, plus tmux and git):

nix develop                                   # enter the shell
nix develop -c python3 agents.py selftest      # or run one command in it
nix flake check                                # evaluate + build the devShell

The agent CLIs themselves (claude, opencode) are external, non-Nix tools — install them per their own docs and make sure they are on PATH before launching live agents. The devShell documents this in its banner.


Adding things

  • Add an agent — add an [[agent]] block; agents.py up <name>. No code change.
  • Add a backend — add a [backend.<name>] block (bin, prompt_delivery, the regexes); point an agent at it with backend = "<name>".
  • Add / append a phase — add an entry to [loop].phases; the watchdog advances into it automatically (clearing a stale SEQUENCE-COMPLETE if the sequence had finished).
  • Change a model or backend — edit the field (or a phase's models = {}), then agents.py down <name> && agents.py up <name>. The watchdog re-reads the file; it won't fight you.