Extracted and generalized from a project-specific agent launch engine. No project specifics remain in code: paths, the loop kickoff preamble, handoff conventions, and the on-complete hook are all config/template driven; session_prefix + log_dir are required. - agents.py: driver + watchdog (data-driven backends via prompt_delivery arg|ping|exec; required session_prefix/log_dir; project-rooted path resolution; configurable kickoff template, handoff patterns, on_complete task; tmux-safe; selftest + init verbs) - agent-log.py: config-driven claude transcript renderer - agents.example.toml: self-contained 2-agent example (dependency-free demo backend) - prompts/: generic builder/adversary/kickoff templates - smoke.sh: isolated up+down sandbox proof that cleans up after itself - flake.nix/.lock: devShell (python311 + tmux + git) - README.md: schema + verbs + AI-PO usage + nix Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agent-orchestrator
A generic, reusable harness for running and supervising a fleet of AI-agent sessions in tmux.
One driver script + one declarative config (agents.toml) describe every agent — a Builder /
Adversary loop pair, a persistent supervisor, a one-shot task — and a watchdog keeps them
alive, healed, paced, and coordinated. The watchdog reads the same config every tick, so there is
never any env-vs-file drift.
Nothing about any particular project lives in this repo. Paths, the loop kickoff preamble, the
handoff conventions, and the on-complete hook are all supplied by the project's config and
prompt files. A project consumes this repo as a pinned git submodule (engine/) and keeps its
own config, prompts, state, and tmux namespace — total isolation between projects.
agents.py the driver + watchdog (pure Python stdlib; needs python >= 3.11 for tomllib)
agent-log.py render claude JSONL transcripts into clean, greppable logs
agents.example.toml a self-contained 2-agent example project
prompts/ generic role + kickoff templates (builder / adversary / kickoff)
smoke.sh bring the example up + tear it down in an isolated sandbox, then clean up
flake.nix/.lock a Nix devShell with the runtime deps (python311, tmux, git)
Quick start
nix develop # python311 + tmux + git on PATH (see "Nix" below)
python3 agents.py selftest # regression-test the activity detector (no config)
python3 agents.py status --config agents.example.toml # one table: every agent + the phase
./smoke.sh # prove up/down works end-to-end, isolated + clean
python3 agents.py init myproject # scaffold a starter agents.toml + prompts/
up is use-or-create: an already-running session is left alone, never double-started.
python3 agents.py --config agents.toml up # start all enabled agents + services + watchdog
python3 agents.py --config agents.toml up builder # start just one agent (by name)
python3 agents.py --config agents.toml down # stop everything
python3 agents.py --config agents.toml logs builder # tail one session's log
python3 agents.py --config agents.toml phase show # where the loop phase machine is
--config defaults to ./agents.toml, falling back to one next to agents.py.
The config: agents.toml
Five section types: [watchdog], [backend.<name>], [defaults], [[agent]] / [[service]],
and [loop]. See agents.example.toml for a complete, runnable example.
[watchdog] — global supervisor cadence
[watchdog]
signal_interval = 30 # seconds between light checks (handoff / stall / limit)
heavy_interval = 300 # seconds between heal + phase-advance checks
limit_probe_fallback = 300 # re-probe cadence for a usage-limited agent when reset time is unparsable
limit_reset_slack = 45 # seconds to wait past a parsed reset before probing
stall_grace = 180 # seconds of slack past a WAITING-UNTIL marker before a stall reboot
[defaults] — inherited by every agent
[defaults]
session_prefix = "myproj-" # REQUIRED: tmux namespace for this project. No implicit default.
log_dir = ".ao-state" # REQUIRED: logs + state/. Relative paths resolve against the config dir.
backend = "claude"
model = "claude-sonnet-4-6"
dir = "." # default working dir for agents (relative → project dir)
watch = "heal" # none | heal | heal+stall
project_dir = "." # OPTIONAL: project root for resolving prompts/paths (default: config's dir)
session_prefix and log_dir are required — the harness has no project-specific fallbacks.
Every relative path (log_dir, an agent's dir, handoff.repo, prompt/template files) resolves
against project_dir, which defaults to the directory holding the config file. When the config
lives in a sandbox but the prompts live elsewhere (as smoke.sh does), set project_dir
explicitly.
[backend.<name>] — backends declared as data
A backend is fully described by config — no code change to add one. The one field that selects
behavior is prompt_delivery:
prompt_delivery |
how the kickoff reaches the agent | example |
|---|---|---|
"arg" |
passed as a CLI argument (claude-style) | claude … "$(cat kickoff)" |
"ping" |
typed in after a TUI connects (opencode-style) | attach, wait, send-keys |
"exec" |
a plain command; the prompt is written to a file | generic / demo |
[backend.claude]
bin = "claude"
flags = "--dangerously-skip-permissions"
remote_control = true # add a --remote-control <session> flag
supports_resume = true # honor an agent's resume=true
prompt_delivery = "arg"
process_name = "claude" # the pane process a healthy session runs (backend-mismatch healing)
submit_key = "Enter" # key to submit a typed message
stall_idle = 300 # seconds idle before a heal+stall agent is rebooted
active_re = "esc to interrupt|Running tool|· \\d+" # pane shows the agent is WORKING
limit_re = "usage limit|limit reached|reached your .*limit" # usage/rate-limit banner
fatal_re = "redacted_thinking|cannot be modified" # unrecoverable session state → kill + restart
[backend.opencode] # a TUI backend
bin = "opencode"
attach = "{bin} attach {server} --dir {dir}"
server = "http://127.0.0.1:4096"
prompt_delivery = "ping"
process_name = "opencode"
footer_ui = true # a static footer lingers after a turn → only the bottom = activity
log_grace = 180 # within this many seconds of a log write, treat as active
connect_delay = 12 # seconds to wait for the TUI before typing
submit_key = "C-m"
model_env = true # pass the model via OPENCODE_CONFIG_CONTENT
preamble = "set -a; . ./.env; set +a" # shell run before launch (e.g. load creds)
active_re = "esc interrupt|thinking|running tool|preparing patch"
limit_re = "usage limit|limit reached"
[backend.demo] # a dependency-free backend for testing the harness mechanics
bin = "echo '[demo] {session} up'; exec sleep 1000000"
prompt_delivery = "exec" # {kickoff}=prompt file, {session}=session name, {model}=model
For an "arg" backend the flag templates are configurable (so you can point at a non-claude
CLI): resume_flag (default --resume '{id}'), model_flag (default --model '{model}'),
remote_control_flag (default --remote-control '{session}'). A backend that sets process_name
participates in backend-mismatch healing; one that doesn't (e.g. demo) never does.
[[agent]] — one block per agent
[[agent]]
name = "builder" # tmux session defaults to <session_prefix><name>; override with session=
kind = "loop" # loop | persistent | task
backend = "claude" # overrides defaults.backend
model = "claude-opus-4-8" # overrides defaults.model
dir = "." # working dir (relative → project dir)
role = "builder" # loop agents only: role prompt = <roles_dir>/<role>.md
resume = true # (arg backends with supports_resume) --resume <state/<name>.id>
watch = "heal+stall" # none | heal | heal+stall
enabled = true # false = not started by a bare `up`, not supervised
wake = { interval = 3600, prompt_file = "prompts/supervise.md" } # periodic nudge
prompt = """inline startup text""" # persistent/task agents; OR prompt_file = "path.md"
log_signature = "PROJECT PHASE" # optional: disambiguate agents that share a dir (agent-log.py)
| kind | prompt source | typical watch |
|---|---|---|
loop |
auto-built: kickoff template + prompts/<role>.md |
heal+stall |
persistent |
prompt / prompt_file (+ optional resume, wake) |
heal |
task |
prompt (runs once, then idles) |
none, enabled=false |
watch policy:
| value | behavior |
|---|---|
none |
ignored by the watchdog entirely |
heal |
restart if the session is dead, FATAL-wedged, or running the wrong backend; pause all healing while inside a usage-limit window; never reboot just for being idle |
heal+stall |
everything in heal, plus reboot if idle past stall_idle — respecting any WAITING-UNTIL: <ISO-8601> self-wake marker the agent prints as its last line |
[[service]] — non-AI helper processes
[[service]]
name = "cleanlogs"
command = "python3 agent-log.py follow-all"
Started by a bare up, killed by down. Just a supervised command in a tmux session.
[loop] — the phase state machine (governs kind="loop" agents)
[loop]
state_file = "phase-idx" # under <log_dir>/state/
resume_phase = true # keep the phase index across restarts (don't reset to 0)
auto_advance = true # advance when the current phase's status file says done_marker
done_marker = "## DONE"
kickoff_template = "prompts/kickoff.md" # project preamble; slots {phase_id}/{plan}/{status}/{role}
roles_dir = "prompts" # role prompt = <roles_dir>/<role>.md
handoff = { repo = ".", claim_pings = "adversary", review_pings = "builder",
inboxes = ["ADVERSARY-INBOX.md", "BUILDER-INBOX.md"],
claim_pattern = "^claim", review_pattern = "^review", state_subdir = "machine-docs" }
on_complete = { trigger_file = ".run-on-complete", run = "reporter" } # run task agent on completion
phases = [
{ id = "p1", plan = "plans/p1.md", status = "STATUS-p1.md" },
{ id = "p2", plan = "plans/p2.md", status = "STATUS-p2.md", models = { builder = "claude-opus-4-8" } },
]
- Kickoff template. A loop agent's prompt is
kickoff_template(with{phase_id},{plan},{status},{role}substituted from the current phase) followed by<roles_dir>/<role>.md. Both are project files; this repo ships generic starters inprompts/. There is no built-in preamble text. - Per-phase model override. A phase's
models = { builder = "...", adversary = "..." }overrides those agents' model for just that phase (matched on the agent'srole). - Auto-advance. Each heavy tick, if the current phase's
statusfile (looked up inhandoff.repo'sstate_subdir/then its root) contains a realdone_marker— not a "Not yet…" placeholder — the watchdog stops the loops, bumps the phase index, and restarts them on the next phase. After the last phase it writes aSEQUENCE-COMPLETEmarker underlog_dirand stops the loops (idempotent — no churn). Appending a phase later clears the stale marker and resumes. On completion, an optionalon_complete.runtask agent fires if itstrigger_fileexists underlog_dir. - Handoff signalling. The watchdog watches
handoff.repo'sorigin/mainfor commits whose subject matchesclaim_pattern/review_pattern, and watches the twoinboxesfiles. When a claim lands it pings theclaim_pingsagent; a review pingsreview_pings; an inbox change pings the relevant side. This is how the Builder and Adversary coordinate purely through git.
Config vs state
- Config =
agents.toml— declarative, version-controlled, the only source of truth. - State =
<log_dir>/state/— machine-written runtime only:phase-idx(current phase),<name>.id(resume id),limited-<session>.json(active usage-limit window),kickoff-<session>.txt(the exact prompt last sent). Git-ignore yourlog_dir. - Env = a one-off override for a single invocation only:
AGENT_MODEL_<name>=…/AGENT_BACKEND_<name>=…. The persisted watchdog ignores env and re-reads the file every tick — deliberately, so env-vs-file drift can never silently revert a backend.
The driver: verbs
The recommended (not required) verb set — an AI project-orchestrator can rely on these being present, but a harness is free to add more:
agents.py up [name…] start enabled agents (+ services + watchdog); use-or-create
agents.py down [name…] stop agents/services/watchdog (all, or named)
agents.py status table of every agent: kind, backend, model, watch, state, phase
agents.py watchdog the supervisor loop (what the <prefix>watchdog session runs)
agents.py logs <name> tail that session's log
agents.py phase [show|next|set N] inspect / move the loop phase index
agents.py selftest regression-test the backend activity detector (needs no config)
agents.py init [dir] scaffold a starter agents.toml + prompts/ in a project dir
--config PATH use a specific config (default: ./agents.toml)
The watchdog tick
agents.py watchdog runs as the <prefix>watchdog tmux session and re-reads the config every
tick. Each loop:
- signal tick (
signal_interval): handoff pings; for each watched agent the usage-limit check, and forheal+stallagents the stall check; fire any duewake. - heavy tick (
heavy_interval): advance the loop phase if the current one is done; otherwise heal each watched agent per itswatchpolicy. When the sequence is complete the finished loops stay stopped, but persistent agents stay supervised.
Usage-limit handling: when an agent prints a limit banner, the watchdog parses the reset time, arms a quiet window (never rebooting a limited agent), and at the end sends one probe to resume it — re-arming if the banner re-prints.
Driving the harness from an AI project-orchestrator
This harness is designed to be driven by an AI "project-orchestrator" (PO) that creates and runs many projects, each pinning its own copy of this engine. The contract is intentionally not rigid — the PO reads these docs and works out how to drive a project. What it can rely on:
- One config, one driver. Everything the PO needs to know about a project's agents is in that
project's
agents.toml; everything it can do is a verb above. To inspect,status. To start or stop,up/down. To move the phase,phase. - Isolation by
session_prefix. Two projects never collide as long as theirsession_prefixdiffer. The PO assigns each project a unique prefix at creation. - State is on disk, not in the PO. Phase index, resume ids and limit windows live under the
project's
log_dir. The PO can restart a project (or the whole host) and the watchdog resumes from there. - Knowledge is one-directional. A project repo contains nothing about the PO or the fleet — it can be run by hand and would have no idea a PO exists. The PO's fleet registry is the only record of which projects exist and at what engine ref. This repo never reaches "up" toward a PO.
- Submodule pin = the engine version. A project pins this repo at a tag (e.g.
v0.1.0) as a submodule underengine/. Bumping is per-project and opt-in (git submodule update --remote); one project's bump can't break another.
A minimal project layout the PO scaffolds:
my-project/ # its own repo; knows nothing about the PO
agents.toml # harness config (this schema)
engine/ # this repo as a pinned submodule
prompts/ # role prompts + kickoff template
machine-docs/ # the loop pair's coordination files (STATUS/REVIEW/inboxes)
.ao-state/ # runtime state + logs (gitignored)
.env # project creds (never in git)
Run it by hand with engine/agents.py up --config agents.toml.
Nix
A flake.nix provides a reproducible devShell with the runtime deps (python311 for stdlib
tomllib, plus tmux and git):
nix develop # enter the shell
nix develop -c python3 agents.py selftest # or run one command in it
nix flake check # evaluate + build the devShell
The agent CLIs themselves (claude, opencode) are external, non-Nix tools — install them
per their own docs and make sure they are on PATH before launching live agents. The devShell
documents this in its banner.
Adding things
- Add an agent — add an
[[agent]]block;agents.py up <name>. No code change. - Add a backend — add a
[backend.<name>]block (bin,prompt_delivery, the regexes); point an agent at it withbackend = "<name>". - Add / append a phase — add an entry to
[loop].phases; the watchdog advances into it automatically (clearing a staleSEQUENCE-COMPLETEif the sequence had finished). - Change a model or backend — edit the field (or a phase's
models = {}), thenagents.py down <name> && agents.py up <name>. The watchdog re-reads the file; it won't fight you.