docs(readme): add Examples section (Builder/Adversary variants, snakepit) + benchmark note

This commit is contained in:
2026-06-16 02:35:40 +00:00
parent 90375f004e
commit 781db071dd

View File

@ -16,6 +16,7 @@ agents.py the driver + watchdog (pure Python stdlib; needs python >=
agent-log.py render claude JSONL transcripts into clean, greppable logs
agents.example.toml a self-contained 2-agent example project
prompts/ generic role + kickoff templates (builder / adversary / kickoff)
examples/ runnable example projects — the Builder/Adversary variant family, snakepit, …
smoke.sh bring the example up + tear it down in an isolated sandbox, then clean up
tests/ the test suite — unit tests + isolated live backend smokes + a runner
flake.nix/.lock a Nix devShell with the runtime deps (python311, tmux, git)
@ -49,6 +50,42 @@ python3 agents.py --config agents.toml phase show # where the loop phase mach
---
## Examples
`examples/` holds runnable example projects — copy one, point `agents.py` at its `agents.toml`, and
go. The headline set is a family of **Builder/Adversary** variants that build the *same* task but each
differ in one dimension — useful both as templates and as a study of the pattern:
- **`builder-adversary`** — the canonical loop pair: a Builder that builds and an Adversary that
cold-verifies every claim, coordinating only through git (`claim(`/`review(` commits + the watchdog
handoff). **Start here.**
- **`builder-adversary-min`** — the same pattern with the prompts compressed to minimal tokens.
- **`builder-adversary-stateless`** — `builder-adversary` + **context hygiene** (compact at each
checkpoint, read diffs not trees, lean loads) to minimise carried/reloaded context.
- **`builder-adversary-lean`** — context hygiene + **per-gate** review (one claim/verdict per gate).
- **`builder-adversary-deferred`** — the Adversary verifies **once**, after the whole build, in a
final comprehensive `review` phase (vs per-phase / per-gate).
- **`builder-solo`** — a single Builder that self-certifies, with **no Adversary** (the control).
- **`snakepit`** — a different topology entirely: a pool of identical worker "snakes" pulling tasks
from a shared filesystem queue, plus cleanup specialists. (`examples/IDEAS.md` sketches more.)
Each example has its own `README.md`. Run one by hand:
```bash
cd examples/builder-adversary
python3 ../../agents.py status --config agents.toml # read-only
python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
```
**Benchmark.** The separate
[`agent-orchestrator-benchmark`](https://git.autonomic.zone/recipe-maintainers/agent-orchestrator-benchmark)
repo runs these Builder/Adversary variants head-to-head (N=5, real `agents.py up` runs) to measure
what drives token cost. Short version: an independent adversary costs **~4.7×** a solo builder, but
the review *cadence* (per-gate / per-phase / deferred) is **nearly token-neutral**, and **context
hygiene** is the one clean **~22%** win. See that repo's `FINDINGS.md`.
---
## The config: `agents.toml`
Five section types: `[watchdog]`, `[backend.<name>]`, `[defaults]`, `[[agent]]` / `[[service]]`,