diff --git a/README.md b/README.md index 9316de4..82a9b13 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,7 @@ agents.py the driver + watchdog (pure Python stdlib; needs python >= agent-log.py render claude JSONL transcripts into clean, greppable logs agents.example.toml a self-contained 2-agent example project prompts/ generic role + kickoff templates (builder / adversary / kickoff) +examples/ runnable example projects — the Builder/Adversary variant family, snakepit, … smoke.sh bring the example up + tear it down in an isolated sandbox, then clean up tests/ the test suite — unit tests + isolated live backend smokes + a runner flake.nix/.lock a Nix devShell with the runtime deps (python311, tmux, git) @@ -49,6 +50,42 @@ python3 agents.py --config agents.toml phase show # where the loop phase mach --- +## Examples + +`examples/` holds runnable example projects — copy one, point `agents.py` at its `agents.toml`, and +go. The headline set is a family of **Builder/Adversary** variants that build the *same* task but each +differ in one dimension — useful both as templates and as a study of the pattern: + +- **`builder-adversary`** — the canonical loop pair: a Builder that builds and an Adversary that + cold-verifies every claim, coordinating only through git (`claim(`/`review(` commits + the watchdog + handoff). **Start here.** +- **`builder-adversary-min`** — the same pattern with the prompts compressed to minimal tokens. +- **`builder-adversary-stateless`** — `builder-adversary` + **context hygiene** (compact at each + checkpoint, read diffs not trees, lean loads) to minimise carried/reloaded context. +- **`builder-adversary-lean`** — context hygiene + **per-gate** review (one claim/verdict per gate). +- **`builder-adversary-deferred`** — the Adversary verifies **once**, after the whole build, in a + final comprehensive `review` phase (vs per-phase / per-gate). +- **`builder-solo`** — a single Builder that self-certifies, with **no Adversary** (the control). +- **`snakepit`** — a different topology entirely: a pool of identical worker "snakes" pulling tasks + from a shared filesystem queue, plus cleanup specialists. (`examples/IDEAS.md` sketches more.) + +Each example has its own `README.md`. Run one by hand: + +```bash +cd examples/builder-adversary +python3 ../../agents.py status --config agents.toml # read-only +python3 ../../agents.py up --config agents.toml # needs `claude` on PATH +``` + +**Benchmark.** The separate +[`agent-orchestrator-benchmark`](https://git.autonomic.zone/recipe-maintainers/agent-orchestrator-benchmark) +repo runs these Builder/Adversary variants head-to-head (N=5, real `agents.py up` runs) to measure +what drives token cost. Short version: an independent adversary costs **~4.7×** a solo builder, but +the review *cadence* (per-gate / per-phase / deferred) is **nearly token-neutral**, and **context +hygiene** is the one clean **~−22%** win. See that repo's `FINDINGS.md`. + +--- + ## The config: `agents.toml` Five section types: `[watchdog]`, `[backend.]`, `[defaults]`, `[[agent]]` / `[[service]]`,