docs(examples): add builder-adversary-min — minimal-prompt variant

Same topology/behaviour as builder-adversary (loop pair, phase machine,
claim()/review() handoff, machine-docs coordination, cold verification) but the
role + kickoff prompts are compressed to minimal tokens, keeping every
load-bearing rule. Config and plans are unchanged. The separate
agent-orchestrator-benchmark repo runs a head-to-head token comparison.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-14 20:18:33 +00:00
parent 11843f41a4
commit 737ef81066
8 changed files with 220 additions and 0 deletions

View File

@ -0,0 +1,26 @@
# Builder/Adversary example — minimal-prompt variant
Same as [`../builder-adversary`](../builder-adversary/) in every way that matters — Builder +
Adversary loop pair, phase machine, `claim(`/`review(` git handoff, `machine-docs/` coordination,
cold verification — but the **role + kickoff prompts are compressed to minimal tokens**, keeping
every load-bearing rule (the commit-prefix handoff, the `machine-docs/` file rule, the
`WHAT+HOW+EXPECTED+WHERE=STATUS / WHY=JOURNAL` anti-anchoring contract, and the `WAITING-UNTIL`
liveness protocol).
Why: the prompts are sent to the agents on every kickoff, so trimming them trims tokens. Config and
plans are unchanged from the original (they aren't part of the prompt). See the original's README for
the full explanation of the pattern, how to run it, and the work-repo isolation model — the commands
are identical, just `--config` this directory's `agents.toml`.
```bash
python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
```
## How small?
`prompts/builder.md` and `prompts/adversary.md` here are roughly **half to a third** the size of the
originals, with the same rules stated tersely. The separate **`agent-orchestrator-benchmark`** repo
runs a head-to-head: the same task built independently by this variant and the original (both on
Sonnet), with token counts for each — confirming the minimal prompts still get the job done and
quantifying the savings.

View File

@ -0,0 +1,91 @@
# examples/builder-adversary-min — minimal-prompt variant of ../builder-adversary.
#
# Same topology and behaviour as builder-adversary (Builder + Adversary loop pair, phase machine,
# claim()/review() git handoff, machine-docs/ coordination). The ONLY difference is that the role +
# kickoff prompts in prompts/ are compressed to minimal tokens while keeping every load-bearing rule.
# Config/comments are unchanged — they aren't sent to the agents, so they don't affect token cost.
#
# python3 ../../agents.py status --config agents.toml
# python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
[watchdog]
signal_interval = 30
heavy_interval = 300
limit_probe_fallback = 300
limit_reset_slack = 45
stall_grace = 180
[defaults]
session_prefix = "bamin-" # REQUIRED — sessions: bamin-builder, bamin-adv, …
log_dir = ".ao-state"
backend = "claude" # set to "demo" for a dependency-free mechanics-only run
model = "claude-sonnet-4-6"
watch = "heal"
[backend.claude]
bin = "claude"
flags = "--dangerously-skip-permissions"
remote_control = true
supports_resume = true
prompt_delivery = "arg"
process_name = "claude"
submit_key = "Enter"
stall_idle = 300
active_re = "esc to interrupt|Running tool|⠇|⠙|· \\d+"
limit_re = "spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)"
fatal_re = "redacted_thinking|blocks cannot be modified|cannot be modified"
[backend.demo]
bin = "echo '[demo] {session} up (kickoff: {kickoff})'; exec sleep 1000000"
prompt_delivery = "exec"
[[agent]]
name = "builder" # tmux session: bamin-builder
kind = "loop"
role = "builder"
dir = "./work"
watch = "heal+stall"
[[agent]]
name = "adversary"
session = "bamin-adv"
kind = "loop"
role = "adversary"
dir = "./work-adv"
watch = "heal+stall"
[[agent]]
name = "orchestrator" # tmux session: bamin-orchestrator
kind = "persistent"
model = "claude-opus-4-8"
resume = true
watch = "heal"
prompt = "You supervise this Builder/Adversary project. On startup: read machine-docs/ for the current phase's STATUS/REVIEW, confirm both loops + the watchdog are up, report the phase and any open findings/VETO. Then stay available; intervene only if the pair is stuck."
[[agent]]
name = "reporter" # tmux session: bamin-reporter
kind = "task"
model = "claude-opus-4-8"
watch = "none"
enabled = false
prompt = "The phase sequence is complete. Read machine-docs/ across all phases, write a short machine-docs/REPORT.md (what was built, each gate's final verdict, deferred items), then go idle."
[[service]]
name = "cleanlogs"
command = "python3 ../../agent-log.py follow-all"
dir = "."
[loop]
state_file = "phase-idx"
resume_phase = true
auto_advance = true
done_marker = "## DONE"
kickoff_template = "prompts/kickoff.md"
roles_dir = "prompts"
handoff = { repo = "./work", claim_pings = "adversary", review_pings = "builder", inboxes = ["ADVERSARY-INBOX.md", "BUILDER-INBOX.md"], claim_pattern = "^claim", review_pattern = "^review", state_subdir = "machine-docs" }
on_complete = { trigger_file = ".run-report-on-complete", run = "reporter" }
phases = [
{ id = "wc", plan = "plans/wc.md", status = "STATUS-wc.md" },
{ id = "json", plan = "plans/json.md", status = "STATUS-json.md", models = { builder = "claude-opus-4-8" } },
]

View File

@ -0,0 +1,2 @@
# Coordination / loop-state files live here at runtime (phase-namespaced STATUS / REVIEW / BACKLOG /
# JOURNAL, plus the ADVERSARY-INBOX.md / BUILDER-INBOX.md side-channels). The loop pair populates it.

View File

@ -0,0 +1,32 @@
# Phase `json` — machine-readable output
**Mission.** Extend the `wc.py` from the previous phase with a `--json` mode, without regressing any
`wc`-phase behaviour. Single source of truth for this phase.
(The phase config gives the Builder `claude-opus-4-8` for this phase — an example of a per-phase
model override; the Adversary stays on the default model.)
## Definition of Done
- **D1 — json output.** `python wc.py --json FILE` prints a single JSON object:
`{"lines": N, "words": N, "chars": N, "file": "FILE"}` (valid JSON, parseable by `json.loads`).
With stdin (no FILE), `"file"` is `null`.
- **D2 — composes with flags.** `--json` honours `-l/-w/-c`: only the requested counts appear as keys
(plus `file`). E.g. `wc.py --json -l FILE``{"lines": N, "file": "FILE"}`.
- **D3 — no regression.** Every `wc`-phase gate (D1D4 there) still passes unchanged.
- **D4 — tests green.** `test_wc.py` is extended for the JSON cases and `pytest -q` is all-green.
## How the Adversary verifies (cold)
```bash
pytest -q # D4 + D3 regression
printf 'a b c\nd e\n' > /tmp/f.txt
python wc.py --json /tmp/f.txt | python -c 'import sys,json; d=json.load(sys.stdin); \
assert d=={"lines":2,"words":5,"chars":10,"file":"/tmp/f.txt"}, d; print("ok")' # D1
python wc.py --json -l /tmp/f.txt # D2: expect {"lines": 2, "file": "/tmp/f.txt"}
```
The Builder restates the exact commands, expected JSON, and commit sha in
`machine-docs/STATUS-json.md`. When every DoD item has a fresh PASS in `machine-docs/REVIEW-json.md`
and there is no `## VETO`, the Builder writes `## DONE` to `STATUS-json.md` — this is the last phase,
so the watchdog then fires the one-shot `reporter` (see `agents.toml` `[loop].on_complete`).

View File

@ -0,0 +1,43 @@
# Phase `wc` — a word-count CLI
**Mission.** Build a small, dependency-free `wc` clone in Python: a script `wc.py` in the work repo
that counts lines, words, and characters, plus a `pytest` suite. This is the single source of truth
for the phase — the Builder builds to the Definition of Done below; the Adversary cold-verifies it.
This task is deliberately tiny and fully local (no network, no services) so the example exercises the
loop-pair *protocol* — claim → cold-verify → PASS/FAIL handshake — not infrastructure.
## Definition of Done
Each Dn is an independent gate. The Builder claims it (`claim(Dn): …`); the Adversary records a fresh
PASS in `machine-docs/REVIEW-wc.md` after re-running the check from its own clone.
- **D1 — default output.** `python wc.py FILE` prints exactly `<lines> <words> <chars> <FILE>`
(counts whitespace-separated words, `\n`-terminated lines, and bytes for `chars`), matching GNU
`wc` on ASCII input.
- **D2 — flags.** `-l`, `-w`, `-c` restrict the output to that single count (e.g. `wc.py -l FILE`
prints `<lines> <FILE>`). Flags may combine; output order is lines, words, chars.
- **D3 — stdin.** With no FILE argument, `wc.py` reads stdin and prints the counts with no filename.
- **D4 — tests green.** A `test_wc.py` runs under `pytest -q` with **0 failures**, covering: an empty
file (`0 0 0`), a multi-line fixture, the no-trailing-newline case, and each flag.
## How the Adversary verifies (cold)
From a fresh clone of the work repo:
```bash
pytest -q # D4: must be all-green
printf 'a b c\nd e\n' > /tmp/f.txt
python wc.py /tmp/f.txt # D1: expect "2 5 10 /tmp/f.txt"
python wc.py -l /tmp/f.txt # D2: expect "2 /tmp/f.txt"
printf 'a b c\nd e\n' | python wc.py # D3: expect "2 5 10"
```
Expected outputs are above — the Builder must restate them (and the exact commands, plus the commit
sha) in `machine-docs/STATUS-wc.md` so the Adversary can re-run without reading the Builder's
reasoning. Any mismatch is a FAIL with repro steps in `machine-docs/REVIEW-wc.md`.
## Out of scope (defer to a later phase or DEFERRED.md)
Multibyte/`-m` char counting, `--files0-from`, multiple-file totals, locale handling. JSON output is
the next phase (`plans/json.md`).

View File

@ -0,0 +1,9 @@
You are the **Adversary**, one of two independent loops: **DISBELIEVE the Builder**. Coordinate ONLY through git. The phase plan is the SSOT for what to verify.
Loop: run `/loop` (no interval). Verify a CLAIMED gate promptly (the watchdog pings you when the Builder claims one); idle otherwise. Cap waits at 10 min; before going idle your LAST line MUST be exactly `WAITING-UNTIL: <ISO-8601 UTC>`. Compact at ~80%.
Verify cold from your OWN clone: re-run the plan's DoD check yourself and try to break it (edge cases, bad input) — don't trust the Builder's word. From STATUS take only what you need to re-run (command, expected result, shas); ignore its reasoning and don't read JOURNAL until after your verdict (it anchors you). Judge from the plan, the code, and your own run.
Git: `pull --rebase`, commit, push; never `--force`. Prefix verdicts `review(<id>): PASS|FAIL …` — pings the Builder. Write only REVIEW.md (+ your findings). Record "<id>: PASS @<ts>" + evidence, or FAIL + repro steps. You hold veto: write "## VETO <reason>".
Begin: read the plan, then enter the loop (clone the work repo into your dir if it exists yet).

View File

@ -0,0 +1,11 @@
You are the **Builder**, one of two independent loops; coordinate ONLY through git. Read the phase plan (the SSOT) and build to its DoD.
Loop: run `/loop` (no interval), one unit of work per wake. Cap every wait at 10 min; before going idle your LAST output line MUST be exactly `WAITING-UNTIL: <ISO-8601 UTC>` (≤10 min out) or the watchdog reboots you. Compact at ~80% context.
Git: `pull --rebase`, smallest change, commit, push; never `--force`. Prefix a gate claim `claim(<id>): …` — the watchdog pings the Adversary on it; use `feat/fix/status/…` otherwise. Before you claim, the tree MUST be clean (committed AND pushed): the Adversary cold-verifies from a fresh clone.
STATUS (in machine-docs/) must give the Adversary: WHAT is claimed (gate id + DoD items), HOW to verify (exact command), the EXPECTED result, WHERE (commit shas/paths). Reasoning goes in JOURNAL, NOT STATUS — the Adversary won't read JOURNAL before judging. Write only your files (code, STATUS, JOURNAL, build backlog); REVIEW is the Adversary's.
Done: write "## DONE" only when REVIEW shows a fresh PASS for every DoD item and there's no "## VETO". Never weaken/skip/delete a test; verify for real, no "should work".
Begin: read the plan, then enter the loop.

View File

@ -0,0 +1,6 @@
*** PHASE {phase_id} ***
Plan (this phase's single source of truth): {plan} — read it fully now; it defines the mission and the Definition of Done (DoD).
Loop state goes under machine-docs/ (create if missing), phase-namespaced: {status}, REVIEW-{phase_id}.md, JOURNAL-{phase_id}.md, BACKLOG-{phase_id}.md. Never at the repo root.
Done = the Builder writes "## DONE" to machine-docs/{status} ONLY after every DoD item has a fresh Adversary PASS in machine-docs/REVIEW-{phase_id}.md.
=== role ===