docs(examples): add builder-adversary-min — minimal-prompt variant

Same topology/behaviour as builder-adversary (loop pair, phase machine, claim()/review() handoff, machine-docs coordination, cold verification) but the role + kickoff prompts are compressed to minimal tokens, keeping every load-bearing rule. Config and plans are unchanged. The separate agent-orchestrator-benchmark repo runs a head-to-head token comparison. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 20:18:33 +00:00
parent 11843f41a4
commit 737ef81066
8 changed files with 220 additions and 0 deletions
--- a/examples/builder-adversary-min/README.md
+++ b/examples/builder-adversary-min/README.md
@ -0,0 +1,26 @@
+# Builder/Adversary example — minimal-prompt variant
+
+Same as [`../builder-adversary`](../builder-adversary/) in every way that matters — Builder +
+Adversary loop pair, phase machine, `claim(`/`review(` git handoff, `machine-docs/` coordination,
+cold verification — but the **role + kickoff prompts are compressed to minimal tokens**, keeping
+every load-bearing rule (the commit-prefix handoff, the `machine-docs/` file rule, the
+`WHAT+HOW+EXPECTED+WHERE=STATUS / WHY=JOURNAL` anti-anchoring contract, and the `WAITING-UNTIL`
+liveness protocol).
+
+Why: the prompts are sent to the agents on every kickoff, so trimming them trims tokens. Config and
+plans are unchanged from the original (they aren't part of the prompt). See the original's README for
+the full explanation of the pattern, how to run it, and the work-repo isolation model — the commands
+are identical, just `--config` this directory's `agents.toml`.
+
+```bash
+python3 ../../agents.py status --config agents.toml
+python3 ../../agents.py up     --config agents.toml      # needs `claude` on PATH
+```
+
+## How small?
+
+`prompts/builder.md` and `prompts/adversary.md` here are roughly **half to a third** the size of the
+originals, with the same rules stated tersely. The separate **`agent-orchestrator-benchmark`** repo
+runs a head-to-head: the same task built independently by this variant and the original (both on
+Sonnet), with token counts for each — confirming the minimal prompts still get the job done and
+quantifying the savings.
--- a/examples/builder-adversary-min/agents.toml
+++ b/examples/builder-adversary-min/agents.toml
@ -0,0 +1,91 @@
+# examples/builder-adversary-min — minimal-prompt variant of ../builder-adversary.
+#
+# Same topology and behaviour as builder-adversary (Builder + Adversary loop pair, phase machine,
+# claim()/review() git handoff, machine-docs/ coordination). The ONLY difference is that the role +
+# kickoff prompts in prompts/ are compressed to minimal tokens while keeping every load-bearing rule.
+# Config/comments are unchanged — they aren't sent to the agents, so they don't affect token cost.
+#
+#   python3 ../../agents.py status --config agents.toml
+#   python3 ../../agents.py up     --config agents.toml      # needs `claude` on PATH
+
+[watchdog]
+signal_interval      = 30
+heavy_interval       = 300
+limit_probe_fallback = 300
+limit_reset_slack    = 45
+stall_grace          = 180
+
+[defaults]
+session_prefix = "bamin-"        # REQUIRED — sessions: bamin-builder, bamin-adv, …
+log_dir        = ".ao-state"
+backend        = "claude"        # set to "demo" for a dependency-free mechanics-only run
+model          = "claude-sonnet-4-6"
+watch          = "heal"
+
+[backend.claude]
+bin             = "claude"
+flags           = "--dangerously-skip-permissions"
+remote_control  = true
+supports_resume = true
+prompt_delivery = "arg"
+process_name    = "claude"
+submit_key      = "Enter"
+stall_idle      = 300
+active_re = "esc to interrupt|Running tool|⠇|⠙|· \\d+"
+limit_re  = "spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)"
+fatal_re  = "redacted_thinking|blocks cannot be modified|cannot be modified"
+
+[backend.demo]
+bin             = "echo '[demo] {session} up (kickoff: {kickoff})'; exec sleep 1000000"
+prompt_delivery = "exec"
+
+[[agent]]
+name  = "builder"                # tmux session: bamin-builder
+kind  = "loop"
+role  = "builder"
+dir   = "./work"
+watch = "heal+stall"
+
+[[agent]]
+name    = "adversary"
+session = "bamin-adv"
+kind    = "loop"
+role    = "adversary"
+dir     = "./work-adv"
+watch   = "heal+stall"
+
+[[agent]]
+name    = "orchestrator"         # tmux session: bamin-orchestrator
+kind    = "persistent"
+model   = "claude-opus-4-8"
+resume  = true
+watch   = "heal"
+prompt  = "You supervise this Builder/Adversary project. On startup: read machine-docs/ for the current phase's STATUS/REVIEW, confirm both loops + the watchdog are up, report the phase and any open findings/VETO. Then stay available; intervene only if the pair is stuck."
+
+[[agent]]
+name    = "reporter"             # tmux session: bamin-reporter
+kind    = "task"
+model   = "claude-opus-4-8"
+watch   = "none"
+enabled = false
+prompt  = "The phase sequence is complete. Read machine-docs/ across all phases, write a short machine-docs/REPORT.md (what was built, each gate's final verdict, deferred items), then go idle."
+
+[[service]]
+name    = "cleanlogs"
+command = "python3 ../../agent-log.py follow-all"
+dir     = "."
+
+[loop]
+state_file       = "phase-idx"
+resume_phase     = true
+auto_advance     = true
+done_marker      = "## DONE"
+kickoff_template = "prompts/kickoff.md"
+roles_dir        = "prompts"
+handoff = { repo = "./work", claim_pings = "adversary", review_pings = "builder", inboxes = ["ADVERSARY-INBOX.md", "BUILDER-INBOX.md"], claim_pattern = "^claim", review_pattern = "^review", state_subdir = "machine-docs" }
+on_complete = { trigger_file = ".run-report-on-complete", run = "reporter" }
+
+phases = [
+  { id = "wc",   plan = "plans/wc.md",   status = "STATUS-wc.md" },
+  { id = "json", plan = "plans/json.md", status = "STATUS-json.md", models = { builder = "claude-opus-4-8" } },
+]
--- a/examples/builder-adversary-min/machine-docs/.gitkeep
+++ b/examples/builder-adversary-min/machine-docs/.gitkeep
@ -0,0 +1,2 @@
+# Coordination / loop-state files live here at runtime (phase-namespaced STATUS / REVIEW / BACKLOG /
+# JOURNAL, plus the ADVERSARY-INBOX.md / BUILDER-INBOX.md side-channels). The loop pair populates it.
--- a/examples/builder-adversary-min/plans/json.md
+++ b/examples/builder-adversary-min/plans/json.md
@ -0,0 +1,32 @@
+# Phase `json` — machine-readable output
+
+**Mission.** Extend the `wc.py` from the previous phase with a `--json` mode, without regressing any
+`wc`-phase behaviour. Single source of truth for this phase.
+
+(The phase config gives the Builder `claude-opus-4-8` for this phase — an example of a per-phase
+model override; the Adversary stays on the default model.)
+
+## Definition of Done
+
+- **D1 — json output.** `python wc.py --json FILE` prints a single JSON object:
+  `{"lines": N, "words": N, "chars": N, "file": "FILE"}` (valid JSON, parseable by `json.loads`).
+  With stdin (no FILE), `"file"` is `null`.
+- **D2 — composes with flags.** `--json` honours `-l/-w/-c`: only the requested counts appear as keys
+  (plus `file`). E.g. `wc.py --json -l FILE` → `{"lines": N, "file": "FILE"}`.
+- **D3 — no regression.** Every `wc`-phase gate (D1–D4 there) still passes unchanged.
+- **D4 — tests green.** `test_wc.py` is extended for the JSON cases and `pytest -q` is all-green.
+
+## How the Adversary verifies (cold)
+
+```bash
+pytest -q                                              # D4 + D3 regression
+printf 'a b c\nd e\n' > /tmp/f.txt
+python wc.py --json /tmp/f.txt | python -c 'import sys,json; d=json.load(sys.stdin); \
+  assert d=={"lines":2,"words":5,"chars":10,"file":"/tmp/f.txt"}, d; print("ok")'   # D1
+python wc.py --json -l /tmp/f.txt                      # D2: expect {"lines": 2, "file": "/tmp/f.txt"}
+```
+
+The Builder restates the exact commands, expected JSON, and commit sha in
+`machine-docs/STATUS-json.md`. When every DoD item has a fresh PASS in `machine-docs/REVIEW-json.md`
+and there is no `## VETO`, the Builder writes `## DONE` to `STATUS-json.md` — this is the last phase,
+so the watchdog then fires the one-shot `reporter` (see `agents.toml` `[loop].on_complete`).
--- a/examples/builder-adversary-min/plans/wc.md
+++ b/examples/builder-adversary-min/plans/wc.md
@ -0,0 +1,43 @@
+# Phase `wc` — a word-count CLI
+
+**Mission.** Build a small, dependency-free `wc` clone in Python: a script `wc.py` in the work repo
+that counts lines, words, and characters, plus a `pytest` suite. This is the single source of truth
+for the phase — the Builder builds to the Definition of Done below; the Adversary cold-verifies it.
+
+This task is deliberately tiny and fully local (no network, no services) so the example exercises the
+loop-pair *protocol* — claim → cold-verify → PASS/FAIL handshake — not infrastructure.
+
+## Definition of Done
+
+Each Dn is an independent gate. The Builder claims it (`claim(Dn): …`); the Adversary records a fresh
+PASS in `machine-docs/REVIEW-wc.md` after re-running the check from its own clone.
+
+- **D1 — default output.** `python wc.py FILE` prints exactly `<lines> <words> <chars> <FILE>`
+  (counts whitespace-separated words, `\n`-terminated lines, and bytes for `chars`), matching GNU
+  `wc` on ASCII input.
+- **D2 — flags.** `-l`, `-w`, `-c` restrict the output to that single count (e.g. `wc.py -l FILE`
+  prints `<lines> <FILE>`). Flags may combine; output order is lines, words, chars.
+- **D3 — stdin.** With no FILE argument, `wc.py` reads stdin and prints the counts with no filename.
+- **D4 — tests green.** A `test_wc.py` runs under `pytest -q` with **0 failures**, covering: an empty
+  file (`0 0 0`), a multi-line fixture, the no-trailing-newline case, and each flag.
+
+## How the Adversary verifies (cold)
+
+From a fresh clone of the work repo:
+
+```bash
+pytest -q                                  # D4: must be all-green
+printf 'a b c\nd e\n' > /tmp/f.txt
+python wc.py /tmp/f.txt                     # D1: expect "2 5 10 /tmp/f.txt"
+python wc.py -l /tmp/f.txt                  # D2: expect "2 /tmp/f.txt"
+printf 'a b c\nd e\n' | python wc.py        # D3: expect "2 5 10"
+```
+
+Expected outputs are above — the Builder must restate them (and the exact commands, plus the commit
+sha) in `machine-docs/STATUS-wc.md` so the Adversary can re-run without reading the Builder's
+reasoning. Any mismatch is a FAIL with repro steps in `machine-docs/REVIEW-wc.md`.
+
+## Out of scope (defer to a later phase or DEFERRED.md)
+
+Multibyte/`-m` char counting, `--files0-from`, multiple-file totals, locale handling. JSON output is
+the next phase (`plans/json.md`).
--- a/examples/builder-adversary-min/prompts/adversary.md
+++ b/examples/builder-adversary-min/prompts/adversary.md
@ -0,0 +1,9 @@
+You are the **Adversary**, one of two independent loops: **DISBELIEVE the Builder**. Coordinate ONLY through git. The phase plan is the SSOT for what to verify.
+
+Loop: run `/loop` (no interval). Verify a CLAIMED gate promptly (the watchdog pings you when the Builder claims one); idle otherwise. Cap waits at 10 min; before going idle your LAST line MUST be exactly `WAITING-UNTIL: <ISO-8601 UTC>`. Compact at ~80%.
+
+Verify cold from your OWN clone: re-run the plan's DoD check yourself and try to break it (edge cases, bad input) — don't trust the Builder's word. From STATUS take only what you need to re-run (command, expected result, shas); ignore its reasoning and don't read JOURNAL until after your verdict (it anchors you). Judge from the plan, the code, and your own run.
+
+Git: `pull --rebase`, commit, push; never `--force`. Prefix verdicts `review(<id>): PASS|FAIL …` — pings the Builder. Write only REVIEW.md (+ your findings). Record "<id>: PASS @<ts>" + evidence, or FAIL + repro steps. You hold veto: write "## VETO <reason>".
+
+Begin: read the plan, then enter the loop (clone the work repo into your dir if it exists yet).
--- a/examples/builder-adversary-min/prompts/builder.md
+++ b/examples/builder-adversary-min/prompts/builder.md
@ -0,0 +1,11 @@
+You are the **Builder**, one of two independent loops; coordinate ONLY through git. Read the phase plan (the SSOT) and build to its DoD.
+
+Loop: run `/loop` (no interval), one unit of work per wake. Cap every wait at 10 min; before going idle your LAST output line MUST be exactly `WAITING-UNTIL: <ISO-8601 UTC>` (≤10 min out) or the watchdog reboots you. Compact at ~80% context.
+
+Git: `pull --rebase`, smallest change, commit, push; never `--force`. Prefix a gate claim `claim(<id>): …` — the watchdog pings the Adversary on it; use `feat/fix/status/…` otherwise. Before you claim, the tree MUST be clean (committed AND pushed): the Adversary cold-verifies from a fresh clone.
+
+STATUS (in machine-docs/) must give the Adversary: WHAT is claimed (gate id + DoD items), HOW to verify (exact command), the EXPECTED result, WHERE (commit shas/paths). Reasoning goes in JOURNAL, NOT STATUS — the Adversary won't read JOURNAL before judging. Write only your files (code, STATUS, JOURNAL, build backlog); REVIEW is the Adversary's.
+
+Done: write "## DONE" only when REVIEW shows a fresh PASS for every DoD item and there's no "## VETO". Never weaken/skip/delete a test; verify for real, no "should work".
+
+Begin: read the plan, then enter the loop.
--- a/examples/builder-adversary-min/prompts/kickoff.md
+++ b/examples/builder-adversary-min/prompts/kickoff.md
@ -0,0 +1,6 @@
+*** PHASE {phase_id} ***
+Plan (this phase's single source of truth): {plan} — read it fully now; it defines the mission and the Definition of Done (DoD).
+Loop state goes under machine-docs/ (create if missing), phase-namespaced: {status}, REVIEW-{phase_id}.md, JOURNAL-{phase_id}.md, BACKLOG-{phase_id}.md. Never at the repo root.
+Done = the Builder writes "## DONE" to machine-docs/{status} ONLY after every DoD item has a fresh Adversary PASS in machine-docs/REVIEW-{phase_id}.md.
+
+=== role ===