feat: agent-orchestrator-benchmark — prompt token comparison harness

A standalone repo (engine vendored as a submodule at the examples commit) that
runs a head-to-head between the builder-adversary and builder-adversary-min
example variants: same task, independent headless runs, both on Sonnet, with
token counts. Includes the roman-numeral test problem and run-bench.sh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-14 20:20:05 +00:00
commit 27df2c7b55
6 changed files with 257 additions and 0 deletions

4
.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
# runtime
.ao-state/
__pycache__/
*.pyc