recipe-maintainers/agent-orchestrator-benchmark

Files

mfowler 27df2c7b55 feat: agent-orchestrator-benchmark — prompt token comparison harness

A standalone repo (engine vendored as a submodule at the examples commit) that
runs a head-to-head between the builder-adversary and builder-adversary-min
example variants: same task, independent headless runs, both on Sonnet, with
token counts. Includes the roman-numeral test problem and run-bench.sh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-14 20:20:05 +00:00

1.4 KiB

Raw Blame History

Phase `roman` — integer → Roman numeral

Mission. In the work repo, implement roman.py plus a test_roman.py unittest suite. Pure stdlib, no dependencies. This file is the single source of truth for the phase.

Definition of Done

D1 — to_roman(n). Returns the Roman-numeral string for an int 1 ≤ n ≤ 3999 (e.g. to_roman(1994) == "MCMXCIV").
D2 — validation. to_roman raises ValueError for n < 1, n > 3999, or a non-int.
D3 — CLI. python roman.py 1994 prints MCMXCIV (and exits 0); a bad argument exits non-zero.
D4 — tests green. test_roman.py (stdlib unittest) passes under python -m unittest, with 0 failures, covering at least: 1→I, 4→IV, 9→IX, 40→XL, 90→XC, 400→CD, 900→CM, 1994→MCMXCIV, 3888→MMMDCCCLXXXVIII, 3999→MMMCMXCIX, and ValueError on 0, 4000, and "x".

How to verify (cold)

From a fresh clone of the work repo:

python -m unittest -q                 # D4: must report OK (0 failures)
python roman.py 1994                  # D3: expect MCMXCIV
python roman.py 3888                  # expect MMMDCCCLXXXVIII

Expected outputs are above. The Builder restates the exact commands + expected outputs + commit sha in machine-docs/STATUS-roman.md; the Adversary re-runs them from its own clone and records roman: PASS/FAIL in machine-docs/REVIEW-roman.md.

1.4 KiB Raw Blame History

Phase roman — integer → Roman numeral

Definition of Done

How to verify (cold)

1.4 KiB

Raw Blame History

Phase `roman` — integer → Roman numeral