feat: agent-orchestrator-benchmark — prompt token comparison harness
A standalone repo (engine vendored as a submodule at the examples commit) that runs a head-to-head between the builder-adversary and builder-adversary-min example variants: same task, independent headless runs, both on Sonnet, with token counts. Includes the roman-numeral test problem and run-bench.sh. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
28
plans/roman.md
Normal file
28
plans/roman.md
Normal file
@ -0,0 +1,28 @@
|
||||
# Phase `roman` — integer → Roman numeral
|
||||
|
||||
**Mission.** In the work repo, implement `roman.py` plus a `test_roman.py` unittest suite. Pure
|
||||
stdlib, no dependencies. This file is the single source of truth for the phase.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- **D1 — `to_roman(n)`.** Returns the Roman-numeral string for an int `1 ≤ n ≤ 3999`
|
||||
(e.g. `to_roman(1994) == "MCMXCIV"`).
|
||||
- **D2 — validation.** `to_roman` raises `ValueError` for `n < 1`, `n > 3999`, or a non-int.
|
||||
- **D3 — CLI.** `python roman.py 1994` prints `MCMXCIV` (and exits 0); a bad argument exits non-zero.
|
||||
- **D4 — tests green.** `test_roman.py` (stdlib `unittest`) passes under `python -m unittest`, with
|
||||
**0 failures**, covering at least: `1→I, 4→IV, 9→IX, 40→XL, 90→XC, 400→CD, 900→CM,
|
||||
1994→MCMXCIV, 3888→MMMDCCCLXXXVIII, 3999→MMMCMXCIX`, and `ValueError` on `0`, `4000`, and `"x"`.
|
||||
|
||||
## How to verify (cold)
|
||||
|
||||
From a fresh clone of the work repo:
|
||||
|
||||
```bash
|
||||
python -m unittest -q # D4: must report OK (0 failures)
|
||||
python roman.py 1994 # D3: expect MCMXCIV
|
||||
python roman.py 3888 # expect MMMDCCCLXXXVIII
|
||||
```
|
||||
|
||||
Expected outputs are above. The Builder restates the exact commands + expected outputs + commit sha
|
||||
in `machine-docs/STATUS-roman.md`; the Adversary re-runs them from its own clone and records
|
||||
`roman: PASS`/`FAIL` in `machine-docs/REVIEW-roman.md`.
|
||||
Reference in New Issue
Block a user