Files
agent-orchestrator-benchmark/plans/roman.md
mfowler 27df2c7b55 feat: agent-orchestrator-benchmark — prompt token comparison harness
A standalone repo (engine vendored as a submodule at the examples commit) that
runs a head-to-head between the builder-adversary and builder-adversary-min
example variants: same task, independent headless runs, both on Sonnet, with
token counts. Includes the roman-numeral test problem and run-bench.sh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 20:20:05 +00:00

29 lines
1.4 KiB
Markdown

# Phase `roman` — integer → Roman numeral
**Mission.** In the work repo, implement `roman.py` plus a `test_roman.py` unittest suite. Pure
stdlib, no dependencies. This file is the single source of truth for the phase.
## Definition of Done
- **D1 — `to_roman(n)`.** Returns the Roman-numeral string for an int `1 ≤ n ≤ 3999`
(e.g. `to_roman(1994) == "MCMXCIV"`).
- **D2 — validation.** `to_roman` raises `ValueError` for `n < 1`, `n > 3999`, or a non-int.
- **D3 — CLI.** `python roman.py 1994` prints `MCMXCIV` (and exits 0); a bad argument exits non-zero.
- **D4 — tests green.** `test_roman.py` (stdlib `unittest`) passes under `python -m unittest`, with
**0 failures**, covering at least: `1→I, 4→IV, 9→IX, 40→XL, 90→XC, 400→CD, 900→CM,
1994→MCMXCIV, 3888→MMMDCCCLXXXVIII, 3999→MMMCMXCIX`, and `ValueError` on `0`, `4000`, and `"x"`.
## How to verify (cold)
From a fresh clone of the work repo:
```bash
python -m unittest -q # D4: must report OK (0 failures)
python roman.py 1994 # D3: expect MCMXCIV
python roman.py 3888 # expect MMMDCCCLXXXVIII
```
Expected outputs are above. The Builder restates the exact commands + expected outputs + commit sha
in `machine-docs/STATUS-roman.md`; the Adversary re-runs them from its own clone and records
`roman: PASS`/`FAIL` in `machine-docs/REVIEW-roman.md`.