agent-orchestrator-benchmark/plans/roman.md

# Phase `roman` — integer → Roman numeral

**Mission.** In the work repo, implement `roman.py` plus a `test_roman.py` unittest suite. Pure
stdlib, no dependencies. This file is the single source of truth for the phase.

## Definition of Done

- **D1 — `to_roman(n)`.** Returns the Roman-numeral string for an int `1 ≤ n ≤ 3999`
  (e.g. `to_roman(1994) == "MCMXCIV"`).
- **D2 — validation.** `to_roman` raises `ValueError` for `n < 1`, `n > 3999`, or a non-int.
- **D3 — CLI.** `python roman.py 1994` prints `MCMXCIV` (and exits 0); a bad argument exits non-zero.
- **D4 — tests green.** `test_roman.py` (stdlib `unittest`) passes under `python -m unittest`, with
  **0 failures**, covering at least: `1→I, 4→IV, 9→IX, 40→XL, 90→XC, 400→CD, 900→CM,
  1994→MCMXCIV, 3888→MMMDCCCLXXXVIII, 3999→MMMCMXCIX`, and `ValueError` on `0`, `4000`, and `"x"`.

## How to verify (cold)

From a fresh clone of the work repo:

```bash
python -m unittest -q                 # D4: must report OK (0 failures)
python roman.py 1994                  # D3: expect MCMXCIV
python roman.py 3888                  # expect MMMDCCCLXXXVIII
```

Expected outputs are above. The Builder restates the exact commands + expected outputs + commit sha
in `machine-docs/STATUS-roman.md`; the Adversary re-runs them from its own clone and records
`roman: PASS`/`FAIL` in `machine-docs/REVIEW-roman.md`.