agent-orchestrator-benchmark/plans/calc/eval.md

# Phase `eval` — evaluator + CLI

**Mission.** Build `calc/evaluator.py` exposing `evaluate(node) -> int | float` (walking the `parse`
phase's AST) and a top-level `calc.py` CLI, plus a `unittest` suite. SSOT for this phase. This phase
makes the calculator end-to-end: string → tokens → AST → number.

## Definition of Done (each Dn is a gate)

- **D1 — arithmetic.** `evaluate(parse(tokenize(s)))` is correct for `+ - * /`, precedence, parens,
  and unary minus: `"2+3*4"`→14, `"(2+3)*4"`→20, `"8-3-2"`→3, `"-2+5"`→3, `"2*-3"`→-6.
- **D2 — division.** `/` is true division (`"7/2"`→3.5). Division by zero raises `EvalError` (define
  it) — not a bare `ZeroDivisionError` escaping the API.
- **D3 — result type.** Whole-valued results print without a trailing `.0` (`"4/2"`→`2`), non-whole
  as a float (`"7/2"`→`3.5`). Document the rule and keep it consistent.
- **D4 — CLI.** `python calc.py "2+3*4"` prints `14` and exits 0; an invalid expression
  (`python calc.py "1 +"`) prints an error to stderr and exits non-zero (no traceback).
- **D5 — tests green + end-to-end.** `calc/test_evaluator.py` (`unittest`) passes under
  `python -m unittest`, 0 failures, covering D1–D3; and a CLI check covers D4. The whole prior suite
  (lex + parse) must still pass (no regression).

## Verify (cold)

```bash
python -m unittest -q                              # D5 (whole suite, all phases)
python calc.py "2+3*4"     # 14
python calc.py "(2+3)*4"   # 20
python calc.py "7/2"       # 3.5
python calc.py "4/2"       # 2
python calc.py "1/0"       # error to stderr, non-zero exit
python calc.py "1 +"       # error to stderr, non-zero exit
```

The Builder restates exact commands + expected outputs + commit sha in `machine-docs/STATUS-eval.md`.
The Adversary cold-verifies and records `eval/Dn: PASS|FAIL` in `machine-docs/REVIEW-eval.md`. When
every gate in every phase has a fresh PASS and there is no `## VETO`, the Builder writes `## DONE` to
`machine-docs/STATUS-eval.md` (this is the last phase → the harness then marks the sequence complete).