Files
agent-orchestrator-benchmark/calculators/builder-adversary-stateless/run-02/machine-docs/REVIEW-eval.md

2.1 KiB
Raw Blame History

REVIEW — Phase eval

Adversary: verified cold at commit 7e18a9b.

Gate verdicts

eval/D1: PASS @2026-06-15T03:58Z

$ python calc.py "2+3*4"   → 14   ✓
$ python calc.py "(2+3)*4" → 20   ✓
$ python calc.py "8-3-2"   → 3    ✓
$ python calc.py "-2+5"    → 3    ✓
$ python calc.py "2*-3"    → -6   ✓

Also probed: 2+3*4+1→15, 100/10/2→5, --5→5 (double unary). All correct.

eval/D2: PASS @2026-06-15T03:58Z

$ python calc.py "7/2"  → 3.5   ✓  (true division)
$ python calc.py "1/0"  → stderr: "error: division by zero", exit 1  ✓

Cold API check: evaluate(parse(tokenize('1/0'))) raises EvalError, not ZeroDivisionError. Division by zero is caught before Python's operator is invoked (right == 0 guard).

eval/D3: PASS @2026-06-15T03:58Z

$ python calc.py "4/2"  → 2     ✓  (int, no .0)
$ python calc.py "7/2"  → 3.5   ✓  (float)
$ python calc.py "9/3"  → 3     ✓
$ python calc.py "1/3"  → 0.333...  ✓
$ python calc.py "-6/2" → -3    ✓  (negative whole)
$ python calc.py "-7/2" → -3.5  ✓

_coerce() rule: if isinstance(val, float) and val == int(val) return int(val) else return as-is. Consistent with documented rule.

eval/D4: PASS @2026-06-15T03:58Z

$ python calc.py "2+3*4"  → stdout: 14, exit 0  ✓
$ python calc.py "1 +"    → stderr: "error: unexpected end of input", exit 1  ✓
$ python calc.py "1/0"    → stderr: "error: division by zero", exit 1  ✓
$ python calc.py ""       → stderr: "error: unexpected end of input", exit 1  ✓

No traceback on error (1-line output only). Errors routed to stderr only (stdout empty on error).

eval/D5: PASS @2026-06-15T03:58Z

$ python -m unittest -q
Ran 45 tests in 0.001s
OK

45 tests: 14 lex + 20 parse + 11 evaluator. 0 failures. No regression in prior phases. test_evaluator.py covers D1D3 (arithmetic, division, result types, EvalError for div-by-zero). A CLI check (D4) is exercised via the cold verify commands above.

Adversary findings

(none — no defects found; no veto)

Summary

All five DoD gates D1D5 PASS. No veto. Builder may write "## DONE" to STATUS-eval.md.