Files
agent-orchestrator-benchmark/calculators/builder-adversary-lean/run-04/machine-docs/REVIEW-eval.md

2.1 KiB

REVIEW — phase eval (Adversary)

Gates

eval/D1: PASS @2026-06-15T06:07Z

Cold-verified all 5 plan cases:

  • "2+3*4" → 14 ✓
  • "(2+3)*4" → 20 ✓
  • "8-3-2" → 3 ✓
  • "-2+5" → 3 ✓
  • "2*-3" → -6 ✓

python -m unittest calc.test_evaluator.TestD1Arithmetic -v: 5/5 ok.

Break-it probes: 3+4+5→12, 10-2*3→4, -(3+4)→-7, 2*3+4*5→26, -(-5)→5 — all correct.


eval/D2: PASS @2026-06-15T06:07Z

Cold-verified:

  • "7/2" → 3.5 (true division) ✓
  • "1/0" raises EvalError("division by zero"), NOT bare ZeroDivisionError

python -m unittest calc.test_evaluator.TestD2Division -v: 3/3 ok.

Break-it probes: 0/0 raises EvalError ✓, 1/(2-2) raises EvalError ✓.


eval/D3: PASS @2026-06-15T06:07Z

Cold-verified:

  • format_result(2.0)"2" (no .0) ✓
  • format_result(3.5)"3.5"
  • calc.py "4/2" prints 2
  • calc.py "7/2" prints 3.5

python -m unittest calc.test_evaluator.TestD3ResultType -v: 5/5 ok.

Break-it probes: integers (14, 0, -6), non-whole floats (3.5, -3.5), whole floats (2.0, 100.0) — all formatted correctly.


eval/D4: PASS @2026-06-15T06:08Z

Cold-verified exact plan spot-checks:

  • calc.py "2+3*4" → stdout 14, exit 0 ✓
  • calc.py "(2+3)*4" → stdout 20, exit 0 ✓
  • calc.py "7/2" → stdout 3.5, exit 0 ✓
  • calc.py "4/2" → stdout 2, exit 0 ✓
  • calc.py "1/0" → stderr error: division by zero, exit 1 ✓
  • calc.py "1 +" → stderr error: ..., exit 1 ✓

python -m unittest calc.test_evaluator.TestD4CLI -v: 6/6 ok.

Break-it probes: no traceback on error ✓, error goes to stderr not stdout ✓, no-args exits 1 ✓.


eval/D5: PASS @2026-06-15T06:08Z

Cold-verified:

  • python -m unittest -q: 50 tests in 0.210s — OK
  • All 6 plan verification commands produce correct output / exit codes ✓
  • No regression in lex or parse suites (19 lex + 12 parse all still green) ✓
  • test_evaluator.py covers D1 (5 tests) + D2 (3 tests) + D3 (5 tests) + D4 (6 tests) = 19 evaluator tests ✓