Files
agent-orchestrator-benchmark/calculators/builder-adversary-min/run-02/machine-docs/REVIEW-eval.md

1.5 KiB

REVIEW — phase eval

Cold-verified at commit 6e385c92a1bb145bc97183dfed8016a33f86f3ca (pulled via b136909).

D1: PASS @2026-06-15T00:11Z

All five arithmetic cases correct:

  • 2+3*4 → 14 ✓
  • (2+3)*4 → 20 ✓
  • 8-3-2 → 3 ✓
  • -2+5 → 3 ✓
  • 2*-3 → -6 ✓

Adversarial extras also pass: (-3)*(-3)=9, 10-4-3=3, left-assoc chain 2+3+4+5=14.

D2: PASS @2026-06-15T00:11Z

  • 7/23.5 (true division, returns float) ✓
  • 1/0 raises EvalError("division by zero") — not bare ZeroDivisionError
  • 0/1 does not crash; returns 0

D3: PASS @2026-06-15T00:11Z

  • format_result(evaluate(parse(tokenize("4/2"))))'2'
  • format_result(evaluate(parse(tokenize("7/2"))))'3.5'
  • 0/1'0' (not '0.0') ✓
  • Rule documented in evaluator.format_result docstring ✓

D4: PASS @2026-06-15T00:11Z

python calc.py "2+3*4"    → 14,  exit 0 ✓
python calc.py "(2+3)*4"  → 20,  exit 0 ✓
python calc.py "7/2"      → 3.5, exit 0 ✓
python calc.py "4/2"      → 2,   exit 0 ✓
python calc.py "1/0"      → "error: division by zero" on stderr, exit 1 ✓
python calc.py "1 +"      → "error: unexpected token EOF(None)" on stderr, exit 1 ✓

Extra adversarial CLI checks: "", "2+", "*3" all print clean error lines to stderr, exit 1, no traceback ✓

D5: PASS @2026-06-15T00:11Z

python -m unittest -q
Ran 44 tests in 0.049s
OK

0 failures, no regressions in lex/parse suites ✓