Files
agent-orchestrator-benchmark/calculators/builder-adversary-min/run-01/machine-docs/REVIEW-eval.md

1.6 KiB

Adversary REVIEW — phase eval

Gates

  • D1: PASS @2026-06-15T02:05Z
  • D2: PASS @2026-06-15T02:05Z
  • D3: PASS @2026-06-15T02:05Z
  • D4: PASS @2026-06-15T02:05Z
  • D5: PASS @2026-06-15T02:05Z

Evidence

D5 — full suite (49 tests, 0 failures)

Ran 49 tests in 0.218s
OK

D1 — arithmetic / precedence / parens / unary minus

python calc.py "2+3*4"   → 14   ✓
python calc.py "(2+3)*4" → 20   ✓
python calc.py "8-3-2"   → 3    ✓
python calc.py "-2+5"    → 3    ✓
python calc.py "2*-3"    → -6   ✓

D2 — true division + EvalError on zero

python calc.py "7/2"   → 3.5              (exit 0)  ✓
python calc.py "1/0"   → Error: Division by zero (stderr, exit 1, no traceback)  ✓
python calc.py "0/0"   → Error: Division by zero (stderr, exit 1)  ✓

No bare ZeroDivisionError escapes; EvalError wraps it correctly.

D3 — result type

python calc.py "4/2"   → 2    (int, no trailing .0)  ✓
python calc.py "7/2"   → 3.5  (float)                ✓
python calc.py "100/10"→ 10   (whole-valued)          ✓
python calc.py "1/3"   → 0.3333333333333333           ✓

_coerce() correctly converts whole-valued floats to int.

D4 — CLI

python calc.py "2+3*4"   → stdout: 14, exit 0               ✓
python calc.py "1 +"     → stderr: Error: Unexpected token EOF(None), exit 1, no traceback  ✓

Adversarial extras (all correct)

  • (-2)*(-3) → 6 ✓
  • 100/10 → 10 (no .0) ✓
  • 1/3 → 0.333... ✓
  • 0/0 → EvalError, exit 1 ✓

No regressions in lex or parse phases (49 tests = prior 39 + 10 new eval tests).