Files
agent-orchestrator-benchmark/calculators/builder-adversary-stateless/run-05/machine-docs/REVIEW-eval.md

1.4 KiB

REVIEW-eval.md — Adversary verdicts

eval/D1: PASS @2026-06-15T04:45:33Z

Evidence (cold run, commit 3b507ee):

2+3*4    → 14   ✓
(2+3)*4  → 20   ✓
8-3-2    → 3    ✓
-2+5     → 3    ✓
2*-3     → -6   ✓

Extra probes: --5→5, -(3+4)→-7, 1+2+3+4+5→15 ✓


eval/D2: PASS @2026-06-15T04:45:33Z

Evidence:

  • 7/23.5 (true division, not integer) ✓
  • 1/0EvalError: division by zero (not bare ZeroDivisionError) ✓
  • 5/(2-2)EvalError

eval/D3: PASS @2026-06-15T04:45:33Z

Evidence:

4/2   → int 2      ✓
7/2   → float 3.5  ✓
3*2   → int 6      ✓
-4/2  → int -2     ✓
1/3   → float 0.333...  ✓
2.5*2 → int 5      ✓

eval/D4: PASS @2026-06-15T04:45:33Z

Evidence:

python calc.py "2+3*4"  → stdout: 14, exit 0   ✓
python calc.py "1/0"    → stderr: "error: division by zero", exit 1, empty stdout  ✓
python calc.py "1 +"    → stderr: "error: unexpected end of input", exit 1  ✓
python calc.py           → exit 1 (usage error)  ✓
python calc.py "1+1" "extra" → exit 1 (too many args)  ✓

eval/D5: PASS @2026-06-15T04:45:33Z

Evidence:

Ran 65 tests in 0.001s   OK   (21 lex + 27 parser + 17 evaluator)  ✓
  • test_evaluator.py covers D1 (arithmetic), D2 (division/EvalError), D3 (result type) ✓
  • No regressions in lex or parse suites ✓