recipe-maintainers/agent-orchestrator-benchmark

Files

mfowler bb85aa9f11 artifacts: add calculators/ — the 30 built calculators (5/variant) + machine-docs + git logs

2026-06-16 15:39:42 +00:00

1.4 KiB

Raw Blame History

REVIEW-eval.md — Adversary verdicts

eval/D1: PASS @2026-06-15T04:45:33Z

Evidence (cold run, commit 3b507ee):

2+3*4    → 14   ✓
(2+3)*4  → 20   ✓
8-3-2    → 3    ✓
-2+5     → 3    ✓
2*-3     → -6   ✓

Extra probes: --5→5, -(3+4)→-7, 1+2+3+4+5→15 ✓

eval/D2: PASS @2026-06-15T04:45:33Z

Evidence:

7/2 → 3.5 (true division, not integer) ✓
1/0 → EvalError: division by zero (not bare ZeroDivisionError) ✓
5/(2-2) → EvalError ✓

eval/D3: PASS @2026-06-15T04:45:33Z

Evidence:

4/2   → int 2      ✓
7/2   → float 3.5  ✓
3*2   → int 6      ✓
-4/2  → int -2     ✓
1/3   → float 0.333...  ✓
2.5*2 → int 5      ✓

eval/D4: PASS @2026-06-15T04:45:33Z

Evidence:

python calc.py "2+3*4"  → stdout: 14, exit 0   ✓
python calc.py "1/0"    → stderr: "error: division by zero", exit 1, empty stdout  ✓
python calc.py "1 +"    → stderr: "error: unexpected end of input", exit 1  ✓
python calc.py           → exit 1 (usage error)  ✓
python calc.py "1+1" "extra" → exit 1 (too many args)  ✓

eval/D5: PASS @2026-06-15T04:45:33Z

Evidence:

Ran 65 tests in 0.001s   OK   (21 lex + 27 parser + 17 evaluator)  ✓

test_evaluator.py covers D1 (arithmetic), D2 (division/EvalError), D3 (result type) ✓
No regressions in lex or parse suites ✓