# REVIEW-eval — Adversary Verdicts

Phase: eval
Plan SSOT: /home/loops/project-orchestrator/projects/agent-orchestrator-benchmark/plans/calc/eval.md

## Gates

- D1 — arithmetic: PASS @2026-06-15T01:12:53Z
- D2 — division / EvalError: PASS @2026-06-15T01:12:53Z
- D3 — result type (no trailing .0): PASS @2026-06-15T01:12:53Z
- D4 — CLI: PASS @2026-06-15T01:12:53Z
- D5 — tests green + end-to-end: PASS @2026-06-15T01:12:53Z

## Verdicts

### D1 — arithmetic: PASS @2026-06-15T01:12:53Z

Cold-verified from work-adv clone (commit after pull: 070dc92).

Evidence (all outputs match expected):
- `python calc.py "2+3*4"` → `14` exit 0 ✓
- `python calc.py "(2+3)*4"` → `20` exit 0 ✓
- `python calc.py "8-3-2"` → `3` exit 0 ✓
- `python calc.py "-2+5"` → `3` exit 0 ✓
- `python calc.py "2*-3"` → `-6` exit 0 ✓
- `python calc.py "--5"` → `5` exit 0 ✓ (double unary)
- `python calc.py "3-3"` → `0` exit 0 ✓

### D2 — division / EvalError: PASS @2026-06-15T01:12:53Z

Evidence:
- `python calc.py "7/2"` → `3.5` exit 0 ✓ (true division)
- `1/0` raises `EvalError("division by zero")`, NOT bare `ZeroDivisionError` ✓
- `5/(3-3)` also raises `EvalError` ✓

### D3 — result type: PASS @2026-06-15T01:12:53Z

Evidence (types confirmed via Python `isinstance` check):
- `4/2` → `int(2)` (not `float(2.0)`) ✓
- `7/2` → `float(3.5)` ✓
- `2+3*4` → `int(14)` ✓
- `0.0/1` → `int(0)` (whole-float coercion works for zero) ✓
- `1.5+1.5` → `3` exit 0 (coerces 3.0 → int) ✓
- Rule documented in evaluator.py docstring ✓

### D4 — CLI: PASS @2026-06-15T01:12:53Z

Evidence:
- `python calc.py "2+3*4"` → stdout `14`, exit 0 ✓
- `python calc.py "1 +"` → stderr error, exit 1, no "Traceback" ✓
- `python calc.py "1/0"` → stderr error, exit 1, no "Traceback" ✓
- `python calc.py` (no args) → stderr usage msg, exit 1 ✓
- Error output confirmed routed to stderr (stdout suppressed, still exits 1) ✓

### D5 — tests green + end-to-end: PASS @2026-06-15T01:12:53Z

Evidence:
- `python -m unittest -q` → `Ran 68 tests in ...s` / `OK` ✓
- Breakdown: 18 lex + 26 parse + 24 eval = 68 total ✓
- Prior 44 tests (lex + parse) still pass — no regression ✓
- `python -m unittest calc.test_lexer calc.test_parser -q` → 44 tests OK ✓

## Adversary findings

None. No defects found. No VETO.