Files
agent-orchestrator-benchmark/calculators/builder-adversary/run-01/machine-docs/REVIEW-eval.md

3.2 KiB

REVIEW — eval phase (Adversary)

Adversary cold-verification log. Each verdict is recorded here.

Status summary

  • D1 arithmetic: PASS
  • D2 division + EvalError: PASS
  • D3 result type: PASS
  • D4 CLI: PASS
  • D5 tests green + end-to-end: PASS

Verdicts

D1 — arithmetic: PASS @2026-06-15T00:27:02Z

Cold run, commit 165c7cc.

Commands run and results:

python calc.py "2+3*4"   → 14  ✓
python calc.py "(2+3)*4" → 20  ✓
python calc.py "8-3-2"   → 3   ✓
python calc.py "-2+5"    → 3   ✓
python calc.py "2*-3"    → -6  ✓
python calc.py "--5"     → 5   ✓ (double unary)
python calc.py "10-3-2"  → 5   ✓ (left-associativity)
python calc.py "-(2+3)"  → -5  ✓ (unary on paren)

D2 — division + EvalError: PASS @2026-06-15T00:27:02Z

python calc.py "7/2"        → 3.5  ✓ (true division)
python calc.py "1/0"        → "Error: Division by zero" on stderr, exit 1  ✓
python calc.py "5/(3-3)"    → "Error: Division by zero" on stderr, exit 1  ✓

Verified EvalError is raised (not bare ZeroDivisionError) via:

  • calc/evaluator.py:37-38 explicitly checks right == 0 and raises EvalError
  • calc.py:24 catches EvalError — ZeroDivisionError would escape if not caught; confirmed not raised

EvalError is a proper subclass of Exception: confirmed True.

D3 — result type: PASS @2026-06-15T00:27:02Z

evaluate(parse(tokenize("4/2"))) → int 2     ✓
evaluate(parse(tokenize("7/2"))) → float 3.5 ✓
evaluate(parse(tokenize("2+3"))) → int 5     ✓ (integer arithmetic stays int)
evaluate(parse(tokenize("-6/2"))) → int -3   ✓ (negative whole result is int)
evaluate(parse(tokenize("1000*1000"))) → int 1000000  ✓

Rule documented in calc/evaluator.py docstring. print(4/2) outputs 2 (not 2.0). ✓

D4 — CLI: PASS @2026-06-15T00:27:02Z

python calc.py "2+3*4"   → prints 14, exit 0   ✓
python calc.py "1 +"     → "Error: Unexpected end of input" on stderr, exit 1, no traceback  ✓
python calc.py "1/0"     → "Error: Division by zero" on stderr, exit 1, no traceback  ✓
python calc.py ""        → "Error: Empty input" on stderr, exit 1  ✓
python calc.py           → usage message on stderr, exit 1  ✓
python calc.py "((1+2)"  → error on stderr, exit 1, no traceback  ✓

No tracebacks in any error path. ✓

D5 — tests green + end-to-end: PASS @2026-06-15T00:27:02Z

python -m unittest -q  → Ran 62 tests in 0.001s  OK

62 tests = lex + parse + evaluator suites. 0 failures, 0 errors. No regression. ✓

Adversary findings

(none — all gates pass, no defects found)

Break-it probes run

  • Traceback check on all error paths: no traceback in any case ✓
  • No-args invocation: graceful usage error ✓
  • Empty string input: graceful error ✓
  • Double unary minus --5 → 5 ✓
  • Left-associativity 10-3-2 → 5 ✓
  • Unary in division -8/2 → -4 ✓
  • Negative whole result type -6/2 → int -3 ✓
  • Large numbers 1000*1000 → int 1000000 ✓
  • Division by zero via expression 5/(3-3) → EvalError ✓
  • Unclosed paren ((1+2) → parse error, no crash ✓

Initialized: 2026-06-15T00:24:45Z Verdicts filed: 2026-06-15T00:27:02Z