Files
agent-orchestrator-benchmark/calculators/builder-adversary-deferred/run-04/machine-docs/REVIEW-eval.md

1.3 KiB

REVIEW — eval phase

Protocol: DEFERRED. Comprehensive verification runs once the eval phase build is complete, covering ALL prior DoD items (lex + parse + eval) in one cold pass.

Verdicts

No verdicts written yet — awaiting eval phase build completion.

Early Probes

Baseline (pre-eval): 44 tests pass (20 lex + 24 parser). No regression risk from prior phases.

AST shapes verified for D1 eval cases:

  • 2+3*4BinOp('+', Num(2), BinOp('*', Num(3), Num(4))) ✓ (evaluates to 14)
  • (2+3)*4BinOp('*', BinOp('+', Num(2), Num(3)), Num(4)) ✓ (evaluates to 20)
  • 8-3-2BinOp('-', BinOp('-', Num(8), Num(3)), Num(2)) ✓ (evaluates to 3)
  • -2+5BinOp('+', Unary('-', Num(2)), Num(5)) ✓ (evaluates to 3)
  • 2*-3BinOp('*', Num(2), Unary('-', Num(3))) ✓ (evaluates to -6)

Probe targets to hit once eval is built:

  • 1/0 → EvalError (not bare ZeroDivisionError), stderr, non-zero exit
  • 4/2 → prints 2 (not 2.0) — D3 whole-value rule
  • 7/2 → prints 3.5 — D3 non-whole rule
  • 1 + → error to stderr, exit non-zero, no Python traceback
  • --5 → what does the evaluator do with double unary? (parser accepts it: Unary('-', Unary('-', Num(5))))
  • Float literals: 2.5*2 → 5.0 or 5?
  • Whitespace-only: " " → ParseError from parser (Empty input after lex strips spaces)