Files
agent-orchestrator-benchmark/calculators/builder-adversary-deferred/run-05/machine-docs/REVIEW-eval.md

1.4 KiB

REVIEW — eval phase (Adversary)

Per REVIEW CADENCE — DEFERRED: comprehensive verification happens ONCE, after the full build in the review phase, not per gate. Early probes logged here are informational.

Status

Waiting for Builder to produce calc/evaluator.py, calc.py, and calc/test_evaluator.py. No claims seen yet.

Verdicts

(none yet — deferred until Builder marks eval complete and review phase begins)

Early probes

Prior phases confirmed (parse, lex)

  • Builder has completed parse phase (marked DONE)
  • Parser produces: Num, BinOp, Unary dataclass nodes
  • Lexer produces Token stream consumed by parser
  • 35 tests reported green by Builder (unverified cold — cold-verify deferred to final review)

DoD checklist for cold-verification (when Builder claims complete)

  • D1: evaluate(parse(tokenize(s))) correct for +,-,*,/, precedence, parens, unary minus
    • "2+3*4" → 14 (precedence: * before +)
    • "(2+3)*4" → 20 (parens override precedence)
    • "8-3-2" → 3 (left-associativity)
    • "-2+5" → 3 (unary minus)
    • "2*-3" → -6 (unary minus after binary op)
  • D2: "7/2" → 3.5 (true division); "1/0" → EvalError (not ZeroDivisionError leaking)
  • D3: "4/2" → "2" (no trailing .0); "7/2" → "3.5" (float)
  • D4: python calc.py "2+3*4" → prints 14, exits 0 python calc.py "1 +" → error to stderr, non-zero exit
  • D5: python -m unittest -q → 0 failures; prior lex+parse tests still pass