Files
agent-orchestrator-benchmark/calculators/builder-adversary/run-02/machine-docs/STATUS-eval.md

2.5 KiB

STATUS-eval — Builder

DONE

Current state

Gate: D1, D2, D3, D4, D5 — all PASS (Adversary verified @2026-06-15T00:54Z)


Claims

D1 — arithmetic

What: evaluate(parse(tokenize(s))) correct for + - * /, precedence, parens, unary minus.

How to verify:

python -m unittest calc.test_evaluator.TestArithmetic -q

Expected: 0 failures. Specific cases:

  • "2+3*4" → 14
  • "(2+3)*4" → 20
  • "8-3-2" → 3
  • "-2+5" → 3
  • "2*-3" → -6

Where: calc/evaluator.py + calc/test_evaluator.py


D2 — division

What: / is true division; EvalError raised on division by zero, not bare ZeroDivisionError.

How to verify:

python -m unittest calc.test_evaluator.TestDivision -q

Expected: 0 failures. Includes:

  • "7/2" → 3.5
  • "1/0"EvalError
  • "5/(3-3)"EvalError
  • No ZeroDivisionError escaping the API

Where: calc/evaluator.py:18-21


D3 — result type

What: Whole-valued results print without .0; non-whole as float.

How to verify:

python -m unittest calc.test_evaluator.TestResultType -q
python calc.py "4/2"   # should print: 2
python calc.py "7/2"   # should print: 3.5

Expected: 0 failures; 4/22 (no dot), 7/23.5.

Rule: The _fmt() function in calc.py checks value.is_integer() on floats; if true, casts to int for display.

Where: calc.py:_fmt(), calc/test_evaluator.py:TestResultType


D4 — CLI

What: python calc.py "2+3*4" prints 14 and exits 0; invalid expression prints error to stderr and exits non-zero (no traceback).

How to verify:

python calc.py "2+3*4"     # stdout: 14, exit 0
python calc.py "(2+3)*4"   # stdout: 20, exit 0
python calc.py "7/2"       # stdout: 3.5, exit 0
python calc.py "4/2"       # stdout: 2, exit 0
python calc.py "1/0"       # stderr: error: ..., exit 1
python calc.py "1 +"       # stderr: error: ..., exit 1

Expected: Exact outputs as above. All errors caught by (LexError, ParseError, EvalError) — no Python traceback.

Where: calc.py


D5 — tests green + end-to-end

What: Full unittest suite (lex + parse + eval) passes with 0 failures; CLI checks cover D4.

How to verify:

python -m unittest -q

Expected:

Ran 68 tests in ~0.2s

OK

All 68 tests pass: 24 lex + 22 parse + 22 eval (including 6 CLI subprocess tests).

Commit sha: (see latest commit after push)

Where: calc/test_lexer.py, calc/test_parser.py, calc/test_evaluator.py