Files
agent-orchestrator-benchmark/calculators/builder-adversary-lean/run-01/machine-docs/STATUS-eval.md

3.6 KiB
Raw Blame History

STATUS — Phase eval (Builder)

DONE

All gates D1D5 Adversary-verified PASS @2026-06-15T06:43Z. Phase eval complete.

Current State

Gates D1D5 implemented, claimed, and Adversary-verified. Phase complete.

Implementation commit: 0fc263d


Gate D1 — arithmetic — CLAIMED, awaiting Adversary

WHAT: evaluate(parse(tokenize(s))) is correct for + - * /, precedence, parens, and unary minus.

HOW:

python -c "
from calc.lexer import tokenize
from calc.parser import parse
from calc.evaluator import evaluate
cases = [('2+3*4', 14), ('(2+3)*4', 20), ('8-3-2', 3), ('-2+5', 3), ('2*-3', -6)]
for expr, expected in cases:
    result = evaluate(parse(tokenize(expr)))
    status = 'OK' if result == expected else f'FAIL (got {result!r})'
    print(f'{status}: {expr!r} -> {result}')
"

EXPECTED:

OK: '2+3*4' -> 14
OK: '(2+3)*4' -> 20
OK: '8-3-2' -> 3
OK: '-2+5' -> 3
OK: '2*-3' -> -6

WHERE: calc/evaluator.py — commit 0fc263d


Gate D2 — division — CLAIMED, awaiting Adversary

WHAT: / is true division ("7/2" → 3.5). Division by zero raises EvalError, not bare ZeroDivisionError.

HOW:

python -c "
from calc.lexer import tokenize
from calc.parser import parse
from calc.evaluator import evaluate, EvalError

print(evaluate(parse(tokenize('7/2'))))

try:
    evaluate(parse(tokenize('1/0')))
    print('FAIL: no exception')
except EvalError as e:
    print(f'OK EvalError: {e}')
except ZeroDivisionError:
    print('FAIL: bare ZeroDivisionError escaped')
"

EXPECTED:

3.5
OK EvalError: division by zero

WHERE: calc/evaluator.py evaluate() BinOp '/' branch — commit 0fc263d


Gate D3 — result type — CLAIMED, awaiting Adversary

WHAT: Whole-valued results return int (no trailing .0); non-whole return float.

Rule: after division, if result.is_integer(), return int(result). Integer arithmetic stays int natively.

HOW:

python -c "
from calc.lexer import tokenize
from calc.parser import parse
from calc.evaluator import evaluate

r1 = evaluate(parse(tokenize('4/2')))
r2 = evaluate(parse(tokenize('7/2')))
r3 = evaluate(parse(tokenize('2+3*4')))
print(repr(r1), type(r1).__name__)
print(repr(r2), type(r2).__name__)
print(repr(r3), type(r3).__name__)
"

EXPECTED:

2 int
3.5 float
14 int

WHERE: calc/evaluator.py evaluate() BinOp '/' branch — commit 0fc263d


Gate D4 — CLI — CLAIMED, awaiting Adversary

WHAT: python calc.py "2+3*4" prints 14 and exits 0; invalid input prints error to stderr and exits non-zero (no traceback).

HOW:

python calc.py "2+3*4"; echo "exit:$?"
python calc.py "(2+3)*4"; echo "exit:$?"
python calc.py "7/2"; echo "exit:$?"
python calc.py "4/2"; echo "exit:$?"
python calc.py "1/0" 2>&1; echo "exit:$?"
python calc.py "1 +" 2>&1; echo "exit:$?"

EXPECTED:

14
exit:0
20
exit:0
3.5
exit:0
2
exit:0
error: division by zero
exit:1
error: unexpected end of expression
exit:1

WHERE: calc.py — commit 0fc263d


Gate D5 — tests green + end-to-end — CLAIMED, awaiting Adversary

WHAT: Full unittest suite passes with 0 failures: 15 lexer + 20 parser + 11 evaluator = 46 tests.

HOW:

python -m unittest -q

EXPECTED:

Ran 46 tests in 0.00Xs

OK

Test coverage in calc/test_evaluator.py:

  • TestArithmetic (5 tests): D1 — 2+3*4, (2+3)4, 8-3-2, -2+5, 2-3
  • TestDivision (3 tests): D2 — 7/2, 1/0, 5/(2-2)
  • TestResultType (3 tests): D3 — 4/2→int, 7/2→float, 2+3*4→int

WHERE: calc/test_evaluator.py, calc/evaluator.py — commit 0fc263d