Files
agent-orchestrator-benchmark/calculators/builder-adversary-deferred/run-01/machine-docs/STATUS-eval.md

3.7 KiB
Raw Blame History

STATUS — Phase eval

DONE

All D1D5 gates Adversary-verified PASS @2026-06-16T00:18Z (REVIEW-eval.md). No VETO.

Gate: ALL CLAIMED, awaiting Adversary comprehensive verification


D1 — arithmetic

WHAT: evaluate(parse(tokenize(s))) is correct for +, -, *, /, precedence, parens, and unary minus. HOW:

python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('2+3*4'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('(2+3)*4'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('8-3-2'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('-2+5'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('2*-3'))))"

EXPECTED: 14, 20, 3, 3, -6 WHERE: calc/evaluator.py, calc/test_evaluator.py::TestArithmetic


D2 — division

WHAT: / is true division; division by zero raises EvalError, not bare ZeroDivisionError. HOW:

python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('7/2'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate, EvalError
try:
    evaluate(parse(tokenize('1/0')))
except EvalError as e:
    print('EvalError:', e)
except ZeroDivisionError:
    print('FAIL: bare ZeroDivisionError escaped')
"

EXPECTED: 3.5; then EvalError: division by zero WHERE: calc/evaluator.py, calc/test_evaluator.py::TestDivision


D3 — result type

WHAT: Whole-valued floats display without trailing .0; non-whole floats display normally. Rule: fmt_result(v) in calc/evaluator.py: if isinstance(v, float) and v.is_integer()str(int(v)), else str(v). HOW:

python calc.py "4/2"
python calc.py "7/2"

EXPECTED: 2, 3.5 WHERE: calc/evaluator.py::fmt_result, calc/test_evaluator.py::TestResultType, calc.py


D4 — CLI

WHAT: python calc.py "2+3*4" prints 14 and exits 0; errors print to stderr and exit non-zero with no traceback. HOW:

python calc.py "2+3*4"       # stdout: 14, exit 0
python calc.py "(2+3)*4"     # stdout: 20, exit 0
python calc.py "7/2"         # stdout: 3.5, exit 0
python calc.py "4/2"         # stdout: 2, exit 0
python calc.py "1/0"; echo "exit:$?"    # stderr: error, exit 1
python calc.py "1 +"; echo "exit:$?"   # stderr: error, exit 1

EXPECTED: 14, 20, 3.5, 2, then error+exit:1, error+exit:1 WHERE: calc.py


D5 — tests green + end-to-end

WHAT: Full unittest suite (lex + parse + eval) passes, 0 failures. HOW:

python -m unittest -q

EXPECTED: Ran 64 tests in X.XXXs\nOK WHERE: calc/test_lexer.py, calc/test_parser.py, calc/test_evaluator.py


Verification commands (for Adversary cold-verify)

python -m unittest -q
python calc.py "2+3*4"
python calc.py "(2+3)*4"
python calc.py "7/2"
python calc.py "4/2"
python calc.py "1/0"; echo "exit:$?"
python calc.py "1 +"; echo "exit:$?"

Adversary Verdict @2026-06-16T00:18Z

COMPREHENSIVE PASS — all D1D5 gates verified cold.

Cold-verified from work-adv clone (commit 21be8f5). Full verdicts in REVIEW-eval.md. Builder may now write ## DONE to this file.