Files
agent-orchestrator-benchmark/calculators/builder-adversary-lean/run-05/machine-docs/STATUS-eval.md

2.4 KiB
Raw Blame History

STATUS — eval phase

DONE

All gates D1D5 Adversary-verified PASS @2026-06-15T06:19:58Z. No veto. Sequence complete.

Current state

All gates D1D5 implemented, claimed, and Adversary-verified.

Gate D1 — CLAIMED, awaiting Adversary

WHAT: evaluate(parse(tokenize(s))) correct for +, -, *, /, precedence, parens, unary minus.

HOW: Run from repo root:

python -m unittest calc.test_evaluator.TestArithmetic -v

EXPECTED: 8 tests pass, 0 failures.

WHERE: calc/evaluator.py, calc/test_evaluator.py @ HEAD

Spot-check direct values:

python calc.py "2+3*4"   # 14
python calc.py "(2+3)*4" # 20
python calc.py "8-3-2"   # 3
python calc.py "-2+5"    # 3
python calc.py "2*-3"    # -6

Gate D2 — CLAIMED, awaiting Adversary

WHAT: / is true division; division by zero raises EvalError, not bare ZeroDivisionError.

HOW:

python -m unittest calc.test_evaluator.TestDivision -v
python calc.py "7/2"   # 3.5
python calc.py "1/0"   # error to stderr, exit 1

EXPECTED: 3 tests pass; 7/23.5; 1/0 → stderr + exit 1.

WHERE: calc/evaluator.py lines 3034, calc/test_evaluator.py class TestDivision.


Gate D3 — CLAIMED, awaiting Adversary

WHAT: Whole-valued results → int (no .0); fractional → float.

HOW:

python -m unittest calc.test_evaluator.TestResultType -v
python calc.py "4/2"  # 2
python calc.py "7/2"  # 3.5

EXPECTED: 4 tests pass; 4/22; 7/23.5.

WHERE: _normalize() in calc/evaluator.py lines 1417; class TestResultType in calc/test_evaluator.py.


Gate D4 — CLAIMED, awaiting Adversary

WHAT: python calc.py "2+3*4" prints 14, exits 0; invalid expr prints to stderr and exits nonzero.

HOW:

python calc.py "2+3*4"   # stdout: 14, exit 0
python calc.py "1 +"     # stderr: error message, exit 1
python calc.py "1/0"     # stderr: error message, exit 1
python -m unittest calc.test_evaluator.TestCLI -v

EXPECTED: 6 CLI tests pass; outputs as stated above.

WHERE: calc.py (repo root).


Gate D5 — CLAIMED, awaiting Adversary

WHAT: Full suite (lex + parse + eval) passes with 0 failures; prior phases not regressed.

HOW:

python -m unittest -q

EXPECTED: 69 tests in ~0.1s, OK, 0 failures.

WHERE: calc/test_lexer.py, calc/test_parser.py, calc/test_evaluator.py.