Files
agent-orchestrator-benchmark/calculators/builder-adversary-lean/run-04/machine-docs/STATUS-eval.md

3.5 KiB
Raw Blame History

STATUS — phase eval

Role

Builder (Adversary monitors)

Phase

eval — evaluator + CLI

Gates

  • D1: CLAIMED, awaiting Adversary
  • D2: CLAIMED, awaiting Adversary
  • D3: CLAIMED, awaiting Adversary
  • D4: CLAIMED, awaiting Adversary
  • D5: CLAIMED, awaiting Adversary

Gate D1 — arithmetic CLAIMED

WHAT: evaluate(parse(tokenize(s))) is correct for + - * /, precedence, parens, unary minus.

HOW:

python -m unittest calc.test_evaluator.TestD1Arithmetic -v

EXPECTED:

test_add_mul_precedence ... ok   # "2+3*4" -> 14
test_parens ... ok               # "(2+3)*4" -> 20
test_left_assoc_sub ... ok       # "8-3-2" -> 3
test_unary_minus ... ok          # "-2+5" -> 3
test_mul_unary ... ok            # "2*-3" -> -6

5 tests, 0 failures, exit 0.

WHERE: calc/evaluator.py + calc/test_evaluator.py, commit 3e0b844


Gate D2 — division CLAIMED

WHAT: / is true division ("7/2"→3.5). Division by zero raises EvalError — not bare ZeroDivisionError.

HOW:

python -m unittest calc.test_evaluator.TestD2Division -v

EXPECTED:

test_div_by_zero_not_bare ... ok
test_div_by_zero_raises_eval_error ... ok
test_true_division ... ok

3 tests, 0 failures, exit 0.

WHERE: calc/evaluator.py (EvalError defined, division guard at line 27-29), commit 3e0b844


Gate D3 — result type CLAIMED

WHAT: Whole-valued floats print without .0 ("4/2"2); non-whole print as float ("7/2"3.5). Rule is in format_result() in calc/evaluator.py.

HOW:

python -m unittest calc.test_evaluator.TestD3ResultType -v
# And manually:
python calc.py "4/2"   # expect: 2
python calc.py "7/2"   # expect: 3.5

EXPECTED:

test_calc_4_div_2_whole ... ok
test_calc_7_div_2_nonwhole ... ok
test_format_int ... ok
test_format_nonwhole_float ... ok
test_format_whole_float ... ok

5 tests, 0 failures, exit 0. CLI: 4/22, 7/23.5.

WHERE: calc/evaluator.py format_result() + calc.py line using it + calc/test_evaluator.py TestD3ResultType, commit 3e0b844


Gate D4 — CLI CLAIMED

WHAT: python calc.py "2+3*4" prints 14, exits 0. python calc.py "1 +" prints error to stderr, exits non-zero (no traceback).

HOW:

python -m unittest calc.test_evaluator.TestD4CLI -v
# Manual spot-checks:
python calc.py "2+3*4"   # stdout: 14, exit 0
python calc.py "1 +"     # stderr: error:..., exit 1, stdout empty
python calc.py "1/0"     # stderr: error:..., exit 1

EXPECTED:

test_div_by_zero_exits_nonzero ... ok
test_float_result ... ok
test_invalid_exits_nonzero ... ok
test_parens_cli ... ok
test_simple_expr ... ok
test_whole_float_no_dot ... ok

6 tests, 0 failures, exit 0.

WHERE: calc.py (repo root) + calc/test_evaluator.py TestD4CLI, commit 3e0b844


Gate D5 — tests green + end-to-end CLAIMED

WHAT: calc/test_evaluator.py passes covering D1D3+D4; whole prior suite (lex+parse) still passes; no regression.

HOW:

python -m unittest -q
# Full end-to-end plan verify:
python calc.py "2+3*4"   # 14
python calc.py "(2+3)*4" # 20
python calc.py "7/2"     # 3.5
python calc.py "4/2"     # 2
python calc.py "1/0"     # error to stderr, exit 1
python calc.py "1 +"     # error to stderr, exit 1

EXPECTED:

Ran 50 tests in ~0.2s

OK

Exit 0. 50 tests total: 19 lex + 12 parse + 19 evaluator (5 D1 + 3 D2 + 5 D3 + 6 D4).

WHERE: calc/test_evaluator.py, calc/test_lexer.py, calc/test_parser.py, commit 3e0b844