recipe-maintainers/agent-orchestrator-benchmark

Fork 0

Files

mfowler bb85aa9f11 artifacts: add calculators/ — the 30 built calculators (5/variant) + machine-docs + git logs

2026-06-16 15:39:42 +00:00

3.5 KiB

Raw Blame History

STATUS — phase eval

Role

Builder (Adversary monitors)

Phase

eval — evaluator + CLI

Gates

D1: CLAIMED, awaiting Adversary
D2: CLAIMED, awaiting Adversary
D3: CLAIMED, awaiting Adversary
D4: CLAIMED, awaiting Adversary
D5: CLAIMED, awaiting Adversary

Gate D1 — arithmetic CLAIMED

WHAT: evaluate(parse(tokenize(s))) is correct for + - * /, precedence, parens, unary minus.

HOW:

python -m unittest calc.test_evaluator.TestD1Arithmetic -v

EXPECTED:

test_add_mul_precedence ... ok   # "2+3*4" -> 14
test_parens ... ok               # "(2+3)*4" -> 20
test_left_assoc_sub ... ok       # "8-3-2" -> 3
test_unary_minus ... ok          # "-2+5" -> 3
test_mul_unary ... ok            # "2*-3" -> -6

5 tests, 0 failures, exit 0.

WHERE: calc/evaluator.py + calc/test_evaluator.py, commit 3e0b844

Gate D2 — division CLAIMED

WHAT: / is true division ("7/2"→3.5). Division by zero raises EvalError — not bare ZeroDivisionError.

HOW:

python -m unittest calc.test_evaluator.TestD2Division -v

EXPECTED:

test_div_by_zero_not_bare ... ok
test_div_by_zero_raises_eval_error ... ok
test_true_division ... ok

3 tests, 0 failures, exit 0.

WHERE: calc/evaluator.py (EvalError defined, division guard at line 27-29), commit 3e0b844

Gate D3 — result type CLAIMED

WHAT: Whole-valued floats print without .0 ("4/2"→2); non-whole print as float ("7/2"→3.5). Rule is in format_result() in calc/evaluator.py.

HOW:

python -m unittest calc.test_evaluator.TestD3ResultType -v
# And manually:
python calc.py "4/2"   # expect: 2
python calc.py "7/2"   # expect: 3.5

EXPECTED:

test_calc_4_div_2_whole ... ok
test_calc_7_div_2_nonwhole ... ok
test_format_int ... ok
test_format_nonwhole_float ... ok
test_format_whole_float ... ok

5 tests, 0 failures, exit 0. CLI: 4/2 → 2, 7/2 → 3.5.

WHERE: calc/evaluator.py format_result() + calc.py line using it + calc/test_evaluator.py TestD3ResultType, commit 3e0b844

Gate D4 — CLI CLAIMED

WHAT: python calc.py "2+3*4" prints 14, exits 0. python calc.py "1 +" prints error to stderr, exits non-zero (no traceback).

HOW:

python -m unittest calc.test_evaluator.TestD4CLI -v
# Manual spot-checks:
python calc.py "2+3*4"   # stdout: 14, exit 0
python calc.py "1 +"     # stderr: error:..., exit 1, stdout empty
python calc.py "1/0"     # stderr: error:..., exit 1

EXPECTED:

test_div_by_zero_exits_nonzero ... ok
test_float_result ... ok
test_invalid_exits_nonzero ... ok
test_parens_cli ... ok
test_simple_expr ... ok
test_whole_float_no_dot ... ok

6 tests, 0 failures, exit 0.

WHERE: calc.py (repo root) + calc/test_evaluator.py TestD4CLI, commit 3e0b844

Gate D5 — tests green + end-to-end CLAIMED

WHAT: calc/test_evaluator.py passes covering D1–D3+D4; whole prior suite (lex+parse) still passes; no regression.

HOW:

python -m unittest -q
# Full end-to-end plan verify:
python calc.py "2+3*4"   # 14
python calc.py "(2+3)*4" # 20
python calc.py "7/2"     # 3.5
python calc.py "4/2"     # 2
python calc.py "1/0"     # error to stderr, exit 1
python calc.py "1 +"     # error to stderr, exit 1

EXPECTED:

Ran 50 tests in ~0.2s

OK

Exit 0. 50 tests total: 19 lex + 12 parse + 19 evaluator (5 D1 + 3 D2 + 5 D3 + 6 D4).

WHERE: calc/test_evaluator.py, calc/test_lexer.py, calc/test_parser.py, commit 3e0b844

3.5 KiB Raw Blame History Unescape Escape

STATUS — phase eval

Role

Phase

Gates

Gate D1 — arithmetic CLAIMED

Gate D2 — division CLAIMED

Gate D3 — result type CLAIMED

Gate D4 — CLI CLAIMED

Gate D5 — tests green + end-to-end CLAIMED

3.5 KiB

Raw Blame History