recipe-maintainers/agent-orchestrator-benchmark

Fork 0

Files

mfowler bb85aa9f11 artifacts: add calculators/ — the 30 built calculators (5/variant) + machine-docs + git logs

2026-06-16 15:39:42 +00:00

2.5 KiB

Raw Blame History

STATUS-eval — Builder

DONE

Current state

Gate: D1, D2, D3, D4, D5 — all PASS (Adversary verified @2026-06-15T00:54Z)

Claims

D1 — arithmetic

What: evaluate(parse(tokenize(s))) correct for + - * /, precedence, parens, unary minus.

How to verify:

python -m unittest calc.test_evaluator.TestArithmetic -q

Expected: 0 failures. Specific cases:

"2+3*4" → 14
"(2+3)*4" → 20
"8-3-2" → 3
"-2+5" → 3
"2*-3" → -6

Where: calc/evaluator.py + calc/test_evaluator.py

D2 — division

What: / is true division; EvalError raised on division by zero, not bare ZeroDivisionError.

How to verify:

python -m unittest calc.test_evaluator.TestDivision -q

Expected: 0 failures. Includes:

"7/2" → 3.5
"1/0" → EvalError
"5/(3-3)" → EvalError
No ZeroDivisionError escaping the API

Where: calc/evaluator.py:18-21

D3 — result type

What: Whole-valued results print without .0; non-whole as float.

How to verify:

python -m unittest calc.test_evaluator.TestResultType -q
python calc.py "4/2"   # should print: 2
python calc.py "7/2"   # should print: 3.5

Expected: 0 failures; 4/2 → 2 (no dot), 7/2 → 3.5.

Rule: The _fmt() function in calc.py checks value.is_integer() on floats; if true, casts to int for display.

Where: calc.py:_fmt(), calc/test_evaluator.py:TestResultType

D4 — CLI

What: python calc.py "2+3*4" prints 14 and exits 0; invalid expression prints error to stderr and exits non-zero (no traceback).

How to verify:

python calc.py "2+3*4"     # stdout: 14, exit 0
python calc.py "(2+3)*4"   # stdout: 20, exit 0
python calc.py "7/2"       # stdout: 3.5, exit 0
python calc.py "4/2"       # stdout: 2, exit 0
python calc.py "1/0"       # stderr: error: ..., exit 1
python calc.py "1 +"       # stderr: error: ..., exit 1

Expected: Exact outputs as above. All errors caught by (LexError, ParseError, EvalError) — no Python traceback.

Where: calc.py

D5 — tests green + end-to-end

What: Full unittest suite (lex + parse + eval) passes with 0 failures; CLI checks cover D4.

How to verify:

python -m unittest -q

Expected:

Ran 68 tests in ~0.2s

OK

All 68 tests pass: 24 lex + 22 parse + 22 eval (including 6 CLI subprocess tests).

Commit sha: (see latest commit after push)

Where: calc/test_lexer.py, calc/test_parser.py, calc/test_evaluator.py

2.5 KiB Raw Blame History

STATUS-eval — Builder

DONE

Current state

Claims

D1 — arithmetic

D2 — division

D3 — result type

D4 — CLI

D5 — tests green + end-to-end

2.5 KiB

Raw Blame History