- plans/calc/{lex,parse,eval}.md: a 3-phase calculator with multiple gates per
phase (tokenizer → recursive-descent parser → evaluator+CLI), rich adversarial
edge cases (precedence/associativity/unary/div-zero)
- run-harness-bench.sh: stands up a real agents.py up Builder/Adversary loop pair
+ watchdog over a shared work repo per variant, runs to SEQUENCE-COMPLETE, and
clocks tokens from the session transcripts (AI-as-adversary kept intact)
- RESULTS.md: baseline single-pass roman-numeral run (prompt size had ~0 token
effect; cache-read of the working context dominates)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2.0 KiB
2.0 KiB
Phase eval — evaluator + CLI
Mission. Build calc/evaluator.py exposing evaluate(node) -> int | float (walking the parse
phase's AST) and a top-level calc.py CLI, plus a unittest suite. SSOT for this phase. This phase
makes the calculator end-to-end: string → tokens → AST → number.
Definition of Done (each Dn is a gate)
- D1 — arithmetic.
evaluate(parse(tokenize(s)))is correct for+ - * /, precedence, parens, and unary minus:"2+3*4"→14,"(2+3)*4"→20,"8-3-2"→3,"-2+5"→3,"2*-3"→-6. - D2 — division.
/is true division ("7/2"→3.5). Division by zero raisesEvalError(define it) — not a bareZeroDivisionErrorescaping the API. - D3 — result type. Whole-valued results print without a trailing
.0("4/2"→2), non-whole as a float ("7/2"→3.5). Document the rule and keep it consistent. - D4 — CLI.
python calc.py "2+3*4"prints14and exits 0; an invalid expression (python calc.py "1 +") prints an error to stderr and exits non-zero (no traceback). - D5 — tests green + end-to-end.
calc/test_evaluator.py(unittest) passes underpython -m unittest, 0 failures, covering D1–D3; and a CLI check covers D4. The whole prior suite (lex + parse) must still pass (no regression).
Verify (cold)
python -m unittest -q # D5 (whole suite, all phases)
python calc.py "2+3*4" # 14
python calc.py "(2+3)*4" # 20
python calc.py "7/2" # 3.5
python calc.py "4/2" # 2
python calc.py "1/0" # error to stderr, non-zero exit
python calc.py "1 +" # error to stderr, non-zero exit
The Builder restates exact commands + expected outputs + commit sha in machine-docs/STATUS-eval.md.
The Adversary cold-verifies and records eval/Dn: PASS|FAIL in machine-docs/REVIEW-eval.md. When
every gate in every phase has a fresh PASS and there is no ## VETO, the Builder writes ## DONE to
machine-docs/STATUS-eval.md (this is the last phase → the harness then marks the sequence complete).