- plans/calc/{lex,parse,eval}.md: a 3-phase calculator with multiple gates per
phase (tokenizer → recursive-descent parser → evaluator+CLI), rich adversarial
edge cases (precedence/associativity/unary/div-zero)
- run-harness-bench.sh: stands up a real agents.py up Builder/Adversary loop pair
+ watchdog over a shared work repo per variant, runs to SEQUENCE-COMPLETE, and
clocks tokens from the session transcripts (AI-as-adversary kept intact)
- RESULTS.md: baseline single-pass roman-numeral run (prompt size had ~0 token
effect; cache-read of the working context dominates)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
37 lines
2.0 KiB
Markdown
37 lines
2.0 KiB
Markdown
# Phase `eval` — evaluator + CLI
|
||
|
||
**Mission.** Build `calc/evaluator.py` exposing `evaluate(node) -> int | float` (walking the `parse`
|
||
phase's AST) and a top-level `calc.py` CLI, plus a `unittest` suite. SSOT for this phase. This phase
|
||
makes the calculator end-to-end: string → tokens → AST → number.
|
||
|
||
## Definition of Done (each Dn is a gate)
|
||
|
||
- **D1 — arithmetic.** `evaluate(parse(tokenize(s)))` is correct for `+ - * /`, precedence, parens,
|
||
and unary minus: `"2+3*4"`→14, `"(2+3)*4"`→20, `"8-3-2"`→3, `"-2+5"`→3, `"2*-3"`→-6.
|
||
- **D2 — division.** `/` is true division (`"7/2"`→3.5). Division by zero raises `EvalError` (define
|
||
it) — not a bare `ZeroDivisionError` escaping the API.
|
||
- **D3 — result type.** Whole-valued results print without a trailing `.0` (`"4/2"`→`2`), non-whole
|
||
as a float (`"7/2"`→`3.5`). Document the rule and keep it consistent.
|
||
- **D4 — CLI.** `python calc.py "2+3*4"` prints `14` and exits 0; an invalid expression
|
||
(`python calc.py "1 +"`) prints an error to stderr and exits non-zero (no traceback).
|
||
- **D5 — tests green + end-to-end.** `calc/test_evaluator.py` (`unittest`) passes under
|
||
`python -m unittest`, 0 failures, covering D1–D3; and a CLI check covers D4. The whole prior suite
|
||
(lex + parse) must still pass (no regression).
|
||
|
||
## Verify (cold)
|
||
|
||
```bash
|
||
python -m unittest -q # D5 (whole suite, all phases)
|
||
python calc.py "2+3*4" # 14
|
||
python calc.py "(2+3)*4" # 20
|
||
python calc.py "7/2" # 3.5
|
||
python calc.py "4/2" # 2
|
||
python calc.py "1/0" # error to stderr, non-zero exit
|
||
python calc.py "1 +" # error to stderr, non-zero exit
|
||
```
|
||
|
||
The Builder restates exact commands + expected outputs + commit sha in `machine-docs/STATUS-eval.md`.
|
||
The Adversary cold-verifies and records `eval/Dn: PASS|FAIL` in `machine-docs/REVIEW-eval.md`. When
|
||
every gate in every phase has a fresh PASS and there is no `## VETO`, the Builder writes `## DONE` to
|
||
`machine-docs/STATUS-eval.md` (this is the last phase → the harness then marks the sequence complete).
|