- plans/calc/{lex,parse,eval}.md: a 3-phase calculator with multiple gates per
phase (tokenizer → recursive-descent parser → evaluator+CLI), rich adversarial
edge cases (precedence/associativity/unary/div-zero)
- run-harness-bench.sh: stands up a real agents.py up Builder/Adversary loop pair
+ watchdog over a shared work repo per variant, runs to SEQUENCE-COMPLETE, and
clocks tokens from the session transcripts (AI-as-adversary kept intact)
- RESULTS.md: baseline single-pass roman-numeral run (prompt size had ~0 token
effect; cache-read of the working context dominates)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
35 lines
1.8 KiB
Markdown
35 lines
1.8 KiB
Markdown
# Phase `lex` — tokenizer
|
||
|
||
**Mission.** Start a Python arithmetic calculator. In this phase build the **lexer**: `calc/lexer.py`
|
||
exposing `tokenize(src: str) -> list[Token]`, plus a `unittest` suite. Pure stdlib. This file is the
|
||
single source of truth for the phase. (Later phases add the parser and evaluator — design the Token
|
||
type so they can consume it.)
|
||
|
||
A `Token` has at least a `kind` and a `value`. Kinds: `NUMBER`, `PLUS`, `MINUS`, `STAR`, `SLASH`,
|
||
`LPAREN`, `RPAREN`, and `EOF` as the final token.
|
||
|
||
## Definition of Done (each Dn is a gate: Builder claims it, Adversary cold-verifies)
|
||
|
||
- **D1 — numbers.** Integers (`42`) and floats (`3.14`, `.5`, `10.`) tokenize to one `NUMBER` token
|
||
whose value is the numeric value (int or float). `tokenize("42")` → `[NUMBER(42), EOF]`.
|
||
- **D2 — operators & parens.** `+ - * / ( )` each tokenize to the right kind; `tokenize("1+2*3")`
|
||
yields `NUMBER PLUS NUMBER STAR NUMBER EOF`.
|
||
- **D3 — whitespace & errors.** Spaces/tabs between tokens are skipped; an invalid character (e.g.
|
||
`@`, `$`, a letter) raises `LexError` (define it in the module) with the offending character and
|
||
its position in the message.
|
||
- **D4 — tests green.** `calc/test_lexer.py` (`unittest`) passes under `python -m unittest`, 0
|
||
failures, covering D1–D3 including: `" 12 + 3 "`, `"3.5*(1-2)"`, and that `"1 @ 2"` raises
|
||
`LexError`.
|
||
|
||
## Verify (cold)
|
||
|
||
```bash
|
||
python -m unittest -q # D4
|
||
python -c "from calc.lexer import tokenize; print([(t.kind,t.value) for t in tokenize('3.5*(1-2)')])"
|
||
python -c "from calc.lexer import tokenize; tokenize('1 @ 2')" # must raise LexError
|
||
```
|
||
|
||
The Builder restates the exact commands + expected token lists + commit sha in
|
||
`machine-docs/STATUS-lex.md`; the Adversary re-runs from its own clone and records
|
||
`lex/Dn: PASS|FAIL` in `machine-docs/REVIEW-lex.md`.
|