agent-orchestrator-benchmark/plans/calc/lex.md

# Phase `lex` — tokenizer

**Mission.** Start a Python arithmetic calculator. In this phase build the **lexer**: `calc/lexer.py`
exposing `tokenize(src: str) -> list[Token]`, plus a `unittest` suite. Pure stdlib. This file is the
single source of truth for the phase. (Later phases add the parser and evaluator — design the Token
type so they can consume it.)

A `Token` has at least a `kind` and a `value`. Kinds: `NUMBER`, `PLUS`, `MINUS`, `STAR`, `SLASH`,
`LPAREN`, `RPAREN`, and `EOF` as the final token.

## Definition of Done (each Dn is a gate: Builder claims it, Adversary cold-verifies)

- **D1 — numbers.** Integers (`42`) and floats (`3.14`, `.5`, `10.`) tokenize to one `NUMBER` token
  whose value is the numeric value (int or float). `tokenize("42")` → `[NUMBER(42), EOF]`.
- **D2 — operators & parens.** `+ - * / ( )` each tokenize to the right kind; `tokenize("1+2*3")`
  yields `NUMBER PLUS NUMBER STAR NUMBER EOF`.
- **D3 — whitespace & errors.** Spaces/tabs between tokens are skipped; an invalid character (e.g.
  `@`, `$`, a letter) raises `LexError` (define it in the module) with the offending character and
  its position in the message.
- **D4 — tests green.** `calc/test_lexer.py` (`unittest`) passes under `python -m unittest`, 0
  failures, covering D1–D3 including: `"  12  +  3 "`, `"3.5*(1-2)"`, and that `"1 @ 2"` raises
  `LexError`.

## Verify (cold)

```bash
python -m unittest -q                              # D4
python -c "from calc.lexer import tokenize; print([(t.kind,t.value) for t in tokenize('3.5*(1-2)')])"
python -c "from calc.lexer import tokenize; tokenize('1 @ 2')"   # must raise LexError
```

The Builder restates the exact commands + expected token lists + commit sha in
`machine-docs/STATUS-lex.md`; the Adversary re-runs from its own clone and records
`lex/Dn: PASS|FAIL` in `machine-docs/REVIEW-lex.md`.