# Phase `lex` — tokenizer **Mission.** Start a Python arithmetic calculator. In this phase build the **lexer**: `calc/lexer.py` exposing `tokenize(src: str) -> list[Token]`, plus a `unittest` suite. Pure stdlib. This file is the single source of truth for the phase. (Later phases add the parser and evaluator — design the Token type so they can consume it.) A `Token` has at least a `kind` and a `value`. Kinds: `NUMBER`, `PLUS`, `MINUS`, `STAR`, `SLASH`, `LPAREN`, `RPAREN`, and `EOF` as the final token. ## Definition of Done (each Dn is a gate: Builder claims it, Adversary cold-verifies) - **D1 — numbers.** Integers (`42`) and floats (`3.14`, `.5`, `10.`) tokenize to one `NUMBER` token whose value is the numeric value (int or float). `tokenize("42")` → `[NUMBER(42), EOF]`. - **D2 — operators & parens.** `+ - * / ( )` each tokenize to the right kind; `tokenize("1+2*3")` yields `NUMBER PLUS NUMBER STAR NUMBER EOF`. - **D3 — whitespace & errors.** Spaces/tabs between tokens are skipped; an invalid character (e.g. `@`, `$`, a letter) raises `LexError` (define it in the module) with the offending character and its position in the message. - **D4 — tests green.** `calc/test_lexer.py` (`unittest`) passes under `python -m unittest`, 0 failures, covering D1–D3 including: `" 12 + 3 "`, `"3.5*(1-2)"`, and that `"1 @ 2"` raises `LexError`. ## Verify (cold) ```bash python -m unittest -q # D4 python -c "from calc.lexer import tokenize; print([(t.kind,t.value) for t in tokenize('3.5*(1-2)')])" python -c "from calc.lexer import tokenize; tokenize('1 @ 2')" # must raise LexError ``` The Builder restates the exact commands + expected token lists + commit sha in `machine-docs/STATUS-lex.md`; the Adversary re-runs from its own clone and records `lex/Dn: PASS|FAIL` in `machine-docs/REVIEW-lex.md`.