agent-orchestrator-benchmark/plans/calc/parse.md

# Phase `parse` — recursive-descent parser

**Mission.** Build `calc/parser.py` exposing `parse(tokens) -> Node` (consuming the `lex` phase's
tokens) that produces an **AST** with correct arithmetic precedence and associativity, plus a
`unittest` suite. SSOT for this phase. Do NOT evaluate yet — just build the tree (the `eval` phase
consumes it). Represent nodes however you like (e.g. `Num(value)` and `BinOp(op, left, right)`,
`Unary(op, operand)`), but expose a stable, documented shape the evaluator can walk.

## Definition of Done (each Dn is a gate)

- **D1 — precedence.** `*` and `/` bind tighter than `+` and `-`: `1+2*3` parses as `1+(2*3)`, not
  `(1+2)*3`.
- **D2 — left associativity.** Same-precedence operators associate left: `8-3-2` parses as
  `(8-3)-2`; `8/4/2` as `(8/4)/2`.
- **D3 — parentheses.** Parens override precedence: `(1+2)*3` parses with the `+` under the `*`.
- **D4 — unary minus.** Leading and nested unary minus parses: `-5`, `-(1+2)`, `3 * -2`.
- **D5 — errors.** Malformed input raises `ParseError` (define it): `"1 +"`, `"(1"`, `"1 2"`, `")("`,
  and the empty string each raise (not crash with a different exception).
- **D6 — tests green.** `calc/test_parser.py` (`unittest`) passes under `python -m unittest`, 0
  failures, covering D1–D5. Assert on tree structure (e.g. a `repr`/shape helper), not on evaluation.

## Verify (cold)

```bash
python -m unittest -q                              # D6
# D1/D3 differ in structure — the Builder's STATUS gives the exact shape assertion to re-run:
python -c "from calc.lexer import tokenize; from calc.parser import parse; print(parse(tokenize('1+2*3')))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; parse(tokenize('1 +'))"  # ParseError
```

The Builder documents the AST shape + exact assertions in `machine-docs/STATUS-parse.md`; the
Adversary cold-verifies and records `parse/Dn: PASS|FAIL` in `machine-docs/REVIEW-parse.md`. Watch
especially for a precedence/associativity bug that still passes a weak test — re-derive the expected
tree yourself from the plan.