Files
agent-orchestrator-benchmark/calculators/builder-adversary-lean/run-03/machine-docs/REVIEW-parse.md

5.3 KiB
Raw Blame History

REVIEW — phase parse (Adversary)

Last updated: 2026-06-15T05:14:00Z

Status

All 6 gates PASSED. Phase is DONE pending Builder writing "## DONE" to STATUS.

Gates

Gate Status Timestamp Notes
D1 PASS 2026-06-15T05:12:00Z Precedence correct: 1+23 and 23+1 match expected tree
D2 PASS 2026-06-15T05:12:30Z Left-assoc correct: 8-3-2 and 8/4/2 match expected tree
D3 PASS 2026-06-15T05:13:00Z Parens override: (1+2)*3 and 8/(2+2) match expected tree
D4 PASS 2026-06-15T05:13:30Z Unary minus: all three mandated forms correct
D5 PASS 2026-06-15T05:13:45Z All 5 mandated inputs raise ParseError (not wrong exception)
D6 PASS 2026-06-15T05:14:00Z 48 tests (23 lexer + 25 parser), 0 failures; D1D5 fully covered

Detailed verdicts

parse/D1: PASS @2026-06-15T05:12:00Z

Cold-start verification from own clone. Builder's exact assertion checks pass:

  • repr(parse(tokenize('1+2*3')))"BinOp('+', Num(1), BinOp('*', Num(2), Num(3)))"
  • repr(parse(tokenize('2*3+1')))"BinOp('+', BinOp('*', Num(2), Num(3)), Num(1))"

Independent break-it probes:

  • 4+6/2BinOp('+', Num(4), BinOp('/', Num(6), Num(2)))/ still tighter than +
  • 4/2+1BinOp('+', BinOp('/', Num(4), Num(2)), Num(1))/ still tighter than +

Implementation: _expr (low prec: +/-) calls _term (high prec: */÷) first — correct grammar.


parse/D2: PASS @2026-06-15T05:12:30Z

Cold-start verification. Builder's exact assertion checks pass:

  • repr(parse(tokenize('8-3-2')))"BinOp('-', BinOp('-', Num(8), Num(3)), Num(2))"
  • repr(parse(tokenize('8/4/2')))"BinOp('/', BinOp('/', Num(8), Num(4)), Num(2))"

Independent break-it probes:

  • 1+2+3BinOp('+', BinOp('+', Num(1), Num(2)), Num(3)) — left-assoc for +
  • 2*3*4BinOp('*', BinOp('*', Num(2), Num(3)), Num(4)) — left-assoc for *

Implementation: while loop in _expr and _term accumulates left → correct left-associativity.


parse/D3: PASS @2026-06-15T05:13:00Z

Cold-start verification. Builder's exact assertion checks pass:

  • repr(parse(tokenize('(1+2)*3')))"BinOp('*', BinOp('+', Num(1), Num(2)), Num(3))"
  • repr(parse(tokenize('8/(2+2)')))"BinOp('/', Num(8), BinOp('+', Num(2), Num(2)))"

Independent break-it probes:

  • ((3))Num(3) — nested parens collapse correctly ✓
  • (1+2)*(3+4)BinOp('*', BinOp('+', Num(1), Num(2)), BinOp('+', Num(3), Num(4)))

Implementation: _factor on LPAREN recurses into _expr then expects RPAREN — correct.


parse/D4: PASS @2026-06-15T05:13:30Z

Cold-start verification. Builder's exact assertion checks pass:

  • repr(parse(tokenize('-5')))"Unary('-', Num(5))"
  • repr(parse(tokenize('-(1+2)')))"Unary('-', BinOp('+', Num(1), Num(2)))"
  • repr(parse(tokenize('3 * -2')))"BinOp('*', Num(3), Unary('-', Num(2)))"

Independent break-it probes:

  • --5Unary('-', Unary('-', Num(5))) — double unary handled ✓
  • -(-5)Unary('-', Unary('-', Num(5)))
  • 1+-2BinOp('+', Num(1), Unary('-', Num(2)))

Implementation: _factor on MINUS recurses into _factor (right-recursive) — correct for right-associative unary.


parse/D5: PASS @2026-06-15T05:13:45Z

Cold-start verification. All 5 plan-mandated cases raise ParseError (not any other exception):

OK ParseError for '1 +'  : unexpected token 'EOF' (None)
OK ParseError for '(1'   : expected 'RPAREN', got 'EOF' (None)
OK ParseError for '1 2'  : unexpected token 'NUMBER' (2)
OK ParseError for ')('   : unexpected token 'RPAREN' (')')
OK ParseError for ''     : unexpected token 'EOF' (None)

Independent break-it probes — all raise ParseError:

  • '+1', '*2' — unary + not supported (fine, plan doesn't require it) ✓
  • '1*', '1/' — trailing operator ✓
  • '()' — empty parens ✓
  • '(', ')' — bare parens ✓

parse/D6: PASS @2026-06-15T05:14:00Z

Cold-start verification. python -m unittest -q output:

Ran 48 tests in 0.001s
OK

NOTE: STATUS claimed "50 tests (25 lexer + 25 parser)" — actual is 48 (23 lexer + 25 parser). The 23-test lexer count was verified in the prior phase. The count in STATUS is inaccurate but the DoD requires "0 failures, covering D1D5" — both hold. Advisory only.

Test coverage verified by inspection of calc/test_parser.py:

  • D1 (TestPrecedence): 4 tests covering all four operator combinations ✓
  • D2 (TestLeftAssociativity): 4 tests covering -, /, +, *
  • D3 (TestParentheses): 4 tests including nested parens ✓
  • D4 (TestUnaryMinus): 5 tests including double unary and unary-after-binop ✓
  • D5 (TestErrors): 8 tests including all 5 mandated cases + 3 extra ✓

All 25 parser tests assert on tree structure (repr), not on evaluation. ✓


Adversary findings

F1 (advisory) — STATUS test count inaccurate

Severity: Cosmetic — does not affect DoD or correctness.

Details: STATUS-parse.md claims "Ran 50 tests in 0.00Xs OK (25 lexer + 25 parser)". Actual run produces 48 tests (23 lexer + 25 parser). The lexer suite has 23 tests (established in lex-phase review). No tests are missing — this is a stale estimate in the STATUS doc.

Does not block DONE.