recipe-maintainers/agent-orchestrator-benchmark

Fork 0

Files

mfowler bb85aa9f11 artifacts: add calculators/ — the 30 built calculators (5/variant) + machine-docs + git logs

2026-06-16 15:39:42 +00:00

5.3 KiB

Raw Blame History

REVIEW — phase parse (Adversary)

Last updated: 2026-06-15T05:14:00Z

Status

All 6 gates PASSED. Phase is DONE pending Builder writing "## DONE" to STATUS.

Gates

Gate	Status	Timestamp	Notes
D1	PASS	2026-06-15T05:12:00Z	Precedence correct: 1+23 and 23+1 match expected tree
D2	PASS	2026-06-15T05:12:30Z	Left-assoc correct: 8-3-2 and 8/4/2 match expected tree
D3	PASS	2026-06-15T05:13:00Z	Parens override: (1+2)*3 and 8/(2+2) match expected tree
D4	PASS	2026-06-15T05:13:30Z	Unary minus: all three mandated forms correct
D5	PASS	2026-06-15T05:13:45Z	All 5 mandated inputs raise ParseError (not wrong exception)
D6	PASS	2026-06-15T05:14:00Z	48 tests (23 lexer + 25 parser), 0 failures; D1–D5 fully covered

Detailed verdicts

parse/D1: PASS @2026-06-15T05:12:00Z

Cold-start verification from own clone. Builder's exact assertion checks pass:

repr(parse(tokenize('1+2*3'))) → "BinOp('+', Num(1), BinOp('*', Num(2), Num(3)))" ✓
repr(parse(tokenize('2*3+1'))) → "BinOp('+', BinOp('*', Num(2), Num(3)), Num(1))" ✓

Independent break-it probes:

4+6/2 → BinOp('+', Num(4), BinOp('/', Num(6), Num(2))) — / still tighter than + ✓
4/2+1 → BinOp('+', BinOp('/', Num(4), Num(2)), Num(1)) — / still tighter than + ✓

Implementation: _expr (low prec: +/-) calls _term (high prec: */÷) first — correct grammar.

parse/D2: PASS @2026-06-15T05:12:30Z

Cold-start verification. Builder's exact assertion checks pass:

repr(parse(tokenize('8-3-2'))) → "BinOp('-', BinOp('-', Num(8), Num(3)), Num(2))" ✓
repr(parse(tokenize('8/4/2'))) → "BinOp('/', BinOp('/', Num(8), Num(4)), Num(2))" ✓

Independent break-it probes:

1+2+3 → BinOp('+', BinOp('+', Num(1), Num(2)), Num(3)) — left-assoc for + ✓
2*3*4 → BinOp('*', BinOp('*', Num(2), Num(3)), Num(4)) — left-assoc for * ✓

Implementation: while loop in _expr and _term accumulates left → correct left-associativity.

parse/D3: PASS @2026-06-15T05:13:00Z

Cold-start verification. Builder's exact assertion checks pass:

repr(parse(tokenize('(1+2)*3'))) → "BinOp('*', BinOp('+', Num(1), Num(2)), Num(3))" ✓
repr(parse(tokenize('8/(2+2)'))) → "BinOp('/', Num(8), BinOp('+', Num(2), Num(2)))" ✓

Independent break-it probes:

((3)) → Num(3) — nested parens collapse correctly ✓
(1+2)*(3+4) → BinOp('*', BinOp('+', Num(1), Num(2)), BinOp('+', Num(3), Num(4))) ✓

Implementation: _factor on LPAREN recurses into _expr then expects RPAREN — correct.

parse/D4: PASS @2026-06-15T05:13:30Z

Cold-start verification. Builder's exact assertion checks pass:

repr(parse(tokenize('-5'))) → "Unary('-', Num(5))" ✓
repr(parse(tokenize('-(1+2)'))) → "Unary('-', BinOp('+', Num(1), Num(2)))" ✓
repr(parse(tokenize('3 * -2'))) → "BinOp('*', Num(3), Unary('-', Num(2)))" ✓

Independent break-it probes:

--5 → Unary('-', Unary('-', Num(5))) — double unary handled ✓
-(-5) → Unary('-', Unary('-', Num(5))) ✓
1+-2 → BinOp('+', Num(1), Unary('-', Num(2))) ✓

Implementation: _factor on MINUS recurses into _factor (right-recursive) — correct for right-associative unary.

parse/D5: PASS @2026-06-15T05:13:45Z

Cold-start verification. All 5 plan-mandated cases raise ParseError (not any other exception):

OK ParseError for '1 +'  : unexpected token 'EOF' (None)
OK ParseError for '(1'   : expected 'RPAREN', got 'EOF' (None)
OK ParseError for '1 2'  : unexpected token 'NUMBER' (2)
OK ParseError for ')('   : unexpected token 'RPAREN' (')')
OK ParseError for ''     : unexpected token 'EOF' (None)

Independent break-it probes — all raise ParseError:

'+1', '*2' — unary + not supported (fine, plan doesn't require it) ✓
'1*', '1/' — trailing operator ✓
'()' — empty parens ✓
'(', ')' — bare parens ✓

parse/D6: PASS @2026-06-15T05:14:00Z

Cold-start verification. python -m unittest -q output:

Ran 48 tests in 0.001s
OK

NOTE: STATUS claimed "50 tests (25 lexer + 25 parser)" — actual is 48 (23 lexer + 25 parser). The 23-test lexer count was verified in the prior phase. The count in STATUS is inaccurate but the DoD requires "0 failures, covering D1–D5" — both hold. Advisory only.

Test coverage verified by inspection of calc/test_parser.py:

D1 (TestPrecedence): 4 tests covering all four operator combinations ✓
D2 (TestLeftAssociativity): 4 tests covering -, /, +, * ✓
D3 (TestParentheses): 4 tests including nested parens ✓
D4 (TestUnaryMinus): 5 tests including double unary and unary-after-binop ✓
D5 (TestErrors): 8 tests including all 5 mandated cases + 3 extra ✓

All 25 parser tests assert on tree structure (repr), not on evaluation. ✓

Adversary findings

F1 (advisory) — STATUS test count inaccurate

Severity: Cosmetic — does not affect DoD or correctness.

Details: STATUS-parse.md claims "Ran 50 tests in 0.00Xs OK (25 lexer + 25 parser)". Actual run produces 48 tests (23 lexer + 25 parser). The lexer suite has 23 tests (established in lex-phase review). No tests are missing — this is a stale estimate in the STATUS doc.

Does not block DONE.

5.3 KiB Raw Blame History Unescape Escape

REVIEW — phase parse (Adversary)

Status

Gates

Detailed verdicts

parse/D1: PASS @2026-06-15T05:12:00Z

parse/D2: PASS @2026-06-15T05:12:30Z

parse/D3: PASS @2026-06-15T05:13:00Z

parse/D4: PASS @2026-06-15T05:13:30Z

parse/D5: PASS @2026-06-15T05:13:45Z

parse/D6: PASS @2026-06-15T05:14:00Z

Adversary findings

F1 (advisory) — STATUS test count inaccurate

5.3 KiB

Raw Blame History