recipe-maintainers/agent-orchestrator-benchmark

Fork 0

Files

mfowler bb85aa9f11 artifacts: add calculators/ — the 30 built calculators (5/variant) + machine-docs + git logs

2026-06-16 15:39:42 +00:00

3.4 KiB

Raw Blame History

REVIEW — Phase parse (Adversary)

Status

All gates D1–D6 verified PASS @2026-06-15T06:32:30Z.

Verdicts

parse/D1: PASS @2026-06-15T06:32:30Z

Cold re-run of Builder's claimed commands:

BinOp('+', Num(1), BinOp('*', Num(2), Num(3)))   # 1+2*3 — * binds tighter ✓
BinOp('+', BinOp('*', Num(2), Num(3)), Num(4))    # 2*3+4 — * evaluated first ✓

Independent break-it probes:

1+2*3+4 → BinOp('+', BinOp('+', Num(1), BinOp('*', Num(2), Num(3))), Num(4)) ✓
4*5-6/2 → BinOp('-', BinOp('*', Num(4), Num(5)), BinOp('/', Num(6), Num(2))) ✓

Derivation confirmed: _expr loops over +/- consuming _term results; _term loops over *// consuming _unary results — two-level precedence grammar is structurally correct.

parse/D2: PASS @2026-06-15T06:32:30Z

Cold re-run:

BinOp('-', BinOp('-', Num(8), Num(3)), Num(2))    # 8-3-2 — left-assoc ✓
BinOp('/', BinOp('/', Num(8), Num(4)), Num(2))    # 8/4/2 — left-assoc ✓

Independent break-it probe:

1-2+3 → BinOp('+', BinOp('-', Num(1), Num(2)), Num(3)) ✓ (left-assoc across mixed +/-)

While-loop accumulation pattern in _expr and _term correctly produces left-leaning trees.

parse/D3: PASS @2026-06-15T06:32:30Z

Cold re-run:

BinOp('*', BinOp('+', Num(1), Num(2)), Num(3))   # (1+2)*3 — + under * ✓
BinOp('+', Num(2), Num(3))                         # ((2+3)) — parens stripped ✓

Independent break-it probe:

3*(1+2)*4 → BinOp('*', BinOp('*', Num(3), BinOp('+', Num(1), Num(2))), Num(4)) ✓

_primary() correctly enters _expr() recursively inside parens and consumes the closing RPAREN.

parse/D4: PASS @2026-06-15T06:32:30Z

Cold re-run:

Unary('-', Num(5))                                  # -5 ✓
Unary('-', BinOp('+', Num(1), Num(2)))             # -(1+2) ✓
BinOp('*', Num(3), Unary('-', Num(2)))             # 3 * -2 ✓
Unary('-', Unary('-', Num(5)))                      # --5 ✓

Independent break-it probes:

1 - -2 → BinOp('-', Num(1), Unary('-', Num(2))) ✓
-1 + -2 → BinOp('+', Unary('-', Num(1)), Unary('-', Num(2))) ✓
-(-(3)) → Unary('-', Unary('-', Num(3))) ✓

_unary() is right-recursive (calls itself), so multiple leading negations stack correctly.

parse/D5: PASS @2026-06-15T06:32:30Z

Cold re-run — all 5 plan-required error cases:

OK ParseError: '1 +' -> unexpected end of expression
OK ParseError: '(1'  -> expected ')' but got 'EOF' ('')
OK ParseError: '1 2' -> unexpected token 'NUMBER' (2)
OK ParseError: ')('  -> unexpected token 'RPAREN' (')')
OK ParseError: ''    -> empty expression

All raise ParseError (not bare ValueError, IndexError, etc.) ✓

Independent extra probes:

'*' → ParseError: unexpected token 'STAR' ('*') ✓
'((1)' → ParseError: expected ')' but got 'EOF' ('') ✓
'1++2' → ParseError: unexpected token 'PLUS' ('+') ✓
'+5' → ParseError: unexpected token 'PLUS' ('+') ✓

parse/D6: PASS @2026-06-15T06:32:30Z

Cold python -m unittest -q run:

Ran 35 tests in 0.001s

OK

Test file inspection confirms:

All 20 parser tests use assertEqual on dataclass instances (e.g. BinOp('+', Num(1), BinOp('*', Num(2), Num(3)))), not on evaluated numeric results ✓
Coverage: D1 (4 tests), D2 (4 tests), D3 (3 tests), D4 (4 tests), D5 (5 tests) ✓

Adversary findings

No findings. All DoD gates pass with no defects detected.

3.4 KiB Raw Blame History Unescape Escape

REVIEW — Phase parse (Adversary)

Status

Verdicts

parse/D1: PASS @2026-06-15T06:32:30Z

parse/D2: PASS @2026-06-15T06:32:30Z

parse/D3: PASS @2026-06-15T06:32:30Z

parse/D4: PASS @2026-06-15T06:32:30Z

parse/D5: PASS @2026-06-15T06:32:30Z

parse/D6: PASS @2026-06-15T06:32:30Z

Adversary findings

3.4 KiB

Raw Blame History