Files
agent-orchestrator-benchmark/calculators/builder-adversary-deferred/run-01/machine-docs/STATUS-review.md

4.0 KiB
Raw Blame History

STATUS — Phase review

DONE

All D1D4 gates Adversary-verified PASS @2026-06-16T00:21Z (REVIEW-review.md). No VETO.

Gate: D1-D3 CLAIMED — Adversary comprehensive PASS received

The full calculator accumulation (lex + parse + eval + CLI) is complete and self-certified. The Adversary should cold-verify D1D3 from a fresh clone and record findings in REVIEW-review.md.


D1 — Full cold re-verify

WHAT: From a fresh clone, re-run all DoD items from lex, parse, and eval phases.

HOW:

# Clone fresh and run from work dir
python -m unittest -q

# Lexer DoD: tokenize produces correct token lists
python -c "from calc.lexer import tokenize; print(tokenize('2+3*4'))"
python -c "from calc.lexer import tokenize; print(tokenize('-2'))"
python -c "from calc.lexer import tokenize; print(tokenize('3.14'))"

# Parser DoD: AST shape is correct
python -c "from calc.lexer import tokenize; from calc.parser import parse; import json; ast = parse(tokenize('1+2*3')); print(ast)"

# Evaluator DoD: arithmetic + division + result type
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('2+3*4'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('(2+3)*4'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('8-3-2'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('-2+5'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('2*-3'))))"
python -c "from calc.lexer import tokenize; from calc.parser import parse; from calc.evaluator import evaluate; print(evaluate(parse(tokenize('7/2'))))"

# CLI DoD
python calc.py "2+3*4"
python calc.py "(2+3)*4"
python calc.py "7/2"
python calc.py "4/2"
python calc.py "1/0"; echo "exit:$?"
python calc.py "1 +"; echo "exit:$?"

EXPECTED:

  • python -m unittest -qRan 64 tests in X.XXXs\nOK
  • Tokenizer outputs correct token lists
  • AST shape is BinOp(+, Num(1), BinOp(*, Num(2), Num(3)))
  • 2+3*414, (2+3)*420, 8-3-23, -2+53, 2*-3-6, 7/23.5
  • CLI: 14, 20, 3.5, 2, then error: division by zero + exit:1, error: unexpected token 'EOF' + exit:1

WHERE: calc/lexer.py, calc/parser.py, calc/evaluator.py, calc.py


D2 — Full suite green

WHAT: python -m unittest passes, 0 failures, 64 tests.

HOW:

python -m unittest -q

EXPECTED: Ran 64 tests in X.XXXs\nOK

WHERE: calc/test_lexer.py, calc/test_parser.py, calc/test_evaluator.py


D3 — Cross-feature break-it

WHAT: Specific cross-feature interactions verified.

HOW:

# Nested unary + parens
python calc.py "-(-(1+2))"

# Precedence chain
python calc.py "2+3*4-5/5"

# Error propagation: lexer→evaluator
python calc.py "1 @ 2"; echo "exit:$?"
python calc.py "1/0"; echo "exit:$?"
python calc.py "(1+"; echo "exit:$?"

# Whitespace + floats + parens
python calc.py "  2.5 + ( 3.5 * 2 ) "
python calc.py "(  1 + 2 ) * ( 3 + 4 )"

# CLI exit codes
python calc.py "2+3*4"; echo "exit:$?"
python calc.py "bad input @#"; echo "exit:$?"

EXPECTED:

  • -(-(1+2))3
  • 2+3*4-5/513
  • 1 @ 2 → stderr error: unexpected character '@', exit:1
  • 1/0 → stderr error: division by zero, exit:1
  • (1+ → stderr error: unexpected token 'EOF', exit:1
  • 2.5 + (3.5 * 2)9.5
  • (1+2)*(3+4)21
  • valid input → exit:0; invalid input → exit:1

WHERE: calc/lexer.py, calc/parser.py, calc/evaluator.py, calc.py


Builder self-verification @2026-06-16

All cross-feature tests above run locally and produce the expected outputs. See JOURNAL-review.md for exact output transcript.