Files
agent-orchestrator-benchmark/plans/calc/review.md

1.3 KiB

Phase review — comprehensive deferred verification

No new features. The Builder has self-certified lex, parse, and eval and accumulated the whole calculator. The Adversary now does its one comprehensive cold-verification of the entire build — the single adversary gate in the run.

Definition of Done

  • D1 — full cold re-verify. From a FRESH clone, the Adversary re-runs every DoD item from every prior phase (tokenizer, parser AST shape, evaluator + CLI) and confirms each passes.
  • D2 — full suite green. python -m unittest passes, 0 failures.
  • D3 — cross-feature break-it. Hunt interactions across lex→parse→eval that a per-gate view misses: nested unary + parens (-(-(1+2))→3), precedence chains (2+3*4-5/5→13), error propagation lexer→evaluator (1 @ 2, 1/0, (1+ all error cleanly), whitespace + floats + parens together, CLI exit codes for valid vs invalid. File any defect.
  • D4 — findings cleared. Every finding fixed by the Builder + re-verified PASS; no standing ## VETO.

How it works

The Adversary records its comprehensive verdict in machine-docs/REVIEW-review.md (review(all): PASS, or findings with repro). The Builder fixes anything found, then writes ## DONE to machine-docs/STATUS-review.md only after the Adversary's comprehensive PASS.