Files
agent-orchestrator/examples/builder-adversary-deferred
mfowler 90375f004e docs(examples): add builder-adversary-deferred — verify after a long segment
Coarsest review cadence: the Builder self-certifies the build phases and the
Adversary does ONE comprehensive cold-verification of the whole accumulated build
in a final `review` phase (vs orig per-phase, lean per-gate). Full original
prompts + a DEFERRED REVIEW CADENCE override, so it isolates verification cadence.
Cheapest coordination; the trade-off is the independent check arrives late (late
rework risk + self-certification drift on build phases). README spells it out.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 00:02:44 +00:00
..

Builder/Adversary example — deferred review (verify after a long segment)

The coarsest point on the review-cadence spectrum. Same pattern, same full original prompts as ../builder-adversary — only when the Adversary verifies changes:

variant the Adversary verifies… handshakes (calculator task)
builder-adversary-lean per gate ~12 claim/verify round-trips
builder-adversary (orig) per phase ~3
builder-adversary-deferred once, after the whole build 1

How it works

The Builder self-certifies the build phases (wc, then json) — builds to each phase's DoD, runs its own tests until green, writes ## DONE, and advances without waiting for the Adversary. The Adversary stays out of the build. Only in the final review phase does it do one comprehensive cold-verification of the entire accumulated calculator (plans/review.md): re-run every DoD item from every phase from a fresh clone, plus cross-feature break-it probes, file all findings at once, re-verify after fixes, then PASS. That single pass is the only adversary gate in the run.

The trade-off

  • Cheapest coordination. One handshake instead of 312 — no per-gate/per-phase round-trips, the Builder isn't interrupted mid-build. (The benchmark showed coordination round-trips are a real token cost; deferring to one pass minimises them.)
  • But the independent check arrives late. Two risks the per-gate/per-phase cadences guard against:
    • Late discovery / rework. If the Builder built phase 2 on a wrong assumption from phase 1, an early adversary would have caught it at gate 1; here it surfaces only at the end, after more work was piled on the flaw — potentially a larger, costlier fix.
    • Self-certification drift. The build phases are self-certified, so a bug the Builder rubber-stamps survives until the final review. The comprehensive pass is the only safety net, so it must be thorough.
  • Better at cross-feature bugs. Because it verifies the whole system at once, it's positioned to catch interactions (e.g. --json × every flag) that a per-gate view, looking at one item at a time, can miss.

So deferred trades early, incremental assurance for minimal coordination + one holistic pass. It suits work where features are independent and cheap to fix late; it's risky where early decisions constrain later ones.

python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up     --config agents.toml      # needs `claude` on PATH

Prompt base: the full original builder-adversary prompts + a DEFERRED REVIEW CADENCE override — so comparing this to builder-adversary/lean isolates only the verification cadence.