A single Builder that builds AND self-verifies (same DoD rigor), with NO independent Adversary and no claim/review handoff. The control for measuring what the AI adversary costs (its tokens, ~half of a loop-pair run) and buys (independent cold verification vs self-certification). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Builder-solo example — no Adversary (self-verification baseline)
A single Builder agent, same task spec as ../builder-adversary, but
with no Adversary: the Builder builds and verifies its own work, then self-certifies ## DONE.
No claim(/review( handoff — there's nothing to hand off to.
This is the control for the AI-as-adversary design. Comparing it against builder-adversary on
the same task answers two things:
- Cost: how much of a run's tokens is the independent Adversary? (In the loop-pair runs the Adversary is ~45–53% of the total — this variant removes that.)
- Quality: does an independent cold verifier catch things a self-checking builder misses? Self- certification has an obvious failure mode — the same agent that wrote the bug decides whether it's a bug. This variant measures what you give up by dropping the second pair of eyes.
The Builder's role prompt keeps the same verification rigor (run every DoD check, try to break it, paste observed output, no self-rubber-stamping) — the only thing removed is the independent adversary. So the comparison is "independent verification vs self-verification," not "verification vs none."
python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
The agent-orchestrator-benchmark repo runs this head-to-head with the other variants on the same
multi-phase task and reports tokens + the efficiency ratios.