Files
mfowler a0f7652e9e docs(examples): add builder-solo — single builder, no adversary (control)
A single Builder that builds AND self-verifies (same DoD rigor), with NO
independent Adversary and no claim/review handoff. The control for measuring
what the AI adversary costs (its tokens, ~half of a loop-pair run) and buys
(independent cold verification vs self-certification).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 02:34:50 +00:00

28 lines
1.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Builder-solo example — no Adversary (self-verification baseline)
A single **Builder** agent, same task spec as [`../builder-adversary`](../builder-adversary/), but
with **no Adversary**: the Builder builds *and* verifies its own work, then self-certifies `## DONE`.
No `claim(`/`review(` handoff — there's nothing to hand off to.
This is the **control** for the AI-as-adversary design. Comparing it against `builder-adversary` on
the same task answers two things:
- **Cost:** how much of a run's tokens is the independent Adversary? (In the loop-pair runs the
Adversary is ~4553% of the total — this variant removes that.)
- **Quality:** does an independent cold verifier catch things a self-checking builder misses? Self-
certification has an obvious failure mode — the same agent that wrote the bug decides whether it's
a bug. This variant measures what you give up by dropping the second pair of eyes.
The Builder's role prompt keeps the same verification *rigor* (run every DoD check, try to break it,
paste observed output, no self-rubber-stamping) — the only thing removed is the **independent**
adversary. So the comparison is "independent verification vs self-verification," not "verification vs
none."
```bash
python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
```
The `agent-orchestrator-benchmark` repo runs this head-to-head with the other variants on the same
multi-phase task and reports tokens + the efficiency ratios.