docs(examples): add builder-solo — single builder, no adversary (control)
A single Builder that builds AND self-verifies (same DoD rigor), with NO independent Adversary and no claim/review handoff. The control for measuring what the AI adversary costs (its tokens, ~half of a loop-pair run) and buys (independent cold verification vs self-certification). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
27
examples/builder-solo/README.md
Normal file
27
examples/builder-solo/README.md
Normal file
@ -0,0 +1,27 @@
|
||||
# Builder-solo example — no Adversary (self-verification baseline)
|
||||
|
||||
A single **Builder** agent, same task spec as [`../builder-adversary`](../builder-adversary/), but
|
||||
with **no Adversary**: the Builder builds *and* verifies its own work, then self-certifies `## DONE`.
|
||||
No `claim(`/`review(` handoff — there's nothing to hand off to.
|
||||
|
||||
This is the **control** for the AI-as-adversary design. Comparing it against `builder-adversary` on
|
||||
the same task answers two things:
|
||||
|
||||
- **Cost:** how much of a run's tokens is the independent Adversary? (In the loop-pair runs the
|
||||
Adversary is ~45–53% of the total — this variant removes that.)
|
||||
- **Quality:** does an independent cold verifier catch things a self-checking builder misses? Self-
|
||||
certification has an obvious failure mode — the same agent that wrote the bug decides whether it's
|
||||
a bug. This variant measures what you give up by dropping the second pair of eyes.
|
||||
|
||||
The Builder's role prompt keeps the same verification *rigor* (run every DoD check, try to break it,
|
||||
paste observed output, no self-rubber-stamping) — the only thing removed is the **independent**
|
||||
adversary. So the comparison is "independent verification vs self-verification," not "verification vs
|
||||
none."
|
||||
|
||||
```bash
|
||||
python3 ../../agents.py status --config agents.toml
|
||||
python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
|
||||
```
|
||||
|
||||
The `agent-orchestrator-benchmark` repo runs this head-to-head with the other variants on the same
|
||||
multi-phase task and reports tokens + the efficiency ratios.
|
||||
Reference in New Issue
Block a user