Files
agent-orchestrator/examples/builder-adversary-lean
mfowler c6c7ce8640 change: base stateless + lean on the FULL original prompts (not minimal)
So that "stateless vs builder-adversary" and "lean vs stateless" isolate context
hygiene / review granularity WITHOUT the confound of the minimal prompts' reduced
testing pressure (which we found cuts ~25% of test methods). stateless = orig +
context hygiene; lean = orig + context hygiene + per-gate review. min stays the
pure minimal-prompt variant (isolates verbosity vs orig).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 03:17:47 +00:00
..

Builder/Adversary example — context-lean + full per-gate review

The builder-adversary-stateless variant added context hygiene (compact at each checkpoint, read diffs not trees, lean loads) and, in benchmarking, happened to also do fewer review rounds — so its token saving was partly leaner context and partly less scrutiny. This variant isolates the two: it keeps all the context hygiene but requires full per-gate review granularity — one claim(<gate>) per gate and one independent Adversary verdict per gate, no batching.

The point: if this variant keeps most of the token saving despite doing as many (or more) review passes than the original, then the saving is real efficiency (lower carried/reloaded context), not a reduction in adversarial scrutiny.

So vs the others:

variant context hygiene review granularity
builder-adversary no as the agents choose
builder-adversary-min no as the agents choose
builder-adversary-stateless yes as the agents choose (tended to batch → fewer rounds)
builder-adversary-lean yes per-gate, enforced (no batching)

Everything else — pattern, AI-as-adversary cold verification, the claim(/review( handoff, machine-docs/ coordination — is identical. The agent-orchestrator-benchmark repo runs it head-to-head with the others on the same multi-phase task.

python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up     --config agents.toml      # needs `claude` on PATH

Prompt base: these prompts are the full original builder-adversary prompts plus the additions above — NOT the minimal ones — so that comparing this variant to builder-adversary isolates its specific change (context hygiene / review granularity) without the minimal-prompt testing-pressure drop.