Files
agent-orchestrator/examples/builder-adversary-stateless
mfowler c6c7ce8640 change: base stateless + lean on the FULL original prompts (not minimal)
So that "stateless vs builder-adversary" and "lean vs stateless" isolate context
hygiene / review granularity WITHOUT the confound of the minimal prompts' reduced
testing pressure (which we found cuts ~25% of test methods). stateless = orig +
context hygiene; lean = orig + context hygiene + per-gate review. min stays the
pure minimal-prompt variant (isolates verbosity vs orig).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 03:17:47 +00:00
..

Builder/Adversary example — context-lean ("stateless") variant

Same pattern, same AI-as-adversary verification, same gates as ../builder-adversary and ../builder-adversary-min — but the role prompts add a context hygiene discipline so each loop carries and reloads as little conversation as possible. Nothing about what the agents do or how they verify changes; only how much context they drag from turn to turn.

Why

In a long autonomous loop the dominant token cost is cache-read: every turn re-sends the conversation so far (the unchanged prefix is billed as cache-read, ~10% of input price, but it's billed every turn). So cost ≈ context length × turns. The role prose is a rounding error against that. The win is keeping the conversation short and not carrying it where it isn't needed.

This protocol already makes that safe: the durable state is on disk (git + the plan + STATUS/REVIEW/JOURNAL), so the conversation is disposable scratch. These prompts exploit that:

  • Compact at every checkpoint. After each gate is committed (Builder) or each verdict is written (Adversary), run /compact — lossless here, because the agent reloads from git + STATUS/REVIEW.
  • Read diffs, not trees. git diff <last-sha>..HEAD and only the touched files — never re-read the whole repo.
  • Spill bulk to files. Long build/test/verification output goes to a file; read back only the slice you need, instead of dumping it into context.
  • Adversary loads only {plan, STATUS, diff} per gate — full cold AI judgment, tiny footprint.

Config note

Run the loop agents non-resumed (the default in this agents.toml — loop agents don't set resume = true), so each time the watchdog restarts a loop (notably at every phase advance) it starts a fresh session rather than carrying the prior phase's whole conversation forward. The in-phase shrinking is done by /compact per the prompts above.

A natural future engine lever (not yet implemented) would be a watchdog policy that recycles a loop's session after each checkpoint commit (claim/review), giving fresh context per gate rather than per phase — the same idea, enforced by the harness instead of the prompt.

Compared

The agent-orchestrator-benchmark repo runs this variant head-to-head against builder-adversary and builder-adversary-min on the same multi-phase task (all on Sonnet), reporting tokens per loop — to quantify how much the context discipline saves while keeping identical gate outcomes.

python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up     --config agents.toml      # needs `claude` on PATH

Prompt base: these prompts are the full original builder-adversary prompts plus the additions above — NOT the minimal ones — so that comparing this variant to builder-adversary isolates its specific change (context hygiene / review granularity) without the minimal-prompt testing-pressure drop.