Files
agent-orchestrator/examples/builder-adversary-stateless/README.md
mfowler 985d33dd51 docs(examples): add builder-adversary-stateless — context-lean variant
Same pattern + AI-as-adversary verification as builder-adversary-min, but the
role prompts add CONTEXT HYGIENE: /compact at every checkpoint (lossless — state
is on disk), read diffs not trees, spill bulk output to files, adversary loads
only {plan, STATUS, diff}. Loop agents non-resumed → fresh session per phase.
Targets cache-read (the dominant cost in a long loop) without changing what the
agents do or how they verify.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 20:47:58 +00:00

50 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Builder/Adversary example — context-lean ("stateless") variant
Same pattern, same **AI-as-adversary** verification, same gates as
[`../builder-adversary`](../builder-adversary/) and
[`../builder-adversary-min`](../builder-adversary-min/) — but the role prompts add a **context
hygiene** discipline so each loop carries and reloads as little conversation as possible. Nothing
about *what* the agents do or *how* they verify changes; only how much context they drag from turn to
turn.
## Why
In a long autonomous loop the dominant token cost is **cache-read**: every turn re-sends the
conversation so far (the unchanged prefix is billed as cache-read, ~10% of input price, but it's
billed *every turn*). So cost ≈ context length × turns. The role prose is a rounding error against
that. The win is keeping the conversation short and not carrying it where it isn't needed.
This protocol already makes that safe: the **durable state is on disk** (git + the plan +
STATUS/REVIEW/JOURNAL), so the conversation is disposable scratch. These prompts exploit that:
- **Compact at every checkpoint.** After each gate is committed (Builder) or each verdict is written
(Adversary), run `/compact` — lossless here, because the agent reloads from git + STATUS/REVIEW.
- **Read diffs, not trees.** `git diff <last-sha>..HEAD` and only the touched files — never re-read
the whole repo.
- **Spill bulk to files.** Long build/test/verification output goes to a file; read back only the
slice you need, instead of dumping it into context.
- **Adversary loads only {plan, STATUS, diff}** per gate — full cold AI judgment, tiny footprint.
## Config note
Run the loop agents **non-resumed** (the default in this `agents.toml` — loop agents don't set
`resume = true`), so each time the watchdog restarts a loop (notably at every phase advance) it
starts a *fresh* session rather than carrying the prior phase's whole conversation forward. The
in-phase shrinking is done by `/compact` per the prompts above.
> A natural future engine lever (not yet implemented) would be a watchdog policy that **recycles a
> loop's session after each checkpoint commit** (claim/review), giving fresh context *per gate*
> rather than per phase — the same idea, enforced by the harness instead of the prompt.
## Compared
The **`agent-orchestrator-benchmark`** repo runs this variant head-to-head against
`builder-adversary` and `builder-adversary-min` on the same multi-phase task (all on Sonnet),
reporting tokens per loop — to quantify how much the context discipline saves while keeping identical
gate outcomes.
```bash
python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
```