docs(examples): add builder-adversary-stateless — context-lean variant

Same pattern + AI-as-adversary verification as builder-adversary-min, but the role prompts add CONTEXT HYGIENE: /compact at every checkpoint (lossless — state is on disk), read diffs not trees, spill bulk output to files, adversary loads only {plan, STATUS, diff}. Loop agents non-resumed → fresh session per phase. Targets cache-read (the dominant cost in a long loop) without changing what the agents do or how they verify. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 20:47:58 +00:00
parent 737ef81066
commit 985d33dd51
8 changed files with 255 additions and 0 deletions
--- a/examples/builder-adversary-stateless/README.md
+++ b/examples/builder-adversary-stateless/README.md
@ -0,0 +1,49 @@
+# Builder/Adversary example — context-lean ("stateless") variant
+
+Same pattern, same **AI-as-adversary** verification, same gates as
+[`../builder-adversary`](../builder-adversary/) and
+[`../builder-adversary-min`](../builder-adversary-min/) — but the role prompts add a **context
+hygiene** discipline so each loop carries and reloads as little conversation as possible. Nothing
+about *what* the agents do or *how* they verify changes; only how much context they drag from turn to
+turn.
+
+## Why
+
+In a long autonomous loop the dominant token cost is **cache-read**: every turn re-sends the
+conversation so far (the unchanged prefix is billed as cache-read, ~10% of input price, but it's
+billed *every turn*). So cost ≈ context length × turns. The role prose is a rounding error against
+that. The win is keeping the conversation short and not carrying it where it isn't needed.
+
+This protocol already makes that safe: the **durable state is on disk** (git + the plan +
+STATUS/REVIEW/JOURNAL), so the conversation is disposable scratch. These prompts exploit that:
+
+- **Compact at every checkpoint.** After each gate is committed (Builder) or each verdict is written
+  (Adversary), run `/compact` — lossless here, because the agent reloads from git + STATUS/REVIEW.
+- **Read diffs, not trees.** `git diff <last-sha>..HEAD` and only the touched files — never re-read
+  the whole repo.
+- **Spill bulk to files.** Long build/test/verification output goes to a file; read back only the
+  slice you need, instead of dumping it into context.
+- **Adversary loads only {plan, STATUS, diff}** per gate — full cold AI judgment, tiny footprint.
+
+## Config note
+
+Run the loop agents **non-resumed** (the default in this `agents.toml` — loop agents don't set
+`resume = true`), so each time the watchdog restarts a loop (notably at every phase advance) it
+starts a *fresh* session rather than carrying the prior phase's whole conversation forward. The
+in-phase shrinking is done by `/compact` per the prompts above.
+
+> A natural future engine lever (not yet implemented) would be a watchdog policy that **recycles a
+> loop's session after each checkpoint commit** (claim/review), giving fresh context *per gate*
+> rather than per phase — the same idea, enforced by the harness instead of the prompt.
+
+## Compared
+
+The **`agent-orchestrator-benchmark`** repo runs this variant head-to-head against
+`builder-adversary` and `builder-adversary-min` on the same multi-phase task (all on Sonnet),
+reporting tokens per loop — to quantify how much the context discipline saves while keeping identical
+gate outcomes.
+
+```bash
+python3 ../../agents.py status --config agents.toml
+python3 ../../agents.py up     --config agents.toml      # needs `claude` on PATH
+```