docs(examples): add builder-adversary-deferred — verify after a long segment

Coarsest review cadence: the Builder self-certifies the build phases and the
Adversary does ONE comprehensive cold-verification of the whole accumulated build
in a final `review` phase (vs orig per-phase, lean per-gate). Full original
prompts + a DEFERRED REVIEW CADENCE override, so it isolates verification cadence.
Cheapest coordination; the trade-off is the independent check arrives late (late
rework risk + self-certification drift on build phases). README spells it out.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-16 00:02:44 +00:00
parent c6c7ce8640
commit 90375f004e
9 changed files with 300 additions and 0 deletions

View File

@ -0,0 +1,48 @@
# Builder/Adversary example — deferred review (verify after a long segment)
The coarsest point on the **review-cadence spectrum**. Same pattern, same full original prompts as
`../builder-adversary` — only *when* the Adversary verifies changes:
| variant | the Adversary verifies… | handshakes (calculator task) |
|---|---|--:|
| `builder-adversary-lean` | per **gate** | ~12 claim/verify round-trips |
| `builder-adversary` (orig) | per **phase** | ~3 |
| **`builder-adversary-deferred`** | **once, after the whole build** | **1** |
## How it works
The Builder **self-certifies** the build phases (`wc`, then `json`) — builds to each phase's DoD, runs
its own tests until green, writes `## DONE`, and advances *without* waiting for the Adversary. The
Adversary stays out of the build. Only in the final **`review` phase** does it do **one comprehensive
cold-verification of the entire accumulated calculator** (`plans/review.md`): re-run every DoD item
from every phase from a fresh clone, plus cross-feature break-it probes, file all findings at once,
re-verify after fixes, then PASS. That single pass is the only adversary gate in the run.
## The trade-off
- **Cheapest coordination.** One handshake instead of 312 — no per-gate/per-phase round-trips, the
Builder isn't interrupted mid-build. (The benchmark showed coordination round-trips are a real
token cost; deferring to one pass minimises them.)
- **But the independent check arrives late.** Two risks the per-gate/per-phase cadences guard
against:
- **Late discovery / rework.** If the Builder built phase 2 on a wrong assumption from phase 1, an
early adversary would have caught it at gate 1; here it surfaces only at the end, after more work
was piled on the flaw — potentially a larger, costlier fix.
- **Self-certification drift.** The build phases are self-certified, so a bug the Builder
rubber-stamps survives until the final review. The comprehensive pass is the only safety net, so
it must be thorough.
- **Better at cross-feature bugs.** Because it verifies the whole system at once, it's positioned to
catch *interactions* (e.g. `--json` × every flag) that a per-gate view, looking at one item at a
time, can miss.
So `deferred` trades *early, incremental* assurance for *minimal coordination + one holistic pass*.
It suits work where features are independent and cheap to fix late; it's risky where early decisions
constrain later ones.
```bash
python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
```
> **Prompt base:** the full original `builder-adversary` prompts + a DEFERRED REVIEW CADENCE override
> — so comparing this to `builder-adversary`/`lean` isolates *only* the verification cadence.