Files
mfowler 90375f004e docs(examples): add builder-adversary-deferred — verify after a long segment
Coarsest review cadence: the Builder self-certifies the build phases and the
Adversary does ONE comprehensive cold-verification of the whole accumulated build
in a final `review` phase (vs orig per-phase, lean per-gate). Full original
prompts + a DEFERRED REVIEW CADENCE override, so it isolates verification cadence.
Cheapest coordination; the trade-off is the independent check arrives late (late
rework risk + self-certification drift on build phases). README spells it out.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 00:02:44 +00:00

49 lines
2.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Builder/Adversary example — deferred review (verify after a long segment)
The coarsest point on the **review-cadence spectrum**. Same pattern, same full original prompts as
`../builder-adversary` — only *when* the Adversary verifies changes:
| variant | the Adversary verifies… | handshakes (calculator task) |
|---|---|--:|
| `builder-adversary-lean` | per **gate** | ~12 claim/verify round-trips |
| `builder-adversary` (orig) | per **phase** | ~3 |
| **`builder-adversary-deferred`** | **once, after the whole build** | **1** |
## How it works
The Builder **self-certifies** the build phases (`wc`, then `json`) — builds to each phase's DoD, runs
its own tests until green, writes `## DONE`, and advances *without* waiting for the Adversary. The
Adversary stays out of the build. Only in the final **`review` phase** does it do **one comprehensive
cold-verification of the entire accumulated calculator** (`plans/review.md`): re-run every DoD item
from every phase from a fresh clone, plus cross-feature break-it probes, file all findings at once,
re-verify after fixes, then PASS. That single pass is the only adversary gate in the run.
## The trade-off
- **Cheapest coordination.** One handshake instead of 312 — no per-gate/per-phase round-trips, the
Builder isn't interrupted mid-build. (The benchmark showed coordination round-trips are a real
token cost; deferring to one pass minimises them.)
- **But the independent check arrives late.** Two risks the per-gate/per-phase cadences guard
against:
- **Late discovery / rework.** If the Builder built phase 2 on a wrong assumption from phase 1, an
early adversary would have caught it at gate 1; here it surfaces only at the end, after more work
was piled on the flaw — potentially a larger, costlier fix.
- **Self-certification drift.** The build phases are self-certified, so a bug the Builder
rubber-stamps survives until the final review. The comprehensive pass is the only safety net, so
it must be thorough.
- **Better at cross-feature bugs.** Because it verifies the whole system at once, it's positioned to
catch *interactions* (e.g. `--json` × every flag) that a per-gate view, looking at one item at a
time, can miss.
So `deferred` trades *early, incremental* assurance for *minimal coordination + one holistic pass*.
It suits work where features are independent and cheap to fix late; it's risky where early decisions
constrain later ones.
```bash
python3 ../../agents.py status --config agents.toml
python3 ../../agents.py up --config agents.toml # needs `claude` on PATH
```
> **Prompt base:** the full original `builder-adversary` prompts + a DEFERRED REVIEW CADENCE override
> — so comparing this to `builder-adversary`/`lean` isolates *only* the verification cadence.