docs(examples): add builder-adversary-deferred — verify after a long segment
Coarsest review cadence: the Builder self-certifies the build phases and the Adversary does ONE comprehensive cold-verification of the whole accumulated build in a final `review` phase (vs orig per-phase, lean per-gate). Full original prompts + a DEFERRED REVIEW CADENCE override, so it isolates verification cadence. Cheapest coordination; the trade-off is the independent check arrives late (late rework risk + self-certification drift on build phases). README spells it out. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
32
examples/builder-adversary-deferred/plans/json.md
Normal file
32
examples/builder-adversary-deferred/plans/json.md
Normal file
@ -0,0 +1,32 @@
|
||||
# Phase `json` — machine-readable output
|
||||
|
||||
**Mission.** Extend the `wc.py` from the previous phase with a `--json` mode, without regressing any
|
||||
`wc`-phase behaviour. Single source of truth for this phase.
|
||||
|
||||
(The phase config gives the Builder `claude-opus-4-8` for this phase — an example of a per-phase
|
||||
model override; the Adversary stays on the default model.)
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- **D1 — json output.** `python wc.py --json FILE` prints a single JSON object:
|
||||
`{"lines": N, "words": N, "chars": N, "file": "FILE"}` (valid JSON, parseable by `json.loads`).
|
||||
With stdin (no FILE), `"file"` is `null`.
|
||||
- **D2 — composes with flags.** `--json` honours `-l/-w/-c`: only the requested counts appear as keys
|
||||
(plus `file`). E.g. `wc.py --json -l FILE` → `{"lines": N, "file": "FILE"}`.
|
||||
- **D3 — no regression.** Every `wc`-phase gate (D1–D4 there) still passes unchanged.
|
||||
- **D4 — tests green.** `test_wc.py` is extended for the JSON cases and `pytest -q` is all-green.
|
||||
|
||||
## How the Adversary verifies (cold)
|
||||
|
||||
```bash
|
||||
pytest -q # D4 + D3 regression
|
||||
printf 'a b c\nd e\n' > /tmp/f.txt
|
||||
python wc.py --json /tmp/f.txt | python -c 'import sys,json; d=json.load(sys.stdin); \
|
||||
assert d=={"lines":2,"words":5,"chars":10,"file":"/tmp/f.txt"}, d; print("ok")' # D1
|
||||
python wc.py --json -l /tmp/f.txt # D2: expect {"lines": 2, "file": "/tmp/f.txt"}
|
||||
```
|
||||
|
||||
The Builder restates the exact commands, expected JSON, and commit sha in
|
||||
`machine-docs/STATUS-json.md`. When every DoD item has a fresh PASS in `machine-docs/REVIEW-json.md`
|
||||
and there is no `## VETO`, the Builder writes `## DONE` to `STATUS-json.md` — this is the last phase,
|
||||
so the watchdog then fires the one-shot `reporter` (see `agents.toml` `[loop].on_complete`).
|
||||
24
examples/builder-adversary-deferred/plans/review.md
Normal file
24
examples/builder-adversary-deferred/plans/review.md
Normal file
@ -0,0 +1,24 @@
|
||||
# Phase `review` — comprehensive deferred verification
|
||||
|
||||
This phase adds **no new features**. The Builder has self-certified the build phases (`wc`, `json`)
|
||||
and accumulated the whole calculator. Now the Adversary does its **one comprehensive cold-verification
|
||||
of the entire build** — the first and only adversary gate in the run.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- **D1 — full cold re-verify.** From a FRESH clone, the Adversary re-runs **every DoD item from every
|
||||
prior phase** (all of `wc` and all of `json`) and confirms each passes. Nothing is taken on the
|
||||
Builder's word.
|
||||
- **D2 — full suite green.** The complete test suite (`python -m unittest`) passes, 0 failures.
|
||||
- **D3 — cross-feature break-it.** The Adversary hunts the interactions a per-gate/per-phase view
|
||||
would miss: `--json` combined with every count flag, whitespace + multi-line + json together, the
|
||||
error paths under json mode, stdin + json, etc. — and files any defects it finds.
|
||||
- **D4 — findings cleared.** Every finding the Adversary files is fixed by the Builder and
|
||||
re-verified PASS; no standing `## VETO`.
|
||||
|
||||
## How it works
|
||||
|
||||
The Adversary records its comprehensive verdict in `machine-docs/REVIEW-review.md`
|
||||
(`review(all): PASS`, or findings with repro). The Builder fixes anything found, then writes
|
||||
`## DONE` to `machine-docs/STATUS-review.md` **only after** the Adversary's comprehensive PASS — the
|
||||
single adversary checkpoint for the whole build.
|
||||
43
examples/builder-adversary-deferred/plans/wc.md
Normal file
43
examples/builder-adversary-deferred/plans/wc.md
Normal file
@ -0,0 +1,43 @@
|
||||
# Phase `wc` — a word-count CLI
|
||||
|
||||
**Mission.** Build a small, dependency-free `wc` clone in Python: a script `wc.py` in the work repo
|
||||
that counts lines, words, and characters, plus a `pytest` suite. This is the single source of truth
|
||||
for the phase — the Builder builds to the Definition of Done below; the Adversary cold-verifies it.
|
||||
|
||||
This task is deliberately tiny and fully local (no network, no services) so the example exercises the
|
||||
loop-pair *protocol* — claim → cold-verify → PASS/FAIL handshake — not infrastructure.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
Each Dn is an independent gate. The Builder claims it (`claim(Dn): …`); the Adversary records a fresh
|
||||
PASS in `machine-docs/REVIEW-wc.md` after re-running the check from its own clone.
|
||||
|
||||
- **D1 — default output.** `python wc.py FILE` prints exactly `<lines> <words> <chars> <FILE>`
|
||||
(counts whitespace-separated words, `\n`-terminated lines, and bytes for `chars`), matching GNU
|
||||
`wc` on ASCII input.
|
||||
- **D2 — flags.** `-l`, `-w`, `-c` restrict the output to that single count (e.g. `wc.py -l FILE`
|
||||
prints `<lines> <FILE>`). Flags may combine; output order is lines, words, chars.
|
||||
- **D3 — stdin.** With no FILE argument, `wc.py` reads stdin and prints the counts with no filename.
|
||||
- **D4 — tests green.** A `test_wc.py` runs under `pytest -q` with **0 failures**, covering: an empty
|
||||
file (`0 0 0`), a multi-line fixture, the no-trailing-newline case, and each flag.
|
||||
|
||||
## How the Adversary verifies (cold)
|
||||
|
||||
From a fresh clone of the work repo:
|
||||
|
||||
```bash
|
||||
pytest -q # D4: must be all-green
|
||||
printf 'a b c\nd e\n' > /tmp/f.txt
|
||||
python wc.py /tmp/f.txt # D1: expect "2 5 10 /tmp/f.txt"
|
||||
python wc.py -l /tmp/f.txt # D2: expect "2 /tmp/f.txt"
|
||||
printf 'a b c\nd e\n' | python wc.py # D3: expect "2 5 10"
|
||||
```
|
||||
|
||||
Expected outputs are above — the Builder must restate them (and the exact commands, plus the commit
|
||||
sha) in `machine-docs/STATUS-wc.md` so the Adversary can re-run without reading the Builder's
|
||||
reasoning. Any mismatch is a FAIL with repro steps in `machine-docs/REVIEW-wc.md`.
|
||||
|
||||
## Out of scope (defer to a later phase or DEFERRED.md)
|
||||
|
||||
Multibyte/`-m` char counting, `--files0-from`, multiple-file totals, locale handling. JSON output is
|
||||
the next phase (`plans/json.md`).
|
||||
Reference in New Issue
Block a user