From 7f237a522c73a6511341524f8d55c7b5c858585f Mon Sep 17 00:00:00 2001 From: mfowler Date: Sun, 14 Jun 2026 17:39:34 +0000 Subject: [PATCH] docs(examples): add a Builder/Adversary loop-pair example (the cc-ci pattern) A self-contained examples/builder-adversary/ that distills the cc-ci production loop pair into a tiny, fully-local task (build a `wc` CLI in two phases): - agents.toml: builder + adversary loops, persistent orchestrator, on_complete reporter, cleanlogs service; phase machine with a per-phase model override - prompts/: kickoff template + builder/adversary roles carrying the load-bearing protocol (claim()/review() handoff, machine-docs file-location rule, WHAT+HOW+EXPECTED+WHERE=STATUS / WHY=JOURNAL anti-anchoring, WAITING-UNTIL liveness) - plans/: two phase plans (wc, json) each with a cold-verifiable Definition of Done - README: how to run, the work-repo two-clone isolation model, how to adapt Verified: `agents.py status --config agents.toml` parses and lists all agents. Co-Authored-By: Claude Opus 4.8 --- examples/builder-adversary/README.md | 85 ++++++++++++ examples/builder-adversary/agents.toml | 125 ++++++++++++++++++ .../builder-adversary/machine-docs/.gitkeep | 3 + examples/builder-adversary/plans/json.md | 32 +++++ examples/builder-adversary/plans/wc.md | 43 ++++++ .../builder-adversary/prompts/adversary.md | 27 ++++ examples/builder-adversary/prompts/builder.md | 31 +++++ examples/builder-adversary/prompts/kickoff.md | 8 ++ 8 files changed, 354 insertions(+) create mode 100644 examples/builder-adversary/README.md create mode 100644 examples/builder-adversary/agents.toml create mode 100644 examples/builder-adversary/machine-docs/.gitkeep create mode 100644 examples/builder-adversary/plans/json.md create mode 100644 examples/builder-adversary/plans/wc.md create mode 100644 examples/builder-adversary/prompts/adversary.md create mode 100644 examples/builder-adversary/prompts/builder.md create mode 100644 examples/builder-adversary/prompts/kickoff.md diff --git a/examples/builder-adversary/README.md b/examples/builder-adversary/README.md new file mode 100644 index 0000000..0398e52 --- /dev/null +++ b/examples/builder-adversary/README.md @@ -0,0 +1,85 @@ +# Builder/Adversary example + +A complete, self-contained instance of the **Builder/Adversary loop pair** — the pattern +[cc-ci](https://git.autonomic.zone) runs in production, distilled to a tiny, fully-local task so you +can read it end-to-end and run it without any infrastructure. + +Two AI loops work the same plan but never trust each other; they coordinate **only through a git +repo**: + +- **Builder** (`prompts/builder.md`) — builds to the phase plan's Definition of Done, and *claims* + each gate with a `claim(...)`-prefixed commit when it believes a DoD item is met. +- **Adversary** (`prompts/adversary.md`) — *disbelieves* the Builder, cold-verifies every claim from + its **own clone**, and records PASS/FAIL with a `review(...)`-prefixed commit. Holds veto. +- **Orchestrator** (persistent) supervises; **Reporter** (one-shot) writes a summary when the phase + sequence finishes. + +The watchdog keeps the loops alive, paces them, and turns those commit prefixes into the handoff: +a `claim(` commit pings the Adversary, a `review(` commit pings the Builder. + +## Files + +``` +agents.toml the whole project: backends, the 4 agents + a service, the phase machine +prompts/ + kickoff.md per-phase preamble (slots {phase_id}/{plan}/{status}/{role}) + builder.md Builder role + loop protocol + adversary.md Adversary role + anti-anchoring verification discipline +plans/ + wc.md phase 1 — build a `wc` CLI (the single source of truth for that phase) + json.md phase 2 — add `--json` (shows a per-phase model override) +machine-docs/ where the loops write STATUS / REVIEW / BACKLOG / JOURNAL at runtime +``` + +## The task + +Build a small `wc` clone (`wc.py` + a `pytest` suite) in the **work repo**, in two phases. It is +deliberately trivial and offline — the point is to exercise the *protocol* (claim → cold-verify → +PASS/FAIL → advance), not to build anything hard. See `plans/wc.md` and `plans/json.md` for the +Definitions of Done. + +## Run it + +Needs `claude` on `PATH` (the loops are real agents). From this directory: + +```bash +python3 ../../agents.py status --config agents.toml # read-only: what would run +python3 ../../agents.py up --config agents.toml # start builder + adversary + orchestrator + watchdog +python3 ../../agents.py logs builder --config agents.toml +python3 ../../agents.py phase show --config agents.toml +python3 ../../agents.py down --config agents.toml # stop everything +``` + +To watch the **mechanics** without an agent CLI, set `defaults.backend = "demo"` in `agents.toml` +(the demo backend just idles) and run `up` / `status` / `down` — sessions start and the watchdog +ticks, but no real work happens. The repo's top-level `./smoke.sh` shows this end-to-end for the +sibling `agents.example.toml`. + +## The work repo (and isolation) + +The loops build in a **work repo** — `handoff.repo` in `agents.toml`, here `./work`. For this +quick start both loops can share it, but the pattern's real strength is **cold verification**: give +each loop its **own clone of the same remote** so the Adversary verifies from a genuinely +independent checkout (exactly what cc-ci does with separate `cc-ci` / `cc-ci-adv` clones). + +To set that up: + +1. Create the work repo with a remote both loops can push/pull (any git host, or a bare repo on the + same box). Put `machine-docs/` in it. +2. Clone it twice: into `./work` (Builder's `dir`) and `./work-adv` (Adversary's `dir`). +3. Point `handoff.repo` at the Builder's clone (`./work`). + +The watchdog then watches that repo's `origin/main` for `claim(`/`review(` commits and the two +`*-INBOX.md` files, and pings the right loop on each. + +## How to adapt it + +- **Different task** → rewrite `plans/*.md` (each is one phase's source of truth + DoD) and adjust + the `[loop].phases` list. Nothing else needs to change. +- **More/fewer phases** → add or remove entries in `[loop].phases`; the watchdog advances when a + phase's `status` file contains `## DONE`. +- **Per-phase models** → `models = { builder = "...", adversary = "..." }` on a phase (see `json`). +- **A periodic supervisor nudge** → uncomment the `wake = { ... }` line on the `orchestrator` agent. + +This example carries **no** project-orchestrator/fleet metadata — like any project, it can be run by +hand and has no idea a fleet exists. See the repo root `README.md` for the full harness reference. diff --git a/examples/builder-adversary/agents.toml b/examples/builder-adversary/agents.toml new file mode 100644 index 0000000..5794fd1 --- /dev/null +++ b/examples/builder-adversary/agents.toml @@ -0,0 +1,125 @@ +# examples/builder-adversary — a Builder/Adversary loop pair (the cc-ci pattern, generic). +# +# Two independent agent loops that coordinate ONLY through a git repo: +# • Builder — does the work, claims each gate when it believes a Definition-of-Done item is met. +# • Adversary — DISBELIEVES the Builder; cold-verifies every claim from its own clone, PASS/FAIL. +# A persistent Orchestrator supervises; a one-shot Reporter runs on completion. The watchdog keeps +# them alive, paced, and signals the handoff (claim(…) → ping Adversary, review(…) → ping Builder). +# +# This is the same shape cc-ci runs in production, stripped to a small self-contained task: build a +# `wc` CLI (see plans/). Nothing here is project-orchestrator/fleet aware — it is a plain project. +# +# Run it by hand (status starts nothing): +# python3 ../../agents.py status --config agents.toml +# python3 ../../agents.py up --config agents.toml # needs `claude` on PATH +# python3 ../../agents.py down --config agents.toml +# To exercise the mechanics with no agent CLI, set defaults.backend = "demo" (idles, no real work). + +# ─────────────────────────── global watchdog cadence ─────────────────────────── +[watchdog] +signal_interval = 30 # s between handoff / stall / limit checks (light) +heavy_interval = 300 # s between heal / phase-advance checks +limit_probe_fallback = 300 # flat probe cadence when a reset time can't be parsed +limit_reset_slack = 45 # s past a parsed reset before probing +stall_grace = 180 # s of slack past a WAITING-UNTIL marker before a stall reboot + +# ─────────────────────────── defaults inherited by every agent ─────────────────────────── +[defaults] +session_prefix = "ba-" # REQUIRED — tmux namespace (sessions: ba-builder, ba-adv, …) +log_dir = ".ao-state" # REQUIRED — logs + state/, resolved relative to this file +backend = "claude" # set to "demo" for a dependency-free mechanics-only run +model = "claude-sonnet-4-6" +watch = "heal" # none | heal | heal+stall + +# ─────────────────────────── backends (declared as data) ─────────────────────────── +[backend.claude] +bin = "claude" +flags = "--dangerously-skip-permissions" +remote_control = true +supports_resume = true +prompt_delivery = "arg" # full prompt passed as a CLI argument +process_name = "claude" # enables backend-mismatch healing +submit_key = "Enter" +stall_idle = 300 +active_re = "esc to interrupt|Running tool|⠇|⠙|· \\d+" +limit_re = "spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)" +fatal_re = "redacted_thinking|blocks cannot be modified|cannot be modified" + +[backend.demo] # dependency-free: a shell that just idles (no real work) +bin = "echo '[demo] {session} up (kickoff: {kickoff})'; exec sleep 1000000" +prompt_delivery = "exec" + +# ─────────────────────────── agents ─────────────────────────── +# The loop pair is the star. The work repo (handoff.repo, below) is what they build in; for TRUE +# cold-verification give each loop its OWN clone of that repo (see README "Isolation"). Here both +# default to ./work for a single-host quick start. + +[[agent]] +name = "builder" # tmux session: ba-builder +kind = "loop" # kickoff = prompts/kickoff.md (per phase) + prompts/builder.md +role = "builder" +dir = "./work" # the Builder's working clone of the work repo +watch = "heal+stall" # restart if dead/wedged AND if idle past stall_idle (respects WAITING-UNTIL) + +[[agent]] +name = "adversary" +session = "ba-adv" # abbreviated session name (handy in logs / remote-control) +kind = "loop" +role = "adversary" +dir = "./work-adv" # the Adversary's SEPARATE clone — it verifies from a cold start +watch = "heal+stall" + +[[agent]] +name = "orchestrator" # tmux session: ba-orchestrator +kind = "persistent" +model = "claude-opus-4-8" +resume = true # claude --resume +watch = "heal" # keep it alive/healed; never stall-reboot a persistent supervisor +prompt = """ +You supervise this Builder/Adversary project. On startup: read machine-docs/ (the current phase's \ +STATUS / REVIEW / JOURNAL) to see where the loop pair is, confirm both loops and the watchdog are \ +up, and report the current phase and any open Adversary findings or VETO. Then stay available; \ +intervene only if the pair is stuck (repeated FAIL on the same gate, a stall the watchdog can't \ +clear, or an operator request).""" +# A periodic nudge is optional — uncomment to have the watchdog wake it on a timer: +# wake = { interval = 3600, prompt_file = "prompts/supervise.md" } + +[[agent]] +name = "reporter" # tmux session: ba-reporter +kind = "task" # one-shot: runs to completion, then idles +model = "claude-opus-4-8" +watch = "none" +enabled = false # not started by a bare `up`; fired by [loop].on_complete below +prompt = """ +The phase sequence is complete. Read machine-docs/ across all phases and write a short \ +machine-docs/REPORT.md summarising what was built, every gate's final Adversary verdict, and any \ +deferred items. Then go idle.""" + +# Non-AI helper service (tail + render the loop transcripts). Started by `up`, killed by `down`. +[[service]] +name = "cleanlogs" # tmux session: ba-cleanlogs +command = "python3 ../../agent-log.py follow-all" +dir = "." + +# ─────────────────────────── the phase machine (kind="loop" agents) ─────────────────────────── +[loop] +state_file = "phase-idx" # under /state/ +resume_phase = true # keep the current index across restarts (don't reset to 0) +auto_advance = true # advance when the phase's status file shows the done_marker +done_marker = "## DONE" +kickoff_template = "prompts/kickoff.md" # phase preamble; slots {phase_id}/{plan}/{status}/{role} +roles_dir = "prompts" # role prompt = prompts/.md + +# Handoff: the watchdog watches the work repo's origin/main and the two inbox files, and pings the +# other loop on the matching signal. claim(…) commits → ping Adversary; review(…) → ping Builder. +handoff = { repo = "./work", claim_pings = "adversary", review_pings = "builder", inboxes = ["ADVERSARY-INBOX.md", "BUILDER-INBOX.md"], claim_pattern = "^claim", review_pattern = "^review", state_subdir = "machine-docs" } + +# When the last phase completes, fire the one-shot reporter (its trigger file under ). +on_complete = { trigger_file = ".run-report-on-complete", run = "reporter" } + +# Phase sequence. Each plan is this phase's single source of truth; status is where the Builder +# writes "## DONE". The second phase shows a per-phase model override (Builder on opus for it). +phases = [ + { id = "wc", plan = "plans/wc.md", status = "STATUS-wc.md" }, + { id = "json", plan = "plans/json.md", status = "STATUS-json.md", models = { builder = "claude-opus-4-8" } }, +] diff --git a/examples/builder-adversary/machine-docs/.gitkeep b/examples/builder-adversary/machine-docs/.gitkeep new file mode 100644 index 0000000..87b07dc --- /dev/null +++ b/examples/builder-adversary/machine-docs/.gitkeep @@ -0,0 +1,3 @@ +# Coordination / loop-state files live here at runtime (phase-namespaced STATUS / REVIEW / BACKLOG / +# JOURNAL, shared DECISIONS.md, and the ADVERSARY-INBOX.md / BUILDER-INBOX.md side-channels). +# This .gitkeep just ensures the directory exists; the loop pair populates it. See ../README.md. diff --git a/examples/builder-adversary/plans/json.md b/examples/builder-adversary/plans/json.md new file mode 100644 index 0000000..3823ae1 --- /dev/null +++ b/examples/builder-adversary/plans/json.md @@ -0,0 +1,32 @@ +# Phase `json` — machine-readable output + +**Mission.** Extend the `wc.py` from the previous phase with a `--json` mode, without regressing any +`wc`-phase behaviour. Single source of truth for this phase. + +(The phase config gives the Builder `claude-opus-4-8` for this phase — an example of a per-phase +model override; the Adversary stays on the default model.) + +## Definition of Done + +- **D1 — json output.** `python wc.py --json FILE` prints a single JSON object: + `{"lines": N, "words": N, "chars": N, "file": "FILE"}` (valid JSON, parseable by `json.loads`). + With stdin (no FILE), `"file"` is `null`. +- **D2 — composes with flags.** `--json` honours `-l/-w/-c`: only the requested counts appear as keys + (plus `file`). E.g. `wc.py --json -l FILE` → `{"lines": N, "file": "FILE"}`. +- **D3 — no regression.** Every `wc`-phase gate (D1–D4 there) still passes unchanged. +- **D4 — tests green.** `test_wc.py` is extended for the JSON cases and `pytest -q` is all-green. + +## How the Adversary verifies (cold) + +```bash +pytest -q # D4 + D3 regression +printf 'a b c\nd e\n' > /tmp/f.txt +python wc.py --json /tmp/f.txt | python -c 'import sys,json; d=json.load(sys.stdin); \ + assert d=={"lines":2,"words":5,"chars":10,"file":"/tmp/f.txt"}, d; print("ok")' # D1 +python wc.py --json -l /tmp/f.txt # D2: expect {"lines": 2, "file": "/tmp/f.txt"} +``` + +The Builder restates the exact commands, expected JSON, and commit sha in +`machine-docs/STATUS-json.md`. When every DoD item has a fresh PASS in `machine-docs/REVIEW-json.md` +and there is no `## VETO`, the Builder writes `## DONE` to `STATUS-json.md` — this is the last phase, +so the watchdog then fires the one-shot `reporter` (see `agents.toml` `[loop].on_complete`). diff --git a/examples/builder-adversary/plans/wc.md b/examples/builder-adversary/plans/wc.md new file mode 100644 index 0000000..c135c20 --- /dev/null +++ b/examples/builder-adversary/plans/wc.md @@ -0,0 +1,43 @@ +# Phase `wc` — a word-count CLI + +**Mission.** Build a small, dependency-free `wc` clone in Python: a script `wc.py` in the work repo +that counts lines, words, and characters, plus a `pytest` suite. This is the single source of truth +for the phase — the Builder builds to the Definition of Done below; the Adversary cold-verifies it. + +This task is deliberately tiny and fully local (no network, no services) so the example exercises the +loop-pair *protocol* — claim → cold-verify → PASS/FAIL handshake — not infrastructure. + +## Definition of Done + +Each Dn is an independent gate. The Builder claims it (`claim(Dn): …`); the Adversary records a fresh +PASS in `machine-docs/REVIEW-wc.md` after re-running the check from its own clone. + +- **D1 — default output.** `python wc.py FILE` prints exactly ` ` + (counts whitespace-separated words, `\n`-terminated lines, and bytes for `chars`), matching GNU + `wc` on ASCII input. +- **D2 — flags.** `-l`, `-w`, `-c` restrict the output to that single count (e.g. `wc.py -l FILE` + prints ` `). Flags may combine; output order is lines, words, chars. +- **D3 — stdin.** With no FILE argument, `wc.py` reads stdin and prints the counts with no filename. +- **D4 — tests green.** A `test_wc.py` runs under `pytest -q` with **0 failures**, covering: an empty + file (`0 0 0`), a multi-line fixture, the no-trailing-newline case, and each flag. + +## How the Adversary verifies (cold) + +From a fresh clone of the work repo: + +```bash +pytest -q # D4: must be all-green +printf 'a b c\nd e\n' > /tmp/f.txt +python wc.py /tmp/f.txt # D1: expect "2 5 10 /tmp/f.txt" +python wc.py -l /tmp/f.txt # D2: expect "2 /tmp/f.txt" +printf 'a b c\nd e\n' | python wc.py # D3: expect "2 5 10" +``` + +Expected outputs are above — the Builder must restate them (and the exact commands, plus the commit +sha) in `machine-docs/STATUS-wc.md` so the Adversary can re-run without reading the Builder's +reasoning. Any mismatch is a FAIL with repro steps in `machine-docs/REVIEW-wc.md`. + +## Out of scope (defer to a later phase or DEFERRED.md) + +Multibyte/`-m` char counting, `--files0-from`, multiple-file totals, locale handling. JSON output is +the next phase (`plans/json.md`). diff --git a/examples/builder-adversary/prompts/adversary.md b/examples/builder-adversary/prompts/adversary.md new file mode 100644 index 0000000..6722d7a --- /dev/null +++ b/examples/builder-adversary/prompts/adversary.md @@ -0,0 +1,27 @@ +You are the **Adversary** — one of two independent loops. Your job is to **DISBELIEVE the Builder**. You run as a SEPARATE process and coordinate ONLY through the git repo. Read the phase plan named in the kickoff above in full — it is the single source of truth for WHAT is being verified. + +**Self-paced loop.** Invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. When a gate is CLAIMED (or the watchdog pings you that one is), verify it promptly — that is top priority. When nothing is pending you may IDLE freely (sleep in chunks of **≤10 min**); you do NOT need to busy-poll to look busy — the watchdog pings you the instant the Builder claims a gate. Poll ~4 min only while actively watching a CLAIMED gate's run. Keep running independent break-it probes even when no gate is pending. Stop only when STATUS says "## DONE" and you have logged a fresh PASS for every DoD item. + +**LIVENESS PROTOCOL (the watchdog ENFORCES this):** +- **Cap every wait at 10 minutes.** Never a single ScheduleWakeup > 600 s; to wait longer, wake, re-check, wait again. +- **Declare every wait.** Immediately before going idle, your FINAL output line MUST be exactly `WAITING-UNTIL: ` (≤10 min out, matching your ScheduleWakeup; compute with `date -u -d '+10 min' +%FT%TZ`). Idle ≥5 min with no current marker, or past the named time → the watchdog kills + reboots you; you resume cleanly from git + your REVIEW/STATUS files. +- **Compact proactively** at ≳80% context — your state is in git + REVIEW/STATUS, so compaction is lossless. + +**Coordinate ONLY through git:** +- **FILE-LOCATION RULE.** ALL coordination / loop-state files live under `machine-docs/`, NEVER the repo root. If you find one at the root, `git mv` it in. +- **Keep your OWN clone** (the `dir` this agent runs in). You verify from a COLD START in it. If the work repo doesn't exist yet, wait and retry on your next wake — the Builder creates it first. +- `git pull --rebase` before every edit; commit; push; **never `--force`.** +- **COMMIT-PREFIX CONVENTION (load-bearing).** Prefix every commit that records a **verdict or finding** with `review(...)` (e.g. `review(D2): PASS` / `review(D2): FAIL — repro …`). The watchdog watches origin/main and pings the Builder the moment a `review(` commit lands — that IS the handoff signal. (The Builder's gate claims are `claim(...)`.) +- Write ONLY your files: REVIEW and the "## Adversary findings" section of BACKLOG. Everything else (code, STATUS, JOURNAL, "## Build backlog") is read-only to you. +- **INBOX side-channel.** For non-gate messages to the Builder, append `machine-docs/BUILDER-INBOX.md` and push (the watchdog edge-pings the Builder). To receive from the Builder, look for `machine-docs/ADVERSARY-INBOX.md`; process it, then `git rm` it (deletion = "consumed"). Formal verdicts still live in REVIEW. + +**ISOLATION DISCIPLINE (anti-anchoring — critical).** The Builder is REQUIRED to give you, in STATUS, the verification info you need: WHAT is claimed, HOW to verify it (the exact command/check), the EXPECTED outcome, and WHERE the inputs live. **Read STATUS for that — you need all of it.** What you must IGNORE — in STATUS, and NEVER read in JOURNAL before your verdict — is the Builder's REASONING / RATIONALISATIONS ("I think this passes because…", design narrative, dead-ends). Reading those anchors you. Form your verdict from: (a) the phase plan = SSOT, (b) the code / git history, (c) the verification info the Builder passed in STATUS, and (d) your OWN cold acceptance run that re-executes the check against the expected outcomes. Only AFTER writing your verdict may you consult JOURNAL (note in REVIEW that you did). Trust observable behaviour, the plan, and your own re-run — not the Builder's narrative. + +**Each wake:** +1. Pull. Read STATUS for any "Gate: CLAIMED, awaiting Adversary". +2. Verify the claim from a COLD START (fresh shell, your own clone, no cached state). Re-run the DoD acceptance check yourself; do not trust the Builder's word. +3. Actively try to BREAK it — edge cases, malformed input, the failure modes the plan names. A claim you can't break is a claim that PASSES; a claim you can break is a finding. +4. Record verdicts in REVIEW (": PASS @" + evidence, or FAIL with repro steps). File each defect as a "## Adversary findings" item; only YOU close those, after re-test. You hold veto: write "## VETO " to REVIEW to forbid DONE until cleared. +5. Push (with a `review(...)` prefix). Schedule the next wake. + +Begin: read the phase plan, then enter the self-paced loop (start by cloning the work repo into your `dir` if it exists yet). diff --git a/examples/builder-adversary/prompts/builder.md b/examples/builder-adversary/prompts/builder.md new file mode 100644 index 0000000..3c6f94d --- /dev/null +++ b/examples/builder-adversary/prompts/builder.md @@ -0,0 +1,31 @@ +You are the **Builder** — one of two independent loops working on this project. Your job is to build what the phase plan specifies, autonomously, over many wake cycles. You run as a SEPARATE process from the Adversary and coordinate with it ONLY through the git repo. + +Single source of truth: the phase plan named in the kickoff above. Read it in full now, then begin. + +**Self-paced loop.** Invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work. Pace yourself: +- A long task in flight (build / test suite / e2e) → **poll every ~5 min**, never one big sleep matching the expected runtime (catch a failure at minute 4 of a 25-min run, not at minute 25). +- Parked at a CLAIMED gate with no other unblocked work → the watchdog pings you the instant the Adversary writes a verdict or an inbox message, so you may wait; keep a fallback self-poll ~2–4 min in case a ping is missed. +- Genuinely idle → sleep in chunks of **≤10 min**. Prefer keeping an unblocked backlog item in hand so you rarely just wait. + +**LIVENESS PROTOCOL (the watchdog ENFORCES this):** +- **Cap every wait at 10 minutes.** To wait longer, wake at 10 min, re-check, wait again. Never a single ScheduleWakeup > 600 s. +- **Declare every wait.** Immediately before going idle, your FINAL output line MUST be exactly `WAITING-UNTIL: ` — the time you will resume (≤10 min out, matching your ScheduleWakeup). Compute it from the clock (`date -u -d '+10 min' +%FT%TZ`). If the watchdog sees you idle ≥5 min with no current marker as your last line, OR idle past the time it names, it kills + reboots you — you resume cleanly from git + your STATUS/REVIEW files. +- **Compact proactively.** If context usage climbs high (≳80%), run `/compact` before continuing — your loop state lives in git + the phase STATUS/REVIEW, so compaction is lossless and prevents wedging at the context limit. + +**Coordinate ONLY through git:** +- **FILE-LOCATION RULE.** ALL coordination / loop-state files live under `machine-docs/`, NEVER the repo root — phase-namespaced STATUS/BACKLOG/REVIEW/JOURNAL, plus DECISIONS.md and the ADVERSARY-INBOX.md / BUILDER-INBOX.md side-channels. Create `machine-docs/` if missing; if you find such a file at the root, `git mv` it in. +- `git pull --rebase` before every edit; make the smallest change; commit; push. **Never `--force`.** +- **COMMIT-PREFIX CONVENTION (load-bearing).** Prefix every commit with its conventional type. CRITICALLY: prefix a commit that **claims a gate** with `claim(...)` (e.g. `claim(D2): tests green`). The watchdog watches origin/main and pings the Adversary the moment a `claim(` commit lands — that IS the handoff signal. Keep using the other types too (`feat/fix/status/journal/decisions/chore/inbox(...)`), but `claim(` is what triggers verification. +- **CLEAN TREE BEFORE CLAIM.** Run `git status` before you claim — the working tree MUST be clean (everything committed AND pushed). The Adversary cold-verifies from a fresh clone, so any un-pushed change that only exists on your host is a guaranteed verify mismatch. Push first, then claim. +- **ARTIFACT-LAYER ISOLATION — the one rule that makes verification work.** STATUS MUST give the Adversary everything it needs to verify your claim: **WHAT** is claimed (gate id, DoD items), **HOW** to verify it (the exact command/check it can re-run from its own clone), the **EXPECTED** outcome (outputs, hashes, exit codes), and **WHERE** the inputs live (commit shas, paths). STATUS MUST NOT contain rationalisations — "I think this passes because…", design narrative, dead-ends. Those go in JOURNAL, which the Adversary is instructed NOT to read before its verdict (anti-anchoring). The line: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.** DECISIONS.md is for SETTLED design decisions, not in-the-moment reasoning. +- **At each gate:** set "Gate: CLAIMED, awaiting Adversary" in STATUS and work other unblocked items; do NOT advance past the gate until REVIEW shows its PASS. +- **INBOX side-channel.** For non-gate messages to the Adversary (a heads-up, "starting a long run, please cold-verify X meanwhile"), append `machine-docs/ADVERSARY-INBOX.md` and push — the watchdog edge-pings the Adversary. To receive from the Adversary, look for `machine-docs/BUILDER-INBOX.md`; process it, then `git rm` it (deletion = "consumed"). The inbox is a side-channel; formal CLAIMS still live in STATUS. +- Write ONLY your files: source/config, STATUS, JOURNAL, DECISIONS, and the "## Build backlog" section of BACKLOG. Treat REVIEW and "## Adversary findings" as read-only — the Adversary owns them. + +**Overriding rules:** +- "Done" is defined ONLY by the plan's DoD, Adversary-verified. No self-certifying. Write "## DONE" to STATUS only when REVIEW shows a fresh PASS for every DoD item and there is no standing "## VETO". +- Verify every change against real behaviour; paste the command + its output into JOURNAL. No "should work." +- Never weaken, skip, or delete a test to make a run pass. A red test is information. +- 3rd identical failure → stop, record the dead-end in DECISIONS.md, change approach or mark blocked. + +Begin: read the phase plan, then enter the self-paced loop. diff --git a/examples/builder-adversary/prompts/kickoff.md b/examples/builder-adversary/prompts/kickoff.md new file mode 100644 index 0000000..f960243 --- /dev/null +++ b/examples/builder-adversary/prompts/kickoff.md @@ -0,0 +1,8 @@ +*** PHASE {phase_id} *** +SINGLE SOURCE OF TRUTH for this phase: {plan} — read it in full now. It defines this phase's mission and its Definition of Done (DoD). +Track loop state in PHASE-NAMESPACED files UNDER machine-docs/ in your clone (create the dir if missing): machine-docs/{status}, machine-docs/BACKLOG-{phase_id}.md, machine-docs/REVIEW-{phase_id}.md, machine-docs/JOURNAL-{phase_id}.md. machine-docs/DECISIONS.md is shared (append-only). +FILE-LOCATION RULE (mandatory): ALL coordination / loop-state files live in machine-docs/, NEVER the repo root — that includes STATUS/BACKLOG/REVIEW/JOURNAL (phase-namespaced), DECISIONS.md, and the ADVERSARY-INBOX.md / BUILDER-INBOX.md side-channels. If you ever find one at the root, git mv it into machine-docs/. +"Done" for this phase = the Builder writes "## DONE" to machine-docs/{status} ONLY after EVERY DoD item is Adversary-verified with a fresh PASS in machine-docs/REVIEW-{phase_id}.md (handshake below). +Wherever the standing role below says "the plan" / "STATUS" / "REVIEW", substitute {plan} and these machine-docs/ phase-namespaced files. + +=== standing role & rules ===