fix(watchdog): stop phase-machine handoff/gate-token work after SEQUENCE-COMPLETE

Gate-token tracking + handoff pings kept running on the completed phase machine, churning 0-token gate records every tick. Gate them on `not seq_done`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6
fix(wake): persistent-agent wakes survive SEQUENCE-COMPLETE
2026-06-24 15:38:02 +00:00 · 2026-06-24 15:36:02 +00:00 · 2026-06-24 02:40:14 +00:00 · 2026-06-23 05:17:07 +00:00 · 2026-06-23 04:40:34 +00:00 · 2026-06-22 07:27:50 +00:00
11 changed files with 513 additions and 21 deletions
--- a/README.md
+++ b/README.md
@ -16,6 +16,7 @@ agents.py            the driver + watchdog (pure Python stdlib; needs python >=
 agent-log.py         render claude JSONL transcripts into clean, greppable logs
 agents.example.toml  a self-contained 2-agent example project
 prompts/             generic role + kickoff templates (builder / adversary / kickoff)
+examples/            runnable example projects — the Builder/Adversary variant family, snakepit, …
 smoke.sh             bring the example up + tear it down in an isolated sandbox, then clean up
 tests/               the test suite — unit tests + isolated live backend smokes + a runner
 flake.nix/.lock      a Nix devShell with the runtime deps (python311, tmux, git)
@ -49,6 +50,42 @@ python3 agents.py --config agents.toml phase show    # where the loop phase mach

 ---

+## Examples
+
+`examples/` holds runnable example projects — copy one, point `agents.py` at its `agents.toml`, and
+go. The headline set is a family of **Builder/Adversary** variants that build the *same* task but each
+differ in one dimension — useful both as templates and as a study of the pattern:
+
+- **`builder-adversary`** — the canonical loop pair: a Builder that builds and an Adversary that
+  cold-verifies every claim, coordinating only through git (`claim(`/`review(` commits + the watchdog
+  handoff). **Start here.**
+- **`builder-adversary-min`** — the same pattern with the prompts compressed to minimal tokens.
+- **`builder-adversary-stateless`** — `builder-adversary` + **context hygiene** (compact at each
+  checkpoint, read diffs not trees, lean loads) to minimise carried/reloaded context.
+- **`builder-adversary-lean`** — context hygiene + **per-gate** review (one claim/verdict per gate).
+- **`builder-adversary-deferred`** — the Adversary verifies **once**, after the whole build, in a
+  final comprehensive `review` phase (vs per-phase / per-gate).
+- **`builder-solo`** — a single Builder that self-certifies, with **no Adversary** (the control).
+- **`snakepit`** — a different topology entirely: a pool of identical worker "snakes" pulling tasks
+  from a shared filesystem queue, plus cleanup specialists. (`examples/IDEAS.md` sketches more.)
+
+Each example has its own `README.md`. Run one by hand:
+
+```bash
+cd examples/builder-adversary
+python3 ../../agents.py status --config agents.toml      # read-only
+python3 ../../agents.py up     --config agents.toml       # needs `claude` on PATH
+```
+
+**Benchmark.** The separate
+[`agent-orchestrator-benchmark`](https://git.autonomic.zone/recipe-maintainers/agent-orchestrator-benchmark)
+repo runs these Builder/Adversary variants head-to-head (N=5, real `agents.py up` runs) to measure
+what drives token cost. Short version: an independent adversary costs **~4.7×** a solo builder, but
+the review *cadence* (per-gate / per-phase / deferred) is **nearly token-neutral**, and **context
+hygiene** is the one clean **~−22%** win. See that repo's `FINDINGS.md`.
+
+---
+
 ## The config: `agents.toml`

 Five section types: `[watchdog]`, `[backend.<name>]`, `[defaults]`, `[[agent]]` / `[[service]]`,
@ -230,6 +267,9 @@ phases = [
  subject matches `claim_pattern` / `review_pattern`, and watches the two `inboxes` files. When a
  claim lands it pings the `claim_pings` agent; a review pings `review_pings`; an inbox change
  pings the relevant side. This is how the Builder and Adversary coordinate purely through git.
+  `claim_pings` / `review_pings` may be a single agent name **or a list** — e.g.
+  `claim_pings = ["correctness-adversary", "readability-adversary"]` pings every reviewer on a claim
+  (each in its own session), for multi-reviewer setups.

 ---

--- a/agents.py
+++ b/agents.py
@ -466,6 +466,30 @@ def limit_tick(cfg, agent, pane):
 # ── stall detection ──────────────────────────────────────────────────────────────

 _idle_since: dict[str, float] = {}
+_done_nudged: dict[str, bool] = {}   # per-session: sent the one-time "write the done marker" nudge this phase
+
+def _done_nudge_msg(cfg, ph):
+    """The DONE-nudge: prompts a stalled loop agent to finalize a built-but-unmarked phase."""
+    dm = cfg["loop"].get("done_marker", "## DONE")
+    pid = ph.get("id", "")
+    status = ph.get("status", f"STATUS-{pid}.md")
+    sub = _state_subdir(cfg)
+    return (f"watchdog nudge: you've stalled in phase '{pid}', which is NOT yet marked '{dm}'. Resume now "
+            f"— pull any pending review/inbox and continue. If (and ONLY if) every DoD item has a fresh "
+            f"PASS from BOTH adversaries with no standing veto, write '{dm}' to {sub}/{status} and push, "
+            f"so the phase settles and auto-advances. Do not stay idle.")
+
+def _pane_last_active(session):
+    """Unix timestamp of the tmux window's last activity (last output change), or None.
+    Seeds idle-duration from the agent's REAL last activity rather than `now`, so stalls are
+    detected regardless of when the watchdog process started — restarting the watchdog no longer
+    resets every agent's stall clock."""
+    r = subprocess.run(f"tmux display-message -p -t {session!r} '#{{window_activity}}'",
+                       shell=True, capture_output=True, text=True)
+    try:
+        return float(r.stdout.strip())
+    except (ValueError, AttributeError):
+        return None

 def _last_nonempty_line(text):
    for line in reversed(text.splitlines()):
@ -502,7 +526,9 @@ def stall_check_one(cfg, agent):
    if pane_active(cfg, agent, pane):
        _idle_since[session] = 0.0
        return
-    since = _idle_since.get(session) or now
+    # Seed from the pane's real last-activity (not `now`), so a watchdog that just (re)started still
+    # sees an already-idle pane as idle-for-its-true-duration instead of resetting the clock.
+    since = _idle_since.get(session) or _pane_last_active(session) or now
    _idle_since[session] = since
    idle = now - since
    grace = int(cfg["watchdog"].get("stall_grace", 180))
@ -516,6 +542,17 @@ def stall_check_one(cfg, agent):
        if idle < stall_idle:
            return
        reason = f"idle {int(idle)}s with no WAITING-UNTIL marker"
+    # Ceremony-lag guard: a loop agent idling in a phase that's built but NOT marked done won't let the
+    # phase advance (the recurring "all gates PASS but no ## DONE written" stall). Nudge it ONCE per phase
+    # to finalize (write the done marker if the DoD is met) before falling back to the blunt kill+reboot.
+    if (cfg["loop"].get("done_nudge", True) and agent.get("kind") == "loop" and phases(cfg)
+            and not phase_done(cfg, cur_phase(cfg).get("status", "")) and not _done_nudged.get(session)):
+        log(f"stall: {agent['name']} ({session}) {reason} — DONE-nudge (phase built but not marked done)")
+        ping_session(session, _done_nudge_msg(cfg, cur_phase(cfg)),
+                     submit_key=backend_of(cfg, agent).get("submit_key", "Enter"))
+        _done_nudged[session] = True
+        _idle_since[session] = now      # fresh idle window to act on the nudge before reboot escalates
+        return
    log(f"stall: {agent['name']} ({session}) {reason} — kill + reboot")
    start_agent(cfg, agent, force=True)
    _idle_since[session] = 0.0
@ -564,6 +601,15 @@ def wake_agent(cfg, agent):
    if not wake:
        return True
    session = agent["session"]
+    # A one-shot `task` is "woken" by RE-RUNNING it fresh — it has no persistent REPL to re-prompt — so
+    # scheduled work (e.g. a coverage audit) recurs autonomously on its interval, no operator needed.
+    # Skip only while its previous run is still going; otherwise kill + restart for a clean re-run.
+    if agent.get("kind") == "task":
+        if session_alive(session) and pane_active(cfg, agent, capture_pane(session, 25)):
+            return False
+        log(f"wake: re-running task {agent['name']} ({session})")
+        start_agent(cfg, agent, force=True)
+        return True
    if not session_alive(session):
        return False
    backend = backend_of(cfg, agent)
@ -599,15 +645,22 @@ def _show_pushed(cfg, repo, path):
            return r.stdout
    return ""

+def _ping_agents(cfg, value, default, msg):
+    """Ping one or more agents. `value` is an agent name, a LIST of names, or falsy (→ default).
+    Each target is pinged in its own session with its own backend's submit key — so a handoff can
+    notify multiple reviewers (e.g. claim_pings = ["correctness-adversary", "readability-adversary"])."""
+    names = value if isinstance(value, list) else [value or default]
+    for name in names:
+        agent = cfg["agents"].get(name)
+        session = agent["session"] if agent and agent.get("session") else (cfg["session_prefix"] + str(name))
+        submit = backend_of(cfg, agent).get("submit_key", "Enter") if agent else "Enter"
+        ping_session(session, msg, submit_key=submit)
+
 def handoff_check(cfg):
    h = cfg["loop"].get("handoff")
    if not h:
        return
    repo = handoff_repo(cfg)
-    sub = lambda name: cfg["agents"].get(name, {}).get("session", cfg["session_prefix"] + name)
-    builder_name = h.get("review_pings", "builder")
-    submit = (backend_of(cfg, cfg["agents"][builder_name]).get("submit_key", "Enter")
-              if builder_name in cfg["agents"] else "Enter")
    claim_pat = h.get("claim_pattern", "^claim")
    review_pat = h.get("review_pattern", "^review")
    _git(repo, "fetch -q origin")
@ -618,15 +671,15 @@ def handoff_check(cfg):
        elif head != _hand["sha"]:
            subjects = _git(repo, f"log --format=%s {_hand['sha']}..origin/main").stdout
            if re.search(claim_pat, subjects, re.M | re.I):
-                log("handoff: claim commit → pinging reviewer")
-                ping_session(sub(h.get("claim_pings", "adversary")),
+                log("handoff: claim commit → pinging reviewer(s)")
+                _ping_agents(cfg, h.get("claim_pings", "adversary"), "adversary",
                    "watchdog ping: the other loop pushed a gate CLAIM commit. "
-                    "Pull and verify the claimed gate now.", submit_key=submit)
+                    "Pull and verify the claimed gate now.")
            if re.search(review_pat, subjects, re.M | re.I):
                log("handoff: review commit → pinging builder")
-                ping_session(sub(h.get("review_pings", "builder")),
+                _ping_agents(cfg, h.get("review_pings", "builder"), "builder",
                    "watchdog ping: the other loop pushed a verdict/finding commit. "
-                    "Pull the review file and act.", submit_key=submit)
+                    "Pull the review file and act.")
            _hand["sha"] = head
    inboxes = h.get("inboxes", [])
    md5 = lambda s: hashlib.md5(s.encode()).hexdigest()
@ -642,9 +695,9 @@ def handoff_check(cfg):
            hh = md5(content)
            if hh != _hand[key]:
                log(f"handoff: {fname} changed → pinging {target}")
-                ping_session(sub(target),
+                _ping_agents(cfg, target, target,
                    f"watchdog ping: the other loop pushed {sub_dir}/{fname} — pull, read it, "
-                    f"act, then delete the file (commit + push) to mark it consumed.", submit_key=submit)
+                    f"act, then delete the file (commit + push) to mark it consumed.")
                _hand[key] = hh
        else:
            _hand[key] = ""
@ -757,6 +810,93 @@ def token_phase_flush(cfg, next_phase_id):
    else:
        sf.unlink(missing_ok=True)

+# ── token logging granularity: per phase, or also per GATE (log_tokens) ────────────
+# Phases are tracked per phase (token_phase_begin/flush). With token_granularity="gate" (the default)
+# tokens are ALSO attributed to each gate. A "gate" is a claimed unit — any `claim(<label>)` commit on
+# the work repo's origin/main (e.g. claim(D1-D5), claim(feat:multi-file)); a leading "feat:" is
+# stripped for readability. A change in the most-recently-claimed label is a boundary; on each
+# boundary the previous gate's per-agent token delta + duration is appended to token-log.jsonl tagged
+# phase_id="<phase>:<label>", so `agents.py tokens` lists it as its own row. The per-phase rollup
+# record is written either way; "phase" granularity logs only that.
+_gate_claim_re = re.compile(r"^claim\(\s*([^)]+?)\s*\)", re.I)
+
+def token_granularity(cfg):
+    """'gate' (default; per claimed gate, plus the per-phase rollup) or 'phase' (per phase only)."""
+    g = (cfg.get("watchdog", {}).get("token_granularity")
+         or cfg.get("loop", {}).get("token_granularity") or "gate")
+    return g if g in ("gate", "phase") else "gate"
+
+def _token_gate_state_path(cfg):
+    return Path(cfg["state_dir"]) / "token-gate.json"
+
+def _latest_claimed_gate(cfg):
+    """Label of the most-recent `claim(<label>)` subject on the work repo's origin/main, or None.
+    A leading 'feat:' is stripped so feature gates read as their bare name."""
+    r = _git(handoff_repo(cfg), "log -1 --format=%s --grep 'claim(' origin/main")
+    m = _gate_claim_re.match((r.stdout or "").strip())
+    if not m:
+        return None
+    label = m.group(1).strip()
+    return label[5:].strip() if label.lower().startswith("feat:") else label
+
+def _write_token_delta(cfg, phase_id, st):
+    """Append one per-agent token delta record (vs the baseline in st) to token-log.jsonl."""
+    cur = _token_cumulative(cfg)
+    base = st.get("baseline", {})
+    started = st.get("started")
+    try:
+        dur = round((datetime.now() - datetime.fromisoformat(started)).total_seconds(), 1)
+    except Exception:
+        dur = None
+    per_agent = {n: _tok_delta(cur.get(n, {}), base.get(n, {})) for n in cur}
+    total = {k: sum(per_agent[n][k] for n in per_agent) for k in _TOKEN_KEYS}
+    rec = {"phase_id": phase_id, "started": started,
+           "ended": datetime.now().isoformat(timespec="seconds"), "duration_s": dur,
+           "agents": per_agent, "total": total}
+    with _token_log_path(cfg).open("a") as fh:
+        fh.write(json.dumps(rec) + "\n")
+    return rec, per_agent, total
+
+def gate_token_flush(cfg):
+    """Close out the currently-tracked gate (if any): write its token delta, then drop the state."""
+    sf = _token_gate_state_path(cfg)
+    try:
+        st = json.loads(sf.read_text())
+    except Exception:
+        return
+    gate = st.get("gate")
+    if not gate:
+        return
+    phase_id = f"{st['phase']}:{gate}" if st.get("phase") else gate
+    rec, per_agent, total = _write_token_delta(cfg, phase_id, st)
+    parts = ", ".join(f"{n}={per_agent[n]['total']:,}" for n in per_agent)
+    log(f"[log_tokens] gate {phase_id}: {total['total']:,} tok in {rec['duration_s']}s ({parts})")
+    try:
+        sf.unlink()
+    except Exception:
+        pass
+
+def gate_token_check(cfg):
+    """When token_granularity=='gate', detect gate boundaries and flush per-gate token deltas."""
+    if not log_tokens_enabled(cfg) or token_granularity(cfg) != "gate":
+        return
+    current = _latest_claimed_gate(cfg)
+    if not current:
+        return
+    sf = _token_gate_state_path(cfg)
+    try:
+        tracked = json.loads(sf.read_text()).get("gate")
+    except Exception:
+        tracked = None
+    if tracked == current:
+        return
+    if tracked:
+        gate_token_flush(cfg)               # close out the previous gate before starting the next
+    sf.write_text(json.dumps({"gate": current, "phase": cur_phase(cfg).get("id"),
+                              "started": datetime.now().isoformat(timespec="seconds"),
+                              "baseline": _token_cumulative(cfg)}))
+    log(f"[log_tokens] tracking gate: {current}")
+
 def phase_advance_check(cfg):
    """On heavy tick: if the current phase is DONE, advance (or finish the sequence).

@ -772,6 +912,8 @@ def phase_advance_check(cfg):
    ph = ps[idx]
    if not phase_done(cfg, ph["status"]):
        return False
+    if log_tokens_enabled(cfg) and token_granularity(cfg) == "gate":
+        gate_token_flush(cfg)      # close out the last in-flight gate before leaving the phase
    nxt = idx + 1
    if nxt < len(ps):
        log(f"PHASE {ph['id']} DONE — auto-transitioning to {ps[nxt]['id']}")
@ -781,6 +923,7 @@ def phase_advance_check(cfg):
        if marker.exists():
            marker.unlink()   # resuming into a (freshly-appended) phase — clear stale completion
        handoff_reset()
+        _done_nudged.clear()   # fresh DONE-nudge budget for the new phase
        start_loops(cfg)
        return True
    # last phase is DONE → sequence complete
@ -819,14 +962,15 @@ def watchdog_loop(cfg_path):
    wake_elapsed = {a["name"]: 0 for a in cfg["agents"].values() if a.get("wake")}
    if log_tokens_enabled(cfg):
        token_phase_begin(cfg, cur_phase(cfg).get("id"))
-        log("[log_tokens] enabled — per-phase token + time logging to token-log.jsonl")
+        log(f"[log_tokens] enabled (granularity={token_granularity(cfg)}) — token-log.jsonl")
    while True:
        cfg = load_config(cfg_path)   # re-read every tick: config is authoritative, no env drift
        has_loops = bool(loop_agents(cfg))
        seq_done = (Path(cfg["log_dir"]) / "SEQUENCE-COMPLETE").exists()

-        if has_loops:
+        if has_loops and not seq_done:
            handoff_check(cfg)
+            gate_token_check(cfg)
        for a in watched(cfg):
            if a["watch"] == "heal+stall":
                stall_check_one(cfg, a)
@ -834,16 +978,24 @@ def watchdog_loop(cfg_path):
                if session_alive(a["session"]):
                    limit_tick(cfg, a, capture_pane(a["session"], 40))

-        if not seq_done:
-            for name, el in list(wake_elapsed.items()):
-                interval = int(cfg["agents"][name]["wake"].get("interval", 3600))
-                if el >= interval:
-                    if wake_agent(cfg, cfg["agents"][name]):
-                        wake_elapsed[name] = 0
+        for name, el in list(wake_elapsed.items()):
+            agent = cfg["agents"][name]
+            # After the phase sequence completes, quiet the loop-tied wakes (e.g. the on-demand
+            # auditor) — but a PERSISTENT agent (the operator-facing supervisor) keeps waking, so its
+            # hourly supervision survives SEQUENCE-COMPLETE and can drive follow-on work (a second build).
+            if seq_done and agent.get("kind") != "persistent":
+                continue
+            interval = int(agent["wake"].get("interval", 3600))
+            if el >= interval:
+                if wake_agent(cfg, agent):
+                    wake_elapsed[name] = 0
+
+        # Auto-advance is checked EVERY tick (not just the heavy tick) so a completed phase advances
+        # within signal_interval of its `## DONE` landing, instead of idling up to heavy_interval.
+        advanced = phase_advance_check(cfg) if has_loops else False

        if elapsed >= heavy:
            elapsed = 0
-            advanced = phase_advance_check(cfg) if has_loops else False
            if not advanced:
                for a in watched(cfg):
                    if seq_done and a["kind"] == "loop":
--- a/examples/builder-adversary-deferred/README.md
+++ b/examples/builder-adversary-deferred/README.md
@ -0,0 +1,48 @@
+# Builder/Adversary example — deferred review (verify after a long segment)
+
+The coarsest point on the **review-cadence spectrum**. Same pattern, same full original prompts as
+`../builder-adversary` — only *when* the Adversary verifies changes:
+
+| variant | the Adversary verifies… | handshakes (calculator task) |
+|---|---|--:|
+| `builder-adversary-lean` | per **gate** | ~12 claim/verify round-trips |
+| `builder-adversary` (orig) | per **phase** | ~3 |
+| **`builder-adversary-deferred`** | **once, after the whole build** | **1** |
+
+## How it works
+
+The Builder **self-certifies** the build phases (`wc`, then `json`) — builds to each phase's DoD, runs
+its own tests until green, writes `## DONE`, and advances *without* waiting for the Adversary. The
+Adversary stays out of the build. Only in the final **`review` phase** does it do **one comprehensive
+cold-verification of the entire accumulated calculator** (`plans/review.md`): re-run every DoD item
+from every phase from a fresh clone, plus cross-feature break-it probes, file all findings at once,
+re-verify after fixes, then PASS. That single pass is the only adversary gate in the run.
+
+## The trade-off
+
+- **Cheapest coordination.** One handshake instead of 3–12 — no per-gate/per-phase round-trips, the
+  Builder isn't interrupted mid-build. (The benchmark showed coordination round-trips are a real
+  token cost; deferring to one pass minimises them.)
+- **But the independent check arrives late.** Two risks the per-gate/per-phase cadences guard
+  against:
+  - **Late discovery / rework.** If the Builder built phase 2 on a wrong assumption from phase 1, an
+    early adversary would have caught it at gate 1; here it surfaces only at the end, after more work
+    was piled on the flaw — potentially a larger, costlier fix.
+  - **Self-certification drift.** The build phases are self-certified, so a bug the Builder
+    rubber-stamps survives until the final review. The comprehensive pass is the only safety net, so
+    it must be thorough.
+- **Better at cross-feature bugs.** Because it verifies the whole system at once, it's positioned to
+  catch *interactions* (e.g. `--json` × every flag) that a per-gate view, looking at one item at a
+  time, can miss.
+
+So `deferred` trades *early, incremental* assurance for *minimal coordination + one holistic pass*.
+It suits work where features are independent and cheap to fix late; it's risky where early decisions
+constrain later ones.
+
+```bash
+python3 ../../agents.py status --config agents.toml
+python3 ../../agents.py up     --config agents.toml      # needs `claude` on PATH
+```
+
+> **Prompt base:** the full original `builder-adversary` prompts + a DEFERRED REVIEW CADENCE override
+> — so comparing this to `builder-adversary`/`lean` isolates *only* the verification cadence.
--- a/examples/builder-adversary-deferred/agents.toml
+++ b/examples/builder-adversary-deferred/agents.toml
@ -0,0 +1,79 @@
+# examples/builder-adversary-deferred — Adversary verifies ONCE, after a long segment of building.
+#
+# Same pattern + full original prompts as ../builder-adversary, but the REVIEW CADENCE is coarsest:
+#   • lean      = the Adversary verifies per gate (finest)
+#   • orig      = the Adversary verifies per phase (medium)
+#   • deferred  = the Adversary verifies ONCE, comprehensively, after the whole build (coarsest)
+# The Builder SELF-CERTIFIES the build phases (wc, json) to advance; the Adversary stays out until the
+# final `review` phase, where it cold-verifies the ENTIRE accumulated calculator in one pass. Cheapest
+# coordination, but the independent check arrives late (see README for the trade-off).
+#
+#   python3 ../../agents.py status --config agents.toml
+#   python3 ../../agents.py up     --config agents.toml      # needs `claude` on PATH
+
+[watchdog]
+signal_interval      = 30
+heavy_interval       = 300
+limit_probe_fallback = 300
+limit_reset_slack    = 45
+stall_grace          = 180
+
+[defaults]
+session_prefix = "badef-"        # tmux namespace: badef-builder, badef-adv, …
+log_dir        = ".ao-state"
+backend        = "claude"        # set to "demo" for a dependency-free mechanics-only run
+model          = "claude-sonnet-4-6"
+watch          = "heal"
+
+[backend.claude]
+bin             = "claude"
+flags           = "--dangerously-skip-permissions"
+remote_control  = true
+supports_resume = true
+prompt_delivery = "arg"
+process_name    = "claude"
+submit_key      = "Enter"
+stall_idle      = 300
+active_re = "esc to interrupt|Running tool|⠇|⠙|· \\d+"
+limit_re  = "spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)"
+fatal_re  = "redacted_thinking|blocks cannot be modified|cannot be modified"
+
+[backend.demo]
+bin             = "echo '[demo] {session} up (kickoff: {kickoff})'; exec sleep 1000000"
+prompt_delivery = "exec"
+
+[[agent]]
+name  = "builder"                # tmux session: badef-builder
+kind  = "loop"
+role  = "builder"
+dir   = "./work"
+watch = "heal+stall"
+
+[[agent]]
+name    = "adversary"
+session = "badef-adv"
+kind    = "loop"
+role    = "adversary"
+dir     = "./work-adv"
+watch   = "heal+stall"
+
+[[service]]
+name    = "cleanlogs"
+command = "python3 ../../agent-log.py follow-all"
+dir     = "."
+
+[loop]
+state_file       = "phase-idx"
+resume_phase     = true
+auto_advance     = true
+done_marker      = "## DONE"
+kickoff_template = "prompts/kickoff.md"
+roles_dir        = "prompts"
+handoff = { repo = "./work", claim_pings = "adversary", review_pings = "builder", inboxes = ["ADVERSARY-INBOX.md", "BUILDER-INBOX.md"], claim_pattern = "^claim", review_pattern = "^review", state_subdir = "machine-docs" }
+# Build phases (wc, json) are self-certified by the Builder; the final `review` phase is the single
+# comprehensive Adversary gate over the whole accumulated build.
+phases = [
+  { id = "wc",     plan = "plans/wc.md",     status = "STATUS-wc.md" },
+  { id = "json",   plan = "plans/json.md",   status = "STATUS-json.md" },
+  { id = "review", plan = "plans/review.md", status = "STATUS-review.md" },
+]
--- a/examples/builder-adversary-deferred/machine-docs/.gitkeep
+++ b/examples/builder-adversary-deferred/machine-docs/.gitkeep
--- a/examples/builder-adversary-deferred/plans/json.md
+++ b/examples/builder-adversary-deferred/plans/json.md
@ -0,0 +1,32 @@
+# Phase `json` — machine-readable output
+
+**Mission.** Extend the `wc.py` from the previous phase with a `--json` mode, without regressing any
+`wc`-phase behaviour. Single source of truth for this phase.
+
+(The phase config gives the Builder `claude-opus-4-8` for this phase — an example of a per-phase
+model override; the Adversary stays on the default model.)
+
+## Definition of Done
+
+- **D1 — json output.** `python wc.py --json FILE` prints a single JSON object:
+  `{"lines": N, "words": N, "chars": N, "file": "FILE"}` (valid JSON, parseable by `json.loads`).
+  With stdin (no FILE), `"file"` is `null`.
+- **D2 — composes with flags.** `--json` honours `-l/-w/-c`: only the requested counts appear as keys
+  (plus `file`). E.g. `wc.py --json -l FILE` → `{"lines": N, "file": "FILE"}`.
+- **D3 — no regression.** Every `wc`-phase gate (D1–D4 there) still passes unchanged.
+- **D4 — tests green.** `test_wc.py` is extended for the JSON cases and `pytest -q` is all-green.
+
+## How the Adversary verifies (cold)
+
+```bash
+pytest -q                                              # D4 + D3 regression
+printf 'a b c\nd e\n' > /tmp/f.txt
+python wc.py --json /tmp/f.txt | python -c 'import sys,json; d=json.load(sys.stdin); \
+  assert d=={"lines":2,"words":5,"chars":10,"file":"/tmp/f.txt"}, d; print("ok")'   # D1
+python wc.py --json -l /tmp/f.txt                      # D2: expect {"lines": 2, "file": "/tmp/f.txt"}
+```
+
+The Builder restates the exact commands, expected JSON, and commit sha in
+`machine-docs/STATUS-json.md`. When every DoD item has a fresh PASS in `machine-docs/REVIEW-json.md`
+and there is no `## VETO`, the Builder writes `## DONE` to `STATUS-json.md` — this is the last phase,
+so the watchdog then fires the one-shot `reporter` (see `agents.toml` `[loop].on_complete`).
--- a/examples/builder-adversary-deferred/plans/review.md
+++ b/examples/builder-adversary-deferred/plans/review.md
@ -0,0 +1,24 @@
+# Phase `review` — comprehensive deferred verification
+
+This phase adds **no new features**. The Builder has self-certified the build phases (`wc`, `json`)
+and accumulated the whole calculator. Now the Adversary does its **one comprehensive cold-verification
+of the entire build** — the first and only adversary gate in the run.
+
+## Definition of Done
+
+- **D1 — full cold re-verify.** From a FRESH clone, the Adversary re-runs **every DoD item from every
+  prior phase** (all of `wc` and all of `json`) and confirms each passes. Nothing is taken on the
+  Builder's word.
+- **D2 — full suite green.** The complete test suite (`python -m unittest`) passes, 0 failures.
+- **D3 — cross-feature break-it.** The Adversary hunts the interactions a per-gate/per-phase view
+  would miss: `--json` combined with every count flag, whitespace + multi-line + json together, the
+  error paths under json mode, stdin + json, etc. — and files any defects it finds.
+- **D4 — findings cleared.** Every finding the Adversary files is fixed by the Builder and
+  re-verified PASS; no standing `## VETO`.
+
+## How it works
+
+The Adversary records its comprehensive verdict in `machine-docs/REVIEW-review.md`
+(`review(all): PASS`, or findings with repro). The Builder fixes anything found, then writes
+`## DONE` to `machine-docs/STATUS-review.md` **only after** the Adversary's comprehensive PASS — the
+single adversary checkpoint for the whole build.
--- a/examples/builder-adversary-deferred/plans/wc.md
+++ b/examples/builder-adversary-deferred/plans/wc.md
@ -0,0 +1,43 @@
+# Phase `wc` — a word-count CLI
+
+**Mission.** Build a small, dependency-free `wc` clone in Python: a script `wc.py` in the work repo
+that counts lines, words, and characters, plus a `pytest` suite. This is the single source of truth
+for the phase — the Builder builds to the Definition of Done below; the Adversary cold-verifies it.
+
+This task is deliberately tiny and fully local (no network, no services) so the example exercises the
+loop-pair *protocol* — claim → cold-verify → PASS/FAIL handshake — not infrastructure.
+
+## Definition of Done
+
+Each Dn is an independent gate. The Builder claims it (`claim(Dn): …`); the Adversary records a fresh
+PASS in `machine-docs/REVIEW-wc.md` after re-running the check from its own clone.
+
+- **D1 — default output.** `python wc.py FILE` prints exactly `<lines> <words> <chars> <FILE>`
+  (counts whitespace-separated words, `\n`-terminated lines, and bytes for `chars`), matching GNU
+  `wc` on ASCII input.
+- **D2 — flags.** `-l`, `-w`, `-c` restrict the output to that single count (e.g. `wc.py -l FILE`
+  prints `<lines> <FILE>`). Flags may combine; output order is lines, words, chars.
+- **D3 — stdin.** With no FILE argument, `wc.py` reads stdin and prints the counts with no filename.
+- **D4 — tests green.** A `test_wc.py` runs under `pytest -q` with **0 failures**, covering: an empty
+  file (`0 0 0`), a multi-line fixture, the no-trailing-newline case, and each flag.
+
+## How the Adversary verifies (cold)
+
+From a fresh clone of the work repo:
+
+```bash
+pytest -q                                  # D4: must be all-green
+printf 'a b c\nd e\n' > /tmp/f.txt
+python wc.py /tmp/f.txt                     # D1: expect "2 5 10 /tmp/f.txt"
+python wc.py -l /tmp/f.txt                  # D2: expect "2 /tmp/f.txt"
+printf 'a b c\nd e\n' | python wc.py        # D3: expect "2 5 10"
+```
+
+Expected outputs are above — the Builder must restate them (and the exact commands, plus the commit
+sha) in `machine-docs/STATUS-wc.md` so the Adversary can re-run without reading the Builder's
+reasoning. Any mismatch is a FAIL with repro steps in `machine-docs/REVIEW-wc.md`.
+
+## Out of scope (defer to a later phase or DEFERRED.md)
+
+Multibyte/`-m` char counting, `--files0-from`, multiple-file totals, locale handling. JSON output is
+the next phase (`plans/json.md`).
--- a/examples/builder-adversary-deferred/prompts/adversary.md
+++ b/examples/builder-adversary-deferred/prompts/adversary.md
@ -0,0 +1,31 @@
+You are the **Adversary** — one of two independent loops. Your job is to **DISBELIEVE the Builder**. You run as a SEPARATE process and coordinate ONLY through the git repo. Read the phase plan named in the kickoff above in full — it is the single source of truth for WHAT is being verified.
+
+**Self-paced loop.** Invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. When a gate is CLAIMED (or the watchdog pings you that one is), verify it promptly — that is top priority. When nothing is pending you may IDLE freely (sleep in chunks of **≤10 min**); you do NOT need to busy-poll to look busy — the watchdog pings you the instant the Builder claims a gate. Poll ~4 min only while actively watching a CLAIMED gate's run. Keep running independent break-it probes even when no gate is pending. Stop only when STATUS says "## DONE" and you have logged a fresh PASS for every DoD item.
+
+**LIVENESS PROTOCOL (the watchdog ENFORCES this):**
+- **Cap every wait at 10 minutes.** Never a single ScheduleWakeup > 600 s; to wait longer, wake, re-check, wait again.
+- **Declare every wait.** Immediately before going idle, your FINAL output line MUST be exactly `WAITING-UNTIL: <ISO-8601 UTC>` (≤10 min out, matching your ScheduleWakeup; compute with `date -u -d '+10 min' +%FT%TZ`). Idle ≥5 min with no current marker, or past the named time → the watchdog kills + reboots you; you resume cleanly from git + your REVIEW/STATUS files.
+- **Compact proactively** at ≳80% context — your state is in git + REVIEW/STATUS, so compaction is lossless.
+
+**Coordinate ONLY through git:**
+- **FILE-LOCATION RULE.** ALL coordination / loop-state files live under `machine-docs/`, NEVER the repo root. If you find one at the root, `git mv` it in.
+- **Keep your OWN clone** (the `dir` this agent runs in). You verify from a COLD START in it. If the work repo doesn't exist yet, wait and retry on your next wake — the Builder creates it first.
+- `git pull --rebase` before every edit; commit; push; **never `--force`.**
+- **COMMIT-PREFIX CONVENTION (load-bearing).** Prefix every commit that records a **verdict or finding** with `review(...)` (e.g. `review(D2): PASS` / `review(D2): FAIL — repro …`). The watchdog watches origin/main and pings the Builder the moment a `review(` commit lands — that IS the handoff signal. (The Builder's gate claims are `claim(...)`.)
+- Write ONLY your files: REVIEW and the "## Adversary findings" section of BACKLOG. Everything else (code, STATUS, JOURNAL, "## Build backlog") is read-only to you.
+- **INBOX side-channel.** For non-gate messages to the Builder, append `machine-docs/BUILDER-INBOX.md` and push (the watchdog edge-pings the Builder). To receive from the Builder, look for `machine-docs/ADVERSARY-INBOX.md`; process it, then `git rm` it (deletion = "consumed"). Formal verdicts still live in REVIEW.
+
+**ISOLATION DISCIPLINE (anti-anchoring — critical).** The Builder is REQUIRED to give you, in STATUS, the verification info you need: WHAT is claimed, HOW to verify it (the exact command/check), the EXPECTED outcome, and WHERE the inputs live. **Read STATUS for that — you need all of it.** What you must IGNORE — in STATUS, and NEVER read in JOURNAL before your verdict — is the Builder's REASONING / RATIONALISATIONS ("I think this passes because…", design narrative, dead-ends). Reading those anchors you. Form your verdict from: (a) the phase plan = SSOT, (b) the code / git history, (c) the verification info the Builder passed in STATUS, and (d) your OWN cold acceptance run that re-executes the check against the expected outcomes. Only AFTER writing your verdict may you consult JOURNAL (note in REVIEW that you did). Trust observable behaviour, the plan, and your own re-run — not the Builder's narrative.
+
+**Each wake:**
+1. Pull. Read STATUS for any "Gate: <id> CLAIMED, awaiting Adversary".
+2. Verify the claim from a COLD START (fresh shell, your own clone, no cached state). Re-run the DoD acceptance check yourself; do not trust the Builder's word.
+3. Actively try to BREAK it — edge cases, malformed input, the failure modes the plan names. A claim you can't break is a claim that PASSES; a claim you can break is a finding.
+4. Record verdicts in REVIEW ("<id>: PASS @<ts>" + evidence, or FAIL with repro steps). File each defect as a "## Adversary findings" item; only YOU close those, after re-test. You hold veto: write "## VETO <reason>" to REVIEW to forbid DONE until cleared.
+5. Push (with a `review(...)` prefix). Schedule the next wake.
+
+REVIEW CADENCE — DEFERRED (this OVERRIDES the "verify each claimed gate per wake" rule above): you verify ONCE, comprehensively, after the whole build — not per gate or per phase.
+- During the BUILD phases (before the final `review` phase): the Builder self-certifies and advances; you do NOT gate those. You may run early break-it probes, but the authoritative check is deferred — don't write per-gate verdicts.
+- In the `review` phase: do ONE comprehensive cold-verification of the ENTIRE calculator from a fresh clone — re-run EVERY DoD item from EVERY prior phase, and hunt cross-feature / integration breaks (interactions between features, not just isolated gates). File all findings together; re-verify after the Builder's fixes; PASS only when the whole system holds. This single comprehensive pass replaces per-gate review.
+
+Begin: read the phase plan, then enter the self-paced loop (start by cloning the work repo into your `dir` if it exists yet).
--- a/examples/builder-adversary-deferred/prompts/builder.md
+++ b/examples/builder-adversary-deferred/prompts/builder.md
@ -0,0 +1,35 @@
+You are the **Builder** — one of two independent loops working on this project. Your job is to build what the phase plan specifies, autonomously, over many wake cycles. You run as a SEPARATE process from the Adversary and coordinate with it ONLY through the git repo.
+
+Single source of truth: the phase plan named in the kickoff above. Read it in full now, then begin.
+
+**Self-paced loop.** Invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work. Pace yourself:
+- A long task in flight (build / test suite / e2e) → **poll every ~5 min**, never one big sleep matching the expected runtime (catch a failure at minute 4 of a 25-min run, not at minute 25).
+- Parked at a CLAIMED gate with no other unblocked work → the watchdog pings you the instant the Adversary writes a verdict or an inbox message, so you may wait; keep a fallback self-poll ~2–4 min in case a ping is missed.
+- Genuinely idle → sleep in chunks of **≤10 min**. Prefer keeping an unblocked backlog item in hand so you rarely just wait.
+
+**LIVENESS PROTOCOL (the watchdog ENFORCES this):**
+- **Cap every wait at 10 minutes.** To wait longer, wake at 10 min, re-check, wait again. Never a single ScheduleWakeup > 600 s.
+- **Declare every wait.** Immediately before going idle, your FINAL output line MUST be exactly `WAITING-UNTIL: <ISO-8601 UTC>` — the time you will resume (≤10 min out, matching your ScheduleWakeup). Compute it from the clock (`date -u -d '+10 min' +%FT%TZ`). If the watchdog sees you idle ≥5 min with no current marker as your last line, OR idle past the time it names, it kills + reboots you — you resume cleanly from git + your STATUS/REVIEW files.
+- **Compact proactively.** If context usage climbs high (≳80%), run `/compact` before continuing — your loop state lives in git + the phase STATUS/REVIEW, so compaction is lossless and prevents wedging at the context limit.
+
+**Coordinate ONLY through git:**
+- **FILE-LOCATION RULE.** ALL coordination / loop-state files live under `machine-docs/`, NEVER the repo root — phase-namespaced STATUS/BACKLOG/REVIEW/JOURNAL, plus DECISIONS.md and the ADVERSARY-INBOX.md / BUILDER-INBOX.md side-channels. Create `machine-docs/` if missing; if you find such a file at the root, `git mv` it in.
+- `git pull --rebase` before every edit; make the smallest change; commit; push. **Never `--force`.**
+- **COMMIT-PREFIX CONVENTION (load-bearing).** Prefix every commit with its conventional type. CRITICALLY: prefix a commit that **claims a gate** with `claim(...)` (e.g. `claim(D2): tests green`). The watchdog watches origin/main and pings the Adversary the moment a `claim(` commit lands — that IS the handoff signal. Keep using the other types too (`feat/fix/status/journal/decisions/chore/inbox(...)`), but `claim(` is what triggers verification.
+- **CLEAN TREE BEFORE CLAIM.** Run `git status` before you claim — the working tree MUST be clean (everything committed AND pushed). The Adversary cold-verifies from a fresh clone, so any un-pushed change that only exists on your host is a guaranteed verify mismatch. Push first, then claim.
+- **ARTIFACT-LAYER ISOLATION — the one rule that makes verification work.** STATUS MUST give the Adversary everything it needs to verify your claim: **WHAT** is claimed (gate id, DoD items), **HOW** to verify it (the exact command/check it can re-run from its own clone), the **EXPECTED** outcome (outputs, hashes, exit codes), and **WHERE** the inputs live (commit shas, paths). STATUS MUST NOT contain rationalisations — "I think this passes because…", design narrative, dead-ends. Those go in JOURNAL, which the Adversary is instructed NOT to read before its verdict (anti-anchoring). The line: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.** DECISIONS.md is for SETTLED design decisions, not in-the-moment reasoning.
+- **At each gate:** set "Gate: <id> CLAIMED, awaiting Adversary" in STATUS and work other unblocked items; do NOT advance past the gate until REVIEW shows its PASS.
+- **INBOX side-channel.** For non-gate messages to the Adversary (a heads-up, "starting a long run, please cold-verify X meanwhile"), append `machine-docs/ADVERSARY-INBOX.md` and push — the watchdog edge-pings the Adversary. To receive from the Adversary, look for `machine-docs/BUILDER-INBOX.md`; process it, then `git rm` it (deletion = "consumed"). The inbox is a side-channel; formal CLAIMS still live in STATUS.
+- Write ONLY your files: source/config, STATUS, JOURNAL, DECISIONS, and the "## Build backlog" section of BACKLOG. Treat REVIEW and "## Adversary findings" as read-only — the Adversary owns them.
+
+**Overriding rules:**
+- "Done" is defined ONLY by the plan's DoD, Adversary-verified. No self-certifying. Write "## DONE" to STATUS only when REVIEW shows a fresh PASS for every DoD item and there is no standing "## VETO".
+- Verify every change against real behaviour; paste the command + its output into JOURNAL. No "should work."
+- Never weaken, skip, or delete a test to make a run pass. A red test is information.
+- 3rd identical failure → stop, record the dead-end in DECISIONS.md, change approach or mark blocked.
+
+REVIEW CADENCE — DEFERRED (this OVERRIDES the per-phase "Adversary-verified / no self-certifying" rule above, for build phases only): the Adversary verifies in ONE comprehensive pass at the END, not per gate or per phase.
+- BUILD phases (every phase before the final `review` phase): SELF-CERTIFY. Build to the phase DoD, run your own tests until green, then write "## DONE" to advance — do NOT claim or wait for the Adversary on a build phase. Accumulate the whole build.
+- The final `review` phase: do not add features. The Adversary now cold-verifies the ENTIRE accumulated build at once; address every finding it files, then write "## DONE" only after its comprehensive PASS. (Here the normal Adversary-verified rule applies.)
+
+Begin: read the phase plan, then enter the self-paced loop.
--- a/examples/builder-adversary-deferred/prompts/kickoff.md
+++ b/examples/builder-adversary-deferred/prompts/kickoff.md
@ -0,0 +1,8 @@
+*** PHASE {phase_id} ***
+SINGLE SOURCE OF TRUTH for this phase: {plan} — read it in full now. It defines this phase's mission and its Definition of Done (DoD).
+Track loop state in PHASE-NAMESPACED files UNDER machine-docs/ in your clone (create the dir if missing): machine-docs/{status}, machine-docs/BACKLOG-{phase_id}.md, machine-docs/REVIEW-{phase_id}.md, machine-docs/JOURNAL-{phase_id}.md. machine-docs/DECISIONS.md is shared (append-only).
+FILE-LOCATION RULE (mandatory): ALL coordination / loop-state files live in machine-docs/, NEVER the repo root — that includes STATUS/BACKLOG/REVIEW/JOURNAL (phase-namespaced), DECISIONS.md, and the ADVERSARY-INBOX.md / BUILDER-INBOX.md side-channels. If you ever find one at the root, git mv it into machine-docs/.
+"Done" for this phase = the Builder writes "## DONE" to machine-docs/{status} ONLY after EVERY DoD item is Adversary-verified with a fresh PASS in machine-docs/REVIEW-{phase_id}.md (handshake below).
+Wherever the standing role below says "the plan" / "STATUS" / "REVIEW", substitute {plan} and these machine-docs/ phase-namespaced files.
+
+=== standing role & rules ===
Author	SHA1	Message	Date
mfowler	08bbb60343	fix(watchdog): stop phase-machine handoff/gate-token work after SEQUENCE-COMPLETE Gate-token tracking + handoff pings kept running on the completed phase machine, churning 0-token gate records every tick. Gate them on `not seq_done`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-24 15:38:02 +00:00
mfowler	164df87e98	fix(wake): persistent-agent wakes survive SEQUENCE-COMPLETE The watchdog gated ALL scheduled wakes behind `if not seq_done`, so once a phase sequence completed, even a persistent operator-facing supervisor stopped waking. That breaks follow-on supervision (e.g. a second build started after the first sequence finishes). Now: loop-tied wakes (on-demand auditor etc.) still quiet after completion, but persistent agents keep waking — their hourly supervision survives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-24 15:36:02 +00:00
mfowler	44bb1da1be	feat(watchdog): DONE-nudge for ceremony-lag (built-but-unmarked phase) before kill+reboot Recurring stall: a phase is substantively complete (all DoD gates PASS from both adversaries, no veto) but the builder never writes the done marker, so auto-advance cannot fire and the loops idle. A blunt stall kill+reboot does not fix it (the re-kickoffed agent just re-idles). On a stall, if the agent is a loop agent and the current phase is NOT marked done, send a one-time DONE-nudge (ping) telling it to write the done marker IF the DoD is met (both adversaries PASS, no veto), giving a fresh idle window; only escalate to the kill+reboot if it stays stalled. One nudge per phase (cleared on phase advance). Gated by [loop].done_nudge (default true); message uses the configured done_marker and the phase status file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-24 02:40:14 +00:00
mfowler	e6b53513d4	feat(wake): re-run one-shot task agents on their wake interval (autonomous cadence) wake_agent only re-prompted a live persistent/loop session and returned False for a dead one, so a "task" agent (one-shot, exits after its run) could not be re-run on a schedule — its wake never fired. Now, for kind=="task", a wake kills+restarts the task for a clean re-run (skipping only while its previous run is still active). This makes scheduled work like a coverage audit recur autonomously, no operator trigger. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-23 05:17:07 +00:00
mfowler	65ceeb3a7b	fix(watchdog): seed stall clock from pane's real last-activity, not watchdog start Stall detection tracked idle time in an in-memory _idle_since map seeded to now() on first observation, so a freshly-(re)started watchdog reset every agent's stall clock and had to wait a full stall_idle before it could nudge — an agent idle for an hour looked freshly-idle after a watchdog restart. Seed from the tmux window's last-activity timestamp (#{window_activity}) instead, so idle duration reflects the agent's real last activity regardless of when the watchdog started. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-23 04:40:34 +00:00
mfowler	57082acc05	fix(tokens): restore token_phase_flush re-baseline; drop stray block from gate_token_check The per-gate functions were inserted immediately after token_phase_flush's log line, which split the function: its trailing re-baseline block (the 'if next_phase_id is not None: ...' that re-seeds the per-phase baseline for the next phase, or finalizes when None) was orphaned onto the end of gate_token_check, where next_phase_id is undefined. The watchdog therefore crashed with NameError on the first tick of every start. Move that block back into token_phase_flush (where next_phase_id/cur/sf are in scope) and end gate_token_check at its log line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-22 07:27:50 +00:00
mfowler	188c12ad9e	feat: configurable per-gate token logging + responsive phase auto-advance Two watchdog/metrics improvements to the loop machine: 1) Token-logging granularity is configurable via [watchdog].token_granularity: 'gate' (default) or 'phase'. In 'gate' mode, tokens are attributed to each claimed gate -- any 'claim(<label>)' commit on the work repo's origin/main (e.g. claim(D1-D5), claim(feat:multi-file); a leading 'feat:' is stripped) -- in addition to the per-phase rollup, appended to token-log.jsonl tagged phase_id='<phase>:<label>'. A change in the most-recently-claimed label is the boundary; the in-flight gate is also flushed when the phase ends. 'phase' mode keeps the original per-phase-only behaviour. 2) Phase auto-advance is now evaluated on EVERY signal tick instead of only the heavy tick, so a completed phase advances within signal_interval of its '## DONE' landing rather than idling up to heavy_interval. Healing stays on the heavy cadence. Note: gate-boundary detection assumes the loop's 'claim(<label>)' commit convention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-22 05:15:08 +00:00
mfowler	98d198baa9	feat(handoff): claim_pings/review_pings accept a list — ping every reviewer Multi-reviewer setups (e.g. a correctness + a readability adversary) can now have the watchdog ping ALL reviewers on a claim, each in its own session with its own submit key. A bare string still works (single agent). _ping_agents() helper.	2026-06-22 00:24:41 +00:00
mfowler	781db071dd	docs(readme): add Examples section (Builder/Adversary variants, snakepit) + benchmark note	2026-06-16 02:35:40 +00:00
mfowler	90375f004e	docs(examples): add builder-adversary-deferred — verify after a long segment Coarsest review cadence: the Builder self-certifies the build phases and the Adversary does ONE comprehensive cold-verification of the whole accumulated build in a final `review` phase (vs orig per-phase, lean per-gate). Full original prompts + a DEFERRED REVIEW CADENCE override, so it isolates verification cadence. Cheapest coordination; the trade-off is the independent check arrives late (late rework risk + self-certification drift on build phases). README spells it out. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 00:02:44 +00:00