Watchdog handoff signalling: ping the waiting loop on gate-claim / verdict (kill double-idle)

launch.sh watchdog now runs a fast (~30s) handoff_check alongside the heavy (300s) restart/DONE
check: when the Builder writes a CLAIMED gate it pings the Adversary to verify now; when the
Adversary updates REVIEW.md it pings the Builder to proceed (edge-triggered, reads local clones).
So a pending handoff resolves in <~30s instead of a whole idle interval. Pacing revised: the
Adversary may idle freely when nothing's pending (no pointless re-verify/busy-poll) and is woken
by the watchdog; Builder waits on the ping + a fallback ~2-4m self-poll. kickoff documents the
new "handoff signalling" role.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-27 06:15:25 +01:00
parent deca47d9c7
commit 239dfd8e26
5 changed files with 81 additions and 30 deletions

View File

@ -652,17 +652,21 @@ the *specific* thing. Three cases:
1. **Something in flight** (build/deploy/`nixos-rebuild`) → re-check on a short cadence (≈4 min) to
stay cache-warm; keep polling *it*, don't treat it as idle, and don't spin on a minutes-long build.
2. **Blocked on the *other* loop** — Builder parked at a `CLAIMED` gate awaiting the Adversary, or
Adversary waiting for the Builder to fix an `[adversary]` finding **poll on the short ≈4 min
cadence for the counterpart's response; do NOT use the long idle sleep.** A pending handoff is not
idleness — the other loop may respond any moment, and if *both* loops long-idle here you get dead
wall-clock where neither advances. (This is the common "both waiting" trap — avoid it.)
Adversary waiting for the Builder to fix an `[adversary]` finding. **You don't need to busy-poll
here: the watchdog signals across the handoff.** The moment the Builder writes a `CLAIMED` gate,
the watchdog pings the Adversary to verify *now*; the moment the Adversary updates `REVIEW.md`
(verdict/finding), it pings the Builder to proceed (`launch.sh`, ~30 s detection). So you may sleep
while blocked and trust the ping — but keep a **fallback self-poll on a modest cadence (~24 min)**
in case a ping is missed (a dead session is restarted by the watchdog and re-orients from the repo
anyway). The goal: a pending handoff resolves in well under a minute, not a whole idle interval.
3. **Genuinely idle, nothing pending from either loop** → sleep ~1015 min, then re-orient.
Corollary for the Adversary: a standing `CLAIMED` gate is immediate top-priority work (verify it now,
don't idle past it); absent a gate, run background break-it probes / re-verify stale D-gates rather
than sleeping — so the Adversary is rarely idle while the Builder is active. Corollary for the
Builder: prefer keeping an unblocked backlog item in hand so you're not fully blocked on a gate; only
hit case 2 when everything is genuinely gated behind the pending verification.
Notes: **The Adversary may idle freely when nothing is pending — it should NOT pointlessly re-verify
or busy-poll to look busy.** It gets woken by the watchdog the instant the Builder claims a gate, so
"start verifying very soon after the Builder waits" is handled by the signal, not by the Adversary
spinning. **The Builder** should prefer keeping an unblocked backlog item in hand so it's rarely
*fully* blocked on a gate; only hit case 2 when everything is genuinely gated behind the pending
verification — and then rely on the watchdog ping (+ fallback poll) rather than a long idle.
**Anti-drift guards.**
- Cap retries: if an approach fails 3× the same way, stop, write the dead-end in `DECISIONS.md`,