plan §7: recommend Monitor-on-convergence pattern for long deploys (builder's idea)

For a long deploy/convergence, arm a Monitor that polls the node every ~30s and
wakes on convergence OR failure, with a longer fallback heartbeat (ScheduleWakeup)
as a backstop. Proceeds the instant it converges (no over-waiting), surfaces
failures promptly, and the heartbeat bounds the wait. Size the timeout sanely
(longer if justified, never absurd like the ~40-min ghost case). Credit: builder.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-05-30 05:17:18 +01:00
parent e85e16318c
commit a89b082240

View File

@ -716,6 +716,14 @@ the *specific* thing. Three cases:
min e2e gets 5 short cache-warm polls, not one 25-min cache-cold blackout. The wakeup that wakes
you mid-task is *cheap* (one cache hit, one quick status check); the value of catching a deploy
that died at minute 4 of a 25-min budget is large. Keep polling *it*, don't treat it as idle.
- **Recommended pattern for long deploys/convergence (builder, 2026-05-30):** **arm a `Monitor`**
that polls the node every ~30s and **wakes you on convergence OR failure**, with a **longer
fallback heartbeat** (`ScheduleWakeup`) as a backstop if the Monitor never fires. This proceeds
the *instant* the deploy converges (no over-waiting if it finishes early) and surfaces a failure
promptly, while the heartbeat bounds the wait if the condition is never met. Size the convergence
timeout sanely — longer than a few minutes if a recipe genuinely needs it, but **never absurd**
(e.g. the ~40-min ghost timeout was excessive). Beats both a single big blind sleep and a fixed
coarse poll.
2. **Blocked on the *other* loop** — Builder parked at a `CLAIMED` gate awaiting the Adversary, or
Adversary waiting for the Builder to fix an `[adversary]` finding. **You don't need to busy-poll
here: the watchdog signals across the handoff.** The moment the Builder writes a `CLAIMED` gate,