watchdog: fix _completed() false-positive that abandoned the run
The 2026-07-03 finish run wedged because _completed() returned True while the run
was still mid-work — so the watchdog exited early and nothing recovered the wedge.
Cause: it scanned part.get('text') across ALL message parts, so DONE_MARKER inside
a TOOL part (a subagent `task` prompt / bash command that referenced 'print
UPGRADE RUN COMPLETE') matched. Now: require the marker in the LAST assistant
TEXT (prose) message — the genuine sign-off — ignoring tool-call args and any
mid-run echo of the instruction (work after the echo disqualifies it).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WxbpH3DquKzoSTSwGvGuET
This commit is contained in:
@ -325,21 +325,26 @@ def _run_pids(sid=None):
|
||||
return out
|
||||
|
||||
def _completed():
|
||||
# The run is done only when the MODEL prints DONE_MARKER — i.e. it appears in an ASSISTANT
|
||||
# message's output. NOT a log grep: the kickoff/resume PROMPT (a user message) also contains
|
||||
# the marker (it instructs the model to print it), which would false-positive.
|
||||
# Done only when the MODEL signs off with DONE_MARKER as its FINAL word: the marker in the LAST
|
||||
# assistant TEXT (prose) message. This guards THREE false-positives that each abandoned a run:
|
||||
# (1) the kickoff/resume PROMPT contains the marker — but that's a USER message (skipped).
|
||||
# (2) the marker inside a TOOL part — a subagent `task` prompt or a bash command that echoes
|
||||
# "print <marker>" — so we look ONLY at type=='text' prose, never tool-call args.
|
||||
# (3) the model ECHOING the instruction mid-run ("…then I'll print <marker>") — only the FINAL
|
||||
# assistant prose counts, so any further work after the echo means it's NOT done.
|
||||
sid = _session_id()
|
||||
msgs = _server_get(f"/session/{sid}/message") if sid else None
|
||||
if msgs is not None:
|
||||
msgs = msgs if isinstance(msgs, list) else msgs.get("data", [])
|
||||
last_prose = None
|
||||
for m in msgs:
|
||||
if ((m.get("info") or {}).get("role")) != "assistant":
|
||||
continue
|
||||
for part in (m.get("parts") or []):
|
||||
t = part.get("text")
|
||||
if isinstance(t, str) and DONE_MARKER in t:
|
||||
return True
|
||||
return False
|
||||
prose = "".join(p.get("text", "") for p in (m.get("parts") or [])
|
||||
if p.get("type") == "text" and isinstance(p.get("text"), str))
|
||||
if prose.strip():
|
||||
last_prose = prose
|
||||
return bool(last_prose and DONE_MARKER in last_prose)
|
||||
# Server unreachable → conservative log fallback that excludes the prompt's own mention.
|
||||
try:
|
||||
with open(LOG_FILE, errors="ignore") as f:
|
||||
|
||||
Reference in New Issue
Block a user