watchdog: fix _completed() false-positive that abandoned the run
The 2026-07-03 finish run wedged because _completed() returned True while the run
was still mid-work — so the watchdog exited early and nothing recovered the wedge.
Cause: it scanned part.get('text') across ALL message parts, so DONE_MARKER inside
a TOOL part (a subagent `task` prompt / bash command that referenced 'print
UPGRADE RUN COMPLETE') matched. Now: require the marker in the LAST assistant
TEXT (prose) message — the genuine sign-off — ignoring tool-call args and any
mid-run echo of the instruction (work after the echo disqualifies it).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WxbpH3DquKzoSTSwGvGuET
This commit is contained in:
@ -325,21 +325,26 @@ def _run_pids(sid=None):
|
|||||||
return out
|
return out
|
||||||
|
|
||||||
def _completed():
|
def _completed():
|
||||||
# The run is done only when the MODEL prints DONE_MARKER — i.e. it appears in an ASSISTANT
|
# Done only when the MODEL signs off with DONE_MARKER as its FINAL word: the marker in the LAST
|
||||||
# message's output. NOT a log grep: the kickoff/resume PROMPT (a user message) also contains
|
# assistant TEXT (prose) message. This guards THREE false-positives that each abandoned a run:
|
||||||
# the marker (it instructs the model to print it), which would false-positive.
|
# (1) the kickoff/resume PROMPT contains the marker — but that's a USER message (skipped).
|
||||||
|
# (2) the marker inside a TOOL part — a subagent `task` prompt or a bash command that echoes
|
||||||
|
# "print <marker>" — so we look ONLY at type=='text' prose, never tool-call args.
|
||||||
|
# (3) the model ECHOING the instruction mid-run ("…then I'll print <marker>") — only the FINAL
|
||||||
|
# assistant prose counts, so any further work after the echo means it's NOT done.
|
||||||
sid = _session_id()
|
sid = _session_id()
|
||||||
msgs = _server_get(f"/session/{sid}/message") if sid else None
|
msgs = _server_get(f"/session/{sid}/message") if sid else None
|
||||||
if msgs is not None:
|
if msgs is not None:
|
||||||
msgs = msgs if isinstance(msgs, list) else msgs.get("data", [])
|
msgs = msgs if isinstance(msgs, list) else msgs.get("data", [])
|
||||||
|
last_prose = None
|
||||||
for m in msgs:
|
for m in msgs:
|
||||||
if ((m.get("info") or {}).get("role")) != "assistant":
|
if ((m.get("info") or {}).get("role")) != "assistant":
|
||||||
continue
|
continue
|
||||||
for part in (m.get("parts") or []):
|
prose = "".join(p.get("text", "") for p in (m.get("parts") or [])
|
||||||
t = part.get("text")
|
if p.get("type") == "text" and isinstance(p.get("text"), str))
|
||||||
if isinstance(t, str) and DONE_MARKER in t:
|
if prose.strip():
|
||||||
return True
|
last_prose = prose
|
||||||
return False
|
return bool(last_prose and DONE_MARKER in last_prose)
|
||||||
# Server unreachable → conservative log fallback that excludes the prompt's own mention.
|
# Server unreachable → conservative log fallback that excludes the prompt's own mention.
|
||||||
try:
|
try:
|
||||||
with open(LOG_FILE, errors="ignore") as f:
|
with open(LOG_FILE, errors="ignore") as f:
|
||||||
|
|||||||
Reference in New Issue
Block a user