When [watchdog].log_tokens (or [loop].log_tokens) is true, the watchdog records
for each phase how many tokens each agent used (and the total) and how long the
phase took, appended to <log_dir>/token-log.jsonl. Tokens are summed from each
agent's session transcript, attributed by working dir. View with `agents.py
tokens`. Baseline snapshot at phase start + delta at phase advance/complete;
robust across watchdog restarts. Validated: the transcript sum matches an
independent external collector exactly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Unit tests (no agents/tmux): config load + defaults merge, kickoff-template
assembly, phase machine (advance/idempotent-complete/append-resumes), limit
reset-banner parsing, WAITING-UNTIL/stall parsing, claude+opencode activity
detectors. Live smokes bring a throwaway project up THROUGH agents.py on each
real backend in an isolated sandbox (unique prefix, opencode on a non-4096
port), verify attach + status + down, and clean up. tests/run.sh runs unit
always + smokes when backends present; README documents it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>