feat: optional log_tokens — per-phase token + time accounting

When [watchdog].log_tokens (or [loop].log_tokens) is true, the watchdog records for each phase how many tokens each agent used (and the total) and how long the phase took, appended to <log_dir>/token-log.jsonl. Tokens are summed from each agent's session transcript, attributed by working dir. View with `agents.py tokens`. Baseline snapshot at phase start + delta at phase advance/complete; robust across watchdog restarts. Validated: the transcript sum matches an independent external collector exactly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 21:48:17 +00:00
parent e0425e6108
commit 924874aafa
2 changed files with 149 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -63,6 +63,23 @@ heavy_interval       = 300   # seconds between heal + phase-advance checks
 limit_probe_fallback = 300   # re-probe cadence for a usage-limited agent when reset time is unparsable
 limit_reset_slack    = 45    # seconds to wait past a parsed reset before probing
 stall_grace          = 180   # seconds of slack past a WAITING-UNTIL marker before a stall reboot
+log_tokens           = false # opt-in: record per-phase token + time usage (see below)
+```
+
+**Per-phase token + time logging (`log_tokens`).** Set `log_tokens = true` (under `[watchdog]` or
+`[loop]`) and the watchdog records, for **each phase**, how many tokens **each agent** used and how
+long the phase took — appended as one JSON object per phase to `<log_dir>/token-log.jsonl`. Tokens
+are summed from each agent's Claude Code session transcript and attributed **by working dir**, so
+give each agent its own `dir` (the Builder/Adversary loop pair already uses separate clones) for
+accurate per-agent numbers. The watchdog snapshots a baseline when a phase starts and writes the
+delta (per agent, and the total) when the phase advances or the sequence completes — robust across
+watchdog restarts. Pretty-print it with `agents.py tokens`:
+
+```
+phase        dur(s)   builder adversary         TOTAL
+-----------------------------------------------------
+lex           372.0 3,910,118 3,221,447     7,131,565
+parse         410.5 ...
 ```

 ### `[defaults]` — inherited by every agent
@ -240,6 +257,7 @@ agents.py status                   table of every agent: kind, backend, model, w
 agents.py watchdog                 the supervisor loop (what the <prefix>watchdog session runs)
 agents.py logs <name>              tail that session's log
 agents.py phase [show|next|set N]  inspect / move the loop phase index
+agents.py tokens                   per-phase token + time report (when [watchdog].log_tokens = true)
 agents.py selftest                 regression-test the backend activity detector (needs no config)
 agents.py init [dir]               scaffold a starter agents.toml + prompts/ in a project dir
  --config PATH                    use a specific config (default: ./agents.toml)