agent-orchestrator

Author	SHA1	Message	Date
mfowler	c591f5430d	tests: cover build-aware stall detection 13 tests across 4 classes: the build-proc match set (build tools match; python/node/bash/claude and substring look-alikes don't), _proc_descendants (real child tree, roots excluded), _build_running (session-scoped, comm-matched, never the claude root, custom regex override), and stall_check_one's defer/reboot/hard-cap behavior. Full suite 64 pass.	2026-07-08 16:53:08 +00:00
mfowler	ce66948245	watchdog: build-aware stall detection — defer reboot while a real build/test runs (scoped to the watched session's child procs), with stall_idle_max hard cap A silent pane whose claude session has a live descendant compile/coverage/test process (cargo, rustc, cc1, llvm-cov, lichen-server, chromium, …) is a running build, not a stall. _build_running() inspects ONLY the descendants of that session's pane_pid (never the claude root, whose args embed the prompt), matching process comm. Defers the kill+reboot until the build finishes, but never past stall_idle_max (default 1800s) so a hung build still recovers. Configurable via build_procs_re / stall_idle_max.	2026-07-08 16:42:57 +00:00
mfowler	071d74f21f	feat(tools): tangled_pr.py — file a Tangled PR via the appview OAuth web endpoint Reusable utility (not project-specific): tangled's appview only indexes pulls created through its OAuth-authenticated web endpoint POST /{owner}/{repo}/pulls/ (it fetches the knot patch + inserts into its DB directly); raw com.atproto.repo.createRecord does NOT index. This tool reuses a browser session cookie (gitignored .tangled-session) to POST target/source branch names. No blob upload, no CSRF. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-07-08 05:10:38 +00:00
mfowler	997f73af1f	fix(watchdog): don't treat idle footer's '· N shells' as active (default active_re) The default active_re matched a bare middle-dot+number, which also matches the TUI idle footer's '· 3 shells' / '· 1 shell still running'. Any background shell then read as ACTIVE, masking a genuine idle from the stall detector — the agent hung at an empty/stranded prompt and never rebooted. Drop the timer token; genuine activity is 'esc to interrupt'/'Running tool' plus the log-recently-touched grace. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-07-07 20:02:05 +00:00
mfowler	234d6a054e	fix(pipeline): PIPELINE-COMPLETE mirrors live stage list, not a write-once latch pipeline_check wrote PIPELINE-COMPLETE the first tick all-then-configured stage markers existed, and never cleared it. Appending stages to an already-completed pipeline (as the orchestrator did — adding rust-linecov-e2e then two -realpds stages after the fork sequence finished) left a stale 'done' sentinel: the new stages ran, but the file lied that the pipeline was complete, misleading operators reading state. Now recompute the sentinel every tick from ALL current stages' markers: write it only when every marker exists, and CLEAR a stale one the moment any stage is incomplete — so appending stages self-corrects. Gated on the markers directly, not 'active is None' (which is also None when a stage names a missing agent). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-07-07 19:25:12 +00:00
mfowler	08fd58ccc0	feat(pipeline): push-on-retire — no stranded commits when a stage completes The pipeline retires a stage the instant its completion marker appears; if the agent wrote the marker before its final 'git push' landed, its last commits were stranded locally (observed: a review stage left 7 unpushed commits). Now, before retiring a COMPLETED stage, the watchdog does a best-effort 'git push origin HEAD:main' in that stage's dir (never raises; 90s timeout). Guarantees the deliverable reaches the remote. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-07-04 17:47:23 +00:00
mfowler	b739504e1f	feat(watchdog): [pipeline] — sequential standalone-agent phases via completion markers Adds a [pipeline] block: an ordered sequence of DISTINCT agents (different prompt/model/dir) run one at a time, advancing when each writes its completion marker — the standalone-agent analog of the [loop] phase machine (which drives FIXED loop agents via ## DONE). The watchdog reconciles it every tick, stateless (markers are the source of truth): exactly the first not-yet-complete stage runs, earlier stages are retired, the active stage is stall/heal-watched. Pipeline agents are enabled=false (the pipeline owns their lifecycle). Also fixes a latent watchdog crash: wake_elapsed is built once at startup but config is re-read each tick, so removing an agent's 'wake' mid-run (e.g. winding down a wake) hit agent['wake'] -> KeyError and killed the whole watchdog silently (stalling phase advancement + stall recovery). Now skips agents whose wake was removed. IDEAS.md: note that [pipeline] and [loop].phases are the same shape and should be unified via an optional per-phase agent/dir/done (fold pipeline into the phase machine, delete the parallel path). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-07-04 13:45:47 +00:00
mfowler	08bbb60343	fix(watchdog): stop phase-machine handoff/gate-token work after SEQUENCE-COMPLETE Gate-token tracking + handoff pings kept running on the completed phase machine, churning 0-token gate records every tick. Gate them on `not seq_done`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-24 15:38:02 +00:00
mfowler	164df87e98	fix(wake): persistent-agent wakes survive SEQUENCE-COMPLETE The watchdog gated ALL scheduled wakes behind `if not seq_done`, so once a phase sequence completed, even a persistent operator-facing supervisor stopped waking. That breaks follow-on supervision (e.g. a second build started after the first sequence finishes). Now: loop-tied wakes (on-demand auditor etc.) still quiet after completion, but persistent agents keep waking — their hourly supervision survives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-24 15:36:02 +00:00
mfowler	44bb1da1be	feat(watchdog): DONE-nudge for ceremony-lag (built-but-unmarked phase) before kill+reboot Recurring stall: a phase is substantively complete (all DoD gates PASS from both adversaries, no veto) but the builder never writes the done marker, so auto-advance cannot fire and the loops idle. A blunt stall kill+reboot does not fix it (the re-kickoffed agent just re-idles). On a stall, if the agent is a loop agent and the current phase is NOT marked done, send a one-time DONE-nudge (ping) telling it to write the done marker IF the DoD is met (both adversaries PASS, no veto), giving a fresh idle window; only escalate to the kill+reboot if it stays stalled. One nudge per phase (cleared on phase advance). Gated by [loop].done_nudge (default true); message uses the configured done_marker and the phase status file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-24 02:40:14 +00:00
mfowler	e6b53513d4	feat(wake): re-run one-shot task agents on their wake interval (autonomous cadence) wake_agent only re-prompted a live persistent/loop session and returned False for a dead one, so a "task" agent (one-shot, exits after its run) could not be re-run on a schedule — its wake never fired. Now, for kind=="task", a wake kills+restarts the task for a clean re-run (skipping only while its previous run is still active). This makes scheduled work like a coverage audit recur autonomously, no operator trigger. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-23 05:17:07 +00:00
mfowler	65ceeb3a7b	fix(watchdog): seed stall clock from pane's real last-activity, not watchdog start Stall detection tracked idle time in an in-memory _idle_since map seeded to now() on first observation, so a freshly-(re)started watchdog reset every agent's stall clock and had to wait a full stall_idle before it could nudge — an agent idle for an hour looked freshly-idle after a watchdog restart. Seed from the tmux window's last-activity timestamp (#{window_activity}) instead, so idle duration reflects the agent's real last activity regardless of when the watchdog started. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-23 04:40:34 +00:00
mfowler	57082acc05	fix(tokens): restore token_phase_flush re-baseline; drop stray block from gate_token_check The per-gate functions were inserted immediately after token_phase_flush's log line, which split the function: its trailing re-baseline block (the 'if next_phase_id is not None: ...' that re-seeds the per-phase baseline for the next phase, or finalizes when None) was orphaned onto the end of gate_token_check, where next_phase_id is undefined. The watchdog therefore crashed with NameError on the first tick of every start. Move that block back into token_phase_flush (where next_phase_id/cur/sf are in scope) and end gate_token_check at its log line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-22 07:27:50 +00:00
mfowler	188c12ad9e	feat: configurable per-gate token logging + responsive phase auto-advance Two watchdog/metrics improvements to the loop machine: 1) Token-logging granularity is configurable via [watchdog].token_granularity: 'gate' (default) or 'phase'. In 'gate' mode, tokens are attributed to each claimed gate -- any 'claim(<label>)' commit on the work repo's origin/main (e.g. claim(D1-D5), claim(feat:multi-file); a leading 'feat:' is stripped) -- in addition to the per-phase rollup, appended to token-log.jsonl tagged phase_id='<phase>:<label>'. A change in the most-recently-claimed label is the boundary; the in-flight gate is also flushed when the phase ends. 'phase' mode keeps the original per-phase-only behaviour. 2) Phase auto-advance is now evaluated on EVERY signal tick instead of only the heavy tick, so a completed phase advances within signal_interval of its '## DONE' landing rather than idling up to heavy_interval. Healing stays on the heavy cadence. Note: gate-boundary detection assumes the loop's 'claim(<label>)' commit convention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-22 05:15:08 +00:00
mfowler	98d198baa9	feat(handoff): claim_pings/review_pings accept a list — ping every reviewer Multi-reviewer setups (e.g. a correctness + a readability adversary) can now have the watchdog ping ALL reviewers on a claim, each in its own session with its own submit key. A bare string still works (single agent). _ping_agents() helper.	2026-06-22 00:24:41 +00:00
mfowler	781db071dd	docs(readme): add Examples section (Builder/Adversary variants, snakepit) + benchmark note	2026-06-16 02:35:40 +00:00
mfowler	90375f004e	docs(examples): add builder-adversary-deferred — verify after a long segment Coarsest review cadence: the Builder self-certifies the build phases and the Adversary does ONE comprehensive cold-verification of the whole accumulated build in a final `review` phase (vs orig per-phase, lean per-gate). Full original prompts + a DEFERRED REVIEW CADENCE override, so it isolates verification cadence. Cheapest coordination; the trade-off is the independent check arrives late (late rework risk + self-certification drift on build phases). README spells it out. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 00:02:44 +00:00
mfowler	c6c7ce8640	change: base stateless + lean on the FULL original prompts (not minimal) So that "stateless vs builder-adversary" and "lean vs stateless" isolate context hygiene / review granularity WITHOUT the confound of the minimal prompts' reduced testing pressure (which we found cuts ~25% of test methods). stateless = orig + context hygiene; lean = orig + context hygiene + per-gate review. min stays the pure minimal-prompt variant (isolates verbosity vs orig). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 03:17:47 +00:00
mfowler	a0f7652e9e	docs(examples): add builder-solo — single builder, no adversary (control) A single Builder that builds AND self-verifies (same DoD rigor), with NO independent Adversary and no claim/review handoff. The control for measuring what the AI adversary costs (its tokens, ~half of a loop-pair run) and buys (independent cold verification vs self-certification). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 02:34:50 +00:00
mfowler	924874aafa	feat: optional log_tokens — per-phase token + time accounting When [watchdog].log_tokens (or [loop].log_tokens) is true, the watchdog records for each phase how many tokens each agent used (and the total) and how long the phase took, appended to <log_dir>/token-log.jsonl. Tokens are summed from each agent's session transcript, attributed by working dir. View with `agents.py tokens`. Baseline snapshot at phase start + delta at phase advance/complete; robust across watchdog restarts. Validated: the transcript sum matches an independent external collector exactly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 21:48:17 +00:00
mfowler	e0425e6108	docs(examples): add builder-adversary-lean — context hygiene + per-gate review Isolates the two effects conflated in builder-adversary-stateless: keeps all the CONTEXT HYGIENE (compact/diffs/lean loads) but ENFORCES full per-gate review granularity (one claim per gate, one independent verdict per gate, no batching). Tests whether the token saving is real efficiency vs reduced scrutiny. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 21:42:12 +00:00
mfowler	985d33dd51	docs(examples): add builder-adversary-stateless — context-lean variant Same pattern + AI-as-adversary verification as builder-adversary-min, but the role prompts add CONTEXT HYGIENE: /compact at every checkpoint (lossless — state is on disk), read diffs not trees, spill bulk output to files, adversary loads only {plan, STATUS, diff}. Loop agents non-resumed → fresh session per phase. Targets cache-read (the dominant cost in a long loop) without changing what the agents do or how they verify. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 20:47:58 +00:00
mfowler	737ef81066	docs(examples): add builder-adversary-min — minimal-prompt variant Same topology/behaviour as builder-adversary (loop pair, phase machine, claim()/review() handoff, machine-docs coordination, cold verification) but the role + kickoff prompts are compressed to minimal tokens, keeping every load-bearing rule. Config and plans are unchanged. The separate agent-orchestrator-benchmark repo runs a head-to-head token comparison. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 20:18:33 +00:00
mfowler	11843f41a4	docs(examples): add IDEAS.md — backlog of creative example topologies A sketch backlog of further examples, each teaching a distinct orchestration topology (anthill/stigmergy, kitchen line/pipeline, incident room/blackboard, senate/debate, baton/mutex+failover, immune system/reactive, evolution chamber, plus ATC and day-night extras). Not implemented — ideas only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 18:13:48 +00:00
mfowler	e4453dcfdd	docs(examples): add the "snake pit" worker-pool example Based on @ponder.ooo's "snake pit agent orchestrator" idea (bsky 2026-05-28) and Claude's metaphor-mapping elaboration: agents are snakes, tasks are food tossed into a shared pit; snakes devour/digest/regurgitate/excrete. A worker-pool-over-a-shared-queue topology (contrast the builder-adversary phase machine): - pit/ is a filesystem queue; snakes claim by atomic mv (no two eat the same food) - species = specialized agents: keeper (zookeeper), planner (regurgitation IS task decomposition), snake-1..3 (worker pool), cleanup (scavenger + coprophagy) - no [loop] phase machine; persistent agents self-pace via /loop - README carries the full bio→compute mapping table from the thread image Verified: `agents.py status --config agents.toml` lists all 6 agents + service. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 17:50:42 +00:00
mfowler	7f237a522c	docs(examples): add a Builder/Adversary loop-pair example (the cc-ci pattern) A self-contained examples/builder-adversary/ that distills the cc-ci production loop pair into a tiny, fully-local task (build a `wc` CLI in two phases): - agents.toml: builder + adversary loops, persistent orchestrator, on_complete reporter, cleanlogs service; phase machine with a per-phase model override - prompts/: kickoff template + builder/adversary roles carrying the load-bearing protocol (claim()/review() handoff, machine-docs file-location rule, WHAT+HOW+EXPECTED+WHERE=STATUS / WHY=JOURNAL anti-anchoring, WAITING-UNTIL liveness) - plans/: two phase plans (wc, json) each with a cold-verifiable Definition of Done - README: how to run, the work-repo two-clone isolation model, how to adapt Verified: `agents.py status --config agents.toml` parses and lists all agents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 17:50:42 +00:00
autonomic-bot	cdcece9a9a	test: add tests/ — unit suite + isolated live claude/opencode smokes + runner Unit tests (no agents/tmux): config load + defaults merge, kickoff-template assembly, phase machine (advance/idempotent-complete/append-resumes), limit reset-banner parsing, WAITING-UNTIL/stall parsing, claude+opencode activity detectors. Live smokes bring a throwaway project up THROUGH agents.py on each real backend in an isolated sandbox (unique prefix, opencode on a non-4096 port), verify attach + status + down, and clean up. tests/run.sh runs unit always + smokes when backends present; README documents it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 18:55:34 +00:00
autonomic-bot	289ef07df4	feat: agent-orchestrator v0.1.0 — generic multi-agent harness Extracted and generalized from a project-specific agent launch engine. No project specifics remain in code: paths, the loop kickoff preamble, handoff conventions, and the on-complete hook are all config/template driven; session_prefix + log_dir are required. - agents.py: driver + watchdog (data-driven backends via prompt_delivery arg\|ping\|exec; required session_prefix/log_dir; project-rooted path resolution; configurable kickoff template, handoff patterns, on_complete task; tmux-safe; selftest + init verbs) - agent-log.py: config-driven claude transcript renderer - agents.example.toml: self-contained 2-agent example (dependency-free demo backend) - prompts/: generic builder/adversary/kickoff templates - smoke.sh: isolated up+down sandbox proof that cleans up after itself - flake.nix/.lock: devShell (python311 + tmux + git) - README.md: schema + verbs + AI-PO usage + nix Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> v0.1.0	2026-06-13 18:39:00 +00:00

28 Commits