agent-orchestrator

Author	SHA1	Message	Date
mfowler	57082acc05	fix(tokens): restore token_phase_flush re-baseline; drop stray block from gate_token_check The per-gate functions were inserted immediately after token_phase_flush's log line, which split the function: its trailing re-baseline block (the 'if next_phase_id is not None: ...' that re-seeds the per-phase baseline for the next phase, or finalizes when None) was orphaned onto the end of gate_token_check, where next_phase_id is undefined. The watchdog therefore crashed with NameError on the first tick of every start. Move that block back into token_phase_flush (where next_phase_id/cur/sf are in scope) and end gate_token_check at its log line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-22 07:27:50 +00:00
mfowler	188c12ad9e	feat: configurable per-gate token logging + responsive phase auto-advance Two watchdog/metrics improvements to the loop machine: 1) Token-logging granularity is configurable via [watchdog].token_granularity: 'gate' (default) or 'phase'. In 'gate' mode, tokens are attributed to each claimed gate -- any 'claim(<label>)' commit on the work repo's origin/main (e.g. claim(D1-D5), claim(feat:multi-file); a leading 'feat:' is stripped) -- in addition to the per-phase rollup, appended to token-log.jsonl tagged phase_id='<phase>:<label>'. A change in the most-recently-claimed label is the boundary; the in-flight gate is also flushed when the phase ends. 'phase' mode keeps the original per-phase-only behaviour. 2) Phase auto-advance is now evaluated on EVERY signal tick instead of only the heavy tick, so a completed phase advances within signal_interval of its '## DONE' landing rather than idling up to heavy_interval. Healing stays on the heavy cadence. Note: gate-boundary detection assumes the loop's 'claim(<label>)' commit convention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UWTdUq2bsic7JZGqJp3nD6	2026-06-22 05:15:08 +00:00
mfowler	98d198baa9	feat(handoff): claim_pings/review_pings accept a list — ping every reviewer Multi-reviewer setups (e.g. a correctness + a readability adversary) can now have the watchdog ping ALL reviewers on a claim, each in its own session with its own submit key. A bare string still works (single agent). _ping_agents() helper.	2026-06-22 00:24:41 +00:00
mfowler	781db071dd	docs(readme): add Examples section (Builder/Adversary variants, snakepit) + benchmark note	2026-06-16 02:35:40 +00:00
mfowler	90375f004e	docs(examples): add builder-adversary-deferred — verify after a long segment Coarsest review cadence: the Builder self-certifies the build phases and the Adversary does ONE comprehensive cold-verification of the whole accumulated build in a final `review` phase (vs orig per-phase, lean per-gate). Full original prompts + a DEFERRED REVIEW CADENCE override, so it isolates verification cadence. Cheapest coordination; the trade-off is the independent check arrives late (late rework risk + self-certification drift on build phases). README spells it out. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 00:02:44 +00:00
mfowler	c6c7ce8640	change: base stateless + lean on the FULL original prompts (not minimal) So that "stateless vs builder-adversary" and "lean vs stateless" isolate context hygiene / review granularity WITHOUT the confound of the minimal prompts' reduced testing pressure (which we found cuts ~25% of test methods). stateless = orig + context hygiene; lean = orig + context hygiene + per-gate review. min stays the pure minimal-prompt variant (isolates verbosity vs orig). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 03:17:47 +00:00
mfowler	a0f7652e9e	docs(examples): add builder-solo — single builder, no adversary (control) A single Builder that builds AND self-verifies (same DoD rigor), with NO independent Adversary and no claim/review handoff. The control for measuring what the AI adversary costs (its tokens, ~half of a loop-pair run) and buys (independent cold verification vs self-certification). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 02:34:50 +00:00
mfowler	924874aafa	feat: optional log_tokens — per-phase token + time accounting When [watchdog].log_tokens (or [loop].log_tokens) is true, the watchdog records for each phase how many tokens each agent used (and the total) and how long the phase took, appended to <log_dir>/token-log.jsonl. Tokens are summed from each agent's session transcript, attributed by working dir. View with `agents.py tokens`. Baseline snapshot at phase start + delta at phase advance/complete; robust across watchdog restarts. Validated: the transcript sum matches an independent external collector exactly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 21:48:17 +00:00
mfowler	e0425e6108	docs(examples): add builder-adversary-lean — context hygiene + per-gate review Isolates the two effects conflated in builder-adversary-stateless: keeps all the CONTEXT HYGIENE (compact/diffs/lean loads) but ENFORCES full per-gate review granularity (one claim per gate, one independent verdict per gate, no batching). Tests whether the token saving is real efficiency vs reduced scrutiny. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 21:42:12 +00:00
mfowler	985d33dd51	docs(examples): add builder-adversary-stateless — context-lean variant Same pattern + AI-as-adversary verification as builder-adversary-min, but the role prompts add CONTEXT HYGIENE: /compact at every checkpoint (lossless — state is on disk), read diffs not trees, spill bulk output to files, adversary loads only {plan, STATUS, diff}. Loop agents non-resumed → fresh session per phase. Targets cache-read (the dominant cost in a long loop) without changing what the agents do or how they verify. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 20:47:58 +00:00
mfowler	737ef81066	docs(examples): add builder-adversary-min — minimal-prompt variant Same topology/behaviour as builder-adversary (loop pair, phase machine, claim()/review() handoff, machine-docs coordination, cold verification) but the role + kickoff prompts are compressed to minimal tokens, keeping every load-bearing rule. Config and plans are unchanged. The separate agent-orchestrator-benchmark repo runs a head-to-head token comparison. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 20:18:33 +00:00
mfowler	11843f41a4	docs(examples): add IDEAS.md — backlog of creative example topologies A sketch backlog of further examples, each teaching a distinct orchestration topology (anthill/stigmergy, kitchen line/pipeline, incident room/blackboard, senate/debate, baton/mutex+failover, immune system/reactive, evolution chamber, plus ATC and day-night extras). Not implemented — ideas only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 18:13:48 +00:00
mfowler	e4453dcfdd	docs(examples): add the "snake pit" worker-pool example Based on @ponder.ooo's "snake pit agent orchestrator" idea (bsky 2026-05-28) and Claude's metaphor-mapping elaboration: agents are snakes, tasks are food tossed into a shared pit; snakes devour/digest/regurgitate/excrete. A worker-pool-over-a-shared-queue topology (contrast the builder-adversary phase machine): - pit/ is a filesystem queue; snakes claim by atomic mv (no two eat the same food) - species = specialized agents: keeper (zookeeper), planner (regurgitation IS task decomposition), snake-1..3 (worker pool), cleanup (scavenger + coprophagy) - no [loop] phase machine; persistent agents self-pace via /loop - README carries the full bio→compute mapping table from the thread image Verified: `agents.py status --config agents.toml` lists all 6 agents + service. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 17:50:42 +00:00
mfowler	7f237a522c	docs(examples): add a Builder/Adversary loop-pair example (the cc-ci pattern) A self-contained examples/builder-adversary/ that distills the cc-ci production loop pair into a tiny, fully-local task (build a `wc` CLI in two phases): - agents.toml: builder + adversary loops, persistent orchestrator, on_complete reporter, cleanlogs service; phase machine with a per-phase model override - prompts/: kickoff template + builder/adversary roles carrying the load-bearing protocol (claim()/review() handoff, machine-docs file-location rule, WHAT+HOW+EXPECTED+WHERE=STATUS / WHY=JOURNAL anti-anchoring, WAITING-UNTIL liveness) - plans/: two phase plans (wc, json) each with a cold-verifiable Definition of Done - README: how to run, the work-repo two-clone isolation model, how to adapt Verified: `agents.py status --config agents.toml` parses and lists all agents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 17:50:42 +00:00
autonomic-bot	cdcece9a9a	test: add tests/ — unit suite + isolated live claude/opencode smokes + runner Unit tests (no agents/tmux): config load + defaults merge, kickoff-template assembly, phase machine (advance/idempotent-complete/append-resumes), limit reset-banner parsing, WAITING-UNTIL/stall parsing, claude+opencode activity detectors. Live smokes bring a throwaway project up THROUGH agents.py on each real backend in an isolated sandbox (unique prefix, opencode on a non-4096 port), verify attach + status + down, and clean up. tests/run.sh runs unit always + smokes when backends present; README documents it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 18:55:34 +00:00
autonomic-bot	289ef07df4	feat: agent-orchestrator v0.1.0 — generic multi-agent harness Extracted and generalized from a project-specific agent launch engine. No project specifics remain in code: paths, the loop kickoff preamble, handoff conventions, and the on-complete hook are all config/template driven; session_prefix + log_dir are required. - agents.py: driver + watchdog (data-driven backends via prompt_delivery arg\|ping\|exec; required session_prefix/log_dir; project-rooted path resolution; configurable kickoff template, handoff patterns, on_complete task; tmux-safe; selftest + init verbs) - agent-log.py: config-driven claude transcript renderer - agents.example.toml: self-contained 2-agent example (dependency-free demo backend) - prompts/: generic builder/adversary/kickoff templates - smoke.sh: isolated up+down sandbox proof that cleans up after itself - flake.nix/.lock: devShell (python311 + tmux + git) - README.md: schema + verbs + AI-PO usage + nix Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> v0.1.0	2026-06-13 18:39:00 +00:00

16 Commits