Coarsest review cadence: the Builder self-certifies the build phases and the
Adversary does ONE comprehensive cold-verification of the whole accumulated build
in a final `review` phase (vs orig per-phase, lean per-gate). Full original
prompts + a DEFERRED REVIEW CADENCE override, so it isolates verification cadence.
Cheapest coordination; the trade-off is the independent check arrives late (late
rework risk + self-certification drift on build phases). README spells it out.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
So that "stateless vs builder-adversary" and "lean vs stateless" isolate context
hygiene / review granularity WITHOUT the confound of the minimal prompts' reduced
testing pressure (which we found cuts ~25% of test methods). stateless = orig +
context hygiene; lean = orig + context hygiene + per-gate review. min stays the
pure minimal-prompt variant (isolates verbosity vs orig).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A single Builder that builds AND self-verifies (same DoD rigor), with NO
independent Adversary and no claim/review handoff. The control for measuring
what the AI adversary costs (its tokens, ~half of a loop-pair run) and buys
(independent cold verification vs self-certification).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When [watchdog].log_tokens (or [loop].log_tokens) is true, the watchdog records
for each phase how many tokens each agent used (and the total) and how long the
phase took, appended to <log_dir>/token-log.jsonl. Tokens are summed from each
agent's session transcript, attributed by working dir. View with `agents.py
tokens`. Baseline snapshot at phase start + delta at phase advance/complete;
robust across watchdog restarts. Validated: the transcript sum matches an
independent external collector exactly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Isolates the two effects conflated in builder-adversary-stateless: keeps all the
CONTEXT HYGIENE (compact/diffs/lean loads) but ENFORCES full per-gate review
granularity (one claim per gate, one independent verdict per gate, no batching).
Tests whether the token saving is real efficiency vs reduced scrutiny.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Same pattern + AI-as-adversary verification as builder-adversary-min, but the
role prompts add CONTEXT HYGIENE: /compact at every checkpoint (lossless — state
is on disk), read diffs not trees, spill bulk output to files, adversary loads
only {plan, STATUS, diff}. Loop agents non-resumed → fresh session per phase.
Targets cache-read (the dominant cost in a long loop) without changing what the
agents do or how they verify.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Same topology/behaviour as builder-adversary (loop pair, phase machine,
claim()/review() handoff, machine-docs coordination, cold verification) but the
role + kickoff prompts are compressed to minimal tokens, keeping every
load-bearing rule. Config and plans are unchanged. The separate
agent-orchestrator-benchmark repo runs a head-to-head token comparison.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A sketch backlog of further examples, each teaching a distinct orchestration
topology (anthill/stigmergy, kitchen line/pipeline, incident room/blackboard,
senate/debate, baton/mutex+failover, immune system/reactive, evolution chamber,
plus ATC and day-night extras). Not implemented — ideas only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Based on @ponder.ooo's "snake pit agent orchestrator" idea (bsky 2026-05-28) and
Claude's metaphor-mapping elaboration: agents are snakes, tasks are food tossed
into a shared pit; snakes devour/digest/regurgitate/excrete.
A worker-pool-over-a-shared-queue topology (contrast the builder-adversary phase
machine):
- pit/ is a filesystem queue; snakes claim by atomic mv (no two eat the same food)
- species = specialized agents: keeper (zookeeper), planner (regurgitation IS
task decomposition), snake-1..3 (worker pool), cleanup (scavenger + coprophagy)
- no [loop] phase machine; persistent agents self-pace via /loop
- README carries the full bio→compute mapping table from the thread image
Verified: `agents.py status --config agents.toml` lists all 6 agents + service.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A self-contained examples/builder-adversary/ that distills the cc-ci production
loop pair into a tiny, fully-local task (build a `wc` CLI in two phases):
- agents.toml: builder + adversary loops, persistent orchestrator, on_complete
reporter, cleanlogs service; phase machine with a per-phase model override
- prompts/: kickoff template + builder/adversary roles carrying the load-bearing
protocol (claim()/review() handoff, machine-docs file-location rule,
WHAT+HOW+EXPECTED+WHERE=STATUS / WHY=JOURNAL anti-anchoring, WAITING-UNTIL liveness)
- plans/: two phase plans (wc, json) each with a cold-verifiable Definition of Done
- README: how to run, the work-repo two-clone isolation model, how to adapt
Verified: `agents.py status --config agents.toml` parses and lists all agents.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Unit tests (no agents/tmux): config load + defaults merge, kickoff-template
assembly, phase machine (advance/idempotent-complete/append-resumes), limit
reset-banner parsing, WAITING-UNTIL/stall parsing, claude+opencode activity
detectors. Live smokes bring a throwaway project up THROUGH agents.py on each
real backend in an isolated sandbox (unique prefix, opencode on a non-4096
port), verify attach + status + down, and clean up. tests/run.sh runs unit
always + smokes when backends present; README documents it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>