agent-orchestrator-benchmark

recipe-maintainers/agent-orchestrator-benchmark

Fork 0

Commit Graph

Author	SHA1	Message	Date
mfowler	37032ee363	feat: campaign mode — repeat each variant N times, aggregate distributions run-harness-bench.sh now loops VARIANTS × BENCH_REPEATS (default 5), writes each run's row to RESULTS-campaign.md.data immediately (survives interruption), and aggregates per-variant median/mean/min/max/stdev + median duration into RESULTS-campaign.md. Frees each run's repo/transcripts after tallying.	2026-06-14 22:19:10 +00:00
mfowler	11eda4a8b1	chore: gitignore the runner's transient .tmp file	2026-06-14 20:40:26 +00:00
mfowler	27df2c7b55	feat: agent-orchestrator-benchmark — prompt token comparison harness A standalone repo (engine vendored as a submodule at the examples commit) that runs a head-to-head between the builder-adversary and builder-adversary-min example variants: same task, independent headless runs, both on Sonnet, with token counts. Includes the roman-numeral test problem and run-bench.sh. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 20:20:05 +00:00

Author

SHA1

Message

Date

mfowler

37032ee363

feat: campaign mode — repeat each variant N times, aggregate distributions

run-harness-bench.sh now loops VARIANTS × BENCH_REPEATS (default 5), writes each
run's row to RESULTS-campaign.md.data immediately (survives interruption), and
aggregates per-variant median/mean/min/max/stdev + median duration into
RESULTS-campaign.md. Frees each run's repo/transcripts after tallying.

2026-06-14 22:19:10 +00:00

mfowler

11eda4a8b1

chore: gitignore the runner's transient .tmp file

2026-06-14 20:40:26 +00:00

mfowler

27df2c7b55

feat: agent-orchestrator-benchmark — prompt token comparison harness

A standalone repo (engine vendored as a submodule at the examples commit) that
runs a head-to-head between the builder-adversary and builder-adversary-min
example variants: same task, independent headless runs, both on Sonnet, with
token counts. Includes the roman-numeral test problem and run-bench.sh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-14 20:20:05 +00:00

3 Commits