Commit Graph

3 Commits

Author SHA1 Message Date
37032ee363 feat: campaign mode — repeat each variant N times, aggregate distributions
run-harness-bench.sh now loops VARIANTS × BENCH_REPEATS (default 5), writes each
run's row to RESULTS-campaign.md.data immediately (survives interruption), and
aggregates per-variant median/mean/min/max/stdev + median duration into
RESULTS-campaign.md. Frees each run's repo/transcripts after tallying.
2026-06-14 22:19:10 +00:00
11eda4a8b1 chore: gitignore the runner's transient .tmp file 2026-06-14 20:40:26 +00:00
27df2c7b55 feat: agent-orchestrator-benchmark — prompt token comparison harness
A standalone repo (engine vendored as a submodule at the examples commit) that
runs a head-to-head between the builder-adversary and builder-adversary-min
example variants: same task, independent headless runs, both on Sonnet, with
token counts. Includes the roman-numeral test problem and run-bench.sh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 20:20:05 +00:00