Commit Graph

3 Commits

Author SHA1 Message Date
fc0608ede1 feat: builder-solo control runner (run after campaign) + limit-detect for it
run-solo-bench.sh runs the builder-solo variant (single builder, self-verify,
no adversary) 5× on the same calculator and appends rows to the shared campaign
data file (adversary col = 0). Separate script so the live campaign runner is
untouched. analyze.py limit-detection now also covers the solo run layout.
Engine example builder-solo committed at a0f7652; benchmark engine to be re-
pinned to it before running solo (after the main campaign completes).
2026-06-15 02:36:58 +00:00
25a77f5d3c fix: flag usage-limit-affected runs; correct tok/sec
A run that hits a usage-limit pause has inflated duration (idle wait) but an
accurate token total. analyze.py now scans each run's watchdog log for 'limit
hit', flags it LIMIT in the raw table, and excludes it from the tokens/sec stat
(token total, tok/LOC, tok/commit unaffected). Caught because campaign run r2
hit the limit ~00:40 and recovered at the 00:50 reset — watchdog handled it.
2026-06-15 01:29:54 +00:00
33eeb3ce6b feat: analyze.py — efficiency ratios (tokens/LOC, tokens/sec, tokens/commit)
Standalone analysis over RESULTS-campaign.md.data (safe: independent of the live
runner). Adds the normalised efficiency ratios per run with min/median/max per
variant, alongside the token distributions, commit/LOC medians, correlations,
and full raw table. Run: python3 analyze.py  (regenerates RESULTS-campaign.md).

Orig baseline (5 runs): tokens/LOC ~25k–34k, tokens/sec ~11.3k–14.0k.
2026-06-15 00:15:46 +00:00