Full-harness benchmark — campaign analysis
Real agents.py up Builder/Adversary loop pair + watchdog through the 3-phase calculator to SEQUENCE-COMPLETE. Both loops on Sonnet. Tokens summed from each loop's session transcript; commits = work-repo commit count; LOC = non-blank calc/*.py lines (code + tests). 25 successful runs of 27 total.
Per-variant total tokens (successful runs)
| variant |
runs(ok) |
median |
mean |
min |
max |
spread |
| builder-adversary |
5/5 |
13,037,683 |
12,919,657 |
11,117,474 |
14,960,414 |
1.35x |
| builder-adversary-min |
5/6 |
9,772,581 |
9,917,040 |
9,135,718 |
11,386,415 |
1.25x |
| builder-adversary-stateless |
5/5 |
10,122,375 |
10,735,401 |
9,992,834 |
13,009,792 |
1.30x |
| builder-adversary-lean |
5/6 |
13,409,349 |
13,216,582 |
12,101,355 |
13,815,595 |
1.14x |
| builder-solo |
5/5 |
2,773,634 |
2,744,840 |
2,417,528 |
2,948,467 |
1.22x |
Efficiency ratios — min / median / max (successful runs)
tokens/sec excludes runs flagged LIMIT (a usage-limit pause inflates duration without adding tokens, so it would understate the true rate); tokens/LOC and tokens/commit are unaffected and include all successful runs.
| variant |
tokens / LOC |
tokens / sec |
tokens / commit |
| builder-adversary |
23,857 / 30,391 / 32,665 |
8,083 / 11,670 / 13,852 |
793,540 / 935,026 / 1,044,586 |
| builder-adversary-min |
23,527 / 25,252 / 32,814 |
8,173 / 14,807 / 15,814 |
582,292 / 669,789 / 712,415 |
| builder-adversary-stateless |
21,802 / 27,799 / 30,861 |
10,544 / 11,620 / 12,755 |
697,172 / 765,282 / 778,644 |
| builder-adversary-lean |
28,077 / 33,238 / 38,966 |
12,416 / 13,523 / 14,403 |
432,191 / 478,905 / 575,650 |
| builder-solo |
6,029 / 6,611 / 6,969 |
6,542 / 6,715 / 7,020 |
392,494 / 483,506 / 737,117 |
| all |
6,029 / 28,077 / 38,966 |
6,542 / 11,712 / 15,814 |
392,494 / 697,172 / 1,044,586 |
Per-variant medians (commits / LOC / duration)
| variant |
median commits |
median LOC |
median dur(s) |
| builder-adversary |
14 |
449 |
1020 |
| builder-adversary-min |
15 |
367 |
720 |
| builder-adversary-stateless |
14 |
400 |
900 |
| builder-adversary-lean |
28 |
390 |
960 |
| builder-solo |
5 |
426 |
420 |
Correlations with total tokens (pooled, n=25)
| tokens vs |
Pearson r |
| duration |
+0.83 |
| commits |
+0.79 |
| LOC |
-0.04 |
All runs (raw)
| variant |
rep |
ok |
limit |
total |
dur(s) |
commits |
LOC |
tok/LOC |
tok/sec |
tok/commit |
| builder-adversary |
1 |
YES |
|
11,117,474 |
960 |
14 |
466 |
23,857 |
11,581 |
794,105 |
| builder-adversary |
2 |
YES |
|
13,579,616 |
1680 |
13 |
449 |
30,244 |
8,083 |
1,044,586 |
| builder-adversary |
3 |
YES |
|
14,960,414 |
1080 |
16 |
458 |
32,665 |
13,852 |
935,026 |
| builder-adversary |
4 |
YES |
|
13,037,683 |
1020 |
13 |
429 |
30,391 |
12,782 |
1,002,899 |
| builder-adversary |
5 |
YES |
|
11,903,098 |
1020 |
15 |
381 |
31,242 |
11,670 |
793,540 |
| builder-adversary-min |
1 |
YES |
|
9,135,718 |
780 |
15 |
367 |
24,893 |
11,712 |
609,048 |
| builder-adversary-min |
2 |
YES |
|
11,386,415 |
720 |
17 |
347 |
32,814 |
15,814 |
669,789 |
| builder-adversary-min |
3 |
YES |
|
9,316,676 |
1140 |
16 |
396 |
23,527 |
8,173 |
582,292 |
| builder-adversary-min |
4 |
YES |
|
9,973,813 |
660 |
14 |
347 |
28,743 |
15,112 |
712,415 |
| builder-adversary-min |
5 |
NO |
|
3,693,171 |
1800 |
4 |
128 |
28,853 |
2,052 |
923,293 |
| builder-adversary-min |
1 |
YES |
|
9,772,581 |
660 |
14 |
387 |
25,252 |
14,807 |
698,042 |
| builder-adversary-stateless |
1 |
YES |
|
10,457,577 |
900 |
15 |
400 |
26,144 |
11,620 |
697,172 |
| builder-adversary-stateless |
2 |
YES |
|
9,992,834 |
840 |
13 |
341 |
29,304 |
11,896 |
768,680 |
| builder-adversary-stateless |
3 |
YES |
|
10,094,430 |
900 |
14 |
463 |
21,802 |
11,216 |
721,031 |
| builder-adversary-stateless |
4 |
YES |
|
10,122,375 |
960 |
13 |
328 |
30,861 |
10,544 |
778,644 |
| builder-adversary-stateless |
5 |
YES |
|
13,009,792 |
1020 |
17 |
468 |
27,799 |
12,755 |
765,282 |
| builder-adversary-lean |
1 |
YES |
|
12,962,701 |
900 |
28 |
390 |
33,238 |
14,403 |
462,954 |
| builder-adversary-lean |
2 |
YES |
|
13,409,349 |
1080 |
28 |
451 |
29,732 |
12,416 |
478,905 |
| builder-adversary-lean |
3 |
NO |
LIMIT |
6,518,422 |
1800 |
11 |
259 |
25,168 |
3,621 |
592,584 |
| builder-adversary-lean |
4 |
YES |
|
13,815,595 |
960 |
24 |
378 |
36,549 |
14,391 |
575,650 |
| builder-adversary-lean |
5 |
YES |
|
12,101,355 |
960 |
28 |
431 |
28,077 |
12,606 |
432,191 |
| builder-adversary-lean |
1 |
YES |
|
13,793,914 |
1020 |
25 |
354 |
38,966 |
13,523 |
551,757 |
| builder-solo |
1 |
YES |
|
2,417,528 |
360 |
5 |
401 |
6,029 |
6,715 |
483,506 |
| builder-solo |
2 |
YES |
|
2,747,457 |
420 |
7 |
450 |
6,105 |
6,542 |
392,494 |
| builder-solo |
3 |
YES |
|
2,837,115 |
420 |
6 |
426 |
6,660 |
6,755 |
472,852 |
| builder-solo |
4 |
YES |
|
2,773,634 |
420 |
5 |
398 |
6,969 |
6,604 |
554,727 |
| builder-solo |
5 |
YES |
|
2,948,467 |
420 |
4 |
446 |
6,611 |
7,020 |
737,117 |
Stats over successful runs. LIMIT = the run hit a usage-limit pause (duration/tok-sec distorted, token total fine). Repos kept under the run root for analysis.