results: full-harness 3-way (orig/min/stateless) on the calculator
All three completed the 3-phase calculator to SEQUENCE-COMPLETE with full adversary verification (calc correct, tests OK). Tokens: orig 10,756,363 min 10,119,225 (-5.9%) stateless 5,964,829 (-44.5%) Prompt-prose minimization barely moved tokens; context hygiene (stateless) nearly halved them, driven by ~48% lower cache_read. Quality held. Also fix the runner's success check: it grepped the word "veto" and matched "No veto" → false failures; now matches the "## VETO" marker. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@ -169,7 +169,8 @@ run_variant() {
|
||||
local out; out="$( cd "$run/work-adv" && python calc.py '2+3*4' 2>/dev/null )"
|
||||
[ "$out" = "14" ] && cli=yes
|
||||
local reviews; reviews="$(grep -rhoiE '(lex|parse|eval)/D[0-9]+:?\s*PASS' "$run/work-adv/machine-docs/" 2>/dev/null | sort -u | wc -l)"
|
||||
local veto="no"; grep -rqi 'VETO' "$run/work-adv/machine-docs/" 2>/dev/null && veto=yes
|
||||
# a standing veto is the "## VETO" marker (per the prompts) — NOT the word "veto" (matches "No veto")
|
||||
local veto="no"; grep -rqiE '##[[:space:]]*VETO' "$run/work-adv/machine-docs/" 2>/dev/null && veto=yes
|
||||
SUM_PHASES[$v]="$(cat "$run/.ao-state/state/phase-idx" 2>/dev/null || echo '?')"
|
||||
|
||||
local success=NO
|
||||
|
||||
Reference in New Issue
Block a user