agent-orchestrator-benchmark/calculators/README.md

# calculators/ — the artifacts the benchmark built

Every benchmark run had a Builder/Adversary loop pair (or a solo Builder) build a Python calculator
to the spec in [`../plans/calc/`](../plans/calc/). This folder preserves the **actual calculators
they produced** — the 5 canonical successful runs per variant (the N=5 the analysis is based on; the
wedged/limit/superseded runs are not included). 30 calculators in all.

## Layout

```
calculators/<variant>/run-NN/
  calc.py                  the CLI entry point
  calc/                    lexer.py, parser.py, evaluator.py + test_*.py (the built calculator)
  machine-docs/            the loop's coordination artifacts for this run:
                             STATUS-<phase>.md   (Builder's claims: WHAT/HOW/EXPECTED/WHERE)
                             REVIEW-<phase>.md   (Adversary's verdicts + findings)
                             JOURNAL-<phase>.md  (Builder's reasoning — kept out of STATUS)
                             BACKLOG/DECISIONS.md
  GIT-LOG.txt              the run's commit history — the claim()/review() handshake
  SOURCE.txt               the original /tmp run path
```

`<variant>` is one of the six: `builder-adversary`, `builder-adversary-min`,
`builder-adversary-stateless`, `builder-adversary-lean`, `builder-adversary-deferred`, `builder-solo`.

These are **working-tree snapshots** (not nested git repos — that would confuse the parent repo). The
commit history that shows *how* each was built — the per-gate/per-phase `claim(`/`review(` exchange —
is captured in each `GIT-LOG.txt`. Compare, say, a `builder-adversary-lean` log (per-gate, ~28
commits) against a `builder-adversary-deferred` log (one comprehensive review at the end) to see the
cadence difference in action.

## What they're good for

- **Inspect the deliverable** each variant produced (all behaviorally identical — verified — but the
  code/test style and volume vary; e.g. `-min` runs have leaner test suites).
- **Read the actual review exchange** in `machine-docs/REVIEW-*.md` + `GIT-LOG.txt` — the Adversary's
  cold verdicts, findings, and the Builder's STATUS hand-offs.

Run any of them:

```bash
cd calculators/builder-adversary/run-01
python -m unittest -q        # tests pass
python calc.py "2+3*4"       # 14
```

See [`../FINDINGS.md`](../FINDINGS.md) for what the benchmark concluded and
[`../RESULTS-campaign.md`](../RESULTS-campaign.md) for the per-run numbers.