49 lines
2.4 KiB
Markdown
49 lines
2.4 KiB
Markdown
# calculators/ — the artifacts the benchmark built
|
|
|
|
Every benchmark run had a Builder/Adversary loop pair (or a solo Builder) build a Python calculator
|
|
to the spec in [`../plans/calc/`](../plans/calc/). This folder preserves the **actual calculators
|
|
they produced** — the 5 canonical successful runs per variant (the N=5 the analysis is based on; the
|
|
wedged/limit/superseded runs are not included). 30 calculators in all.
|
|
|
|
## Layout
|
|
|
|
```
|
|
calculators/<variant>/run-NN/
|
|
calc.py the CLI entry point
|
|
calc/ lexer.py, parser.py, evaluator.py + test_*.py (the built calculator)
|
|
machine-docs/ the loop's coordination artifacts for this run:
|
|
STATUS-<phase>.md (Builder's claims: WHAT/HOW/EXPECTED/WHERE)
|
|
REVIEW-<phase>.md (Adversary's verdicts + findings)
|
|
JOURNAL-<phase>.md (Builder's reasoning — kept out of STATUS)
|
|
BACKLOG/DECISIONS.md
|
|
GIT-LOG.txt the run's commit history — the claim()/review() handshake
|
|
SOURCE.txt the original /tmp run path
|
|
```
|
|
|
|
`<variant>` is one of the six: `builder-adversary`, `builder-adversary-min`,
|
|
`builder-adversary-stateless`, `builder-adversary-lean`, `builder-adversary-deferred`, `builder-solo`.
|
|
|
|
These are **working-tree snapshots** (not nested git repos — that would confuse the parent repo). The
|
|
commit history that shows *how* each was built — the per-gate/per-phase `claim(`/`review(` exchange —
|
|
is captured in each `GIT-LOG.txt`. Compare, say, a `builder-adversary-lean` log (per-gate, ~28
|
|
commits) against a `builder-adversary-deferred` log (one comprehensive review at the end) to see the
|
|
cadence difference in action.
|
|
|
|
## What they're good for
|
|
|
|
- **Inspect the deliverable** each variant produced (all behaviorally identical — verified — but the
|
|
code/test style and volume vary; e.g. `-min` runs have leaner test suites).
|
|
- **Read the actual review exchange** in `machine-docs/REVIEW-*.md` + `GIT-LOG.txt` — the Adversary's
|
|
cold verdicts, findings, and the Builder's STATUS hand-offs.
|
|
|
|
Run any of them:
|
|
|
|
```bash
|
|
cd calculators/builder-adversary/run-01
|
|
python -m unittest -q # tests pass
|
|
python calc.py "2+3*4" # 14
|
|
```
|
|
|
|
See [`../FINDINGS.md`](../FINDINGS.md) for what the benchmark concluded and
|
|
[`../RESULTS-campaign.md`](../RESULTS-campaign.md) for the per-run numbers.
|