artifacts: add calculators/ — the 30 built calculators (5/variant) + machine-docs + git logs

2026-06-16 15:39:42 +00:00
parent 64bc360fc0
commit bb85aa9f11
728 changed files with 34148 additions and 0 deletions
--- a/calculators/README.md
+++ b/calculators/README.md
@ -0,0 +1,48 @@
+# calculators/ — the artifacts the benchmark built
+
+Every benchmark run had a Builder/Adversary loop pair (or a solo Builder) build a Python calculator
+to the spec in [`../plans/calc/`](../plans/calc/). This folder preserves the **actual calculators
+they produced** — the 5 canonical successful runs per variant (the N=5 the analysis is based on; the
+wedged/limit/superseded runs are not included). 30 calculators in all.
+
+## Layout
+
+```
+calculators/<variant>/run-NN/
+  calc.py                  the CLI entry point
+  calc/                    lexer.py, parser.py, evaluator.py + test_*.py (the built calculator)
+  machine-docs/            the loop's coordination artifacts for this run:
+                             STATUS-<phase>.md   (Builder's claims: WHAT/HOW/EXPECTED/WHERE)
+                             REVIEW-<phase>.md   (Adversary's verdicts + findings)
+                             JOURNAL-<phase>.md  (Builder's reasoning — kept out of STATUS)
+                             BACKLOG/DECISIONS.md
+  GIT-LOG.txt              the run's commit history — the claim()/review() handshake
+  SOURCE.txt               the original /tmp run path
+```
+
+`<variant>` is one of the six: `builder-adversary`, `builder-adversary-min`,
+`builder-adversary-stateless`, `builder-adversary-lean`, `builder-adversary-deferred`, `builder-solo`.
+
+These are **working-tree snapshots** (not nested git repos — that would confuse the parent repo). The
+commit history that shows *how* each was built — the per-gate/per-phase `claim(`/`review(` exchange —
+is captured in each `GIT-LOG.txt`. Compare, say, a `builder-adversary-lean` log (per-gate, ~28
+commits) against a `builder-adversary-deferred` log (one comprehensive review at the end) to see the
+cadence difference in action.
+
+## What they're good for
+
+- **Inspect the deliverable** each variant produced (all behaviorally identical — verified — but the
+  code/test style and volume vary; e.g. `-min` runs have leaner test suites).
+- **Read the actual review exchange** in `machine-docs/REVIEW-*.md` + `GIT-LOG.txt` — the Adversary's
+  cold verdicts, findings, and the Builder's STATUS hand-offs.
+
+Run any of them:
+
+```bash
+cd calculators/builder-adversary/run-01
+python -m unittest -q        # tests pass
+python calc.py "2+3*4"       # 14
+```
+
+See [`../FINDINGS.md`](../FINDINGS.md) for what the benchmark concluded and
+[`../RESULTS-campaign.md`](../RESULTS-campaign.md) for the per-run numbers.