claim(3 U0): results.json + level ladder — gate CLAIMED

U0 (R1) done: pure level() mapper (L0-L6 gap-caps) + per-test JUnit results + results.json, all emitted best-effort (never changes verdict, R7). Two real runs bracket the gate: custom-html-tiny=L2 (functional N/A, backup N/A caps at L2) and uptime-kuma=L4 (full climb, no SSO surface caps at L5). 28 unit tests + Adversary fuzz-clean. Rung-mapping contract in DECISIONS. Verify: STATUS-3.md HOW/EXPECTED. Awaiting Adversary cold-verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:03:49 +00:00
parent 757511e4e7
commit 5b6b378ade
3 changed files with 97 additions and 23 deletions
--- a/machine-docs/JOURNAL-3.md
+++ b/machine-docs/JOURNAL-3.md
@ -40,3 +40,32 @@ a P3 blocker; operator kicked Phase 3 off manually. Proceeding.
   the run's exit code (R7: cosmetics never block).
 4. Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve
   `/runs/<id>/...`. Decide details + record in DECISIONS.
+
+## 2026-05-31T06:00Z — U0 complete + CLAIMED
+
+Implemented U0.1–U0.4. Two real end-to-end runs on cc-ci confirm the translation layer (the binding
+risk the Adversary flagged at df54693) produces correct levels:
+- **custom-html-tiny** (stateless, not backup-capable, ≥2 versions): install+upgrade pass, backup/
+  restore skip→N/A, no custom → **level=2**, cap "L3 backup/restore N/A". Proves gap-caps on real data.
+- **uptime-kuma** (backup-capable, 3 functional tests, no deps): all five tiers pass → **level=4**,
+  cap "L5 integration N/A". Proves a full clean climb with no SSO surface caps at L4.
+Both: deploy-count=1, clean_teardown=true, no_secret_leak=true, no orphan apps after.
+
+Design notes / WHY:
+- Chose STRICT monotonic capping (N/A caps like FAIL, distinct reason) over "N/A transparent for middle
+  rungs" because the only worked example in §4.1 (no-integration → cap L4) is N/A-caps, and the cardinal
+  guardrail is never-inflate. A stateless app that can't back up is honestly capped at L2 with a clear
+  reason rather than shown as L4 — understating is safe, overstating is the cardinal FAIL.
+- Kept the LEVEL driven by tier results + deps signals (precise, in-hand) rather than per-test marker
+  plumbing; the per-test JUnit rows are for the card's DISPLAY (U2/U3). functional-vs-SSO split inside
+  the custom tier is conservative: a custom FAIL fails the functional rung (caps L3) since we don't
+  cheaply distinguish — never inflates.
+- results.json assembly + the narrow leak-scan are wrapped in try/except in main() so any failure is
+  logged but never changes `overall` (R7). The broader Adversary leak scan over published artifacts is
+  the authority (U5).
+- "version" field currently shows the recipe HEAD sha for a non-PR run (no VERSION env). Honest but
+  ugly for the card; will prefer the tested version tag for display in U2.
+
+Pre-existing repo lint RED (94 reformat + 36 ruff errors on origin/main, ruff 0.7.3 on CI devshell):
+not mine, flagged in STATUS for the operator. My new files are clean; run_recipe_ci.py left better
+than found (1 vs 4 errors). NOT reformatting 94 cross-phase files in Phase 3 (out of scope, huge noise).