claim(3 U0): results.json + level ladder — gate CLAIMED

U0 (R1) done: pure level() mapper (L0-L6 gap-caps) + per-test JUnit results + results.json, all emitted best-effort (never changes verdict, R7). Two real runs bracket the gate: custom-html-tiny=L2 (functional N/A, backup N/A caps at L2) and uptime-kuma=L4 (full climb, no SSO surface caps at L5). 28 unit tests + Adversary fuzz-clean. Rung-mapping contract in DECISIONS. Verify: STATUS-3.md HOW/EXPECTED. Awaiting Adversary cold-verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:03:49 +00:00
parent 757511e4e7
commit 5b6b378ade
3 changed files with 97 additions and 23 deletions
--- a/machine-docs/BACKLOG-3.md
+++ b/machine-docs/BACKLOG-3.md
@ -6,16 +6,15 @@ Milestones U0–U5 (plan §5); each ends with an Adversary gate. DoD items R1–
 ## Build backlog

 ### U0 — Results schema + level  (R1)
- [ ] U0.1 — Pure `level()` function: map per-tier results (+ deps/SSO/recipe-local signal) → integer
-      level L0–L6 with gap-caps-level semantics (§4.1). Unit-tested (pass-through-L4 and fail-at-L2-capped).
- [ ] U0.2 — Per-tier pytest emits structured per-test results (JUnit XML per tier → parsed) so
-      results.json carries per-stage AND per-test ✔/✘ breakdown.
- [ ] U0.3 — `run_recipe_ci.py` writes `results.json` per run (recipe, version, pr, ref, stages[],
-      per-test rows, level, level_cap_reason, invariant flags: clean-teardown, no-secret-leak) to a
-      run-scoped artifact dir. Never blocks/fails the test verdict (R7).
- [ ] U0.4 — Decide & wire the artifact hosting path (run-scoped dir on host + dashboard serves
-      `/runs/<id>/...`). Record in DECISIONS.
- GATE U0: level correct for a recipe through L4 and one capped at L2.
+- [x] U0.1 — Pure `level()` function (harness/level.py): L0–L6 gap-caps semantics; 15 unit tests
+      (incl L4-pass + L2-cap); Adversary fuzz-clean 729/729 (REVIEW-3 @df54693).
+- [x] U0.2 — Per-tier pytest emits JUnit XML (parsed by harness/results.py) → results.json per-stage
+      AND per-test ✔/✘ breakdown.
+- [x] U0.3 — `run_recipe_ci.py` writes `results.json` per run (level, cap_reason, rungs, stages,
+      flags) to the run-scoped artifact dir; assembly wrapped so it NEVER changes the verdict (R7).
+- [x] U0.4 — Artifact hosting path decided + recorded in DECISIONS (`${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/
+      <run_id>/`; dashboard serves `/runs/<id>/` in U2/U4 via host bind-mount).
+- GATE U0: **CLAIMED 2026-05-31** — real runs: custom-html-tiny=L2 (cap L3 N/A), uptime-kuma=L4 (cap L5 N/A).

 ### U1 — App screenshot  (R4)
 - [ ] U1.1 — Harness captures a real Playwright screenshot of the deployed app while it is up