feat(3 U0.2+U0.3): per-test results + results.json with computed level

harness/results.py: JUnit-XML parsing (stdlib) → per-stage/per-test rows; derive_rungs (documented tier+deps/SSO → rung mapping); build_results assembles results.json {recipe,version,pr,ref,run_id, stages[],level,level_cap_reason,rungs,flags{clean_teardown,no_secret_leak},screenshot,summary_card}; write_results (atomic). run_recipe_ci.py: tiers emit --junitxml + append {tier,source,file,rc,junit} records; main() assembles+writes results.json wrapped so a failure NEVER changes the verdict (R7), incl. a narrow leak-scan of the serialised artifact. 17 new unit tests (test_results.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:55:52 +00:00
parent df54693449
commit 52e5d210d8
5 changed files with 819 additions and 63 deletions
--- a/runner/harness/level.py
+++ b/runner/harness/level.py
@ -66,7 +66,9 @@ def compute_level(rungs: dict[str, str]) -> tuple[int, str]:
    for name in RUNGS:
        st = rungs.get(name)
        if st not in VALID:
-            raise ValueError(f"rung {name!r} has invalid status {st!r} (expect one of {sorted(VALID)})")
+            raise ValueError(
+                f"rung {name!r} has invalid status {st!r} (expect one of {sorted(VALID)})"
+            )

    # L0: install did not pass.
    if rungs["install"] != "pass":