claim(3 U0): results.json + level ladder — gate CLAIMED
U0 (R1) done: pure level() mapper (L0-L6 gap-caps) + per-test JUnit results + results.json, all emitted best-effort (never changes verdict, R7). Two real runs bracket the gate: custom-html-tiny=L2 (functional N/A, backup N/A caps at L2) and uptime-kuma=L4 (full climb, no SSO surface caps at L5). 28 unit tests + Adversary fuzz-clean. Rung-mapping contract in DECISIONS. Verify: STATUS-3.md HOW/EXPECTED. Awaiting Adversary cold-verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -40,3 +40,32 @@ a P3 blocker; operator kicked Phase 3 off manually. Proceeding.
|
||||
the run's exit code (R7: cosmetics never block).
|
||||
4. Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve
|
||||
`/runs/<id>/...`. Decide details + record in DECISIONS.
|
||||
|
||||
## 2026-05-31T06:00Z — U0 complete + CLAIMED
|
||||
|
||||
Implemented U0.1–U0.4. Two real end-to-end runs on cc-ci confirm the translation layer (the binding
|
||||
risk the Adversary flagged at df54693) produces correct levels:
|
||||
- **custom-html-tiny** (stateless, not backup-capable, ≥2 versions): install+upgrade pass, backup/
|
||||
restore skip→N/A, no custom → **level=2**, cap "L3 backup/restore N/A". Proves gap-caps on real data.
|
||||
- **uptime-kuma** (backup-capable, 3 functional tests, no deps): all five tiers pass → **level=4**,
|
||||
cap "L5 integration N/A". Proves a full clean climb with no SSO surface caps at L4.
|
||||
Both: deploy-count=1, clean_teardown=true, no_secret_leak=true, no orphan apps after.
|
||||
|
||||
Design notes / WHY:
|
||||
- Chose STRICT monotonic capping (N/A caps like FAIL, distinct reason) over "N/A transparent for middle
|
||||
rungs" because the only worked example in §4.1 (no-integration → cap L4) is N/A-caps, and the cardinal
|
||||
guardrail is never-inflate. A stateless app that can't back up is honestly capped at L2 with a clear
|
||||
reason rather than shown as L4 — understating is safe, overstating is the cardinal FAIL.
|
||||
- Kept the LEVEL driven by tier results + deps signals (precise, in-hand) rather than per-test marker
|
||||
plumbing; the per-test JUnit rows are for the card's DISPLAY (U2/U3). functional-vs-SSO split inside
|
||||
the custom tier is conservative: a custom FAIL fails the functional rung (caps L3) since we don't
|
||||
cheaply distinguish — never inflates.
|
||||
- results.json assembly + the narrow leak-scan are wrapped in try/except in main() so any failure is
|
||||
logged but never changes `overall` (R7). The broader Adversary leak scan over published artifacts is
|
||||
the authority (U5).
|
||||
- "version" field currently shows the recipe HEAD sha for a non-PR run (no VERSION env). Honest but
|
||||
ugly for the card; will prefer the tested version tag for display in U2.
|
||||
|
||||
Pre-existing repo lint RED (94 reformat + 36 ruff errors on origin/main, ruff 0.7.3 on CI devshell):
|
||||
not mine, flagged in STATUS for the operator. My new files are clean; run_recipe_ci.py left better
|
||||
than found (1 vs 4 errors). NOT reformatting 94 cross-phase files in Phase 3 (out of scope, huge noise).
|
||||
|
||||
Reference in New Issue
Block a user