From 5b6b378ade7ed7be0fd9113f9e26f9c65b49ff35 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sun, 31 May 2026 06:03:49 +0000 Subject: [PATCH] =?UTF-8?q?claim(3=20U0):=20results.json=20+=20level=20lad?= =?UTF-8?q?der=20=E2=80=94=20gate=20CLAIMED?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit U0 (R1) done: pure level() mapper (L0-L6 gap-caps) + per-test JUnit results + results.json, all emitted best-effort (never changes verdict, R7). Two real runs bracket the gate: custom-html-tiny=L2 (functional N/A, backup N/A caps at L2) and uptime-kuma=L4 (full climb, no SSO surface caps at L5). 28 unit tests + Adversary fuzz-clean. Rung-mapping contract in DECISIONS. Verify: STATUS-3.md HOW/EXPECTED. Awaiting Adversary cold-verify. Co-Authored-By: Claude Opus 4.8 (1M context) --- machine-docs/BACKLOG-3.md | 19 +++++------ machine-docs/JOURNAL-3.md | 29 ++++++++++++++++ machine-docs/STATUS-3.md | 72 ++++++++++++++++++++++++++++++++------- 3 files changed, 97 insertions(+), 23 deletions(-) diff --git a/machine-docs/BACKLOG-3.md b/machine-docs/BACKLOG-3.md index 85c4709..fb8ac7d 100644 --- a/machine-docs/BACKLOG-3.md +++ b/machine-docs/BACKLOG-3.md @@ -6,16 +6,15 @@ Milestones U0–U5 (plan §5); each ends with an Adversary gate. DoD items R1– ## Build backlog ### U0 — Results schema + level (R1) -- [ ] U0.1 — Pure `level()` function: map per-tier results (+ deps/SSO/recipe-local signal) → integer - level L0–L6 with gap-caps-level semantics (§4.1). Unit-tested (pass-through-L4 and fail-at-L2-capped). -- [ ] U0.2 — Per-tier pytest emits structured per-test results (JUnit XML per tier → parsed) so - results.json carries per-stage AND per-test ✔/✘ breakdown. -- [ ] U0.3 — `run_recipe_ci.py` writes `results.json` per run (recipe, version, pr, ref, stages[], - per-test rows, level, level_cap_reason, invariant flags: clean-teardown, no-secret-leak) to a - run-scoped artifact dir. Never blocks/fails the test verdict (R7). -- [ ] U0.4 — Decide & wire the artifact hosting path (run-scoped dir on host + dashboard serves - `/runs//...`). Record in DECISIONS. -- GATE U0: level correct for a recipe through L4 and one capped at L2. +- [x] U0.1 — Pure `level()` function (harness/level.py): L0–L6 gap-caps semantics; 15 unit tests + (incl L4-pass + L2-cap); Adversary fuzz-clean 729/729 (REVIEW-3 @df54693). +- [x] U0.2 — Per-tier pytest emits JUnit XML (parsed by harness/results.py) → results.json per-stage + AND per-test ✔/✘ breakdown. +- [x] U0.3 — `run_recipe_ci.py` writes `results.json` per run (level, cap_reason, rungs, stages, + flags) to the run-scoped artifact dir; assembly wrapped so it NEVER changes the verdict (R7). +- [x] U0.4 — Artifact hosting path decided + recorded in DECISIONS (`${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/ + /`; dashboard serves `/runs//` in U2/U4 via host bind-mount). +- GATE U0: **CLAIMED 2026-05-31** — real runs: custom-html-tiny=L2 (cap L3 N/A), uptime-kuma=L4 (cap L5 N/A). ### U1 — App screenshot (R4) - [ ] U1.1 — Harness captures a real Playwright screenshot of the deployed app while it is up diff --git a/machine-docs/JOURNAL-3.md b/machine-docs/JOURNAL-3.md index 34b5661..efdb7ca 100644 --- a/machine-docs/JOURNAL-3.md +++ b/machine-docs/JOURNAL-3.md @@ -40,3 +40,32 @@ a P3 blocker; operator kicked Phase 3 off manually. Proceeding. the run's exit code (R7: cosmetics never block). 4. Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve `/runs//...`. Decide details + record in DECISIONS. + +## 2026-05-31T06:00Z — U0 complete + CLAIMED + +Implemented U0.1–U0.4. Two real end-to-end runs on cc-ci confirm the translation layer (the binding +risk the Adversary flagged at df54693) produces correct levels: +- **custom-html-tiny** (stateless, not backup-capable, ≥2 versions): install+upgrade pass, backup/ + restore skip→N/A, no custom → **level=2**, cap "L3 backup/restore N/A". Proves gap-caps on real data. +- **uptime-kuma** (backup-capable, 3 functional tests, no deps): all five tiers pass → **level=4**, + cap "L5 integration N/A". Proves a full clean climb with no SSO surface caps at L4. +Both: deploy-count=1, clean_teardown=true, no_secret_leak=true, no orphan apps after. + +Design notes / WHY: +- Chose STRICT monotonic capping (N/A caps like FAIL, distinct reason) over "N/A transparent for middle + rungs" because the only worked example in §4.1 (no-integration → cap L4) is N/A-caps, and the cardinal + guardrail is never-inflate. A stateless app that can't back up is honestly capped at L2 with a clear + reason rather than shown as L4 — understating is safe, overstating is the cardinal FAIL. +- Kept the LEVEL driven by tier results + deps signals (precise, in-hand) rather than per-test marker + plumbing; the per-test JUnit rows are for the card's DISPLAY (U2/U3). functional-vs-SSO split inside + the custom tier is conservative: a custom FAIL fails the functional rung (caps L3) since we don't + cheaply distinguish — never inflates. +- results.json assembly + the narrow leak-scan are wrapped in try/except in main() so any failure is + logged but never changes `overall` (R7). The broader Adversary leak scan over published artifacts is + the authority (U5). +- "version" field currently shows the recipe HEAD sha for a non-PR run (no VERSION env). Honest but + ugly for the card; will prefer the tested version tag for display in U2. + +Pre-existing repo lint RED (94 reformat + 36 ruff errors on origin/main, ruff 0.7.3 on CI devshell): +not mine, flagged in STATUS for the operator. My new files are clean; run_recipe_ci.py left better +than found (1 vs 4 errors). NOT reformatting 94 cross-phase files in Phase 3 (out of scope, huge noise). diff --git a/machine-docs/STATUS-3.md b/machine-docs/STATUS-3.md index 797bf36..4dc6637 100644 --- a/machine-docs/STATUS-3.md +++ b/machine-docs/STATUS-3.md @@ -6,22 +6,68 @@ State files (this phase): `machine-docs/{STATUS,BACKLOG,REVIEW,JOURNAL}-3.md`. D **WHAT + HOW + EXPECTED + WHERE live here; WHY → JOURNAL-3.md.** ## Phase context -- Phase 2b is `## DONE` (Adversary-verified, no VETO). Phase 3 kicked off **manually by the operator** - (plan-phase3 transition = manual). Note for honesty: Phase-2 (recipe-tests) `## DONE` is not yet - flipped and REVIEW-2 carries a standing VETO on full Phase-2 DONE authorization; cross-phase - sequencing is an operator call — Phase 3 proceeds per the operator kickoff. Adversary concurs this - is not a Phase-3 blocker (REVIEW-3 @05:42Z). +- Phase 2b is `## DONE` (Adversary-verified, no VETO). Phase 3 kicked off **manually by the operator**. + Note for honesty: Phase-2 `## DONE` not yet flipped (REVIEW-2 standing VETO on full Phase-2 DONE + authorization); cross-phase sequencing is an operator call. Adversary concurs it's not a P3 blocker + (REVIEW-3 @05:42Z). +- **Pre-existing repo-wide lint is RED on origin/main** (94 files `ruff format`-dirty + 36 `ruff check` + errors; confirmed on cc-ci CI devshell against clean `origin/main`, ruff 0.7.3). This predates Phase 3 + and is NOT introduced by my work — my NEW Phase-3 files are fully `ruff`-clean, and I left + `run_recipe_ci.py` with fewer ruff errors than main (1 vs 4). Flagged for the operator; not a Phase-3 + DoD item, and the U0 gate is verified by unit tests + real-run results.json, not repo-wide lint. -## Current state -- Phase-3 loop live. Bootstrapping state files + settling open decisions, then executing **U0**. -- No gate claimed yet. +--- -## In flight -- **U0 — Results schema + level (R1).** Building: pure `level()` mapper (L0–L6, gap-caps), - per-test structured results, `results.json` per run, artifact hosting path. +## Gate: U0 — CLAIMED, awaiting Adversary (Results schema + level; R1) -## Gate -(none claimed) +**WHAT.** `run_recipe_ci.py` now emits a per-run `results.json` with per-stage AND per-test ✔/✘ +breakdown and a computed integer **level** (L0–L6, YunoHost gap-caps semantics). DoD R1 (level ladder) +satisfied; U0 milestone acceptance ("level correct for a recipe through L4 and one capped at L2") +demonstrated on two real end-to-end runs. + +**WHERE (commits / files).** +- `9773e3f` `runner/harness/level.py` — pure `compute_level(rungs)->(level,cap_reason)` + helpers + `backup_restore_status`, `tier_to_rung`. `tests/unit/test_level.py` (15 tests). +- `52e5d21` `runner/harness/results.py` — JUnit-XML parse, `collect_stages`, `derive_rungs` (the + tier+deps/SSO→rung translation), `build_results`, `write_results`. `tests/unit/test_results.py` + (13 tests). `runner/run_recipe_ci.py` — tiers emit `--junitxml` + append `{tier,source,file,rc,junit}` + records; `main()` assembles+writes results.json wrapped so a failure NEVER changes the verdict (R7), + incl. a narrow self leak-scan of the serialised artifact. +- `757511e` `machine-docs/DECISIONS.md` (Phase-3 section) — the documented ladder + exact rung-mapping + contract `derive_rungs` implements + results.json schema + artifact-hosting decision. + +**HOW to verify (cold, from your clone on cc-ci).** +1. **Unit tests** (deterministic; also fuzz-verifiable): + `cc-ci-run -m pytest tests/unit/test_level.py tests/unit/test_results.py -q` +2. **Real-run L2-cap** (stateless, not backup-capable, ≥2 versions): + `RECIPE=custom-html-tiny STAGES=install,upgrade,backup,restore,custom CCCI_RUN_ID=adv-cht cc-ci-run runner/run_recipe_ci.py` + then read `/var/lib/cc-ci-runs/adv-cht/results.json`. +3. **Real-run L4-pass** (backup-capable, 3 functional tests, no deps): + `RECIPE=uptime-kuma STAGES=install,upgrade,backup,restore,custom CCCI_RUN_ID=adv-uk cc-ci-run runner/run_recipe_ci.py` + then read `/var/lib/cc-ci-runs/adv-uk/results.json`. + (Compare the `level`/`rungs` against the `results` dict + DECISIONS contract — a level greener than + the tiers would be a FAIL. Verify clean teardown: no orphan `*-pr*`/recipe service after.) + +**EXPECTED.** +1. `28 passed`. +2. custom-html-tiny: `level=2`, `level_cap_reason="L3 backup/restore (data integrity) N/A"`, + `rungs={install:pass, upgrade:pass, backup_restore:na, functional:na, integration:na, recipe_local:na}`, + `results={install:pass, upgrade:pass, backup:skip, restore:skip, custom:skip}`, + `flags={clean_teardown:true, no_secret_leak:true}`, stages=[install,upgrade] each w/ per-test rows. + (My run: `/var/lib/cc-ci-runs/u0-cht-L2/results.json`.) +3. uptime-kuma: `level=4`, `level_cap_reason="L5 integration (SSO/OIDC + cross-app) N/A"`, + `rungs={install:pass, upgrade:pass, backup_restore:pass, functional:pass, integration:na, recipe_local:na}`, + all five tiers pass, `flags.clean_teardown=true`, stages=[install,upgrade,backup,restore,custom] + with per-test rows (incl. 3 uptime-kuma functional tests, source `cc-ci`). + (My run: `/var/lib/cc-ci-runs/u0-uk-L4/results.json`.) + +These two bracket the gate: a recipe whose functional tests **pass** is still capped at **L2** when a +lower rung (L3 backup) is N/A (gap-caps; never inflates), and a full clean climb with no SSO surface +caps at **L4**. + +## In flight (next, post-gate) +- U1 — app screenshot (Playwright, post-login, secret-safe). Will start once U0 PASSes; meanwhile I + hold U1 design as the next unblocked item. ## Blocked (none)