From 18d2bd1443b2782e9a6ee7435ab5f6d98690e428 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sun, 31 May 2026 06:53:34 +0000 Subject: [PATCH] =?UTF-8?q?review(3=20U0):=20PASS=20=E2=80=94=20results.js?= =?UTF-8?q?on=20schema=20+=20level=20ladder=20cold-verified?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cold/independent on the real cc-ci-run harness: - 29 unit tests pass (test_level + test_results, PYTHONPATH=runner). - Independent break-it probe EXIT 0, all 10 checks: compute_level 729 exhaustive vs own reference; no-inflation monotonicity; gap-cap; backup_restore_status; SSO gating (no-deps->L4, deps->L5, unverified->fail); derive_rungs no-pass-without-backing big fuzz; e2e custom-fail->L3 + upgrade-fail->L1; leak-clean; schema complete. - Real artifacts match EXPECTED exactly: custom-html-tiny L2 (cap L3 backup N/A), uptime-kuma L4 (cap L5 integration N/A). 0 real secret leaks (only field name no_secret_leak matched). Clean teardown (only traefik_app live). Emission R7-wrapped (try/except; return overall) so cosmetics never change the verdict. R1 (level ladder) cold-verified. Builder may proceed past U0. No VETO. Co-Authored-By: Claude Opus 4.8 (1M context) --- machine-docs/REVIEW-3.md | 91 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 88 insertions(+), 3 deletions(-) diff --git a/machine-docs/REVIEW-3.md b/machine-docs/REVIEW-3.md index a8804ac..68098c1 100644 --- a/machine-docs/REVIEW-3.md +++ b/machine-docs/REVIEW-3.md @@ -5,8 +5,8 @@ This is the Adversary-owned, append-only verdict log for Phase 3. The Builder ow JOURNAL-3.md / BACKLOG-3.md `## Build backlog`. I own this file + BACKLOG-3.md `## Adversary findings`. ## Definition of Done (Phase 3) — R1–R8, each to be Adversary cold-verified within 24h -- [ ] **R1 — Level ladder.** Documented ladder (§4.1) maps passed test sets → one integer level per - run; a missing lower rung caps the level (YunoHost semantics). +- [x] **R1 — Level ladder.** Documented ladder (§4.1) maps passed test sets → one integer level per + run; a missing lower rung caps the level (YunoHost semantics). **COLD-VERIFIED @U0 07:05Z.** - [ ] **R2 — Image-forward PR comment.** `!testme` posts/updates a Gitea PR comment: marker (🌻) + status/level badge + summary image, both linking to run/dashboard; re-run updates same comment. - [ ] **R3 — Summary card image.** Per-run PNG: recipe+version, level, per-stage/per-test ✔/✘ @@ -22,7 +22,7 @@ JOURNAL-3.md / BACKLOG-3.md `## Build backlog`. I own this file + BACKLOG-3.md ` - [ ] **R8 — Docs.** docs/ explains ladder, card/screenshot/badge generation, badge embedding. ## Milestone gates (each ends with an Adversary gate) — U0..U5 -- [ ] U0 — Results schema + level (results.json per-stage/per-test; level correct for L4-pass & L2-cap). +- [x] U0 — Results schema + level (results.json per-stage/per-test; level correct for L4-pass & L2-cap). **PASS @07:05Z.** - [ ] U1 — App screenshot (real, post-login, secret-safe). - [ ] U2 — Summary card + badge (HTML→PNG; level/✔✘/screenshot; SVG badge; stable URLs; pass+fail). - [ ] U3 — YunoHost-style PR comment (marker+badge+card, linked; updates on re-run; no secrets). @@ -73,3 +73,88 @@ Cold-run from a fresh clone on the cc-ci host @9773e3f (`cc-ci-run -m pytest tes `results.json` is actually emitted per real run. The pure function is sound; I will attack the mapping and the real emitted artifact when U0 is CLAIMED. Not anchoring on the Builder's narrative — this is my own cold re-run + fuzz. No verdict yet. + +### @2026-05-31T07:05Z — U0 GATE: **PASS** (Results schema + level; R1) + +**Claim (STATUS-3, `claim(3 U0)` @5b6b378).** `run_recipe_ci.py` emits per-run `results.json` with +per-stage AND per-test ✔/✘ breakdown + a computed integer **level** (L0–L6, YunoHost gap-cap). +Accept: level correct for an L4-pass recipe and one capped at the L2 rung. + +**Verification was COLD + INDEPENDENT.** My clone is on the orchestrator VM; `cc-ci-run` lives only +on the cc-ci host, so I tar'd my clone's `runner/` + `tests/` to a fresh `/tmp/advverify` on cc-ci +and ran everything under the real `cc-ci-run` harness. Verdict formed from the plan (SSOT) + code + +STATUS-3 verification info + my own re-run/probe — JOURNAL-3 NOT read first (anti-anchoring §6.1). + +**1. Unit tests (cold, real harness).** `PYTHONPATH=runner cc-ci-run -m pytest +tests/unit/test_level.py tests/unit/test_results.py -q` → **29 passed in 0.09s**. +(Builder's STATUS said 28 @claim sha; origin HEAD has one more — superset, all green. NB: pytest +needs `tests/conftest.py:13` to put `runner/` on sys.path; the Builder runs from the repo root where +it loads natively, so this is an invocation detail of my /tmp copy, not a defect.) + +**2. My own independent break-it probe** (`/tmp/adv_probe_u0c.py`, written from scratch against the +actual source API `harness.level`/`harness.results`, re-implementing the DECISIONS Phase-3 contract +independently; run under `cc-ci-run` — **EXIT 0, all 10 checks OK**): +- `[1]` `compute_level` exhaustive **729 (3^6)** rung-combos == my independent reference (level = + count of leading contiguous passes); cap_reason empty iff L6, present iff /...`) and hold the cardinal invariant + (rendered card/level/screenshot never greener than raw results.json + actual outcomes) at U2–U4. +- Pre-existing repo-wide lint RED on origin/main (Builder-flagged) is not a Phase-3 DoD item and not + introduced by U0 — noted, not a finding.