Revert "docs(lvl5): results-ux.md → 5-rung de-capped ladder + schema 2; recipe-customization.md EXPECTED_NA/BACKUP_CAPABLE rows to new semantics"
This reverts commit af7488a498.
This commit is contained in:
@ -115,8 +115,8 @@ _This table is GENERATED from the `runner/harness/meta.py` KEYS registry by `scr
|
||||
| `HEALTH_OK` | `tuple[int]` | `(200, 301, 302)` | Acceptable HTTP status codes for health. |
|
||||
| `DEPLOY_TIMEOUT` | `int` | `600` | Max seconds to wait for swarm convergence per deploy. |
|
||||
| `HTTP_TIMEOUT` | `int` | `300` | Max seconds to wait for HTTP health after convergence. |
|
||||
| `BACKUP_CAPABLE` | `bool` | `None` | Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces an intentional skip of the backup/restore rung; `True` forces the tier on; unset = auto-detect. |
|
||||
| `EXPECTED_NA` | `dict` | `None` | Declare a non-run rung an INTENTIONAL skip: `{rung: reason}`. The level climbs past an intentional skip; an undeclared non-run rung is *unverified* and blocks the level above it (phase lvl5; classification table in `machine-docs/DECISIONS.md`). Never overrides an exercised pass/fail; the `lint` rung has no escape hatch. |
|
||||
| `BACKUP_CAPABLE` | `bool` | `None` | Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces N/A; `True` forces the tier on; unset = auto-detect. |
|
||||
| `EXPECTED_NA` | `dict` | `None` | Declare an N/A rung intentional: `{rung: reason}`. The cap stands either way; only the report wording changes. |
|
||||
| `READY_PROBE` | `hook` | `None` | Callable `(ctx) -> [probe, ...]` returning extra readiness probes, run after install AND after upgrade: HTTP `{host, path, ok}` or TCP `{tcp_host, tcp_port, stable}`. |
|
||||
| `UPGRADE_BASE_VERSION` | `str` | `None` | Exact published tag overriding the upgrade tier's base (default: `recipe_versions[-2]`). |
|
||||
| `BACKUP_VERIFY` | `hook` | `None` | Callable `(ctx) -> bool` post-backup data-capture check; `False` re-runs the backup (truncated-dump race guard), retried up to 3 attempts. |
|
||||
|
||||
@ -10,9 +10,12 @@ It is the R8 reference for Phase 3 (`plan-phase3-results-ux.md`).
|
||||
|
||||
---
|
||||
|
||||
## 1. The level ladder (phase lvl5 semantics, operator-decided 2026-06-11)
|
||||
## 1. The level ladder (R1)
|
||||
|
||||
Every run earns a single integer **level 0–5** over the FIVE essential rungs:
|
||||
Every run earns a single integer **level 0–6**. The ladder is cumulative with **YunoHost
|
||||
gap-caps-the-level** semantics: you earn level `L` only if **every rung 1..L was a clean PASS**. The
|
||||
first rung that is not a clean PASS — a real **FAIL** *or* genuinely **N/A** for this recipe — stops
|
||||
the climb, and `level_cap_reason` records which rung and why.
|
||||
|
||||
| Level | Rung | Earned when |
|
||||
|------:|------|-------------|
|
||||
@ -21,52 +24,42 @@ Every run earns a single integer **level 0–5** over the FIVE essential rungs:
|
||||
| **L2** | upgrade | previous published version → PR/latest, stays healthy, data intact. |
|
||||
| **L3** | backup/restore | seeded data survives backup → wipe → restore. |
|
||||
| **L4** | functional | the recipe-specific functional tests pass. |
|
||||
| **L5** | lint | `abra recipe lint` passes against the exact ref under test. |
|
||||
| **L5** | integration | SSO/OIDC + cross-app integration tests pass. |
|
||||
| **L6** | recipe-local | the recipe repo's own `tests/` (D4) pass and are merged. |
|
||||
|
||||
Each rung has one of FOUR statuses, and the level is:
|
||||
**N/A caps, fairly.** A rung that does not apply to a recipe (only one published version → no
|
||||
upgrade; not backup-capable; no SSO/integration surface; no recipe-local tests) is **N/A**, which
|
||||
caps the climb at the rung below it with a recorded reason — it is *not* counted as a failure. This is
|
||||
the only fair reading of "a missing lower rung caps the level": e.g. a recipe with **no integration
|
||||
surface caps at L4 by definition**, shown as `level_cap_reason = "L5 integration … N/A"`. A stateless
|
||||
app whose functional tests pass but which cannot be backed up is honestly capped at **L2** (`"L3
|
||||
backup/restore … N/A"`) rather than shown as L4 — understating is safe; overstating is forbidden.
|
||||
|
||||
level = the highest rung that PASSED, where every rung below it is "pass" or an intentional skip
|
||||
|
||||
- **pass / fail** — the rung was exercised. A FAIL blocks: no rung above it counts, however green.
|
||||
- **skip (intentional)** — the rung *genuinely does not apply*, from a declared or structural fact:
|
||||
not backup-capable (declared), only one published version (no upgrade target), or a declared
|
||||
`EXPECTED_NA`. Intentional skips are **climbed past** — a stateless recipe with passing
|
||||
functional tests and a clean lint reaches **L5**, not the old "capped at 2".
|
||||
- **unver (unverified)** — the rung *should* have run but didn't: infra error, missing tool,
|
||||
harness exception, prior-stage abort, timeout. **The level cannot rise above an unverified
|
||||
rung** — it blocks exactly like a fail (we never claim what we didn't check). Anything
|
||||
unclassifiable defaults to unver (conservative).
|
||||
|
||||
There is **no capping concept** (no `cap_reason`, no `capped`): the per-rung table
|
||||
(✔ / ✘ / intentional-skip / unverified) on the card and in `results.json.rungs` is the sole
|
||||
carrier of "why isn't this level higher". Worked examples:
|
||||
|
||||
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → **level 1** (fail blocks).
|
||||
- install ✔, upgrade ✔, backup skip (not capable), functional ✔, lint ✔ → **level 5**.
|
||||
- install ✔, upgrade ✔, backup unver (harness error), functional ✔, lint ✔ → **level 2**.
|
||||
- all four ✔, lint unver (abra missing) → **level 4** (an unverified top rung isn't earned).
|
||||
|
||||
Integration (SSO/OIDC + cross-app) and recipe-local tests are **optional capabilities**, not
|
||||
rungs — they never affect the level (SSO remains enforced for the run VERDICT).
|
||||
Worked examples (real runs):
|
||||
- `uptime-kuma` — install+upgrade+backup+restore+functional all pass, no SSO surface → **L4**
|
||||
(`cap = "L5 integration (SSO/OIDC + cross-app) N/A"`).
|
||||
- `custom-html-tiny` — stateless, not backup-capable: install+upgrade pass, backup/restore N/A →
|
||||
**L2** (`cap = "L3 backup/restore (data integrity) N/A"`).
|
||||
|
||||
### How tiers map to rungs (the translation layer)
|
||||
|
||||
`run_recipe_ci.py` holds the run's per-tier results (`install/upgrade/backup/restore/custom`) +
|
||||
structural signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict
|
||||
that `runner/harness/level.py::compute_level` scores. The full intentional-vs-unintentional
|
||||
classification table for every N/A source is in `machine-docs/DECISIONS.md` (phase lvl5). Summary:
|
||||
deps/SSO signals; `runner/harness/results.py::derive_rungs` maps them to the rung-status dict that
|
||||
`runner/harness/level.py::compute_level` scores. The mapping (also in `DECISIONS.md`, Phase 3):
|
||||
|
||||
- **install** ← install tier (pass/fail; a non-run is unver — install always applies).
|
||||
- **upgrade** ← upgrade tier; tier skipped with no upgrade target (single published version,
|
||||
structural) → skip; declared `EXPECTED_NA` → skip; otherwise unver.
|
||||
- **install** ← install tier (pass/fail).
|
||||
- **upgrade** ← upgrade tier; `skip` → **na** (only one published version).
|
||||
- **backup_restore** ← backup AND restore tiers both pass → pass; either fail → fail; not
|
||||
backup-capable (structural/declared) → skip; unverified-while-capable → unver.
|
||||
- **functional** ← the custom tier; a custom failure conservatively fails this rung; no custom
|
||||
tests is a coverage GAP → unver, unless declared `EXPECTED_NA["functional"]` → skip.
|
||||
- **lint** ← the lint executor (`runner/harness/lint.py`): `abra recipe lint` on a pristine
|
||||
scratch clone of the run's recipe tree at the exact tested sha, 60s hard budget, full output in
|
||||
the run artifact `lint.txt`. pass/fail only — when lint can't run the rung is **unver** (never
|
||||
a silent pass, never an intentional skip). Lint never changes the run verdict.
|
||||
backup-capable → **na**.
|
||||
- **functional** ← the custom tier minus its SSO tests; a custom failure conservatively fails this
|
||||
rung (we don't split functional-vs-SSO failure → never inflate); no custom tests → **na**.
|
||||
- **integration** ← applies only if the recipe declares deps; pass iff deps wired and SSO verified and
|
||||
custom didn't fail; recipes with no declared deps → **na** (the "caps at L4" rule).
|
||||
- **recipe_local** ← the recipe repo's own `tests/` (discovery source `repo-local`) ran and passed;
|
||||
none present → **na**.
|
||||
|
||||
The pure scorer is exhaustively unit-tested + fuzz-verified (all 729 rung combinations: level ==
|
||||
count of leading consecutive passes, zero inflation).
|
||||
|
||||
### Invariant flags (shown, not climbed)
|
||||
|
||||
@ -84,29 +77,19 @@ build number, or the run's unique app domain for a hand-run). Schema:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": 2, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
|
||||
"schema": 1, "run_id": "...", "recipe": "...", "version": "...", "pr": "...", "ref": "...",
|
||||
"finished": 0.0,
|
||||
"level": 5,
|
||||
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"skip","functional":"pass",
|
||||
"lint":"pass"},
|
||||
"lint": {"status":"pass","detail":"","rules_failed":[]},
|
||||
"skips": {"intentional": {"backup_restore": "not backup-capable (no backupbot labels / declared)"},
|
||||
"unintentional": []},
|
||||
"level": 4, "level_cap_reason": "L5 integration (SSO/OIDC + cross-app) N/A",
|
||||
"rungs": {"install":"pass","upgrade":"pass","backup_restore":"pass","functional":"pass",
|
||||
"integration":"na","recipe_local":"na"},
|
||||
"stages": [{"name":"install","status":"pass",
|
||||
"tests":[{"name":"test_serving","status":"pass","ms":168,"source":"generic"}]}],
|
||||
"results": {"install":"pass","upgrade":"pass","backup":"skip","restore":"skip","custom":"pass"},
|
||||
"results": {"install":"pass","upgrade":"pass","backup":"pass","restore":"pass","custom":"pass"},
|
||||
"flags": {"clean_teardown": true, "no_secret_leak": true},
|
||||
"screenshot": "screenshot.png", "summary_card": "summary.png"
|
||||
}
|
||||
```
|
||||
|
||||
`rungs` carries the four-status vocabulary above; `skips.intentional` maps each intentionally
|
||||
skipped rung to its (declared or structural) reason and `skips.unintentional` lists the
|
||||
unverified rungs. `lint` carries the L5 rung outcome + failing rule ids; the full
|
||||
`abra recipe lint` output is served at `/runs/<run_id>/lint.txt`. Pre-lvl5 artifacts
|
||||
(`"schema": 1`, 4-rung ladder, `level_cap_reason`/`level_cap_rung` present, `"na"` statuses)
|
||||
are still rendered as-is by the dashboard/card — their stored level is never recomputed.
|
||||
|
||||
Assembly is **best-effort**: a failure to build/write `results.json` is logged but never changes the
|
||||
run's exit code (cosmetics never block the pipeline, R7).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user