Files
cc-ci/runner/harness/level.py
autonomic-bot e219a7891d
All checks were successful
continuous-integration/drone/push Build is passing
feat(lvl5): P1 — 5-rung ladder (L5=abra recipe lint) + de-capped level semantics
level.py: RUNGS += lint; statuses {pass,fail,skip,unver}; compute_level = max passed
rung with all below pass-or-skip (fail/unver block); cap_reason/capped DELETED.
harness/lint.py: lint executor — pristine scratch clone of the per-run tree at the
exact tested ref (mirror-origin + untracked-overlay pollution solved by context, no
rule filtered), PTY via script -qec, 60s hard budget, lint.txt artifact, table-parse
classifier (rc only signals FATA), unver on any non-run (never silent pass).
results.py: derive_rungs classifies every N/A source (structural/declared → skip,
else unver), lint rung + synthetic lint stage + lint block in results.json, schema 2,
cap fields removed. run_recipe_ci.py: lint call before tiers (double-wrapped,
verdict-neutral), badge = level only. card/dashboard: 0-5 ramp, cap line → 'level N
of {4|5}', unverified rows, badge number+colour only, lint.txt servable, old schema-1
artifacts render untouched. Unit suite rewritten: 245 passed on cc-ci venv.
2026-06-11 07:42:30 +00:00

113 lines
5.0 KiB
Python

"""The level ladder — five rungs, no capping (phase lvl5, plan-phase-lvl5-lint-rung.md).
A single integer **level** summarising how far up the quality ladder a recipe run climbed:
L0 — install failed / app never became healthy.
L1 — Installs: deploys + passes health/readiness.
L2 — Upgrades: previous published version → PR version, stays healthy, data intact.
L3 — Backup/restore: seeded data survives backup → wipe → restore.
L4 — Functional: recipe-specific functional tests pass.
L5 — Lint: `abra recipe lint` passes against the exact ref under test.
Semantics (operator-decided 2026-06-11, recorded in DECISIONS.md — replaces the Phase-3
"N/A caps" rule):
level = max i such that rung_i == "pass" and every rung j < i is "pass" or "skip"; 0 if none.
A rung has one of FOUR statuses:
"pass" — exercised and passed.
"fail" — exercised and failed. Blocks: no rung above it can count.
"skip" — INTENTIONAL skip: the rung genuinely does not apply to this recipe, from a
declared or structural fact (not backup-capable; only one published version;
declared in recipe_meta.EXPECTED_NA). Does NOT stop the climb.
"unver" — UNINTENTIONAL not-verified: the rung SHOULD have run but didn't (infra error,
missing tool, harness exception, prior-stage abort, timeout). Blocks exactly
like a fail — the level never rises above a rung that wasn't actually checked.
The per-rung table (results.json `rungs`, card, dashboard) is the SOLE carrier of "why isn't
this level higher" — there is no cap_reason. The classification of every N/A source into
skip-vs-unver lives in derive_rungs (results.py) and is tabulated in DECISIONS.md; anything
unclassifiable defaults to "unver" (conservative: never claim what wasn't checked).
Integration (SSO/OIDC + cross-app) and recipe-local (the recipe repo's own tests/) remain
OPTIONAL capabilities — not rungs, never counted (SSO is still enforced for the run VERDICT
via the deps/SSO checks in run_recipe_ci.py).
This module is PURE (no I/O) so it is cheaply unit-testable and the Adversary can re-run the
unit test cold (`cc-ci-run -m pytest tests/unit/test_level.py -q`).
"""
from __future__ import annotations
# The climbable rungs in ascending order. install (L1) is the foundation; L0 means install
# itself did not pass. These five are the ESSENTIAL rungs — integration/recipe-local are
# optional and deliberately NOT in this tuple.
RUNGS = ("install", "upgrade", "backup_restore", "functional", "lint")
# Human-readable label per rung level, for the summary card / docs.
RUNG_LABEL = {
1: "install (deploy + health)",
2: "upgrade (prev published → PR)",
3: "backup/restore (data integrity)",
4: "functional (recipe-specific tests)",
5: "lint (abra recipe lint)",
}
VALID = {"pass", "fail", "skip", "unver"}
def compute_level(rungs: dict[str, str]) -> int:
"""Map a rung-status dict → level 0..5.
`rungs` must contain a status in VALID for every name in RUNGS. The level is the highest
i such that rungs[i] == "pass" and every rung below i is "pass" or "skip" (an intentional
skip does not stop the climb). A "fail" or "unver" rung blocks: rungs above it cannot
count, however green. 0 when no rung qualifies.
"""
for name in RUNGS:
st = rungs.get(name)
if st not in VALID:
raise ValueError(
f"rung {name!r} has invalid status {st!r} (expect one of {sorted(VALID)})"
)
level = 0
for idx, name in enumerate(RUNGS, start=1):
st = rungs[name]
if st == "pass":
level = idx
elif st == "skip":
continue
else: # fail / unver — nothing above this rung can count
break
return level
def backup_restore_status(backup: str | None, restore: str | None, backup_capable: bool) -> str:
"""Collapse the backup + restore tier results into the single L3 rung status.
Not backup-capable (a declared/structural fact: no backupbot labels, or
recipe_meta.BACKUP_CAPABLE=False) → "skip" — the rung genuinely does not apply.
Otherwise both tiers must pass for the rung to pass; a fail in either tier fails it; any
other shape (tier skipped or never ran while backup-capable — e.g. a prior-stage abort)
is "unver": the rung should have been verified and wasn't.
"""
if not backup_capable:
return "skip"
vals = {backup, restore}
if "fail" in vals:
return "fail"
if backup == "pass" and restore == "pass":
return "pass"
return "unver"
def tier_to_rung(status: str | None) -> str:
"""Map a single tier result ('pass'|'fail'|'skip'|None) to a rung status, with NO
intentionality information: a tier that did not produce a pass/fail is "unver" (it should
have run and wasn't verified). The caller (derive_rungs) upgrades "unver" to "skip" where
a declared/structural fact makes the skip intentional — never the other way around."""
if status == "pass":
return "pass"
if status == "fail":
return "fail"
return "unver"