Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-lvl5-lint-rung.md
autonomic-bot 1f7fc7eb39 plan(lvl5): fold in de-capping — level = highest passed rung, N/A skips, fail blocks
Operator decision (explicit Q&A 2026-06-11): remove cap/cap_reason/capped
entirely. New formula: level = max i with rung_i==pass and all j<i in
{pass,na}. N/A no longer stops the climb (the confusing part — e.g.
non-backup-capable recipes were stuck at L2); a real FAIL still blocks.
Per-rung table + verdict carry the completeness story. Added: de-cap
implementation reqs, both-schema rendering, before/after level table for all
recipes, N/A-skip proof run, bad-canary designed-levels re-derivation under
the new formula.
2026-06-11 01:45:54 +00:00

167 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase `lvl5` — level-system changes: 5th rung (`abra recipe lint`) + remove "capping"
**Mission (operator-specified, two changes):**
1. **New top rung — Level 5 = `abra recipe lint` passes against the exact recipe ref
under test (the PR head on PR builds)**, after the existing four rungs (install,
upgrade, backup/restore, functional). The existing four rungs' meanings are UNCHANGED.
2. **Remove the "capping" concept entirely.** The operator finds cap/cap_reason confusing.
New level semantics (operator-decided 2026-06-11, explicit Q&A):
**level = the highest rung that PASSED, where every rung below it is either "pass" or
"na" — N/A rungs are SKIPPED (they no longer stop the climb), a real FAIL still
blocks.** Formally: `level = max i such that rung_i == "pass" and all j < i have
status in {"pass", "na"}`; 0 if no such i.
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → **level 1** (fail blocks)
- install ✔, upgrade ✔, backup N/A, functional ✔, lint ✔ → **level 5** (N/A skipped —
previously this capped at 2; that was the confusing part)
- all four ✔, lint N/A (e.g. abra missing) → **level 4** (an N/A rung is never EARNED,
it just doesn't block the ones above)
The words "cap", "capped", "cap_reason" disappear from code, schema, card, dashboard,
and docs. The per-rung ✔/✘/— table remains everywhere it exists today, so nothing is
hidden — the table is now the SOLE carrier of "why isn't this level higher".
NOTE this is a deliberate operator override of the old "N/A caps so the level never
overstates" stance from Phase 3: the number now reads "how far did it get", and the
rung table + run verdict carry the completeness story. Record this in DECISIONS.md.
State files (phase-namespaced): `STATUS-lvl5.md`, `BACKLOG-lvl5.md`, `REVIEW-lvl5.md`,
`JOURNAL-lvl5.md`. DECISIONS.md shared (append).
---
## 1. Current system (file map — verified 2026-06-11)
- `runner/harness/level.py` (120 lines, PURE) — `RUNGS = ("install", "upgrade",
"backup_restore", "functional")`, `RUNG_LABEL` 14, `compute_level()` (gap caps; N/A
caps with distinct reason), `tier_to_rung`, `backup_restore_status`.
- `runner/harness/results.py:137-218` — `derive_rungs()` builds the rung dict from tier
results; `compute_level` → results.json `level` + `cap_reason` + `capped`.
- `runner/harness/card.py` — `LEVEL_COLOR` map (line ~58), cap line hardcodes
*"full clean climb — top level (4)"* (line ~246).
- `dashboard/dashboard.py` — `_LEVEL_COLOR` (line ~81), corner level badge `_level_pill`,
`/badge/<recipe>.svg`.
- `tests/unit/test_level.py`, `test_results.py`, `test_card.py`, `test_dashboard.py`.
- **Lint today:** abra's pinned (non-chaos) deploy runs `abra recipe lint` internally and
FATALs on R014 — see `runner/harness/abra.py:109-114`: the CI's mirror-origin repointing
tripped a go-git path, which is why chaos deploys (the PR-testing path!) deliberately
SKIP lint. So lint currently runs implicitly on some paths and not at all on others —
the new rung makes it explicit, uniform, and visible.
## 2. Design requirements
1. **New rung `lint` appended after `functional`** → ladder install(1) upgrade(2)
backup_restore(3) functional(4) **lint(5)**. `RUNG_LABEL[5] = "lint (abra recipe lint)"`
(or similar). Full clean climb is now 5.
2. **What is linted:** the EXACT recipe tree/ref the run deployed (PR head on
`!testme`/PR builds; the tested ref otherwise) — never some other branch. Run
`abra recipe lint <recipe>` (the abra on the CI host) against the run's own checkout
context. Capture rc + full output into the run artifacts (e.g. `lint.txt`), and put a
pass/fail + short excerpt in results.json.
3. **Mirror-plumbing must not pollute recipe lint results (CRITICAL, see abra.py:109-114):**
the R014/go-git failure caused by CI's origin-repointing is a HARNESS artifact, not a
recipe defect. The lint rung must evaluate the recipe's content. Solve it properly
(e.g. lint in a context where origin looks canonical, or pre-step the same
stash/revert dance abra.py already does) — do NOT blanket-ignore lint rules to make
the plumbing pass, and document exactly what (if anything) is filtered and why. Any
filtering is a named, unit-tested, Adversary-reviewed decision.
4. **Verdict semantics UNCHANGED:** lint is a level rung, not a run gate. A lint failure
caps the level at 4 with `cap_reason "L5 ... FAILED"`; it must NOT fail/flip the run
verdict, and must be time-bounded + best-effort in execution (a wedged lint can never
hang a run — hard timeout, ~60s class).
5. **No N/A escape hatch by default:** every recipe can be linted, so the rung is
pass/fail in practice (keep "na" handling for totality, e.g. abra binary missing →
"na" + loud log — never silently "pass"; per the new semantics an N/A lint rung is
simply not earned, so the level stays 4).
6. **De-cap implementation (mission item 2):** `compute_level()` reimplemented to the
new formal rule; `cap_reason`/`capped` deleted from level.py, derive_rungs/results.json
schema, card (the "capped: …"/"full clean climb — top level (4)" line at card.py:~246
is replaced by the rung table alone or a neutral "level N of 5"), dashboard fields,
and docs. Unit tests rewritten to the new semantics, INCLUDING the three worked
examples from the mission and the old N/A-cases (single-published-version recipe,
non-backup-capable recipe) now climbing past their former caps.
7. **All consumers updated coherently:** RUNGS/labels, results schema, card (color map +
hardcoded top-level line), dashboard pills/badge SVG/legend text, docs
(testing.md / results docs / recipe-customization.md §levels if it references L4 as
top), and every unit test that assumes 4 is the ceiling or asserts cap_reason.
8. **History compatibility:** old results.json artifacts (level ≤ 4, lint rung absent,
cap_reason PRESENT) must still render correctly in dashboard/card history views — no
KeyErrors, no retroactive relabeling of old runs; renderers tolerate both schemas.
9. **Expected level shifts are findings, not regressions:** recipes previously capped by
an N/A rung will legitimately jump levels (e.g. a non-backup-capable recipe with
passing functional goes 2 → 4/5). P3/P4 must produce a before/after level table for
ALL enrolled recipes so the Adversary can check every shift against the new rule —
any shift NOT explained by the rule change is a real regression.
## 3. Work plan
**P1 — Ladder + plumbing.** level.py: new rung + the de-capped `compute_level()` per
§2 item 2 (delete cap_reason/capped); lint executor (new `harness/lint.py` or a clean
home in abra.py) with timeout, artifact capture, mirror-context handling per §2.3;
derive_rungs wiring; results schema. Unit tests for: full climb = 5; fail-blocks
(upgrade ✘ → L1 even with higher passes); N/A-skip (backup na + functional ✔ → L4+);
lint na → stays 4; old-artifact rendering; mirror-filter decision (if any).
**P2 — Presentation.** Card, dashboard, badge, docs — a level-5 color/legend that reads
as "above functional". Regenerate anything generated.
**P3 — Reality pass over all enrolled recipes.** Run the lint rung against every enrolled
recipe's current main/HEAD (cheap — lint only, no deploys, respects the shared-checkout
rule below). Matrix in BACKLOG-lvl5.md: pass/fail + rule hits per recipe. For failing
recipes: the rung correctly caps them at ≤4 (that is WORKING AS DESIGNED, not a phase
blocker). Where the fix is mechanical and safe, open a PR against the recipe mirror
(NEVER push main / NEVER merge — operator decides), else file the finding in
DEFERRED.md with the rule output.
**P4 — Real-CI proof.** Full-stage runs on enough recipes to prove the changes
end-to-end: ≥1 recipe reaching a genuine L5 (all five rungs green), ≥1 recipe blocked at
L4 by a lint failure (real or synthesized — a throwaway branch with a deliberate lint
violation is fine), ≥1 recipe demonstrating the N/A-skip (formerly capped by an N/A rung,
now climbing past it), ≥2 runs via the drone `!testme` path showing the rung on a real
PR, plus the canary suite green — the bad canaries must still land at their designed
levels under the NEW formula (re-derive what those designed levels are; backup/restore
fail still blocks). Card + dashboard visually verified (Read the PNGs/SVG output), and
the §2 item 9 before/after level table completed for all enrolled recipes.
## 4. Gates
**M1 — Implementation complete (pre-merge).** Branch with P1+P2; Adversary cold-runs unit
suite + lint from a clean checkout; reviews the mirror-filter decision (§2.3) explicitly;
confirms verdict-neutrality by code inspection + a targeted test. PASS required before
merge to main.
**M2 — Proven in real CI.** P3 matrix complete for ALL enrolled recipes; P4 runs done
(L5 achieved, L4-capped demonstrated, drone path ×2, canaries green); old artifacts still
render; run durations not materially inflated (lint adds ≤ ~60s). Fresh Adversary PASS →
Builder writes `## DONE` to STATUS-lvl5.md.
## 5. Guardrails (binding)
- **Rung statuses stay honest** (the Phase-3 rule, adapted to the new semantics): no
rung is ever silently "pass" — lint errors/timeouts/missing abra are fail/na, never
pass. The level FORMULA is the operator-decided rule in §2 item 2 (N/A skips); the
per-rung table and the run verdict must always remain visible so completeness is
never hidden, and the verdict logic itself is untouched by this phase.
- **No gate weakening; no verdict changes.** Existing tests/assertions untouchable except
where the L4-ceiling assumption itself must change — those edits are mechanical and the
Adversary checks each one against the old intent.
- **Recipe mirrors:** lint findings in recipes → PRs or DEFERRED entries only; NEVER push
recipe-mirror main, NEVER merge their PRs.
- **Shared checkout race:** NEVER git-checkout `~/.abra/recipes/<recipe>` on cc-ci while
that recipe's CI build is running — the harness deploys from that tree. P3's lint-only
sweep must use its own scratch clones or run when the recipe is not mid-build.
- Real-CI etiquette: ≤2-3 concurrent live deploys; teardown every dev deploy on every
exit path; no secrets in logs/commits. Commit author `autonomic-bot
<autonomic-bot@noreply.git.autonomic.zone>`; push after every commit.
- CI host has no python3 on default PATH for remote one-liners — use shell or the
harness venv (`cc-ci-run`).
## 6. Definition of Done
The new level system live on main and visible end-to-end (results.json → card →
dashboard → badge): L5 = abra recipe lint on the tested ref, capping concept fully
removed (level = highest passed rung with all lower rungs pass-or-N/A; N/A skips, fail
blocks; no cap/cap_reason anywhere), all enrolled recipes linted and dispositioned with
the before/after level table adversary-checked, ≥1 real L5 + ≥1 lint-blocked L4 + ≥1
N/A-skip climb demonstrated through real CI including the drone path, old artifacts
unharmed, M1+M2 fresh Adversary PASSes, no verdict or duration regressions.