Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-lvl5-lint-rung.md
autonomic-bot 5ea17fca21 watchdog: fix limit-probe self-match + scrollback dedupe wedge; plan(lvl5): badge shows level only
Night-watch findings (monthly-spend-limit window, ~01:49-04:45):
- probe text said 'usage limit' which matches LIMIT_RE, so a submitted probe
  kept limited_now true forever -> reworded to 'quota window' with a CAUTION
  note (nudge text must never match LIMIT_RE)
- dedupe scanned all 40 captured lines, so once a probe scrolled into the
  conversation no further probe ever fired (builder/adv frozen at nudges=1,
  orchestrator probes degraded to hourly riding the wake scroll) -> dedupe
  now only checks the bottom 8 lines (input area)
Core invariant HELD: zero kill+reboots during the limit window.

plan(lvl5): operator addition - the top-corner level badge (card, dashboard
pill, badge SVG) shows only the level number+color, zero capping info; the
inline per-rung table keeps intentional-skip/unverified detail.
2026-06-11 05:52:26 +00:00

192 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase `lvl5` — level-system changes: 5th rung (`abra recipe lint`) + remove "capping"
**Mission (operator-specified, two changes):**
1. **New top rung — Level 5 = `abra recipe lint` passes against the exact recipe ref
under test (the PR head on PR builds)**, after the existing four rungs (install,
upgrade, backup/restore, functional). The existing four rungs' meanings are UNCHANGED.
2. **Remove the "capping" concept entirely.** The operator finds cap/cap_reason confusing.
New level semantics (operator-decided 2026-06-11, explicit Q&A + refinement):
**level = the highest rung that PASSED, where every rung below it is "pass" or an
INTENTIONAL skip.** A real FAIL blocks. A not-applicable rung is split in two:
- **Intentional skip (declared/structural):** the rung genuinely does not apply to
this recipe — e.g. not backup-capable (declared), only one published version exists
(no upgrade target). These are SKIPPED: they do not stop the climb.
- **Unintentional N/A (unverified):** the rung SHOULD have run but didn't — infra
error, missing tool (abra absent), harness exception, prior-stage abort, timeout.
**The level cannot rise above an unverified rung** — it blocks exactly like a fail
(the rungs below it still count).
Formally, with statuses {pass, fail, skip (intentional), unver (unintentional)}:
`level = max i such that rung_i == "pass" and all j < i have status in {"pass","skip"}`;
0 if no such i.
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → **level 1** (fail blocks)
- install ✔, upgrade ✔, backup skip(not capable), functional ✔, lint ✔ → **level 5**
(intentional skip — previously this capped at 2; that was the confusing part)
- install ✔, upgrade ✔, backup UNVER (harness error), functional ✔, lint ✔ →
**level 2** (unverified blocks — we cannot claim what we didn't check)
- all four ✔, lint unver (abra missing) → **level 4** (an unverified top rung is
simply not earned)
**Every N/A source in derive_rungs must be explicitly classified** intentional vs
unintentional, from declared recipe meta / structural facts vs runtime failure —
the classification table goes in DECISIONS.md and is Adversary-reviewed. Default for
an unclassifiable N/A is UNINTENTIONAL (conservative).
The words "cap", "capped", "cap_reason" disappear from code, schema, card, dashboard,
and docs. The per-rung table remains everywhere it exists today (now distinguishing
✔ / ✘ / intentional-skip / unverified), so nothing is hidden — the table is the SOLE
carrier of "why isn't this level higher". Record the whole semantics change in
DECISIONS.md (it deliberately replaces the Phase-3 "N/A caps" stance; the
unverified-blocks rule preserves its honest core: never claim what wasn't checked).
State files (phase-namespaced): `STATUS-lvl5.md`, `BACKLOG-lvl5.md`, `REVIEW-lvl5.md`,
`JOURNAL-lvl5.md`. DECISIONS.md shared (append).
---
## 1. Current system (file map — verified 2026-06-11)
- `runner/harness/level.py` (120 lines, PURE) — `RUNGS = ("install", "upgrade",
"backup_restore", "functional")`, `RUNG_LABEL` 14, `compute_level()` (gap caps; N/A
caps with distinct reason), `tier_to_rung`, `backup_restore_status`.
- `runner/harness/results.py:137-218` — `derive_rungs()` builds the rung dict from tier
results; `compute_level` → results.json `level` + `cap_reason` + `capped`.
- `runner/harness/card.py` — `LEVEL_COLOR` map (line ~58), cap line hardcodes
*"full clean climb — top level (4)"* (line ~246).
- `dashboard/dashboard.py` — `_LEVEL_COLOR` (line ~81), corner level badge `_level_pill`,
`/badge/<recipe>.svg`.
- `tests/unit/test_level.py`, `test_results.py`, `test_card.py`, `test_dashboard.py`.
- **Lint today:** abra's pinned (non-chaos) deploy runs `abra recipe lint` internally and
FATALs on R014 — see `runner/harness/abra.py:109-114`: the CI's mirror-origin repointing
tripped a go-git path, which is why chaos deploys (the PR-testing path!) deliberately
SKIP lint. So lint currently runs implicitly on some paths and not at all on others —
the new rung makes it explicit, uniform, and visible.
## 2. Design requirements
1. **New rung `lint` appended after `functional`** → ladder install(1) upgrade(2)
backup_restore(3) functional(4) **lint(5)**. `RUNG_LABEL[5] = "lint (abra recipe lint)"`
(or similar). Full clean climb is now 5.
2. **What is linted:** the EXACT recipe tree/ref the run deployed (PR head on
`!testme`/PR builds; the tested ref otherwise) — never some other branch. Run
`abra recipe lint <recipe>` (the abra on the CI host) against the run's own checkout
context. Capture rc + full output into the run artifacts (e.g. `lint.txt`), and put a
pass/fail + short excerpt in results.json.
3. **Mirror-plumbing must not pollute recipe lint results (CRITICAL, see abra.py:109-114):**
the R014/go-git failure caused by CI's origin-repointing is a HARNESS artifact, not a
recipe defect. The lint rung must evaluate the recipe's content. Solve it properly
(e.g. lint in a context where origin looks canonical, or pre-step the same
stash/revert dance abra.py already does) — do NOT blanket-ignore lint rules to make
the plumbing pass, and document exactly what (if anything) is filtered and why. Any
filtering is a named, unit-tested, Adversary-reviewed decision.
4. **Verdict semantics UNCHANGED:** lint is a level rung, not a run gate. A lint failure
caps the level at 4 with `cap_reason "L5 ... FAILED"`; it must NOT fail/flip the run
verdict, and must be time-bounded + best-effort in execution (a wedged lint can never
hang a run — hard timeout, ~60s class).
5. **No N/A escape hatch by default:** every recipe can be linted, so the rung is
pass/fail in practice. When lint can't run (abra binary missing, timeout) the status
is **unver** + loud log — never silently "pass", never an intentional skip; the rung
is not earned and the level stays ≤ 4.
6. **De-cap implementation (mission item 2):** `compute_level()` reimplemented to the
new formal rule; `cap_reason`/`capped` deleted from level.py, derive_rungs/results.json
schema, card (the "capped: …"/"full clean climb — top level (4)" line at card.py:~246
is replaced by the rung table alone or a neutral "level N of 5"), dashboard fields,
and docs. **The top-corner level badge in particular** (the big LEVEL badge on the
card, the dashboard `_level_pill`, and `/badge/<recipe>.svg`) shows ONLY the level —
number + color, zero capping/cap-reason annotation (operator-specified 2026-06-11).
The INLINE detail keeps the intentional-skip information: the per-rung table still
marks each rung ✔ / ✘ / intentional-skip / unverified, so "why isn't it higher" lives
there and only there. Unit tests rewritten to the new semantics, INCLUDING the worked
examples from the mission and the old N/A-cases (single-published-version recipe,
non-backup-capable recipe) now climbing past their former caps.
7. **All consumers updated coherently:** RUNGS/labels, results schema, card (color map +
hardcoded top-level line), dashboard pills/badge SVG/legend text, docs
(testing.md / results docs / recipe-customization.md §levels if it references L4 as
top), and every unit test that assumes 4 is the ceiling or asserts cap_reason.
8. **History compatibility:** old results.json artifacts (level ≤ 4, lint rung absent,
cap_reason PRESENT) must still render correctly in dashboard/card history views — no
KeyErrors, no retroactive relabeling of old runs; renderers tolerate both schemas.
9. **Expected level shifts are findings, not regressions:** recipes previously capped by
an N/A rung will legitimately jump levels (e.g. a non-backup-capable recipe with
passing functional goes 2 → 4/5). P3/P4 must produce a before/after level table for
ALL enrolled recipes so the Adversary can check every shift against the new rule —
any shift NOT explained by the rule change is a real regression.
## 3. Work plan
**P1 — Ladder + plumbing.** level.py: new rung + the de-capped `compute_level()` per
§2 item 2 (delete cap_reason/capped); lint executor (new `harness/lint.py` or a clean
home in abra.py) with timeout, artifact capture, mirror-context handling per §2.3;
derive_rungs wiring (incl. the intentional-vs-unintentional classification of every N/A
source, §2 item 2); results schema. Unit tests for: full climb = 5; fail-blocks
(upgrade ✘ → L1 even with higher passes); intentional-skip climbs (backup skip +
functional ✔ → L4+); unverified-blocks (backup unver + functional ✔ → L2); lint unver →
stays 4; unclassifiable N/A defaults to unver; old-artifact rendering; mirror-filter
decision (if any).
**P2 — Presentation.** Card, dashboard, badge, docs — a level-5 color/legend that reads
as "above functional". Regenerate anything generated.
**P3 — Reality pass over all enrolled recipes.** Run the lint rung against every enrolled
recipe's current main/HEAD (cheap — lint only, no deploys, respects the shared-checkout
rule below). Matrix in BACKLOG-lvl5.md: pass/fail + rule hits per recipe. For failing
recipes: the rung correctly caps them at ≤4 (that is WORKING AS DESIGNED, not a phase
blocker). Where the fix is mechanical and safe, open a PR against the recipe mirror
(NEVER push main / NEVER merge — operator decides), else file the finding in
DEFERRED.md with the rule output.
**P4 — Real-CI proof.** Full-stage runs on enough recipes to prove the changes
end-to-end: ≥1 recipe reaching a genuine L5 (all five rungs green), ≥1 recipe blocked at
L4 by a lint failure (real or synthesized — a throwaway branch with a deliberate lint
violation is fine), ≥1 recipe demonstrating the N/A-skip (formerly capped by an N/A rung,
now climbing past it), ≥2 runs via the drone `!testme` path showing the rung on a real
PR, plus the canary suite green — the bad canaries must still land at their designed
levels under the NEW formula (re-derive what those designed levels are; backup/restore
fail still blocks). The unverified-blocks rule is proven by unit test + one synthesized
run (induce a tier abort and check the level stays below the unverified rung). Card +
dashboard visually verified (Read the PNGs/SVG output), and the §2 item 9 before/after
level table completed for all enrolled recipes.
## 4. Gates
**M1 — Implementation complete (pre-merge).** Branch with P1+P2; Adversary cold-runs unit
suite + lint from a clean checkout; reviews the mirror-filter decision (§2.3) explicitly;
confirms verdict-neutrality by code inspection + a targeted test. PASS required before
merge to main.
**M2 — Proven in real CI.** P3 matrix complete for ALL enrolled recipes; P4 runs done
(L5 achieved, L4-capped demonstrated, drone path ×2, canaries green); old artifacts still
render; run durations not materially inflated (lint adds ≤ ~60s). Fresh Adversary PASS →
Builder writes `## DONE` to STATUS-lvl5.md.
## 5. Guardrails (binding)
- **Rung statuses stay honest** (the Phase-3 rule, adapted to the new semantics): no
rung is ever silently "pass" — lint errors/timeouts/missing abra are fail/na, never
pass. The level FORMULA is the operator-decided rule in §2 item 2 (N/A skips); the
per-rung table and the run verdict must always remain visible so completeness is
never hidden, and the verdict logic itself is untouched by this phase.
- **No gate weakening; no verdict changes.** Existing tests/assertions untouchable except
where the L4-ceiling assumption itself must change — those edits are mechanical and the
Adversary checks each one against the old intent.
- **Recipe mirrors:** lint findings in recipes → PRs or DEFERRED entries only; NEVER push
recipe-mirror main, NEVER merge their PRs.
- **Shared checkout race:** NEVER git-checkout `~/.abra/recipes/<recipe>` on cc-ci while
that recipe's CI build is running — the harness deploys from that tree. P3's lint-only
sweep must use its own scratch clones or run when the recipe is not mid-build.
- Real-CI etiquette: ≤2-3 concurrent live deploys; teardown every dev deploy on every
exit path; no secrets in logs/commits. Commit author `autonomic-bot
<autonomic-bot@noreply.git.autonomic.zone>`; push after every commit.
- CI host has no python3 on default PATH for remote one-liners — use shell or the
harness venv (`cc-ci-run`).
## 6. Definition of Done
The new level system live on main and visible end-to-end (results.json → card →
dashboard → badge): L5 = abra recipe lint on the tested ref, capping concept fully
removed (level = highest passed rung with all lower rungs pass-or-intentional-skip;
declared skips climb, fails AND unverified rungs block; no cap/cap_reason anywhere),
all enrolled recipes linted and dispositioned with
the before/after level table adversary-checked, ≥1 real L5 + ≥1 lint-blocked L4 + ≥1
N/A-skip climb demonstrated through real CI including the drone path, old artifacts
unharmed, M1+M2 fresh Adversary PASSes, no verdict or duration regressions.