Night-watch findings (monthly-spend-limit window, ~01:49-04:45): - probe text said 'usage limit' which matches LIMIT_RE, so a submitted probe kept limited_now true forever -> reworded to 'quota window' with a CAUTION note (nudge text must never match LIMIT_RE) - dedupe scanned all 40 captured lines, so once a probe scrolled into the conversation no further probe ever fired (builder/adv frozen at nudges=1, orchestrator probes degraded to hourly riding the wake scroll) -> dedupe now only checks the bottom 8 lines (input area) Core invariant HELD: zero kill+reboots during the limit window. plan(lvl5): operator addition - the top-corner level badge (card, dashboard pill, badge SVG) shows only the level number+color, zero capping info; the inline per-rung table keeps intentional-skip/unverified detail.
192 lines
13 KiB
Markdown
192 lines
13 KiB
Markdown
# Phase `lvl5` — level-system changes: 5th rung (`abra recipe lint`) + remove "capping"
|
||
|
||
**Mission (operator-specified, two changes):**
|
||
|
||
1. **New top rung — Level 5 = `abra recipe lint` passes against the exact recipe ref
|
||
under test (the PR head on PR builds)**, after the existing four rungs (install,
|
||
upgrade, backup/restore, functional). The existing four rungs' meanings are UNCHANGED.
|
||
|
||
2. **Remove the "capping" concept entirely.** The operator finds cap/cap_reason confusing.
|
||
New level semantics (operator-decided 2026-06-11, explicit Q&A + refinement):
|
||
**level = the highest rung that PASSED, where every rung below it is "pass" or an
|
||
INTENTIONAL skip.** A real FAIL blocks. A not-applicable rung is split in two:
|
||
- **Intentional skip (declared/structural):** the rung genuinely does not apply to
|
||
this recipe — e.g. not backup-capable (declared), only one published version exists
|
||
(no upgrade target). These are SKIPPED: they do not stop the climb.
|
||
- **Unintentional N/A (unverified):** the rung SHOULD have run but didn't — infra
|
||
error, missing tool (abra absent), harness exception, prior-stage abort, timeout.
|
||
**The level cannot rise above an unverified rung** — it blocks exactly like a fail
|
||
(the rungs below it still count).
|
||
Formally, with statuses {pass, fail, skip (intentional), unver (unintentional)}:
|
||
`level = max i such that rung_i == "pass" and all j < i have status in {"pass","skip"}`;
|
||
0 if no such i.
|
||
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → **level 1** (fail blocks)
|
||
- install ✔, upgrade ✔, backup skip(not capable), functional ✔, lint ✔ → **level 5**
|
||
(intentional skip — previously this capped at 2; that was the confusing part)
|
||
- install ✔, upgrade ✔, backup UNVER (harness error), functional ✔, lint ✔ →
|
||
**level 2** (unverified blocks — we cannot claim what we didn't check)
|
||
- all four ✔, lint unver (abra missing) → **level 4** (an unverified top rung is
|
||
simply not earned)
|
||
**Every N/A source in derive_rungs must be explicitly classified** intentional vs
|
||
unintentional, from declared recipe meta / structural facts vs runtime failure —
|
||
the classification table goes in DECISIONS.md and is Adversary-reviewed. Default for
|
||
an unclassifiable N/A is UNINTENTIONAL (conservative).
|
||
The words "cap", "capped", "cap_reason" disappear from code, schema, card, dashboard,
|
||
and docs. The per-rung table remains everywhere it exists today (now distinguishing
|
||
✔ / ✘ / intentional-skip / unverified), so nothing is hidden — the table is the SOLE
|
||
carrier of "why isn't this level higher". Record the whole semantics change in
|
||
DECISIONS.md (it deliberately replaces the Phase-3 "N/A caps" stance; the
|
||
unverified-blocks rule preserves its honest core: never claim what wasn't checked).
|
||
|
||
State files (phase-namespaced): `STATUS-lvl5.md`, `BACKLOG-lvl5.md`, `REVIEW-lvl5.md`,
|
||
`JOURNAL-lvl5.md`. DECISIONS.md shared (append).
|
||
|
||
---
|
||
|
||
## 1. Current system (file map — verified 2026-06-11)
|
||
|
||
- `runner/harness/level.py` (120 lines, PURE) — `RUNGS = ("install", "upgrade",
|
||
"backup_restore", "functional")`, `RUNG_LABEL` 1–4, `compute_level()` (gap caps; N/A
|
||
caps with distinct reason), `tier_to_rung`, `backup_restore_status`.
|
||
- `runner/harness/results.py:137-218` — `derive_rungs()` builds the rung dict from tier
|
||
results; `compute_level` → results.json `level` + `cap_reason` + `capped`.
|
||
- `runner/harness/card.py` — `LEVEL_COLOR` map (line ~58), cap line hardcodes
|
||
*"full clean climb — top level (4)"* (line ~246).
|
||
- `dashboard/dashboard.py` — `_LEVEL_COLOR` (line ~81), corner level badge `_level_pill`,
|
||
`/badge/<recipe>.svg`.
|
||
- `tests/unit/test_level.py`, `test_results.py`, `test_card.py`, `test_dashboard.py`.
|
||
- **Lint today:** abra's pinned (non-chaos) deploy runs `abra recipe lint` internally and
|
||
FATALs on R014 — see `runner/harness/abra.py:109-114`: the CI's mirror-origin repointing
|
||
tripped a go-git path, which is why chaos deploys (the PR-testing path!) deliberately
|
||
SKIP lint. So lint currently runs implicitly on some paths and not at all on others —
|
||
the new rung makes it explicit, uniform, and visible.
|
||
|
||
## 2. Design requirements
|
||
|
||
1. **New rung `lint` appended after `functional`** → ladder install(1) upgrade(2)
|
||
backup_restore(3) functional(4) **lint(5)**. `RUNG_LABEL[5] = "lint (abra recipe lint)"`
|
||
(or similar). Full clean climb is now 5.
|
||
2. **What is linted:** the EXACT recipe tree/ref the run deployed (PR head on
|
||
`!testme`/PR builds; the tested ref otherwise) — never some other branch. Run
|
||
`abra recipe lint <recipe>` (the abra on the CI host) against the run's own checkout
|
||
context. Capture rc + full output into the run artifacts (e.g. `lint.txt`), and put a
|
||
pass/fail + short excerpt in results.json.
|
||
3. **Mirror-plumbing must not pollute recipe lint results (CRITICAL, see abra.py:109-114):**
|
||
the R014/go-git failure caused by CI's origin-repointing is a HARNESS artifact, not a
|
||
recipe defect. The lint rung must evaluate the recipe's content. Solve it properly
|
||
(e.g. lint in a context where origin looks canonical, or pre-step the same
|
||
stash/revert dance abra.py already does) — do NOT blanket-ignore lint rules to make
|
||
the plumbing pass, and document exactly what (if anything) is filtered and why. Any
|
||
filtering is a named, unit-tested, Adversary-reviewed decision.
|
||
4. **Verdict semantics UNCHANGED:** lint is a level rung, not a run gate. A lint failure
|
||
caps the level at 4 with `cap_reason "L5 ... FAILED"`; it must NOT fail/flip the run
|
||
verdict, and must be time-bounded + best-effort in execution (a wedged lint can never
|
||
hang a run — hard timeout, ~60s class).
|
||
5. **No N/A escape hatch by default:** every recipe can be linted, so the rung is
|
||
pass/fail in practice. When lint can't run (abra binary missing, timeout) the status
|
||
is **unver** + loud log — never silently "pass", never an intentional skip; the rung
|
||
is not earned and the level stays ≤ 4.
|
||
6. **De-cap implementation (mission item 2):** `compute_level()` reimplemented to the
|
||
new formal rule; `cap_reason`/`capped` deleted from level.py, derive_rungs/results.json
|
||
schema, card (the "capped: …"/"full clean climb — top level (4)" line at card.py:~246
|
||
is replaced by the rung table alone or a neutral "level N of 5"), dashboard fields,
|
||
and docs. **The top-corner level badge in particular** (the big LEVEL badge on the
|
||
card, the dashboard `_level_pill`, and `/badge/<recipe>.svg`) shows ONLY the level —
|
||
number + color, zero capping/cap-reason annotation (operator-specified 2026-06-11).
|
||
The INLINE detail keeps the intentional-skip information: the per-rung table still
|
||
marks each rung ✔ / ✘ / intentional-skip / unverified, so "why isn't it higher" lives
|
||
there and only there. Unit tests rewritten to the new semantics, INCLUDING the worked
|
||
examples from the mission and the old N/A-cases (single-published-version recipe,
|
||
non-backup-capable recipe) now climbing past their former caps.
|
||
7. **All consumers updated coherently:** RUNGS/labels, results schema, card (color map +
|
||
hardcoded top-level line), dashboard pills/badge SVG/legend text, docs
|
||
(testing.md / results docs / recipe-customization.md §levels if it references L4 as
|
||
top), and every unit test that assumes 4 is the ceiling or asserts cap_reason.
|
||
8. **History compatibility:** old results.json artifacts (level ≤ 4, lint rung absent,
|
||
cap_reason PRESENT) must still render correctly in dashboard/card history views — no
|
||
KeyErrors, no retroactive relabeling of old runs; renderers tolerate both schemas.
|
||
9. **Expected level shifts are findings, not regressions:** recipes previously capped by
|
||
an N/A rung will legitimately jump levels (e.g. a non-backup-capable recipe with
|
||
passing functional goes 2 → 4/5). P3/P4 must produce a before/after level table for
|
||
ALL enrolled recipes so the Adversary can check every shift against the new rule —
|
||
any shift NOT explained by the rule change is a real regression.
|
||
|
||
## 3. Work plan
|
||
|
||
**P1 — Ladder + plumbing.** level.py: new rung + the de-capped `compute_level()` per
|
||
§2 item 2 (delete cap_reason/capped); lint executor (new `harness/lint.py` or a clean
|
||
home in abra.py) with timeout, artifact capture, mirror-context handling per §2.3;
|
||
derive_rungs wiring (incl. the intentional-vs-unintentional classification of every N/A
|
||
source, §2 item 2); results schema. Unit tests for: full climb = 5; fail-blocks
|
||
(upgrade ✘ → L1 even with higher passes); intentional-skip climbs (backup skip +
|
||
functional ✔ → L4+); unverified-blocks (backup unver + functional ✔ → L2); lint unver →
|
||
stays 4; unclassifiable N/A defaults to unver; old-artifact rendering; mirror-filter
|
||
decision (if any).
|
||
|
||
**P2 — Presentation.** Card, dashboard, badge, docs — a level-5 color/legend that reads
|
||
as "above functional". Regenerate anything generated.
|
||
|
||
**P3 — Reality pass over all enrolled recipes.** Run the lint rung against every enrolled
|
||
recipe's current main/HEAD (cheap — lint only, no deploys, respects the shared-checkout
|
||
rule below). Matrix in BACKLOG-lvl5.md: pass/fail + rule hits per recipe. For failing
|
||
recipes: the rung correctly caps them at ≤4 (that is WORKING AS DESIGNED, not a phase
|
||
blocker). Where the fix is mechanical and safe, open a PR against the recipe mirror
|
||
(NEVER push main / NEVER merge — operator decides), else file the finding in
|
||
DEFERRED.md with the rule output.
|
||
|
||
**P4 — Real-CI proof.** Full-stage runs on enough recipes to prove the changes
|
||
end-to-end: ≥1 recipe reaching a genuine L5 (all five rungs green), ≥1 recipe blocked at
|
||
L4 by a lint failure (real or synthesized — a throwaway branch with a deliberate lint
|
||
violation is fine), ≥1 recipe demonstrating the N/A-skip (formerly capped by an N/A rung,
|
||
now climbing past it), ≥2 runs via the drone `!testme` path showing the rung on a real
|
||
PR, plus the canary suite green — the bad canaries must still land at their designed
|
||
levels under the NEW formula (re-derive what those designed levels are; backup/restore
|
||
fail still blocks). The unverified-blocks rule is proven by unit test + one synthesized
|
||
run (induce a tier abort and check the level stays below the unverified rung). Card +
|
||
dashboard visually verified (Read the PNGs/SVG output), and the §2 item 9 before/after
|
||
level table completed for all enrolled recipes.
|
||
|
||
## 4. Gates
|
||
|
||
**M1 — Implementation complete (pre-merge).** Branch with P1+P2; Adversary cold-runs unit
|
||
suite + lint from a clean checkout; reviews the mirror-filter decision (§2.3) explicitly;
|
||
confirms verdict-neutrality by code inspection + a targeted test. PASS required before
|
||
merge to main.
|
||
|
||
**M2 — Proven in real CI.** P3 matrix complete for ALL enrolled recipes; P4 runs done
|
||
(L5 achieved, L4-capped demonstrated, drone path ×2, canaries green); old artifacts still
|
||
render; run durations not materially inflated (lint adds ≤ ~60s). Fresh Adversary PASS →
|
||
Builder writes `## DONE` to STATUS-lvl5.md.
|
||
|
||
## 5. Guardrails (binding)
|
||
|
||
- **Rung statuses stay honest** (the Phase-3 rule, adapted to the new semantics): no
|
||
rung is ever silently "pass" — lint errors/timeouts/missing abra are fail/na, never
|
||
pass. The level FORMULA is the operator-decided rule in §2 item 2 (N/A skips); the
|
||
per-rung table and the run verdict must always remain visible so completeness is
|
||
never hidden, and the verdict logic itself is untouched by this phase.
|
||
- **No gate weakening; no verdict changes.** Existing tests/assertions untouchable except
|
||
where the L4-ceiling assumption itself must change — those edits are mechanical and the
|
||
Adversary checks each one against the old intent.
|
||
- **Recipe mirrors:** lint findings in recipes → PRs or DEFERRED entries only; NEVER push
|
||
recipe-mirror main, NEVER merge their PRs.
|
||
- **Shared checkout race:** NEVER git-checkout `~/.abra/recipes/<recipe>` on cc-ci while
|
||
that recipe's CI build is running — the harness deploys from that tree. P3's lint-only
|
||
sweep must use its own scratch clones or run when the recipe is not mid-build.
|
||
- Real-CI etiquette: ≤2-3 concurrent live deploys; teardown every dev deploy on every
|
||
exit path; no secrets in logs/commits. Commit author `autonomic-bot
|
||
<autonomic-bot@noreply.git.autonomic.zone>`; push after every commit.
|
||
- CI host has no python3 on default PATH for remote one-liners — use shell or the
|
||
harness venv (`cc-ci-run`).
|
||
|
||
## 6. Definition of Done
|
||
|
||
The new level system live on main and visible end-to-end (results.json → card →
|
||
dashboard → badge): L5 = abra recipe lint on the tested ref, capping concept fully
|
||
removed (level = highest passed rung with all lower rungs pass-or-intentional-skip;
|
||
declared skips climb, fails AND unverified rungs block; no cap/cap_reason anywhere),
|
||
all enrolled recipes linted and dispositioned with
|
||
the before/after level table adversary-checked, ≥1 real L5 + ≥1 lint-blocked L4 + ≥1
|
||
N/A-skip climb demonstrated through real CI including the drone path, old artifacts
|
||
unharmed, M1+M2 fresh Adversary PASSes, no verdict or duration regressions.
|