Files

autonomic-bot 5ea17fca21 watchdog: fix limit-probe self-match + scrollback dedupe wedge; plan(lvl5): badge shows level only

Night-watch findings (monthly-spend-limit window, ~01:49-04:45):
- probe text said 'usage limit' which matches LIMIT_RE, so a submitted probe
  kept limited_now true forever -> reworded to 'quota window' with a CAUTION
  note (nudge text must never match LIMIT_RE)
- dedupe scanned all 40 captured lines, so once a probe scrolled into the
  conversation no further probe ever fired (builder/adv frozen at nudges=1,
  orchestrator probes degraded to hourly riding the wake scroll) -> dedupe
  now only checks the bottom 8 lines (input area)
Core invariant HELD: zero kill+reboots during the limit window.

plan(lvl5): operator addition - the top-corner level badge (card, dashboard
pill, badge SVG) shows only the level number+color, zero capping info; the
inline per-rung table keeps intentional-skip/unverified detail.

2026-06-11 05:52:26 +00:00

13 KiB

Raw Blame History

Phase `lvl5` — level-system changes: 5th rung (`abra recipe lint`) + remove "capping"

Mission (operator-specified, two changes):

New top rung — Level 5 = abra recipe lint passes against the exact recipe ref under test (the PR head on PR builds), after the existing four rungs (install, upgrade, backup/restore, functional). The existing four rungs' meanings are UNCHANGED.
Remove the "capping" concept entirely. The operator finds cap/cap_reason confusing. New level semantics (operator-decided 2026-06-11, explicit Q&A + refinement): level = the highest rung that PASSED, where every rung below it is "pass" or an INTENTIONAL skip. A real FAIL blocks. A not-applicable rung is split in two:
- Intentional skip (declared/structural): the rung genuinely does not apply to this recipe — e.g. not backup-capable (declared), only one published version exists (no upgrade target). These are SKIPPED: they do not stop the climb.
- Unintentional N/A (unverified): the rung SHOULD have run but didn't — infra error, missing tool (abra absent), harness exception, prior-stage abort, timeout. The level cannot rise above an unverified rung — it blocks exactly like a fail (the rungs below it still count). Formally, with statuses {pass, fail, skip (intentional), unver (unintentional)}: level = max i such that rung_i == "pass" and all j < i have status in {"pass","skip"}; 0 if no such i.
- install ✔, upgrade ✘, backup ✔, functional ✔, lint ✔ → level 1 (fail blocks)
- install ✔, upgrade ✔, backup skip(not capable), functional ✔, lint ✔ → level 5 (intentional skip — previously this capped at 2; that was the confusing part)
- install ✔, upgrade ✔, backup UNVER (harness error), functional ✔, lint ✔ → level 2 (unverified blocks — we cannot claim what we didn't check)
- all four ✔, lint unver (abra missing) → level 4 (an unverified top rung is simply not earned) Every N/A source in derive_rungs must be explicitly classified intentional vs unintentional, from declared recipe meta / structural facts vs runtime failure — the classification table goes in DECISIONS.md and is Adversary-reviewed. Default for an unclassifiable N/A is UNINTENTIONAL (conservative). The words "cap", "capped", "cap_reason" disappear from code, schema, card, dashboard, and docs. The per-rung table remains everywhere it exists today (now distinguishing ✔ / ✘ / intentional-skip / unverified), so nothing is hidden — the table is the SOLE carrier of "why isn't this level higher". Record the whole semantics change in DECISIONS.md (it deliberately replaces the Phase-3 "N/A caps" stance; the unverified-blocks rule preserves its honest core: never claim what wasn't checked).

State files (phase-namespaced): STATUS-lvl5.md, BACKLOG-lvl5.md, REVIEW-lvl5.md, JOURNAL-lvl5.md. DECISIONS.md shared (append).

1. Current system (file map — verified 2026-06-11)

runner/harness/level.py (120 lines, PURE) — RUNGS = ("install", "upgrade", "backup_restore", "functional"), RUNG_LABEL 1–4, compute_level() (gap caps; N/A caps with distinct reason), tier_to_rung, backup_restore_status.
runner/harness/results.py:137-218 — derive_rungs() builds the rung dict from tier results; compute_level → results.json level + cap_reason + capped.
runner/harness/card.py — LEVEL_COLOR map (line ~58), cap line hardcodes "full clean climb — top level (4)" (line ~246).
dashboard/dashboard.py — _LEVEL_COLOR (line ~81), corner level badge _level_pill, /badge/<recipe>.svg.
tests/unit/test_level.py, test_results.py, test_card.py, test_dashboard.py.
Lint today: abra's pinned (non-chaos) deploy runs abra recipe lint internally and FATALs on R014 — see runner/harness/abra.py:109-114: the CI's mirror-origin repointing tripped a go-git path, which is why chaos deploys (the PR-testing path!) deliberately SKIP lint. So lint currently runs implicitly on some paths and not at all on others — the new rung makes it explicit, uniform, and visible.

2. Design requirements

New rung lint appended after functional → ladder install(1) upgrade(2) backup_restore(3) functional(4) lint(5). RUNG_LABEL[5] = "lint (abra recipe lint)" (or similar). Full clean climb is now 5.
What is linted: the EXACT recipe tree/ref the run deployed (PR head on !testme/PR builds; the tested ref otherwise) — never some other branch. Run abra recipe lint <recipe> (the abra on the CI host) against the run's own checkout context. Capture rc + full output into the run artifacts (e.g. lint.txt), and put a pass/fail + short excerpt in results.json.
Mirror-plumbing must not pollute recipe lint results (CRITICAL, see abra.py:109-114): the R014/go-git failure caused by CI's origin-repointing is a HARNESS artifact, not a recipe defect. The lint rung must evaluate the recipe's content. Solve it properly (e.g. lint in a context where origin looks canonical, or pre-step the same stash/revert dance abra.py already does) — do NOT blanket-ignore lint rules to make the plumbing pass, and document exactly what (if anything) is filtered and why. Any filtering is a named, unit-tested, Adversary-reviewed decision.
Verdict semantics UNCHANGED: lint is a level rung, not a run gate. A lint failure caps the level at 4 with cap_reason "L5 ... FAILED"; it must NOT fail/flip the run verdict, and must be time-bounded + best-effort in execution (a wedged lint can never hang a run — hard timeout, ~60s class).
No N/A escape hatch by default: every recipe can be linted, so the rung is pass/fail in practice. When lint can't run (abra binary missing, timeout) the status is unver + loud log — never silently "pass", never an intentional skip; the rung is not earned and the level stays ≤ 4.
De-cap implementation (mission item 2): compute_level() reimplemented to the new formal rule; cap_reason/capped deleted from level.py, derive_rungs/results.json schema, card (the "capped: …"/"full clean climb — top level (4)" line at card.py:~246 is replaced by the rung table alone or a neutral "level N of 5"), dashboard fields, and docs. The top-corner level badge in particular (the big LEVEL badge on the card, the dashboard _level_pill, and /badge/<recipe>.svg) shows ONLY the level — number + color, zero capping/cap-reason annotation (operator-specified 2026-06-11). The INLINE detail keeps the intentional-skip information: the per-rung table still marks each rung ✔ / ✘ / intentional-skip / unverified, so "why isn't it higher" lives there and only there. Unit tests rewritten to the new semantics, INCLUDING the worked examples from the mission and the old N/A-cases (single-published-version recipe, non-backup-capable recipe) now climbing past their former caps.
All consumers updated coherently: RUNGS/labels, results schema, card (color map + hardcoded top-level line), dashboard pills/badge SVG/legend text, docs (testing.md / results docs / recipe-customization.md §levels if it references L4 as top), and every unit test that assumes 4 is the ceiling or asserts cap_reason.
History compatibility: old results.json artifacts (level ≤ 4, lint rung absent, cap_reason PRESENT) must still render correctly in dashboard/card history views — no KeyErrors, no retroactive relabeling of old runs; renderers tolerate both schemas.
Expected level shifts are findings, not regressions: recipes previously capped by an N/A rung will legitimately jump levels (e.g. a non-backup-capable recipe with passing functional goes 2 → 4/5). P3/P4 must produce a before/after level table for ALL enrolled recipes so the Adversary can check every shift against the new rule — any shift NOT explained by the rule change is a real regression.

3. Work plan

P1 — Ladder + plumbing. level.py: new rung + the de-capped compute_level() per §2 item 2 (delete cap_reason/capped); lint executor (new harness/lint.py or a clean home in abra.py) with timeout, artifact capture, mirror-context handling per §2.3; derive_rungs wiring (incl. the intentional-vs-unintentional classification of every N/A source, §2 item 2); results schema. Unit tests for: full climb = 5; fail-blocks (upgrade ✘ → L1 even with higher passes); intentional-skip climbs (backup skip + functional ✔ → L4+); unverified-blocks (backup unver + functional ✔ → L2); lint unver → stays 4; unclassifiable N/A defaults to unver; old-artifact rendering; mirror-filter decision (if any).

P2 — Presentation. Card, dashboard, badge, docs — a level-5 color/legend that reads as "above functional". Regenerate anything generated.

P3 — Reality pass over all enrolled recipes. Run the lint rung against every enrolled recipe's current main/HEAD (cheap — lint only, no deploys, respects the shared-checkout rule below). Matrix in BACKLOG-lvl5.md: pass/fail + rule hits per recipe. For failing recipes: the rung correctly caps them at ≤4 (that is WORKING AS DESIGNED, not a phase blocker). Where the fix is mechanical and safe, open a PR against the recipe mirror (NEVER push main / NEVER merge — operator decides), else file the finding in DEFERRED.md with the rule output.

P4 — Real-CI proof. Full-stage runs on enough recipes to prove the changes end-to-end: ≥1 recipe reaching a genuine L5 (all five rungs green), ≥1 recipe blocked at L4 by a lint failure (real or synthesized — a throwaway branch with a deliberate lint violation is fine), ≥1 recipe demonstrating the N/A-skip (formerly capped by an N/A rung, now climbing past it), ≥2 runs via the drone !testme path showing the rung on a real PR, plus the canary suite green — the bad canaries must still land at their designed levels under the NEW formula (re-derive what those designed levels are; backup/restore fail still blocks). The unverified-blocks rule is proven by unit test + one synthesized run (induce a tier abort and check the level stays below the unverified rung). Card + dashboard visually verified (Read the PNGs/SVG output), and the §2 item 9 before/after level table completed for all enrolled recipes.

4. Gates

M1 — Implementation complete (pre-merge). Branch with P1+P2; Adversary cold-runs unit suite + lint from a clean checkout; reviews the mirror-filter decision (§2.3) explicitly; confirms verdict-neutrality by code inspection + a targeted test. PASS required before merge to main.

M2 — Proven in real CI. P3 matrix complete for ALL enrolled recipes; P4 runs done (L5 achieved, L4-capped demonstrated, drone path ×2, canaries green); old artifacts still render; run durations not materially inflated (lint adds ≤ ~60s). Fresh Adversary PASS → Builder writes ## DONE to STATUS-lvl5.md.

5. Guardrails (binding)

Rung statuses stay honest (the Phase-3 rule, adapted to the new semantics): no rung is ever silently "pass" — lint errors/timeouts/missing abra are fail/na, never pass. The level FORMULA is the operator-decided rule in §2 item 2 (N/A skips); the per-rung table and the run verdict must always remain visible so completeness is never hidden, and the verdict logic itself is untouched by this phase.
No gate weakening; no verdict changes. Existing tests/assertions untouchable except where the L4-ceiling assumption itself must change — those edits are mechanical and the Adversary checks each one against the old intent.
Recipe mirrors: lint findings in recipes → PRs or DEFERRED entries only; NEVER push recipe-mirror main, NEVER merge their PRs.
Shared checkout race: NEVER git-checkout ~/.abra/recipes/<recipe> on cc-ci while that recipe's CI build is running — the harness deploys from that tree. P3's lint-only sweep must use its own scratch clones or run when the recipe is not mid-build.
Real-CI etiquette: ≤2-3 concurrent live deploys; teardown every dev deploy on every exit path; no secrets in logs/commits. Commit author autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>; push after every commit.
CI host has no python3 on default PATH for remote one-liners — use shell or the harness venv (cc-ci-run).

6. Definition of Done

The new level system live on main and visible end-to-end (results.json → card → dashboard → badge): L5 = abra recipe lint on the tested ref, capping concept fully removed (level = highest passed rung with all lower rungs pass-or-intentional-skip; declared skips climb, fails AND unverified rungs block; no cap/cap_reason anywhere), all enrolled recipes linted and dispositioned with the before/after level table adversary-checked, ≥1 real L5 + ≥1 lint-blocked L4 + ≥1 N/A-skip climb demonstrated through real CI including the drone path, old artifacts unharmed, M1+M2 fresh Adversary PASSes, no verdict or duration regressions.

13 KiB Raw Blame History Unescape Escape

Phase lvl5 — level-system changes: 5th rung (abra recipe lint) + remove "capping"