machine-docs: move all per-phase coordination files out of repo root

STATUS/BACKLOG/REVIEW/JOURNAL for bsky/conc/dstamp/kuma/lvl5/mailu/rcust/shot (32 files) were at the repo root; move them into machine-docs/ to match the mandated file-location rule (DECISIONS/DEFERRED/INBOX + older phases already live there). AGENTS.md gains an explicit File-location rule. No content change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:57:03 +00:00
parent 560e772b5f
commit 85a781368a
33 changed files with 8 additions and 0 deletions
--- a/machine-docs/BACKLOG-lvl5.md
+++ b/machine-docs/BACKLOG-lvl5.md
@ -0,0 +1,99 @@
+# BACKLOG — Phase lvl5
+
+## Build backlog
+
+- [x] B1 (P1) `level.py`: append rung `lint` (L5); new status vocabulary {pass, fail, skip, unver}; `compute_level()` → new formula (level = max i: rung_i pass ∧ ∀j<i status ∈ {pass,skip}); DELETE cap_reason/capped concepts.
+- [x] B2 (P1) lint executor (`harness/lint.py`): `abra recipe lint <recipe>` against the exact tested ref; hard ~60s timeout; rc+full output → `lint.txt` artifact; pass/fail/unver classification (missing abra / timeout / exception → unver, never pass, never skip); mirror-context handling per phase-plan §2.3 (probe abra behavior first; any filtering = named + unit-tested + DECISIONS.md).
+- [x] B3 (P1) `results.py`: wire lint into `derive_rungs` + explicit intentional-vs-unintentional classification of EVERY N/A source; drop level_cap_reason/level_cap_rung from schema; `skips()` reflects new statuses; orchestrator (`run_recipe_ci.py`) runs lint executor at the tested-ref point + passes result through; verdict-neutral (R7 wrap).
+- [x] B4 (P1) unit tests: rewrite test_level.py/test_results.py to new semantics incl. mission worked examples (fail-blocks → L1; intentional-skip climbs → L5; unver-blocks → L2; lint unver → L4; unclassifiable N/A → unver default); lint executor tests; old-artifact rendering compat tests.
+- [x] B5 (P2) `card.py`: 0–5 color ramp; cap line removed ("level N of 5" neutral); rung table renders ✔/✘/intentional-skip/unverified; level_badge_svg loses cap_skip third segment (badge = number+color only); tolerate old artifacts.
+- [x] B6 (P2) `dashboard.py`: _LEVEL_COLOR 5-scale; _level_pill/badge SVG number-only; legend text; old results.json (cap_reason present, lint absent) render without KeyError.
+- [x] B7 (P2) docs: results-ux.md, testing.md, recipe-customization.md §EXPECTED_NA wording — L5 ladder, de-cap semantics.
+- [x] B8 (P1) DECISIONS.md: semantics change record (replaces Phase-3 "N/A caps"); N/A classification table (every derive_rungs N/A source → intentional|unintentional); mirror-filter decision for lint (if any filtering).
+- [x] B9 — gate M1: claim (branch w/ P1+P2; clean tree; cold-verifiable).
+- [x] B10 (P3) lint sweep over ALL enrolled recipes (scratch clones — never touch ~/.abra/recipes during builds); matrix here (pass/fail + rule hits); mechanical fixes → mirror PRs (never push main/never merge); rest → DEFERRED.md.
+- [x] B11 (P4) real-CI proofs: ≥1 genuine L5; ≥1 lint-blocked L4 (synth branch ok); ≥1 N/A-skip climb; 2× drone !testme; canary suite at re-derived designed levels; 1 synthesized unver-blocks run; before/after level table for ALL enrolled recipes; card/dashboard PNG/SVG visually verified.
+- [x] B12 — gate M2: claim; then ## DONE after fresh PASS.
+
+## Adversary findings
+
+## P3 lint sweep matrix (B10) — all 19 enrolled, mirror main HEAD, 2026-06-11
+
+Method: per recipe, fresh scratch clone of its canonical origin (mirror for the 17
+recipe-maintainers recipes; coopcloud upstream for bluesky-pds/custom-html-tiny/mumble) +
+upstream version tags fetched (production fetch_recipe shape), then `harness.lint.run_lint`
+from phase-lvl5 @ 3d8d286 in a scratch ABRA_DIR (`/tmp/lvl5-sweep` on cc-ci; full outputs in
+`/tmp/lvl5-sweep/art/<recipe>/lint.txt`). Canonical `~/.abra/recipes` never touched.
+
+**Result: 19/19 PASS** (no error-severity rule unsatisfied anywhere). No recipe-mirror PRs and
+no DEFERRED entries needed. Warn-severity misses (informational, do not fail the rung):
+
+| recipe | lint | warn-rule misses |
+|---|---|---|
+| bluesky-pds | pass | R002 R007 R015 |
+| cryptpad | pass | R002 R005 R007 |
+| custom-html | pass | R002 R004 R005 |
+| custom-html-tiny | pass | R002 |
+| discourse | pass | R002 R007 R015 |
+| ghost | pass | R015 |
+| hedgedoc | pass | R015 |
+| immich | pass | R002 R005 |
+| keycloak | pass | R002 R015 |
+| lasuite-docs | pass | R005 |
+| lasuite-drive | pass | R002 R005 |
+| lasuite-meet | pass | R002 |
+| mailu | pass | R002 |
+| matrix-synapse | pass | R002 R015 |
+| mattermost-lts | pass | R002 R015 |
+| mumble | pass | R002 |
+| n8n | pass | R002 R015 |
+| plausible | pass | R002 R005 R007 |
+| uptime-kuma | pass | R015 |
+
+Note: lasuite-meet's historically-lightweight tag `0.3.0+v1.16.0` is now ANNOTATED upstream
+(verified `git cat-file -t` = tag on all three version tags) — R014 passes genuinely; the
+abra.py:105 lightweight-tag deploy fallback simply no longer triggers for it.
+
+## Before/after level table skeleton (§2.9 — "after" to be filled by P4 real runs)
+
+Baseline = latest results.json on cc-ci per recipe re-scored under the CURRENT (pre-lvl5,
+4-rung) rule; ancient 6-rung artifacts (builds ≤205, integration/recipe_local era) re-read on
+their four essential rungs. Predicted = same tier outcomes + sweep lint result under the new
+rule (assumption flagged; P4 produces the real values).
+
+| recipe | baseline rungs (latest artifact) | baseline level | predicted new level | REAL new level (P4 run) | why it shifts |
+|---|---|---|---|---|---|
+| bluesky-pds | no artifact (deploy-gated upstream, shot-phase N/A) | — | — | — (still deploy-gated; documented N/A) | still deploy-gated |
+| cryptpad | I✔ U✔ B✔ F✔ (#181) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| custom-html | I✔ U✔ B✔ F✔ (#182) | 4 | 5 | **4** (#405 PR4 lintdemo: lint fail R011; main analytic 5) | + lint pass |
+| custom-html-tiny | I✔ U✔ B-na F-na (#205, predates functional/) | 2 | 5 | **5** (#399 — N/A-skip climb, was 2) | de-cap: backup skip declared; functional/ tests exist now; + lint |
+| discourse | I✔ U✔ B✔ F✔ (#184) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| ghost | I✔ U✔ B✔ F✔ (#185) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| hedgedoc | I✔ U✔ B✔ F✔ (#113) | 4 | 5 | **5** (#398, 100s) | + lint pass |
+| immich | I✔ U✔ B✔ F✔ (#370) | 4 | 5 | **5** (#406, drone !testme PR2, 199s) | + lint pass |
+| keycloak | I✔ U✔ B✔ F✔ (#187) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| lasuite-docs | I✔ U✔ B✔ F✔ (#188) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| lasuite-drive | I✔ U✔ B✔ F✔ (#189) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| lasuite-meet | I✔ U✔ B✔ F✔ (#204) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| mailu | I✔ U✔ B-na F✔ (#191) | 2 | 5 | (not re-run; analytic 5 — same de-cap as #399) | de-cap: not backup-capable → skip climbs (the §2.9 N/A-skip demo) |
+| matrix-synapse | I✔ U✔ B✔ F✔ (#203) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| mattermost-lts | I✔ U✔ B✔ F✔ (#196) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| mumble | no results.json artifact retained | — | — | **5** (#413, 80s — first retained artifact) | P4 run to establish |
+| n8n | I✔ U✔ B✔ F✔ (#197) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+| plausible | I✔ U✔ B✔ F✔ (#371) | 4 | 5 | **5** (#407, drone !testme PR3, 164s) | + lint pass |
+| uptime-kuma | I✔ U✔ B✔ F✔ (#165) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
+
+Canaries (designed levels under the NEW formula, re-derived): custom-html-bkp-bad /
+custom-html-rst-bad — backup-capable with a failing backup/restore tier → backup_restore rung
+FAIL → level 2 (fail still blocks; run verdict red as today). To be proven in P4.
+
+### Canary designed-level re-derivation (P4, runs 415/416 — 2026-06-11)
+
+Under the NEW formula the bad canaries' designed level is **1**, not the old 2: their mirrors
+carry no published version tags on the SRC+REF path → upgrade = intentional skip (climbs past
+but never earns), backup_restore = FAIL blocks → level = install = 1. Verified live: 415
+(bkp-bad) + 416 (rst-bad) both **verdict FAILURE (red)**, rungs
+{install: pass, upgrade: skip, backup_restore: fail, functional: unver (post-failure abort),
+lint: pass}, LEVEL 1. Backup/restore fail still blocks; verdict logic untouched.
+(First attempts 411/412 failed in 1s: canaries are mirror-only, not catalogue recipes — they
+need SRC+REF params, as prior phases ran them.)