machine-docs: move all per-phase coordination files out of repo root

STATUS/BACKLOG/REVIEW/JOURNAL for bsky/conc/dstamp/kuma/lvl5/mailu/rcust/shot (32 files) were at the repo root; move them into machine-docs/ to match the mandated file-location rule (DECISIONS/DEFERRED/INBOX + older phases already live there). AGENTS.md gains an explicit File-location rule. No content change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:57:03 +00:00
parent 560e772b5f
commit 85a781368a
33 changed files with 8 additions and 0 deletions
--- a/machine-docs/REVIEW-lvl5.md
+++ b/machine-docs/REVIEW-lvl5.md
@ -0,0 +1,148 @@
+# REVIEW — Phase lvl5 (L5 lint rung + de-cap) — Adversary verdicts
+
+Cold-verification ledger (append-only). Each verdict formed from the plan (SSOT), the code/git
+history, the verification info in STATUS-lvl5.md, and my own cold re-run — NOT from JOURNAL
+(anti-anchoring, §6.1). JOURNAL not consulted before this verdict.
+
+---
+
+## M1 — Implementation complete (pre-merge): **PASS** @ 2026-06-11T07:54Z
+
+Branch `phase-lvl5` @ `3d8d286cf3f2df7d164bf458f07bbb916cc18f2b` (claim 24baac5). Implementation
+deliberately NOT on main (reverts 589943f/cd62743 hold it pre-merge) — confirmed; only the
+DECISIONS entry (392f7df) is on main. Verified from a **fresh cold clone** on the cc-ci host
+(`/tmp/adv-lvl5`, cloned from origin, checked out phase-lvl5; HEAD matched 3d8d286).
+
+**Acceptance per plan §4 M1 — all satisfied:**
+
+1. **Cold clone + HEAD** — `git rev-parse HEAD` = 3d8d286 ✓ (matches claim).
+2. **Unit suite (CI host venv)** — `cc-ci-run -m pytest tests/unit/ -q` → **246 passed** in 5.32s
+   ✓ (matches claimed count).
+3. **Repo lint** — `nix develop .#lint --command bash scripts/lint.sh` → **lint: PASS** ✓.
+4. **De-capped `compute_level` correct on ALL 4 mission worked examples** (hand-traced against
+   `level.py` + verified by the rewritten test_level.py):
+   - install✔ upgrade✘ backup✔ functional✔ lint✔ → **L1** (fail blocks) ✓
+   - install✔ upgrade✔ backup skip functional✔ lint✔ → **L5** (intentional skip climbs — the
+     de-cap; was L2 under old rule) ✓
+   - install✔ upgrade✔ backup **unver** functional✔ lint✔ → **L2** (unver blocks) ✓
+   - all four ✔, lint unver → **L4** (unverified top rung not earned) ✓
+   Formula `level = max i: rung_i==pass ∧ all j<i ∈ {pass,skip}` implemented exactly
+   (pass→advance, skip→continue, fail/unver→break). 0 if none.
+5. **N/A classification table matches code.** `derive_rungs` (results.py) implements the
+   DECISIONS table verbatim, incl. the subtle upgrade split: `skip ∧ ¬has_upgrade_target` →
+   `skip` (structural, climbs); a prior-stage abort (`skip`/None WITH a target, undeclared) →
+   `unver` (blocks). install never skips; backup_restore skip iff not-capable or EXPECTED_NA;
+   functional skip iff EXPECTED_NA else unver; **lint pass/fail-or-unver, NEVER skip** (no N/A
+   escape hatch, §2 item 5; EXPECTED_NA["lint"] ignored). Default-unclassifiable = unver. ✓
+6. **§2.3 mirror-context decision reviewed — NO rule filtered.** Executor (`lint.py`) lints a
+   pristine scratch clone of the per-run tree at the tested sha; origin→local path makes abra's
+   tag force-fetch work offline (no auth, no go-git "reference not found"), and the run's real
+   tags ride along so R014 evaluates real content. The plumbing pollution is solved by context,
+   not exemptions. Confirmed by **real-abra behavioral probe** (not just synthetic fixtures):
+   - `run_lint("hedgedoc", …)` clean → `{'status':'pass',...}` ✓ (proves scratch-clone makes
+     abra lint actually run — no FATA).
+   - inject lightweight tag → `{'status':'fail','detail':'error rule(s) unsatisfied: R014',
+     'rules_failed':['R014']}` ✓ (proves the classifier has teeth; R014 is NOT suppressed).
+   Classifier correctly recognizes `rc=0`-with-critical-errors (parses table + "critical errors
+   present" sentinel, fails closed on disagreement); only content-FATA ("unable to validate
+   recipe") → fail, all other non-zero → unver.
+7. **Verdict-neutrality — code inspection + targeted tests.** `run_lint` invoked once
+   (run_recipe_ci.py:942), defaults to `unver`, double-wrapped in try/except (crash → stays
+   unver, non-fatal print), runs BEFORE the tiers at `head_ref` (the exact tested ref). Its
+   result is consumed ONLY at build_results (line 1278, "non-fatal, verdict unaffected"); NO
+   verdict computation reads it. 60s hard budget, never raises. Targeted tests pass:
+   `test_run_lint_missing_recipe_is_unver_not_raise`,
+   `test_build_results_no_lint_given_is_unverified_never_pass`. ✓
+8. **cap/cap_reason/capped fully removed** from active code/schema/card/dashboard/docs. grep over
+   runner/dashboard/docs/tests finds the words only in (a) the unrelated screenshot timeout-cap,
+   (b) "capable"/max-users, (c) explicit test/doc assertions that the fields are ABSENT in
+   schema 2 and that old schema-1 artifacts (which carry level_cap_reason) still render with no
+   relabeling — history-compat covered by test_card/test_dashboard (green). ✓
+
+No verdict regression, no run-verdict coupling, no rule suppression, no silent pass. **M1 PASS.**
+Builder cleared to merge phase-lvl5 → main and proceed to P3/P4 (M2). No VETO.
+
+**Scope note (carried to M2):** M1 verified the lint executor + classifier + level math on real
+abra output and the unit surface. M2 must still prove, on real CI end-to-end: ≥1 genuine L5,
+≥1 lint-blocked L4, ≥1 N/A-skip climb, drone `!testme` ×2, canaries at designed levels under the
+NEW formula, old artifacts rendering live, durations not inflated (lint ≤~60s; observed ~0.7s),
+the before/after level table for ALL enrolled recipes, and card/dashboard/badge visually (PNG/SVG).
+
+---
+
+## M2 — Proven in real CI: **PASS** @ 2026-06-11T11:27Z
+
+Main @ `a521d43` (impl merged 08e6cc8 + PR-path fix 68c3486). Cold-verified from a **fresh clone
+of main** on the cc-ci host (`/tmp/adv-m2`), drone API (token from /run/secrets), live HTTPS
+artifacts, and Read PNGs. JOURNAL not consulted before this verdict.
+
+**Acceptance per plan §4 M2 + §6 DoD — all satisfied:**
+
+1. **Unit suite + lint (fresh clone main).** `cc-ci-run -m pytest tests/unit/ -q` → **247 passed**;
+   `scripts/lint.sh` → PASS. The new PR-path regression test
+   `test_run_lint_detached_pr_tree_lints_exact_ref` passes (covers fix 68c3486: abra lint checks
+   out the repo DEFAULT BRANCH, so a detached scratch clone would FATA or silently lint a stale
+   branch; fix forces local main AT the tested ref + repoints origin to scratch → lints the PR
+   head content). My M1 smoke only exercised the HEAD path; this closes that gap.
+2. **Genuine L5 (full clean climb).** Runs 398 hedgedoc / 406 immich / 407 plausible / 413 mumble:
+   results.json schema=2, level=5, all 5 rungs pass, no cap keys, drone build status=success.
+3. **Lint-blocked L4, verdict-neutral — the central claim.** Run 405 custom-html PR4:
+   results.json level=4, lint=fail rules_failed=[R011], all five TIERS pass
+   (install/upgrade/backup/restore/custom), **drone build 405 status=SUCCESS**, and the bridge
+   `reflected outcome build 405 (custom-html PR #4): success` to the PR. A lint failure caps the
+   level at 4 but does NOT flip the run verdict. Card PNG shows lint ✗ FAIL red, "level 4 of 5",
+   badge #a0b93f. Neutrality proven BOTH directions (415/416 red with lint=pass — see #6).
+4. **N/A-skip climb (the de-cap).** Run 399 custom-html-tiny: backup_restore=skip with declared
+   reason in skips.intentional ("stateless static file server … no backupbot.backup label"),
+   other rungs pass, **level=5** (was L2 @ #205). Card PNG shows backup/restore "⊘ INTENTIONAL
+   SKIP" + reason, level 5 of 5. A formerly-capped non-backup-capable recipe now climbs.
+5. **Drone !testme path ×3, GENUINE (not manual API).** ccci-bridge poll logs:
+   `[poll] triggered build 405 for custom-html@36b362aa (PR #4, comment 14332)`,
+   `406 immich@107d7220 (PR #2, comment 14333)`, `407 plausible@13458fac (PR #3, comment 14334)`,
+   each followed by `reflected outcome … success`. Build params confirm RECIPE/PR/REF match the
+   real PR heads. ≥2 required; 3 delivered, all on real PRs showing the lint rung.
+6. **Canaries at re-derived designed level + backup-fail still blocks.** 415 (bkp-bad) / 416
+   (rst-bad): drone build status=**failure** (red), results.json level=1, rungs {install pass,
+   upgrade skip(structural — no version tags on SRC+REF mirror), backup_restore FAIL, functional
+   unver, lint pass}. New-formula trace: install(1) → upgrade skip(climb) → backup_restore
+   fail(BLOCK) → L1. RED is caused by the failing backup/restore TIER (verdict logic untouched),
+   NOT by lint (lint=pass). Re-derivation is sound; matches OLD-rule level too (old: upgrade N/A
+   caps at L1) — no regression, same designed level, red either way.
+7. **Unverified-blocks (mission example #3), synthesized.** host run
+   `/var/lib/cc-ci-runs/lvl5-unver-demo/results.json`: schema=2, level=2, rungs {install pass,
+   upgrade pass, backup_restore UNVER, functional pass, lint pass}, skips.unintentional=
+   [backup_restore]. backup unver blocks at L2 even though functional+lint pass above it. ✓
+8. **Durations not inflated.** drone build wall-times: 398=100s, 399=45s, 405=61s, 406 immich=199s
+   (shot baseline 198-199s), 407 plausible=164s (shot baseline 166s), 413=80s. lint adds ~0.7s;
+   the two cross-phase baselines are flat (407 slightly faster). No duration regression.
+9. **Old artifacts render, no relabel.** /runs/370 (schema=1, level=4, level_cap_reason present)
+   serves 200 (results.json + summary.png); dashboard `/` + `/recipe/immich` 200 with mixed
+   schema-1/schema-2 rows; unit history-compat tests green.
+10. **lint.txt served.** /runs/398/lint.txt 200 — full real abra table (HEAVY-box), cmd + rc=0 +
+    status=pass header, ref=09bf4d54 (hedgedoc's EXACT tested ref).
+11. **Badges number+colour only.** hedgedoc badge ">level 5<" #3fb950; custom-html ">level 4<"
+    #a0b93f; grep finds NO cap/skip/na/reason language in badge SVGs. Matches operator spec.
+12. **P3 matrix 19/19 lint PASS** (BACKLOG-lvl5.md) via documented scratch-clone method; no mirror
+    PRs / DEFERRED needed; warn-severity misses only (don't fail the rung). lasuite-meet R014 now
+    passes genuinely (tag annotated upstream — not suppressed). **Before/after table: every level
+    shift is explained by the rule change** — L4→L5 (+lint, baseline from real artifacts + P3
+    sweep), de-cap L2→L5 (custom-html-tiny proven #399; mailu same mechanism), L4 lintdemo (#405),
+    canary L1, bluesky N/A consistent. **No unexplained shift / no downward regression.** "Analytic
+    5" cells are derivation-checkable from two evidenced inputs (real baseline tiers + proven lint).
+13. **No secret leak.** Independent sweep: no /run/secrets infra-secret VALUES and no generated
+    app-credential patterns appear in any published run artifact (the new lint.txt surface incl.).
+    results.json flags no_secret_leak=true + clean_teardown=true across runs.
+
+**§6 Definition of Done satisfied:** new level system live on main and visible end-to-end
+(results.json→card→dashboard→badge); L5 = abra recipe lint on the tested ref; capping fully
+removed (no cap/cap_reason/capped); all 19 enrolled recipes linted + dispositioned with an
+adversary-checked before/after table; ≥1 real L5 + ≥1 lint-blocked L4 + ≥1 N/A-skip climb through
+real CI incl. the drone path ×3; old artifacts unharmed; M1 (cfc87fd) + M2 fresh Adversary
+PASSes; no verdict or duration regressions.
+
+**No VETO. Builder is cleared to write `## DONE` to STATUS-lvl5.md.**
+
+Out-of-scope note (Builder's STATUS query): the WC5 promote-on-green-cold observation (a
+STAGES-filtered hand-run promoted custom-html's canonical) is pre-existing and orthogonal to the
+level system — NOT a lvl5 finding/regression and not a DONE blocker. If the Builder wants it
+tracked, DEFERRED.md/IDEAS.md is the right home; I'm not filing it as an [adversary] finding.