Files
cc-ci/machine-docs/REVIEW-lvl5.md
autonomic-bot 85a781368a
Some checks failed
continuous-integration/drone/push Build is failing
machine-docs: move all per-phase coordination files out of repo root
STATUS/BACKLOG/REVIEW/JOURNAL for bsky/conc/dstamp/kuma/lvl5/mailu/rcust/shot
(32 files) were at the repo root; move them into machine-docs/ to match the
mandated file-location rule (DECISIONS/DEFERRED/INBOX + older phases already
live there). AGENTS.md gains an explicit File-location rule. No content change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:57:03 +00:00

149 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW — Phase lvl5 (L5 lint rung + de-cap) — Adversary verdicts
Cold-verification ledger (append-only). Each verdict formed from the plan (SSOT), the code/git
history, the verification info in STATUS-lvl5.md, and my own cold re-run — NOT from JOURNAL
(anti-anchoring, §6.1). JOURNAL not consulted before this verdict.
---
## M1 — Implementation complete (pre-merge): **PASS** @ 2026-06-11T07:54Z
Branch `phase-lvl5` @ `3d8d286cf3f2df7d164bf458f07bbb916cc18f2b` (claim 24baac5). Implementation
deliberately NOT on main (reverts 589943f/cd62743 hold it pre-merge) — confirmed; only the
DECISIONS entry (392f7df) is on main. Verified from a **fresh cold clone** on the cc-ci host
(`/tmp/adv-lvl5`, cloned from origin, checked out phase-lvl5; HEAD matched 3d8d286).
**Acceptance per plan §4 M1 — all satisfied:**
1. **Cold clone + HEAD**`git rev-parse HEAD` = 3d8d286 ✓ (matches claim).
2. **Unit suite (CI host venv)**`cc-ci-run -m pytest tests/unit/ -q`**246 passed** in 5.32s
✓ (matches claimed count).
3. **Repo lint**`nix develop .#lint --command bash scripts/lint.sh`**lint: PASS** ✓.
4. **De-capped `compute_level` correct on ALL 4 mission worked examples** (hand-traced against
`level.py` + verified by the rewritten test_level.py):
- install✔ upgrade✘ backup✔ functional✔ lint✔ → **L1** (fail blocks) ✓
- install✔ upgrade✔ backup skip functional✔ lint✔ → **L5** (intentional skip climbs — the
de-cap; was L2 under old rule) ✓
- install✔ upgrade✔ backup **unver** functional✔ lint✔ → **L2** (unver blocks) ✓
- all four ✔, lint unver → **L4** (unverified top rung not earned) ✓
Formula `level = max i: rung_i==pass ∧ all j<i ∈ {pass,skip}` implemented exactly
(pass→advance, skip→continue, fail/unver→break). 0 if none.
5. **N/A classification table matches code.** `derive_rungs` (results.py) implements the
DECISIONS table verbatim, incl. the subtle upgrade split: `skip ∧ ¬has_upgrade_target`
`skip` (structural, climbs); a prior-stage abort (`skip`/None WITH a target, undeclared) →
`unver` (blocks). install never skips; backup_restore skip iff not-capable or EXPECTED_NA;
functional skip iff EXPECTED_NA else unver; **lint pass/fail-or-unver, NEVER skip** (no N/A
escape hatch, §2 item 5; EXPECTED_NA["lint"] ignored). Default-unclassifiable = unver. ✓
6. **§2.3 mirror-context decision reviewed — NO rule filtered.** Executor (`lint.py`) lints a
pristine scratch clone of the per-run tree at the tested sha; origin→local path makes abra's
tag force-fetch work offline (no auth, no go-git "reference not found"), and the run's real
tags ride along so R014 evaluates real content. The plumbing pollution is solved by context,
not exemptions. Confirmed by **real-abra behavioral probe** (not just synthetic fixtures):
- `run_lint("hedgedoc", …)` clean → `{'status':'pass',...}` ✓ (proves scratch-clone makes
abra lint actually run — no FATA).
- inject lightweight tag → `{'status':'fail','detail':'error rule(s) unsatisfied: R014',
'rules_failed':['R014']}` ✓ (proves the classifier has teeth; R014 is NOT suppressed).
Classifier correctly recognizes `rc=0`-with-critical-errors (parses table + "critical errors
present" sentinel, fails closed on disagreement); only content-FATA ("unable to validate
recipe") → fail, all other non-zero → unver.
7. **Verdict-neutrality — code inspection + targeted tests.** `run_lint` invoked once
(run_recipe_ci.py:942), defaults to `unver`, double-wrapped in try/except (crash → stays
unver, non-fatal print), runs BEFORE the tiers at `head_ref` (the exact tested ref). Its
result is consumed ONLY at build_results (line 1278, "non-fatal, verdict unaffected"); NO
verdict computation reads it. 60s hard budget, never raises. Targeted tests pass:
`test_run_lint_missing_recipe_is_unver_not_raise`,
`test_build_results_no_lint_given_is_unverified_never_pass`. ✓
8. **cap/cap_reason/capped fully removed** from active code/schema/card/dashboard/docs. grep over
runner/dashboard/docs/tests finds the words only in (a) the unrelated screenshot timeout-cap,
(b) "capable"/max-users, (c) explicit test/doc assertions that the fields are ABSENT in
schema 2 and that old schema-1 artifacts (which carry level_cap_reason) still render with no
relabeling — history-compat covered by test_card/test_dashboard (green). ✓
No verdict regression, no run-verdict coupling, no rule suppression, no silent pass. **M1 PASS.**
Builder cleared to merge phase-lvl5 → main and proceed to P3/P4 (M2). No VETO.
**Scope note (carried to M2):** M1 verified the lint executor + classifier + level math on real
abra output and the unit surface. M2 must still prove, on real CI end-to-end: ≥1 genuine L5,
≥1 lint-blocked L4, ≥1 N/A-skip climb, drone `!testme` ×2, canaries at designed levels under the
NEW formula, old artifacts rendering live, durations not inflated (lint ≤~60s; observed ~0.7s),
the before/after level table for ALL enrolled recipes, and card/dashboard/badge visually (PNG/SVG).
---
## M2 — Proven in real CI: **PASS** @ 2026-06-11T11:27Z
Main @ `a521d43` (impl merged 08e6cc8 + PR-path fix 68c3486). Cold-verified from a **fresh clone
of main** on the cc-ci host (`/tmp/adv-m2`), drone API (token from /run/secrets), live HTTPS
artifacts, and Read PNGs. JOURNAL not consulted before this verdict.
**Acceptance per plan §4 M2 + §6 DoD — all satisfied:**
1. **Unit suite + lint (fresh clone main).** `cc-ci-run -m pytest tests/unit/ -q` → **247 passed**;
`scripts/lint.sh` → PASS. The new PR-path regression test
`test_run_lint_detached_pr_tree_lints_exact_ref` passes (covers fix 68c3486: abra lint checks
out the repo DEFAULT BRANCH, so a detached scratch clone would FATA or silently lint a stale
branch; fix forces local main AT the tested ref + repoints origin to scratch → lints the PR
head content). My M1 smoke only exercised the HEAD path; this closes that gap.
2. **Genuine L5 (full clean climb).** Runs 398 hedgedoc / 406 immich / 407 plausible / 413 mumble:
results.json schema=2, level=5, all 5 rungs pass, no cap keys, drone build status=success.
3. **Lint-blocked L4, verdict-neutral — the central claim.** Run 405 custom-html PR4:
results.json level=4, lint=fail rules_failed=[R011], all five TIERS pass
(install/upgrade/backup/restore/custom), **drone build 405 status=SUCCESS**, and the bridge
`reflected outcome build 405 (custom-html PR #4): success` to the PR. A lint failure caps the
level at 4 but does NOT flip the run verdict. Card PNG shows lint ✗ FAIL red, "level 4 of 5",
badge #a0b93f. Neutrality proven BOTH directions (415/416 red with lint=pass — see #6).
4. **N/A-skip climb (the de-cap).** Run 399 custom-html-tiny: backup_restore=skip with declared
reason in skips.intentional ("stateless static file server … no backupbot.backup label"),
other rungs pass, **level=5** (was L2 @ #205). Card PNG shows backup/restore "⊘ INTENTIONAL
SKIP" + reason, level 5 of 5. A formerly-capped non-backup-capable recipe now climbs.
5. **Drone !testme path ×3, GENUINE (not manual API).** ccci-bridge poll logs:
`[poll] triggered build 405 for custom-html@36b362aa (PR #4, comment 14332)`,
`406 immich@107d7220 (PR #2, comment 14333)`, `407 plausible@13458fac (PR #3, comment 14334)`,
each followed by `reflected outcome … success`. Build params confirm RECIPE/PR/REF match the
real PR heads. ≥2 required; 3 delivered, all on real PRs showing the lint rung.
6. **Canaries at re-derived designed level + backup-fail still blocks.** 415 (bkp-bad) / 416
(rst-bad): drone build status=**failure** (red), results.json level=1, rungs {install pass,
upgrade skip(structural — no version tags on SRC+REF mirror), backup_restore FAIL, functional
unver, lint pass}. New-formula trace: install(1) → upgrade skip(climb) → backup_restore
fail(BLOCK) → L1. RED is caused by the failing backup/restore TIER (verdict logic untouched),
NOT by lint (lint=pass). Re-derivation is sound; matches OLD-rule level too (old: upgrade N/A
caps at L1) — no regression, same designed level, red either way.
7. **Unverified-blocks (mission example #3), synthesized.** host run
`/var/lib/cc-ci-runs/lvl5-unver-demo/results.json`: schema=2, level=2, rungs {install pass,
upgrade pass, backup_restore UNVER, functional pass, lint pass}, skips.unintentional=
[backup_restore]. backup unver blocks at L2 even though functional+lint pass above it. ✓
8. **Durations not inflated.** drone build wall-times: 398=100s, 399=45s, 405=61s, 406 immich=199s
(shot baseline 198-199s), 407 plausible=164s (shot baseline 166s), 413=80s. lint adds ~0.7s;
the two cross-phase baselines are flat (407 slightly faster). No duration regression.
9. **Old artifacts render, no relabel.** /runs/370 (schema=1, level=4, level_cap_reason present)
serves 200 (results.json + summary.png); dashboard `/` + `/recipe/immich` 200 with mixed
schema-1/schema-2 rows; unit history-compat tests green.
10. **lint.txt served.** /runs/398/lint.txt 200 — full real abra table (HEAVY-box), cmd + rc=0 +
status=pass header, ref=09bf4d54 (hedgedoc's EXACT tested ref).
11. **Badges number+colour only.** hedgedoc badge ">level 5<" #3fb950; custom-html ">level 4<"
#a0b93f; grep finds NO cap/skip/na/reason language in badge SVGs. Matches operator spec.
12. **P3 matrix 19/19 lint PASS** (BACKLOG-lvl5.md) via documented scratch-clone method; no mirror
PRs / DEFERRED needed; warn-severity misses only (don't fail the rung). lasuite-meet R014 now
passes genuinely (tag annotated upstream — not suppressed). **Before/after table: every level
shift is explained by the rule change** — L4→L5 (+lint, baseline from real artifacts + P3
sweep), de-cap L2→L5 (custom-html-tiny proven #399; mailu same mechanism), L4 lintdemo (#405),
canary L1, bluesky N/A consistent. **No unexplained shift / no downward regression.** "Analytic
5" cells are derivation-checkable from two evidenced inputs (real baseline tiers + proven lint).
13. **No secret leak.** Independent sweep: no /run/secrets infra-secret VALUES and no generated
app-credential patterns appear in any published run artifact (the new lint.txt surface incl.).
results.json flags no_secret_leak=true + clean_teardown=true across runs.
**§6 Definition of Done satisfied:** new level system live on main and visible end-to-end
(results.json→card→dashboard→badge); L5 = abra recipe lint on the tested ref; capping fully
removed (no cap/cap_reason/capped); all 19 enrolled recipes linted + dispositioned with an
adversary-checked before/after table; ≥1 real L5 + ≥1 lint-blocked L4 + ≥1 N/A-skip climb through
real CI incl. the drone path ×3; old artifacts unharmed; M1 (cfc87fd) + M2 fresh Adversary
PASSes; no verdict or duration regressions.
**No VETO. Builder is cleared to write `## DONE` to STATUS-lvl5.md.**
Out-of-scope note (Builder's STATUS query): the WC5 promote-on-green-cold observation (a
STAGES-filtered hand-run promoted custom-html's canonical) is pre-existing and orthogonal to the
level system — NOT a lvl5 finding/regression and not a DONE blocker. If the Builder wants it
tracked, DEFERRED.md/IDEAS.md is the right home; I'm not filing it as an [adversary] finding.