claim(cf48): Opus 4.8 cold review matrix complete — NO COVERAGE LOST
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
Independent cross-validation of cfold 44e0242. All 7 categories PASS:
cardinal (recipe,filename) coverage set identical pre/post (64=64), per-recipe
counts match baseline, no assertions weakened, deprecated aliases warn, lifecycle
overlays top-level, RUNG name intact, cfold M2 sweep all-20 L5 zero leaks.
cf55(sonnet-4.6) vs cf48(opus-4.8) FULL agreement; cf48 also caught a cf55
narrative slip (keycloak sys.path unchanged, not depth-adjusted).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
21
machine-docs/BACKLOG-cf48.md
Normal file
21
machine-docs/BACKLOG-cf48.md
Normal file
@ -0,0 +1,21 @@
|
||||
# BACKLOG — phase cf48
|
||||
|
||||
## Build backlog
|
||||
|
||||
- [x] Confirm session model is `claude-opus-4-8` on the `claude` backend (phase Model Requirement)
|
||||
- [x] Read inputs: cfold plan, STATUS-cfold/REVIEW-cfold, STATUS-cf55/REVIEW-cf55
|
||||
- [x] Cat 1 — Diff review of `44e0242` line-by-line for coverage loss
|
||||
- [x] Cat 2 — Discovery parity: recompute custom-test inventory + cardinal coverage diff vs pre-cfold
|
||||
- [x] Cat 3 — Assertion preservation: confirm no weakened/removed/skipped assertions
|
||||
- [x] Cat 4 — Old-folder behavior: deprecated-alias + loud-warning live probe
|
||||
- [x] Cat 5 — Lifecycle-overlay separation: 0 in custom/, overlays top-level, RUNG name intact
|
||||
- [x] Cat 6 — Evidence audit: cfold M2 full-sweep all-20-recipes L5, zero leaks
|
||||
- [x] Cat 7 — Cleanliness: clean tree, no stray root/temp files
|
||||
- [x] cf55-vs-cf48 agreement note (incl. keycloak sys.path discrepancy cf48 caught)
|
||||
- [x] Write review matrix to STATUS-cf48.md + claim M1
|
||||
- [ ] Await Adversary M1 + M2 PASS in REVIEW-cf48.md
|
||||
- [ ] On M1+M2 PASS with no VETO → write `## DONE` to STATUS-cf48.md
|
||||
|
||||
## Adversary findings
|
||||
|
||||
_(Adversary-owned — do not edit)_
|
||||
47
machine-docs/JOURNAL-cf48.md
Normal file
47
machine-docs/JOURNAL-cf48.md
Normal file
@ -0,0 +1,47 @@
|
||||
# JOURNAL — phase cf48 (Opus 4.8 post-cfold coverage-loss review)
|
||||
|
||||
## 2026-06-13T05:30Z — Independent cold review complete, M1 claimed
|
||||
|
||||
**Model check:** session reports `claude-opus-4-8`, override files
|
||||
`/srv/cc-ci/.cc-ci-logs/.loop-model-cf48 = claude-opus-4-8` and `.loop-backend = claude`. Matches the
|
||||
phase Model Requirement — proceeded.
|
||||
|
||||
**Approach.** Reviewed independently first (formed my own verdict from the diff, the code, and live
|
||||
probes), THEN read cf55 to reconcile. The plan named GPT-5.5 for cf55 but cf55 actually ran on
|
||||
claude-sonnet-4-6 (launcher mismatch, orchestrator relaunch — documented in its own state files), so the
|
||||
"two different models" cross-validation is Sonnet 4.6 vs Opus 4.8. Recorded honestly in STATUS rather
|
||||
than pretending it was GPT vs Claude.
|
||||
|
||||
**Why I'm confident it's a pure relocation.** The cfold safety argument (discovery globs both old subdirs
|
||||
with no branching, both map to the L4 `functional` rung, identical fixtures/failure semantics) was already
|
||||
established in the cfold plan §1. My job was to confirm the *execution* matched. Three things made it
|
||||
provable rather than "looks right":
|
||||
1. The cardinal coverage diff (cmd 6) compares the actual git trees at `44e0242^` and HEAD by
|
||||
`(recipe, filename)`, stripping the folder component — a byte-identical sorted diff means no file was
|
||||
added, dropped, or renamed-away, only re-parented. This is stronger than a count match (counts can
|
||||
coincide while a file is swapped).
|
||||
2. `git show --find-renames` collapses the 100%-identical moves so only the 5 content-touched test files
|
||||
surface — and each of those is a docstring/comment/sys.path line, never an assertion. Small surface to
|
||||
eyeball exhaustively.
|
||||
3. The whole-repo grep for `functional/`/`playwright/` literals outside the alias handling, plus the
|
||||
`== "functional"` value-branch grep, proves no consumer (manifest, screenshot, dashboard, drone, bridge)
|
||||
silently keys off the old folder name. Only `discovery.py`'s intentional alias lines remain.
|
||||
|
||||
**Discrepancy I caught vs cf55.** cf55's narrative claims keycloak's custom tests had a `sys.path` depth
|
||||
adjustment `../..` → `../../..`. The diff shows those lines unchanged (only the comment moved). Harmless —
|
||||
functional/ and custom/ are equal depth so no adjustment was needed — but it's a factual slip in cf55's
|
||||
write-up. Surfaced in the agreement note per the phase's "note where the two disagree" instruction. cf48
|
||||
found it; cf55 missed it. No coverage consequence either way.
|
||||
|
||||
**Evidence audit stance.** Did NOT rerun the full fleet sweep (guardrail: don't re-sweep unless cfold
|
||||
evidence is incomplete — it isn't). Relied on cfold's cold-verified M2 PASS (REVIEW-cfold.md 04:11:00Z):
|
||||
all 20 recipes L5, custom-junit counts = baseline per recipe, ghost upgrade junit=2, live_pr_apps=0. That
|
||||
is sufficient and independently re-runnable evidence; re-sweeping would be churn.
|
||||
|
||||
**Commands run (all green):** unit suite `18 passed`; per-recipe counts all match; cardinal diff
|
||||
`IDENTICAL SET`; alias probe `found: ['test_new.py','test_old.py','test_ui.py']` + 2 warnings; stale-
|
||||
consumer grep clean; `git status` clean; RUNG name `"functional"` intact.
|
||||
|
||||
**Next:** parked at M1 CLAIMED gate awaiting Adversary M1 + M2 PASS in REVIEW-cf48.md. No other unblocked
|
||||
cf48 work (review-only phase). Will self-poll with a fallback while the watchdog edge-pings on the
|
||||
Adversary's `review(...)` commit.
|
||||
184
machine-docs/STATUS-cf48.md
Normal file
184
machine-docs/STATUS-cf48.md
Normal file
@ -0,0 +1,184 @@
|
||||
# STATUS — phase cf48
|
||||
|
||||
**Phase:** cf48 — Opus 4.8 post-cfold coverage-loss review (independent cross-validation of cf55)
|
||||
**Builder:** autonomic-bot
|
||||
**Model:** `claude-opus-4-8` (claude backend) — matches phase Model Requirement
|
||||
**Updated:** 2026-06-13T05:30Z
|
||||
|
||||
---
|
||||
|
||||
## Gate: M1 — CLAIMED, awaiting Adversary
|
||||
|
||||
WHAT:
|
||||
- Independent Opus 4.8 cold review of the cfold custom-folder collapse, covering all 7 required
|
||||
categories across all 20 enrolled recipes, plus a cf55-vs-cf48 agreement note.
|
||||
- Implementation commit under review: `44e0242` (`feat(cfold): canonicalize custom test layout`).
|
||||
Parent (pre-cfold baseline tree): `44e0242^` = `87928a9`. Current HEAD: `42413b6` (no test-tree drift since cfold).
|
||||
- Verdict: **NO COVERAGE LOST** — cfold preserved the full pre-cfold custom-test set.
|
||||
|
||||
HOW (Adversary can re-run each from a fresh clone of origin/main):
|
||||
1. Canonical custom test count: `git ls-files "tests/*/custom/test_*.py" | wc -l`
|
||||
2. Stale old-folder test files: `git ls-files "tests/*/functional/*" "tests/*/playwright/*" | grep test_ | wc -l`
|
||||
3. Lifecycle overlays leaked into custom/: `git ls-files "tests/*/custom/test_install.py" "tests/*/custom/test_upgrade.py" "tests/*/custom/test_backup.py" "tests/*/custom/test_restore.py" | wc -l`
|
||||
4. Lifecycle overlays still at top-level: `git ls-files "tests/*/test_install.py" "tests/*/test_upgrade.py" "tests/*/test_backup.py" "tests/*/test_restore.py" | wc -l`
|
||||
5. Per-recipe count vs baseline:
|
||||
`for r in bluesky-pds cryptpad custom-html custom-html-tiny discourse drone ghost hedgedoc immich keycloak lasuite-docs lasuite-drive lasuite-meet mailu matrix-synapse mattermost-lts mumble n8n plausible uptime-kuma; do printf "%s %s\n" "$r" "$(git ls-files "tests/$r/custom/test_*.py" | wc -l)"; done`
|
||||
6. CARDINAL coverage diff — pre-cfold `(recipe, filename)` set vs post-cfold, must be identical:
|
||||
```
|
||||
git ls-tree -r --name-only 44e0242^ | grep -E '^tests/[^/]+/(functional|playwright)/test_.*\.py$' | sed -E 's#tests/([^/]+)/(functional|playwright)/(test_.*)#\1/\3#' | sort > /tmp/pre.txt
|
||||
git ls-files "tests/*/custom/test_*.py" | sed -E 's#tests/([^/]+)/custom/(test_.*)#\1/\2#' | sort > /tmp/head.txt
|
||||
diff /tmp/pre.txt /tmp/head.txt
|
||||
```
|
||||
7. Content-change audit (only non-100%-rename files): `git show 44e0242 --find-renames=40% --stat` — every test file with a non-zero diff is docstring/comment or sys.path-redirect only; assertion bodies untouched.
|
||||
8. Whole-repo stale-consumer grep (nothing keys off old folder names outside discovery.py's alias handling):
|
||||
`git grep -nE "['\"/](functional|playwright)/" -- ':!tests/**' ':!docs/**' ':!machine-docs/**' ':!README.md'`
|
||||
and `git grep -nE "== ['\"](functional|playwright)['\"]" -- 'runner/**'`
|
||||
9. Deprecated-alias live probe (custom/ + both deprecated subdirs discovered, warnings fire, deterministic order):
|
||||
```
|
||||
nix shell nixpkgs#python311 -c python3 -c "
|
||||
import sys,os,tempfile,unittest.mock as mock
|
||||
sys.path.insert(0,'runner'); from harness import discovery
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
d=os.path.join(tmp,'tests','probe')
|
||||
for s in ('functional','playwright','custom'): os.makedirs(os.path.join(d,s))
|
||||
open(os.path.join(d,'custom','test_new.py'),'w').write('#x')
|
||||
open(os.path.join(d,'functional','test_old.py'),'w').write('#x')
|
||||
open(os.path.join(d,'playwright','test_ui.py'),'w').write('#x')
|
||||
with mock.patch.object(discovery,'cc_ci_dir',lambda r: os.path.join(tmp,'tests',r)):
|
||||
print('found:',[os.path.basename(p) for _,p in discovery.custom_tests('probe',None)])
|
||||
"
|
||||
```
|
||||
10. Unit suite: `nix shell nixpkgs#python311Packages.pytest -c pytest tests/unit/test_discovery.py tests/unit/test_discovery_phase2.py tests/unit/test_manifest.py -q`
|
||||
11. RUNG name unchanged: `grep 'functional' runner/harness/level.py`
|
||||
12. Clean tree: `git status --short`
|
||||
|
||||
EXPECTED:
|
||||
1. `64`
|
||||
2. `0`
|
||||
3. `0`
|
||||
4. `64`
|
||||
5. matches baseline table below exactly
|
||||
6. empty diff (`IDENTICAL SET`) — no file added/removed, only folder path changed
|
||||
7. only these files have content changes, all non-semantic: discovery.py (+alias handling), manifest.py (sub→"custom"), unit tests (folder-name fixtures + 1 ADDED test), custom-html test_browser_smoke.py (docstring), keycloak ×2 (comment), lasuite-drive/-meet oidc (docstring SOURCE comment), mailu ops/test_backup/test_restore (sys.path functional→custom redirect to moved `_mailu.py`), drone/ghost/lasuite-docs/lasuite-drive recipe_meta+install_steps (comments)
|
||||
8. only `runner/harness/discovery.py` (docstring + intentional alias lines); manifest.py grep empty (no branch on folder name as value)
|
||||
9. `found: ['test_new.py', 'test_old.py', 'test_ui.py']` + 2 `WARNING [cfold]` lines for functional/ and playwright/
|
||||
10. `18 passed`
|
||||
11. `RUNGS = ("install", "upgrade", "backup_restore", "functional", "lint")` — folder rename did NOT touch the L4 RUNG name
|
||||
12. clean (nothing to commit)
|
||||
|
||||
WHERE:
|
||||
- Implementation commit: `44e0242`; pre-cfold tree: `44e0242^`; HEAD: `42413b6`
|
||||
- Discovery + alias warnings: `runner/harness/discovery.py:106` (`subdirs = ("custom","functional","playwright")`, warning at the `sub != "custom"` branch)
|
||||
- Canonical manifest counts: `runner/harness/manifest.py:55` (`sub = "custom"`)
|
||||
- Migrated custom tests/helpers: `tests/*/custom/`
|
||||
- Lifecycle overlays (must stay top-level): `tests/*/test_{install,upgrade,backup,restore}.py`
|
||||
- RUNG names: `runner/harness/level.py`
|
||||
- Unit coverage: `tests/unit/test_discovery.py`, `test_discovery_phase2.py`, `test_manifest.py`
|
||||
- cfold full-sweep evidence: `REVIEW-cfold.md` 2026-06-13T04:11:00Z (all 20 recipes L5, custom counts match, `live_pr_apps=0`)
|
||||
|
||||
---
|
||||
|
||||
## Baseline (pre-cfold) custom test count per recipe
|
||||
|
||||
| Recipe | Pre-cfold | Post-cfold (HEAD) | Match |
|
||||
|---|---:|---:|---|
|
||||
| bluesky-pds | 4 | 4 | ✓ |
|
||||
| cryptpad | 4 | 4 | ✓ |
|
||||
| custom-html | 4 | 4 | ✓ |
|
||||
| custom-html-tiny | 1 | 1 | ✓ |
|
||||
| discourse | 3 | 3 | ✓ |
|
||||
| drone | 1 | 1 | ✓ |
|
||||
| ghost | 4 | 4 | ✓ |
|
||||
| hedgedoc | 2 | 2 | ✓ |
|
||||
| immich | 3 | 3 | ✓ |
|
||||
| keycloak | 3 | 3 | ✓ |
|
||||
| lasuite-docs | 5 | 5 | ✓ |
|
||||
| lasuite-drive | 3 | 3 | ✓ |
|
||||
| lasuite-meet | 3 | 3 | ✓ |
|
||||
| mailu | 3 | 3 | ✓ |
|
||||
| matrix-synapse | 3 | 3 | ✓ |
|
||||
| mattermost-lts | 3 | 3 | ✓ |
|
||||
| mumble | 5 | 5 | ✓ |
|
||||
| n8n | 4 | 4 | ✓ |
|
||||
| plausible | 2 | 2 | ✓ |
|
||||
| uptime-kuma | 4 | 4 | ✓ |
|
||||
| **TOTAL** | **64** | **64** | **MATCH** |
|
||||
|
||||
Cardinal coverage diff (cmd 6): the full `(recipe, filename)` SET is byte-identical pre vs post — every
|
||||
one of the 64 files maps 1:1, only the parent folder changed `functional/`|`playwright/` → `custom/`.
|
||||
|
||||
---
|
||||
|
||||
## Review Matrix — Opus 4.8 independent verdict
|
||||
|
||||
**1. Diff review** (`44e0242`, 110 files, +306/-241): PASS.
|
||||
- The 64 test files are 100% pure renames except 5 with trivial content diffs, all non-semantic:
|
||||
custom-html `test_browser_smoke.py` (docstring: plan §4.1 ref → cfold layout), keycloak
|
||||
`test_create_client_and_use.py` + `test_password_grant_token.py` (comment line only; **sys.path lines
|
||||
UNCHANGED** — functional/ and custom/ are equal depth), lasuite-drive + lasuite-meet
|
||||
`test_oidc_with_keycloak.py` (docstring SOURCE comment). No assertion, wait, or skip touched.
|
||||
- Code: `discovery.py` adds `"custom"` as the first (canonical) subdir and emits a loud
|
||||
`WARNING [cfold]` on stderr for any test still found under `functional/`/`playwright/` — all three
|
||||
still discovered, nothing dropped. `manifest.py` normalizes the reported `sub` key to `"custom"`.
|
||||
- Helper/lifecycle import fixups: mailu `ops.py`/`test_backup.py`/`test_restore.py` redirect
|
||||
`sys.path.insert(... "functional")` → `"custom"` to follow the moved `_mailu.py` helper (helper is in
|
||||
the rename list). drone/ghost/lasuite-docs/lasuite-drive `recipe_meta.py`/`install_steps.sh` are
|
||||
comment-only. All mechanical.
|
||||
|
||||
**2. Discovery parity**: PASS. 64 canonical custom tests; 0 in `functional/`/`playwright/`; per-recipe
|
||||
counts match the baseline exactly; cardinal `(recipe, filename)` set identical pre vs post (cmd 6 empty diff).
|
||||
|
||||
**3. Assertion preservation**: PASS. No assertion removed/weakened, no test skipped, no wait relaxed, no
|
||||
test renamed without equivalent coverage. The only content changes are docstring/comment text and a
|
||||
forced `sys.path` redirect (mailu). One unit test was renamed
|
||||
(`..._functional_playwright_only` → `..._custom_only`) keeping the same structural assertions, and a NEW
|
||||
unit test (`test_custom_tests_prefers_custom_and_warns_on_deprecated_aliases`) ADDS coverage.
|
||||
|
||||
**4. Old-folder behavior**: PASS — matches cfold's documented decision (deprecated-alias + loud warning).
|
||||
`functional/`/`playwright/` remain in the `subdirs` tuple, still discovered, with a per-file
|
||||
`WARNING [cfold]: test found in deprecated folder ...` to stderr. Live probe confirms: all three subdirs
|
||||
return their tests and the two deprecated ones warn. No silent coverage loss path for recipe-local tests.
|
||||
|
||||
**5. Lifecycle-overlay separation**: PASS. 0 lifecycle files (`test_{install,upgrade,backup,restore}.py`)
|
||||
under any `custom/`; 64 lifecycle overlays remain at `tests/<recipe>/` top-level. discovery still excludes
|
||||
lifecycle names inside subdirs (defensive). The L4 RUNG name `"functional"` in `level.py` is unchanged —
|
||||
only the *folder* was renamed, not the tier/rung.
|
||||
|
||||
**6. Evidence audit**: PASS. cfold M2 (REVIEW-cfold.md 2026-06-13T04:11:00Z) cold-verified a full real-CI
|
||||
`!testme` sweep: all 20 enrolled recipes green at **level 5/5** with custom-junit counts matching baseline
|
||||
(ghost 4/4, lasuite-docs 5/5, mumble 5/5, … every recipe = its baseline count), ghost upgrade junit=2,
|
||||
and `live_pr_apps=0` (zero leaked stacks). No silent level drop; no skipped custom tier.
|
||||
|
||||
**7. Cleanliness**: PASS. `git status` clean; no stray root coordination files; no leaked test stacks
|
||||
(live_pr_apps=0); no stale temp scripts or uncommitted implementation files; `machine-docs/` holds only
|
||||
phase-namespaced state.
|
||||
|
||||
---
|
||||
|
||||
## cf55-vs-cf48 agreement note
|
||||
|
||||
**Agreement: FULL.** Both reviews independently reach **NO COVERAGE LOST** and PASS on all 7 categories.
|
||||
The two cross-validating models were **cf55 = claude-sonnet-4-6** (plan named GPT-5.5, but prior GPT-5.x
|
||||
loops stopped on a launcher model-mismatch and the orchestrator relaunched cf55 on Claude Sonnet 4.6 —
|
||||
recorded in STATUS-cf55.md / REVIEW-cf55.md) and **cf48 = claude-opus-4-8**. So the actual cross-check is
|
||||
Sonnet 4.6 vs Opus 4.8 (both Claude), not GPT vs Claude — noted honestly; it still gives two independent
|
||||
models over the same commit.
|
||||
|
||||
One **discrepancy** worth surfacing (per phase instruction to note where the two reviews differ):
|
||||
- cf55's diff-review narrative states the keycloak custom tests had a `sys.path.insert` *depth* adjusted
|
||||
`../..` → `../../..`. The actual `44e0242` diff shows the keycloak `sys.path` lines are **UNCHANGED** —
|
||||
only the adjacent comment was edited. (No adjustment was needed: `functional/` and `custom/` sit at the
|
||||
same depth under `tests/keycloak/`.) This is a cf55 narrative inaccuracy, not a coverage defect — both
|
||||
reviews still correctly conclude the keycloak tests are intact. cf48 catches it; cf55 missed it.
|
||||
|
||||
No category where cf48 found a regression that cf55 cleared, or vice-versa. No blocking findings on either side.
|
||||
|
||||
---
|
||||
|
||||
## Final Verdict
|
||||
|
||||
**NO COVERAGE LOST.** cfold (`44e0242`) preserved the complete pre-cfold custom-test set: all 64 tests
|
||||
relocated 1:1 from `functional/`/`playwright/` into canonical `custom/`, identical `(recipe, filename)`
|
||||
set, per-recipe counts unchanged, zero assertions weakened, deprecated aliases retained with loud
|
||||
warnings, lifecycle overlays untouched at top-level, RUNG name preserved, and a full real-CI sweep green
|
||||
at L5 across all 20 recipes with zero leaks. Awaiting Adversary M1 + M2 PASS in REVIEW-cf48.md.
|
||||
Reference in New Issue
Block a user