Independent cross-validation of cfold 44e0242. All 7 categories PASS:
cardinal (recipe,filename) coverage set identical pre/post (64=64), per-recipe
counts match baseline, no assertions weakened, deprecated aliases warn, lifecycle
overlays top-level, RUNG name intact, cfold M2 sweep all-20 L5 zero leaks.
cf55(sonnet-4.6) vs cf48(opus-4.8) FULL agreement; cf48 also caught a cf55
narrative slip (keycloak sys.path unchanged, not depth-adjusted).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
12 KiB
STATUS — phase cf48
Phase: cf48 — Opus 4.8 post-cfold coverage-loss review (independent cross-validation of cf55)
Builder: autonomic-bot
Model: claude-opus-4-8 (claude backend) — matches phase Model Requirement
Updated: 2026-06-13T05:30Z
Gate: M1 — CLAIMED, awaiting Adversary
WHAT:
- Independent Opus 4.8 cold review of the cfold custom-folder collapse, covering all 7 required categories across all 20 enrolled recipes, plus a cf55-vs-cf48 agreement note.
- Implementation commit under review:
44e0242(feat(cfold): canonicalize custom test layout). Parent (pre-cfold baseline tree):44e0242^=87928a9. Current HEAD:42413b6(no test-tree drift since cfold). - Verdict: NO COVERAGE LOST — cfold preserved the full pre-cfold custom-test set.
HOW (Adversary can re-run each from a fresh clone of origin/main):
- Canonical custom test count:
git ls-files "tests/*/custom/test_*.py" | wc -l - Stale old-folder test files:
git ls-files "tests/*/functional/*" "tests/*/playwright/*" | grep test_ | wc -l - Lifecycle overlays leaked into custom/:
git ls-files "tests/*/custom/test_install.py" "tests/*/custom/test_upgrade.py" "tests/*/custom/test_backup.py" "tests/*/custom/test_restore.py" | wc -l - Lifecycle overlays still at top-level:
git ls-files "tests/*/test_install.py" "tests/*/test_upgrade.py" "tests/*/test_backup.py" "tests/*/test_restore.py" | wc -l - Per-recipe count vs baseline:
for r in bluesky-pds cryptpad custom-html custom-html-tiny discourse drone ghost hedgedoc immich keycloak lasuite-docs lasuite-drive lasuite-meet mailu matrix-synapse mattermost-lts mumble n8n plausible uptime-kuma; do printf "%s %s\n" "$r" "$(git ls-files "tests/$r/custom/test_*.py" | wc -l)"; done - CARDINAL coverage diff — pre-cfold
(recipe, filename)set vs post-cfold, must be identical:git ls-tree -r --name-only 44e0242^ | grep -E '^tests/[^/]+/(functional|playwright)/test_.*\.py$' | sed -E 's#tests/([^/]+)/(functional|playwright)/(test_.*)#\1/\3#' | sort > /tmp/pre.txt git ls-files "tests/*/custom/test_*.py" | sed -E 's#tests/([^/]+)/custom/(test_.*)#\1/\2#' | sort > /tmp/head.txt diff /tmp/pre.txt /tmp/head.txt - Content-change audit (only non-100%-rename files):
git show 44e0242 --find-renames=40% --stat— every test file with a non-zero diff is docstring/comment or sys.path-redirect only; assertion bodies untouched. - Whole-repo stale-consumer grep (nothing keys off old folder names outside discovery.py's alias handling):
git grep -nE "['\"/](functional|playwright)/" -- ':!tests/**' ':!docs/**' ':!machine-docs/**' ':!README.md'andgit grep -nE "== ['\"](functional|playwright)['\"]" -- 'runner/**' - Deprecated-alias live probe (custom/ + both deprecated subdirs discovered, warnings fire, deterministic order):
nix shell nixpkgs#python311 -c python3 -c " import sys,os,tempfile,unittest.mock as mock sys.path.insert(0,'runner'); from harness import discovery with tempfile.TemporaryDirectory() as tmp: d=os.path.join(tmp,'tests','probe') for s in ('functional','playwright','custom'): os.makedirs(os.path.join(d,s)) open(os.path.join(d,'custom','test_new.py'),'w').write('#x') open(os.path.join(d,'functional','test_old.py'),'w').write('#x') open(os.path.join(d,'playwright','test_ui.py'),'w').write('#x') with mock.patch.object(discovery,'cc_ci_dir',lambda r: os.path.join(tmp,'tests',r)): print('found:',[os.path.basename(p) for _,p in discovery.custom_tests('probe',None)]) " - Unit suite:
nix shell nixpkgs#python311Packages.pytest -c pytest tests/unit/test_discovery.py tests/unit/test_discovery_phase2.py tests/unit/test_manifest.py -q - RUNG name unchanged:
grep 'functional' runner/harness/level.py - Clean tree:
git status --short
EXPECTED:
640064- matches baseline table below exactly
- empty diff (
IDENTICAL SET) — no file added/removed, only folder path changed - only these files have content changes, all non-semantic: discovery.py (+alias handling), manifest.py (sub→"custom"), unit tests (folder-name fixtures + 1 ADDED test), custom-html test_browser_smoke.py (docstring), keycloak ×2 (comment), lasuite-drive/-meet oidc (docstring SOURCE comment), mailu ops/test_backup/test_restore (sys.path functional→custom redirect to moved
_mailu.py), drone/ghost/lasuite-docs/lasuite-drive recipe_meta+install_steps (comments) - only
runner/harness/discovery.py(docstring + intentional alias lines); manifest.py grep empty (no branch on folder name as value) found: ['test_new.py', 'test_old.py', 'test_ui.py']+ 2WARNING [cfold]lines for functional/ and playwright/18 passedRUNGS = ("install", "upgrade", "backup_restore", "functional", "lint")— folder rename did NOT touch the L4 RUNG name- clean (nothing to commit)
WHERE:
- Implementation commit:
44e0242; pre-cfold tree:44e0242^; HEAD:42413b6 - Discovery + alias warnings:
runner/harness/discovery.py:106(subdirs = ("custom","functional","playwright"), warning at thesub != "custom"branch) - Canonical manifest counts:
runner/harness/manifest.py:55(sub = "custom") - Migrated custom tests/helpers:
tests/*/custom/ - Lifecycle overlays (must stay top-level):
tests/*/test_{install,upgrade,backup,restore}.py - RUNG names:
runner/harness/level.py - Unit coverage:
tests/unit/test_discovery.py,test_discovery_phase2.py,test_manifest.py - cfold full-sweep evidence:
REVIEW-cfold.md2026-06-13T04:11:00Z (all 20 recipes L5, custom counts match,live_pr_apps=0)
Baseline (pre-cfold) custom test count per recipe
| Recipe | Pre-cfold | Post-cfold (HEAD) | Match |
|---|---|---|---|
| bluesky-pds | 4 | 4 | ✓ |
| cryptpad | 4 | 4 | ✓ |
| custom-html | 4 | 4 | ✓ |
| custom-html-tiny | 1 | 1 | ✓ |
| discourse | 3 | 3 | ✓ |
| drone | 1 | 1 | ✓ |
| ghost | 4 | 4 | ✓ |
| hedgedoc | 2 | 2 | ✓ |
| immich | 3 | 3 | ✓ |
| keycloak | 3 | 3 | ✓ |
| lasuite-docs | 5 | 5 | ✓ |
| lasuite-drive | 3 | 3 | ✓ |
| lasuite-meet | 3 | 3 | ✓ |
| mailu | 3 | 3 | ✓ |
| matrix-synapse | 3 | 3 | ✓ |
| mattermost-lts | 3 | 3 | ✓ |
| mumble | 5 | 5 | ✓ |
| n8n | 4 | 4 | ✓ |
| plausible | 2 | 2 | ✓ |
| uptime-kuma | 4 | 4 | ✓ |
| TOTAL | 64 | 64 | MATCH |
Cardinal coverage diff (cmd 6): the full (recipe, filename) SET is byte-identical pre vs post — every
one of the 64 files maps 1:1, only the parent folder changed functional/|playwright/ → custom/.
Review Matrix — Opus 4.8 independent verdict
1. Diff review (44e0242, 110 files, +306/-241): PASS.
- The 64 test files are 100% pure renames except 5 with trivial content diffs, all non-semantic:
custom-html
test_browser_smoke.py(docstring: plan §4.1 ref → cfold layout), keycloaktest_create_client_and_use.py+test_password_grant_token.py(comment line only; sys.path lines UNCHANGED — functional/ and custom/ are equal depth), lasuite-drive + lasuite-meettest_oidc_with_keycloak.py(docstring SOURCE comment). No assertion, wait, or skip touched. - Code:
discovery.pyadds"custom"as the first (canonical) subdir and emits a loudWARNING [cfold]on stderr for any test still found underfunctional//playwright/— all three still discovered, nothing dropped.manifest.pynormalizes the reportedsubkey to"custom". - Helper/lifecycle import fixups: mailu
ops.py/test_backup.py/test_restore.pyredirectsys.path.insert(... "functional")→"custom"to follow the moved_mailu.pyhelper (helper is in the rename list). drone/ghost/lasuite-docs/lasuite-driverecipe_meta.py/install_steps.share comment-only. All mechanical.
2. Discovery parity: PASS. 64 canonical custom tests; 0 in functional//playwright/; per-recipe
counts match the baseline exactly; cardinal (recipe, filename) set identical pre vs post (cmd 6 empty diff).
3. Assertion preservation: PASS. No assertion removed/weakened, no test skipped, no wait relaxed, no
test renamed without equivalent coverage. The only content changes are docstring/comment text and a
forced sys.path redirect (mailu). One unit test was renamed
(..._functional_playwright_only → ..._custom_only) keeping the same structural assertions, and a NEW
unit test (test_custom_tests_prefers_custom_and_warns_on_deprecated_aliases) ADDS coverage.
4. Old-folder behavior: PASS — matches cfold's documented decision (deprecated-alias + loud warning).
functional//playwright/ remain in the subdirs tuple, still discovered, with a per-file
WARNING [cfold]: test found in deprecated folder ... to stderr. Live probe confirms: all three subdirs
return their tests and the two deprecated ones warn. No silent coverage loss path for recipe-local tests.
5. Lifecycle-overlay separation: PASS. 0 lifecycle files (test_{install,upgrade,backup,restore}.py)
under any custom/; 64 lifecycle overlays remain at tests/<recipe>/ top-level. discovery still excludes
lifecycle names inside subdirs (defensive). The L4 RUNG name "functional" in level.py is unchanged —
only the folder was renamed, not the tier/rung.
6. Evidence audit: PASS. cfold M2 (REVIEW-cfold.md 2026-06-13T04:11:00Z) cold-verified a full real-CI
!testme sweep: all 20 enrolled recipes green at level 5/5 with custom-junit counts matching baseline
(ghost 4/4, lasuite-docs 5/5, mumble 5/5, … every recipe = its baseline count), ghost upgrade junit=2,
and live_pr_apps=0 (zero leaked stacks). No silent level drop; no skipped custom tier.
7. Cleanliness: PASS. git status clean; no stray root coordination files; no leaked test stacks
(live_pr_apps=0); no stale temp scripts or uncommitted implementation files; machine-docs/ holds only
phase-namespaced state.
cf55-vs-cf48 agreement note
Agreement: FULL. Both reviews independently reach NO COVERAGE LOST and PASS on all 7 categories. The two cross-validating models were cf55 = claude-sonnet-4-6 (plan named GPT-5.5, but prior GPT-5.x loops stopped on a launcher model-mismatch and the orchestrator relaunched cf55 on Claude Sonnet 4.6 — recorded in STATUS-cf55.md / REVIEW-cf55.md) and cf48 = claude-opus-4-8. So the actual cross-check is Sonnet 4.6 vs Opus 4.8 (both Claude), not GPT vs Claude — noted honestly; it still gives two independent models over the same commit.
One discrepancy worth surfacing (per phase instruction to note where the two reviews differ):
- cf55's diff-review narrative states the keycloak custom tests had a
sys.path.insertdepth adjusted../..→../../... The actual44e0242diff shows the keycloaksys.pathlines are UNCHANGED — only the adjacent comment was edited. (No adjustment was needed:functional/andcustom/sit at the same depth undertests/keycloak/.) This is a cf55 narrative inaccuracy, not a coverage defect — both reviews still correctly conclude the keycloak tests are intact. cf48 catches it; cf55 missed it.
No category where cf48 found a regression that cf55 cleared, or vice-versa. No blocking findings on either side.
Final Verdict
NO COVERAGE LOST. cfold (44e0242) preserved the complete pre-cfold custom-test set: all 64 tests
relocated 1:1 from functional//playwright/ into canonical custom/, identical (recipe, filename)
set, per-recipe counts unchanged, zero assertions weakened, deprecated aliases retained with loud
warnings, lifecycle overlays untouched at top-level, RUNG name preserved, and a full real-CI sweep green
at L5 across all 20 recipes with zero leaks. Awaiting Adversary M1 + M2 PASS in REVIEW-cf48.md.