Files
cc-ci/machine-docs/STATUS-cf48.md
autonomic-bot 580c250497
Some checks failed
continuous-integration/drone/push Build is failing
claim(cf48): Opus 4.8 cold review matrix complete — NO COVERAGE LOST
Independent cross-validation of cfold 44e0242. All 7 categories PASS:
cardinal (recipe,filename) coverage set identical pre/post (64=64), per-recipe
counts match baseline, no assertions weakened, deprecated aliases warn, lifecycle
overlays top-level, RUNG name intact, cfold M2 sweep all-20 L5 zero leaks.
cf55(sonnet-4.6) vs cf48(opus-4.8) FULL agreement; cf48 also caught a cf55
narrative slip (keycloak sys.path unchanged, not depth-adjusted).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 05:24:46 +00:00

185 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# STATUS — phase cf48
**Phase:** cf48 — Opus 4.8 post-cfold coverage-loss review (independent cross-validation of cf55)
**Builder:** autonomic-bot
**Model:** `claude-opus-4-8` (claude backend) — matches phase Model Requirement
**Updated:** 2026-06-13T05:30Z
---
## Gate: M1 — CLAIMED, awaiting Adversary
WHAT:
- Independent Opus 4.8 cold review of the cfold custom-folder collapse, covering all 7 required
categories across all 20 enrolled recipes, plus a cf55-vs-cf48 agreement note.
- Implementation commit under review: `44e0242` (`feat(cfold): canonicalize custom test layout`).
Parent (pre-cfold baseline tree): `44e0242^` = `87928a9`. Current HEAD: `42413b6` (no test-tree drift since cfold).
- Verdict: **NO COVERAGE LOST** — cfold preserved the full pre-cfold custom-test set.
HOW (Adversary can re-run each from a fresh clone of origin/main):
1. Canonical custom test count: `git ls-files "tests/*/custom/test_*.py" | wc -l`
2. Stale old-folder test files: `git ls-files "tests/*/functional/*" "tests/*/playwright/*" | grep test_ | wc -l`
3. Lifecycle overlays leaked into custom/: `git ls-files "tests/*/custom/test_install.py" "tests/*/custom/test_upgrade.py" "tests/*/custom/test_backup.py" "tests/*/custom/test_restore.py" | wc -l`
4. Lifecycle overlays still at top-level: `git ls-files "tests/*/test_install.py" "tests/*/test_upgrade.py" "tests/*/test_backup.py" "tests/*/test_restore.py" | wc -l`
5. Per-recipe count vs baseline:
`for r in bluesky-pds cryptpad custom-html custom-html-tiny discourse drone ghost hedgedoc immich keycloak lasuite-docs lasuite-drive lasuite-meet mailu matrix-synapse mattermost-lts mumble n8n plausible uptime-kuma; do printf "%s %s\n" "$r" "$(git ls-files "tests/$r/custom/test_*.py" | wc -l)"; done`
6. CARDINAL coverage diff — pre-cfold `(recipe, filename)` set vs post-cfold, must be identical:
```
git ls-tree -r --name-only 44e0242^ | grep -E '^tests/[^/]+/(functional|playwright)/test_.*\.py$' | sed -E 's#tests/([^/]+)/(functional|playwright)/(test_.*)#\1/\3#' | sort > /tmp/pre.txt
git ls-files "tests/*/custom/test_*.py" | sed -E 's#tests/([^/]+)/custom/(test_.*)#\1/\2#' | sort > /tmp/head.txt
diff /tmp/pre.txt /tmp/head.txt
```
7. Content-change audit (only non-100%-rename files): `git show 44e0242 --find-renames=40% --stat` — every test file with a non-zero diff is docstring/comment or sys.path-redirect only; assertion bodies untouched.
8. Whole-repo stale-consumer grep (nothing keys off old folder names outside discovery.py's alias handling):
`git grep -nE "['\"/](functional|playwright)/" -- ':!tests/**' ':!docs/**' ':!machine-docs/**' ':!README.md'`
and `git grep -nE "== ['\"](functional|playwright)['\"]" -- 'runner/**'`
9. Deprecated-alias live probe (custom/ + both deprecated subdirs discovered, warnings fire, deterministic order):
```
nix shell nixpkgs#python311 -c python3 -c "
import sys,os,tempfile,unittest.mock as mock
sys.path.insert(0,'runner'); from harness import discovery
with tempfile.TemporaryDirectory() as tmp:
d=os.path.join(tmp,'tests','probe')
for s in ('functional','playwright','custom'): os.makedirs(os.path.join(d,s))
open(os.path.join(d,'custom','test_new.py'),'w').write('#x')
open(os.path.join(d,'functional','test_old.py'),'w').write('#x')
open(os.path.join(d,'playwright','test_ui.py'),'w').write('#x')
with mock.patch.object(discovery,'cc_ci_dir',lambda r: os.path.join(tmp,'tests',r)):
print('found:',[os.path.basename(p) for _,p in discovery.custom_tests('probe',None)])
"
```
10. Unit suite: `nix shell nixpkgs#python311Packages.pytest -c pytest tests/unit/test_discovery.py tests/unit/test_discovery_phase2.py tests/unit/test_manifest.py -q`
11. RUNG name unchanged: `grep 'functional' runner/harness/level.py`
12. Clean tree: `git status --short`
EXPECTED:
1. `64`
2. `0`
3. `0`
4. `64`
5. matches baseline table below exactly
6. empty diff (`IDENTICAL SET`) — no file added/removed, only folder path changed
7. only these files have content changes, all non-semantic: discovery.py (+alias handling), manifest.py (sub→"custom"), unit tests (folder-name fixtures + 1 ADDED test), custom-html test_browser_smoke.py (docstring), keycloak ×2 (comment), lasuite-drive/-meet oidc (docstring SOURCE comment), mailu ops/test_backup/test_restore (sys.path functional→custom redirect to moved `_mailu.py`), drone/ghost/lasuite-docs/lasuite-drive recipe_meta+install_steps (comments)
8. only `runner/harness/discovery.py` (docstring + intentional alias lines); manifest.py grep empty (no branch on folder name as value)
9. `found: ['test_new.py', 'test_old.py', 'test_ui.py']` + 2 `WARNING [cfold]` lines for functional/ and playwright/
10. `18 passed`
11. `RUNGS = ("install", "upgrade", "backup_restore", "functional", "lint")` — folder rename did NOT touch the L4 RUNG name
12. clean (nothing to commit)
WHERE:
- Implementation commit: `44e0242`; pre-cfold tree: `44e0242^`; HEAD: `42413b6`
- Discovery + alias warnings: `runner/harness/discovery.py:106` (`subdirs = ("custom","functional","playwright")`, warning at the `sub != "custom"` branch)
- Canonical manifest counts: `runner/harness/manifest.py:55` (`sub = "custom"`)
- Migrated custom tests/helpers: `tests/*/custom/`
- Lifecycle overlays (must stay top-level): `tests/*/test_{install,upgrade,backup,restore}.py`
- RUNG names: `runner/harness/level.py`
- Unit coverage: `tests/unit/test_discovery.py`, `test_discovery_phase2.py`, `test_manifest.py`
- cfold full-sweep evidence: `REVIEW-cfold.md` 2026-06-13T04:11:00Z (all 20 recipes L5, custom counts match, `live_pr_apps=0`)
---
## Baseline (pre-cfold) custom test count per recipe
| Recipe | Pre-cfold | Post-cfold (HEAD) | Match |
|---|---:|---:|---|
| bluesky-pds | 4 | 4 | ✓ |
| cryptpad | 4 | 4 | ✓ |
| custom-html | 4 | 4 | ✓ |
| custom-html-tiny | 1 | 1 | ✓ |
| discourse | 3 | 3 | ✓ |
| drone | 1 | 1 | ✓ |
| ghost | 4 | 4 | ✓ |
| hedgedoc | 2 | 2 | ✓ |
| immich | 3 | 3 | ✓ |
| keycloak | 3 | 3 | ✓ |
| lasuite-docs | 5 | 5 | ✓ |
| lasuite-drive | 3 | 3 | ✓ |
| lasuite-meet | 3 | 3 | ✓ |
| mailu | 3 | 3 | ✓ |
| matrix-synapse | 3 | 3 | ✓ |
| mattermost-lts | 3 | 3 | ✓ |
| mumble | 5 | 5 | ✓ |
| n8n | 4 | 4 | ✓ |
| plausible | 2 | 2 | ✓ |
| uptime-kuma | 4 | 4 | ✓ |
| **TOTAL** | **64** | **64** | **MATCH** |
Cardinal coverage diff (cmd 6): the full `(recipe, filename)` SET is byte-identical pre vs post — every
one of the 64 files maps 1:1, only the parent folder changed `functional/`|`playwright/` → `custom/`.
---
## Review Matrix — Opus 4.8 independent verdict
**1. Diff review** (`44e0242`, 110 files, +306/-241): PASS.
- The 64 test files are 100% pure renames except 5 with trivial content diffs, all non-semantic:
custom-html `test_browser_smoke.py` (docstring: plan §4.1 ref → cfold layout), keycloak
`test_create_client_and_use.py` + `test_password_grant_token.py` (comment line only; **sys.path lines
UNCHANGED** — functional/ and custom/ are equal depth), lasuite-drive + lasuite-meet
`test_oidc_with_keycloak.py` (docstring SOURCE comment). No assertion, wait, or skip touched.
- Code: `discovery.py` adds `"custom"` as the first (canonical) subdir and emits a loud
`WARNING [cfold]` on stderr for any test still found under `functional/`/`playwright/` — all three
still discovered, nothing dropped. `manifest.py` normalizes the reported `sub` key to `"custom"`.
- Helper/lifecycle import fixups: mailu `ops.py`/`test_backup.py`/`test_restore.py` redirect
`sys.path.insert(... "functional")` → `"custom"` to follow the moved `_mailu.py` helper (helper is in
the rename list). drone/ghost/lasuite-docs/lasuite-drive `recipe_meta.py`/`install_steps.sh` are
comment-only. All mechanical.
**2. Discovery parity**: PASS. 64 canonical custom tests; 0 in `functional/`/`playwright/`; per-recipe
counts match the baseline exactly; cardinal `(recipe, filename)` set identical pre vs post (cmd 6 empty diff).
**3. Assertion preservation**: PASS. No assertion removed/weakened, no test skipped, no wait relaxed, no
test renamed without equivalent coverage. The only content changes are docstring/comment text and a
forced `sys.path` redirect (mailu). One unit test was renamed
(`..._functional_playwright_only` → `..._custom_only`) keeping the same structural assertions, and a NEW
unit test (`test_custom_tests_prefers_custom_and_warns_on_deprecated_aliases`) ADDS coverage.
**4. Old-folder behavior**: PASS — matches cfold's documented decision (deprecated-alias + loud warning).
`functional/`/`playwright/` remain in the `subdirs` tuple, still discovered, with a per-file
`WARNING [cfold]: test found in deprecated folder ...` to stderr. Live probe confirms: all three subdirs
return their tests and the two deprecated ones warn. No silent coverage loss path for recipe-local tests.
**5. Lifecycle-overlay separation**: PASS. 0 lifecycle files (`test_{install,upgrade,backup,restore}.py`)
under any `custom/`; 64 lifecycle overlays remain at `tests/<recipe>/` top-level. discovery still excludes
lifecycle names inside subdirs (defensive). The L4 RUNG name `"functional"` in `level.py` is unchanged —
only the *folder* was renamed, not the tier/rung.
**6. Evidence audit**: PASS. cfold M2 (REVIEW-cfold.md 2026-06-13T04:11:00Z) cold-verified a full real-CI
`!testme` sweep: all 20 enrolled recipes green at **level 5/5** with custom-junit counts matching baseline
(ghost 4/4, lasuite-docs 5/5, mumble 5/5, … every recipe = its baseline count), ghost upgrade junit=2,
and `live_pr_apps=0` (zero leaked stacks). No silent level drop; no skipped custom tier.
**7. Cleanliness**: PASS. `git status` clean; no stray root coordination files; no leaked test stacks
(live_pr_apps=0); no stale temp scripts or uncommitted implementation files; `machine-docs/` holds only
phase-namespaced state.
---
## cf55-vs-cf48 agreement note
**Agreement: FULL.** Both reviews independently reach **NO COVERAGE LOST** and PASS on all 7 categories.
The two cross-validating models were **cf55 = claude-sonnet-4-6** (plan named GPT-5.5, but prior GPT-5.x
loops stopped on a launcher model-mismatch and the orchestrator relaunched cf55 on Claude Sonnet 4.6 —
recorded in STATUS-cf55.md / REVIEW-cf55.md) and **cf48 = claude-opus-4-8**. So the actual cross-check is
Sonnet 4.6 vs Opus 4.8 (both Claude), not GPT vs Claude — noted honestly; it still gives two independent
models over the same commit.
One **discrepancy** worth surfacing (per phase instruction to note where the two reviews differ):
- cf55's diff-review narrative states the keycloak custom tests had a `sys.path.insert` *depth* adjusted
`../..` → `../../..`. The actual `44e0242` diff shows the keycloak `sys.path` lines are **UNCHANGED** —
only the adjacent comment was edited. (No adjustment was needed: `functional/` and `custom/` sit at the
same depth under `tests/keycloak/`.) This is a cf55 narrative inaccuracy, not a coverage defect — both
reviews still correctly conclude the keycloak tests are intact. cf48 catches it; cf55 missed it.
No category where cf48 found a regression that cf55 cleared, or vice-versa. No blocking findings on either side.
---
## Final Verdict
**NO COVERAGE LOST.** cfold (`44e0242`) preserved the complete pre-cfold custom-test set: all 64 tests
relocated 1:1 from `functional/`/`playwright/` into canonical `custom/`, identical `(recipe, filename)`
set, per-recipe counts unchanged, zero assertions weakened, deprecated aliases retained with loud
warnings, lifecycle overlays untouched at top-level, RUNG name preserved, and a full real-CI sweep green
at L5 across all 20 recipes with zero leaks. Awaiting Adversary M1 + M2 PASS in REVIEW-cf48.md.