STATUS — phase cfold (custom-folder collapse)

Phase: cfold — collapse functional/+playwright/ into custom/ Builder: autonomic-bot Updated: 2026-06-12

M1 — PASS

Gate result: REVIEW-cfold.md 2026-06-12T16:20Z -> M1 PASS

Inputs for verification:

Implementation commit: 44e0242 (feat(cfold): canonicalize custom test layout)

Completed in this checkpoint:

discovery.py: custom/ canonical + deprecated aliases with warnings
git mv all 64 custom tests (60 functional + 4 playwright) across 20 recipes
helper modules moved alongside their tests into custom/
sys.path refs updated in mailu lifecycle overlays
docs updated (README.md, recipe-customization.md, testing.md, enroll-recipe.md)
unit tests updated (test_discovery.py, test_discovery_phase2.py, test_manifest.py)
manifest.py now reports canonical custom counts

WHAT:

M1 implementation is complete: custom-test discovery is canonicalized to custom/, deprecated aliases warn loudly instead of silently dropping coverage, all cc-ci custom tests/helpers moved to tests/<recipe>/custom/, manifest counts are canonicalized, and the placement-rule docs/unit tests were updated.

HOW:

git ls-files "tests/*/custom/test_*.py" | wc -l
git ls-files "tests/*/functional/*" "tests/*/playwright/*"
for recipe in bluesky-pds cryptpad custom-html custom-html-tiny discourse drone ghost hedgedoc immich keycloak lasuite-docs lasuite-drive lasuite-meet mailu matrix-synapse mattermost-lts mumble n8n plausible uptime-kuma; do count=$(git ls-files "tests/$recipe/custom/test_*.py" | wc -l); printf "%s %s\n" "$recipe" "$count"; done
nix shell nixpkgs#python311Packages.pytest -c pytest tests/unit/test_discovery.py tests/unit/test_discovery_phase2.py tests/unit/test_manifest.py -q

EXPECTED:

Total canonical custom tests: 64
Old tracked trees: no output for functional/* or playwright/*
Per-recipe counts exactly match the baseline table below
Focused unit suite: 18 passed

WHERE:

Discovery + alias warnings: runner/harness/discovery.py
Canonical manifest counts: runner/harness/manifest.py
Migrated custom tests/helpers: tests/*/custom/
Focused unit coverage: tests/unit/test_discovery.py, tests/unit/test_discovery_phase2.py, tests/unit/test_manifest.py
Placement-rule docs: docs/recipe-customization.md, docs/testing.md, docs/enroll-recipe.md, README.md

Adversary verdict:

machine-docs/REVIEW-cfold.md lines 52-77
PASS facts include: 64 canonical custom tests, zero old tracked custom trees, focused unit suite 18 passed, deprecated-alias warning probe green, normalized (recipe, filename) coverage set preserved exactly (missing [], extra []).

M2 — IN PROGRESS

Current work item:

full real-CI !testme sweep evidence is mostly assembled; one recipe (ghost) remains non-green for a cfold-neutral upgrade regression on the recipe/environment side
fresh follow-up probes now show the Ghost upgrade failure is not confined to PR #4 / PR #5: a reopened PR #3 at ref 720faa0b also re-failed twice post-cfold (568, 569) with the same shape
the Ghost duplicate-trigger side issue is now root-caused in the bridge source: reopened PRs can replay old pre-bridge-start !testme comments that were never seen during startup because the PR was closed at that time; repo fix landed locally and is being carried through deployment verification

M2 baseline matrix (built from live PR heads + fresh post-cfold evidence)

Recipe	PR / ref	Expected level	Custom tests	Fresh evidence
bluesky-pds	PR #2 `f7b6c8df`	5	4	build `556` -> L5
cryptpad	PR #5 `9c18c176`	5	4	build `554` -> L5
custom-html	PR #2 `db9a9502`	5	4	build `541` -> L5
custom-html-tiny	PR #7 `526502ba`	5	1	build `510` -> L5
discourse	PR #2 `b7d8a244`	5	3	build `521` -> L5
drone	PR #1 `049438e1`	5	1	build `506` -> L5
ghost	PR #3 `720faa0b`	5	4	build `568` -> L1 (upgrade fail)
hedgedoc	PR #1 `441c411c`	5	2	build `555` -> L5
immich	PR #2 `17f1649c`	5	3	build `522` -> L5
keycloak	PR #3 `bfe0d16f`	5	3	build `553` -> L5
lasuite-docs	PR #5 `8a06cfc2`	5	5	build `523` -> L5
lasuite-drive	PR #2 `6771622b`	5	3	build `524` -> L5
lasuite-meet	PR #6 `05cdafb5`	5	3	build `525` -> L5
mailu	PR #4 `682ccaaa`	5	3	build `526` -> L5
matrix-synapse	PR #2 `72f0176a`	5	3	build `527` -> L5
mattermost-lts	PR #2 `966c6d61`	5	3	build `529` -> L5
mumble	PR #1 `2b50b2f7`	5	5	build `558` -> L5
n8n	PR #5 `989c44b3`	5	4	build `528` -> L5
plausible	PR #3 `709a294d`	5	2	build `530` -> L5
uptime-kuma	PR #3 `b0ce7942`	5	4	build `531` -> L5

Ghost deviation (blocking a formal M2 claim)

ghost is the only recipe still preventing an M2 claim.

Current upgrade PR heads and fresh post-cfold outcomes are all red with the same stage shape:
- PR #3 720faa0b: builds 568 and 569 -> L1; install/backup/restore/custom/lint pass, upgrade fail
- PR #4 d88f5801: build 557 -> L1; install/backup/restore/custom pass, upgrade fail
- PR #5 d42d0f7c: build 559 -> L1; install/backup/restore/custom/lint pass, upgrade fail
Focused artifact audit still confirms the strongest same-ref comparison explicitly: historical build 185 (d42d0f7c7cf9) had upgrade=pass, while fresh build 559 on that same ref has upgrade=fail with the canonical custom stage still green.
The fresh PR #3 rerun adds a second previously-green Ghost upgrade head that now fails the same way, so the blocker is broader than a single Ghost branch and still points away from cfold itself.
Side observation from the PR #3 retrigger: a single !testme comment at 2026-06-13T00:07:50Z spawned three new Ghost runs (568, 569, 570). All three are now red with the same upgrade-only failure.
Root cause of the triple-trigger: bridge logs show those three runs were tied to three distinct comment ids on the reopened PR (14029, 14032, 14497), not one comment processed three times. The poller replayed two historical !testme comments that predated the current bridge process because PR #3 was closed during bridge startup and only became visible to the poller after reopen.
Conclusion so far: Ghost's current failure is not caused by the custom/ folder migration; the custom tier still discovers and passes all 4 canonical custom tests, and the regression reproduces across multiple Ghost PR heads as an upgrade convergence failure.

Fresh Adversary state

REVIEW-cfold.md 2026-06-12T23:45:11Z: cold Ghost follow-up audit only, no new finding, no M2 claim pending.
REVIEW-cfold.md 2026-06-13T00:23:55Z: cold M2 artifact/teardown audit only, no new finding, no M2 claim pending; zero leaked live -pr stacks confirmed.

Baseline (pre-cfold) — custom test count per recipe

Recipe	Count
bluesky-pds	4
cryptpad	4
custom-html	4
custom-html-tiny	1
discourse	3
drone	1
ghost	4
hedgedoc	2
immich	3
keycloak	3
lasuite-docs	5
lasuite-drive	3
lasuite-meet	3
mailu	3
matrix-synapse	3
mattermost-lts	3
mumble	5
n8n	4
plausible	2
uptime-kuma	4
TOTAL	64

7.3 KiB Raw Blame History