Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-cfold-custom-folder.md
autonomic-bot af2b2e8156 plan: phase 'cfold' — collapse functional/+playwright/ into custom/ + full !testme recipe sweep (queued after drone)
The functional/playwright split is purely organizational (discovery globs both
with no branching; same custom tier -> L4 rung, same fixtures, same failure
semantics). Migrate all custom tests to one custom/ folder; M1 proves coverage
identical before/after (no silent drops), M2 is a full real-CI !testme sweep
across all recipes confirming levels unchanged. cfold becomes the last phase so
the queued /upgrade-all fires after it (folder change verified before upgrade).
2026-06-11 22:52:45 +00:00

5.7 KiB

Phase cfold — collapse custom-test folders into one custom/ + full recipe CI sweep

Mission (operator-specified): custom recipe tests currently live in TWO folders — tests/<recipe>/functional/ and tests/<recipe>/playwright/ — a split that is purely organizational (the harness treats both identically). Collapse them into a single tests/<recipe>/custom/ folder, then prove the change with a full real-CI recipe sweep confirming every recipe's !testme still works and no custom test was silently dropped.

State files (machine-docs/, per the file-location rule): machine-docs/STATUS-cfold.md, BACKLOG-cfold.md, REVIEW-cfold.md, JOURNAL-cfold.md. DECISIONS.md shared.

1. Why this is safe (investigation already done, 2026-06-11)

The split carries ZERO semantic weight — verified:

  • runner/harness/discovery.py:103 custom_tests() globs subdirs = ("functional", "playwright") with NO branching on which folder.
  • runner/run_recipe_ci.py:579 run_custom() runs both in the same custom tier with the same pytest command.
  • Same fixtures for both (recipe/meta/live_app/op_state/deps); playwright tests just from playwright.sync_api import sync_playwright directly — no special fixture.
  • Both map to the functional rung (L4); folder name does NOT affect tier/rung/level.
  • Failure semantics identical. So merging loses nothing.

The ONE distinction that DOES matter and MUST be preserved: a top-level test_<op>.py is a lifecycle overlay, NOT a custom test (top-level non-lifecycle files are not discovered). custom/ is still a subdir, so that distinction survives.

2. Implementation (P1)

  1. Discovery: discovery.custom_tests() → canonical subdir is custom/. To prevent SILENT coverage loss, do NOT do a blind cutover: either (RECOMMENDED) keep recognizing functional//playwright/ as deprecated aliases AND emit a loud one-line warning when a test is found in a deprecated folder, OR have discovery raise/log loudly if a non-empty functional//playwright/ remains after migration. The end-state canonical home is custom/; nothing may be dropped without a loud signal. Decide + record in DECISIONS.md; the Adversary reviews the choice.
  2. Migrate cc-ci's own tests: git mv tests/<recipe>/{functional,playwright}/test_*.pytests/<recipe>/custom/ for EVERY recipe. Preserve any per-recipe conftest.py / helper modules those tests import (move/adjust imports as needed — mechanical only).
  3. repo-local (HC2-gated) tests: recipes' OWN repo tests/ may still use the old folder names. Keep discovery recognizing them (deprecated-alias path) OR document the rename requirement — do NOT silently stop discovering them. State the decision.
  4. Docs: update the placement rule everywhere — docs/recipe-customization.md §3
    • §5.3 + the tree, docs/testing.md §4, docs/enroll-recipe.md worked examples. Regenerate anything generated.
  5. Unit test: tests/unit/ coverage for custom_tests() — finds custom/, ignores top-level lifecycle overlays, (alias behavior if kept), deterministic ordering.
  6. Nothing else may key off the names: grep the whole repo for functional/ and playwright/ string literals (harness, bridge, dashboard, results, screenshot, drone pipeline) and fix every consumer. The screenshot SCREENSHOT hook + manifest must be unaffected.

3. Gates

M1 — Migration complete + coverage-preserving (pre-sweep). All recipes' custom tests relocated to custom/; discovery + docs + unit tests updated; full-repo grep shows no stale consumer. Coverage-diff proof (cardinal, mirrors rcust M1): the SET of discovered + executed custom tests per recipe is IDENTICAL before and after — same files, same count, just relocated; NONE dropped, NONE newly skipped. Adversary cold-verifies the diff from a clean checkout and confirms no consumer still keys off the old folder names and no test assertion was weakened.

M2 — Full recipe CI sweep (the operator-required proof). Run a real-CI sweep across ALL enrolled recipes via the drone !testme path confirming every recipe's custom tier still discovers + runs + passes its tests at the same level as its pre-cfold baseline. Build the baseline matrix (recipe → expected level + custom-test set) BEFORE the change. Then sweep: every recipe's !testme green, custom tests present in the run output / manifest, levels unchanged, zero leaked apps. Max 2-3 concurrent live deploys; canary suite green. Any deviation must be explained as cfold-neutral or fixed. Fresh Adversary PASS → Builder writes ## DONE.

4. Guardrails (binding)

  • No silent coverage loss — the whole point of M1's coverage diff. A custom test that stops being discovered without a loud signal is an automatic FAIL.
  • No test weakening — this is a pure relocation; assertions are untouchable. The only content changes allowed are import-path adjustments forced by the move (mechanical, Adversary-checked line-by-line).
  • File-location rule still applies to loop-state files (machine-docs/).
  • Real-CI etiquette: ≤2-3 concurrent deploys, teardown every dev deploy on every exit path, never git-checkout ~/.abra/recipes/<recipe> mid-build, no secrets in logs.
  • Recipe mirrors: PR only, never merge. Commit author autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>; push every commit. CI host: no python3 on default PATH (use cc-ci-run).

5. Definition of Done

All custom tests live under tests/<recipe>/custom/ (functional/playwright collapsed), discovery + docs + unit tests updated, no consumer keys off the old names, coverage proven identical before/after, and a full !testme recipe sweep is green with unchanged levels and zero leaks. M1 + M2 fresh Adversary PASSes.