# STATUS — sub-phase rcust (recipe-customization restructure) Plan: /srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md (SSOT for this phase). Reference spec: docs/recipe-customization.md @ 76a4b6b. Work branch: `restructure/recipe-custom` (one commit per phase P1–P6; merged to main only after M1 PASS). ## Phase progress - [x] P1 — single loader + key registry + migrate L1–L6 + unit tests + doc gen (branch commit 472a68b) - [x] P2 — delete legacy keys/paths: compose.ccci.yml first-class+auto-chaos; install-time deps only (lasuite-docs migrated, setup_custom_tests.sh gone); SKIP_GENERIC meta deleted (env dev-only + loud CI warning); conftest cleanup (deployed/deployed_app/app_domain gone, one `deps` fixture) (branch commit 8cd72fd) - [x] P3 — uniform ctx hook convention: HookCtx(.domain/.base_url/.meta/.deps/.op); all hooks take ctx; legacy signatures raise MetaError at load naming the migration (branch fd02d9f) - [x] P4 — custom-test ergonomics: placement rule (custom under functional/+playwright/ only), op_state fixture, deps fixture tests (branch 29a28e2) - [x] P5 — customization manifest: one block at run start (non-default meta keys, hooks, overlays, custom-test counts, active CCCI_SKIP_GENERIC* env overrides with !! CI flag) printed + embedded verbatim in results.json under "customization"; pure presentation, HC2-honoring (branch commit 68954be — new runner/harness/manifest.py + tests/unit/test_manifest.py) - [x] P6 — docs rewritten to the end state: recipe-customization.md is now the REFERENCE (was review spec) — §8 records R1–R9 resolutions, §4 keeps the generated table + HookCtx, §5 the end-state shapes; testing.md invariant updated to install-time-deps isolation, generic opt-out documented dev-only; enroll-recipe.md worked examples (lasuite-docs install-time OIDC, mumble post-F2-14c), deps fixture, ctx signatures (branch commit da558ca) - [x] Adversary inbox 19:06Z (P5 manifest dashboard hygiene) — addressed: secret-NAMED meta values (top-level + nested dict keys) render as '' in manifest + results.json; key names stay visible; unit-test pinned (branch commit 858e0f5) ## P1–P6 verification facts (for the eventual M1 cold-verify) - WHERE: branch `restructure/recipe-custom`, P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2, P5=68954be, P6=da558ca, manifest-redaction fix=858e0f5 (branch head). - HOW: `cc-ci-run -m pytest tests/unit -q` and `nix develop .#lint --command scripts/lint.sh` from a clean checkout of the branch. - EXPECTED: 192 passed; `lint: PASS`. - New single loader: `runner/harness/meta.py::load()`; all-recipes typo gate + R2 proof in `tests/unit/test_meta.py`; docs §4 table generated by `scripts/gen-meta-docs.py` (sync pinned by unit test). ## M2 baseline matrix (built BEFORE merge, per plan M2.1) Expected outcome per recipe dir for the post-merge regression sweep = most recent known-good evidence. Levels are results.json `level`; evidence = run id under /var/lib/cc-ci-runs// (on cc-ci) unless noted. Bad canaries are EXPECTED to fail at their designed tier. | Recipe | Expected | Evidence | |---|---|---| | bluesky-pds | full lifecycle green: 5 tiers + 4 custom pass, deploy-count=1 (L4-equiv; pre-results-era) | Adversary cold run, REVIEW e45e0ee (Phase 2 Q4.3); weekly 06-05: up-to-date | | cryptpad | L4 (all four essential rungs pass) | run 181 (06-05) | | custom-html | L4 | run 182 (06-05) | | custom-html-bkp-bad | DESIGNED-BAD: backup tier fail → backup_restore=fail, L1 | run regression-bad-restore-2 (06-02) | | custom-html-rst-bad | DESIGNED-BAD: restore tier fail → backup_restore=fail, L1 | run regression-bad-restore-3 (06-02) | | custom-html-tiny | L2 (backup_restore N/A — declared EXPECTED_NA; functional N/A) | run 205 (06-09) | | discourse | L4 | run 184 (06-05) | | ghost | L4 | run 185 (06-05) | | hedgedoc | L4 | run 113 (06-02) | | immich | L4 | run 307 (06-10) | | keycloak | L4 | run 187 (06-05) | | lasuite-docs | L5 (integration pass) | run 188 (06-05) | | lasuite-drive | L5 (integration pass) | run 189 (06-05) | | lasuite-meet | L5 (integration pass) | run 204 (06-09) | | mailu | L2 (backup_restore N/A — no backupbot labels; functional pass) | run 191 (06-05) | | matrix-synapse | L4 | run 203 (06-08) | | mattermost-lts | L4 | run 196 (06-05) | | mumble | all 5 tiers pass, deploy-count=1 (L4-equiv; pre-results-era) | log ~/ccci-mumble-f214c.log on cc-ci (05-31) | | n8n | L4 | run 197 (06-05) | | plausible | L4 | run 308 (06-10) | | uptime-kuma | L4 | run 165 (06-02) | Customization-executed spot-greps for M2.4 (mumble READY_PROBE tcp lines, cryptpad SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + chaos base, lasuite-* deps provisioning + OIDC skip-count 0, immich ops.py seeds, manifest block in every log) apply on the sweep runs, not retroactively here. ## Gate **Gate: M2 IN PROGRESS** — M1 PASS in REVIEW-rcust.md (01f9f70, 2026-06-10). - M2.0 merge: `restructure/recipe-custom` merged to main as 01e6d49 (merge commit, no force); push build green: drone build **326 success** on 01e6d49 (API-verified). - M2.2 canary suite: **7/7 PASSED** in 286s (fresh clone of merged main at /root/m2-sweep on cc-ci, log /root/m2-canary.log) — green canaries pass, all four RED canaries still caught at their designed tiers (bad-install/bad-upgrade/bad-backup/bad-restore). - M2.3 per-recipe sweep (driver /root/m2-driver.sh, 2 concurrent, REF = mirror heads; logs /root/m2-logs/.log; results /var/lib/cc-ci-runs/m2r-/): first pass **15/21 matched baseline** — hedgedoc/custom-html/custom-html-tiny/uptime-kuma/n8n/cryptpad/ghost/keycloak/mumble/mailu/ matrix-synapse/lasuite-docs/lasuite-meet at baseline level; both DESIGNED-BAD canaries failed at exactly their designed tier (bkp-bad: backup fail; rst-bad: backup pass→restore fail). 6 below baseline, ALL flake-shaped (known modes, not new assertion semantics): discourse+plausible+mattermost-lts+immich restore data-integrity (the documented pre-existing truncated-dump capture race — discourse BACKUP_VERIFY honestly failed 3/3 attempts, its docstring + the 06-05 weekly report record this exact mode pre-restructure; seeds verified committed by ops.py read-back asserts, i.e. the migrated ctx hooks executed correctly); bluesky-pds abra `FATA deploy timed out` at default 600s during concurrent image pulls; lasuite-drive pre_install MinIO one-shot 90s timeout (bucket appeared later — every subsequent tier passed). Serial re-runs (MAX=1, /root/m2-rerun.sh, logs /root/m2-rerun-logs/, results m2rr-/) completed 20:44Z — but ran default heads, not baseline refs (superseded by the targeted runs below). - M2.3 reconciliation runs (serial, MAX=1): - **Baseline-ref re-runs on merged main** (/root/m2-baseline-runs.sh, logs /root/m2-baseline-logs/, results m2b-/): **plausible L4, mattermost-lts L4, immich L4** at their exact baseline refs — baseline REPRODUCED on the restructured harness; restore-race cluster closed for those three. m2b-discourse @7ae7b0f (ran PR=0; baseline run 184 was PR=2): **L1, NEW mode** — upgrade HC1 `deployed chaos commit 'eb96de94+U', not PR-head '7ae7b0f76efb'`. Investigated facts (cold-checkable in /var/lib/cc-ci-runs/m2b-discourse/): `eb96de94` IS the prev-base tag commit `0.7.0+3.3.1` (`git -C .../abra/recipes/discourse rev-list -n1 0.7.0+3.3.1`); the preserved per-run clone HEAD = 7ae7b0f (the upgrade re-checkout DID run and persist); the `service "sidekiq" depends on undefined service "discourse"` log line is benign noise (appears verbatim in the PASSING m2r/m2rr upgrade sections too; published compose ships a dangling depends_on — see tests/discourse/compose.ccci.yml NOTE). So the chaos redeploy itself left the base stamp in place at this ref. NOT folded into the restore-flake cluster; discriminating runs queued (below). - **Old-main A/B at the m2r ref** (/root/m2-ab.sh, /root/m2-ab-logs/, results ab--oldmain/): discourse @7d53d4ec on OLD main = **L2 restore fail** == new-main m2r L2 at the same ref → restore race harness-neutral at that ref. bluesky-pds @b2d86ef on OLD main = **L0 install fail**. - **bluesky-pds re-characterized (not a pull timeout)**: the app container crash-loops `Error: Cannot find module '/app/index.js'` (MODULE_NOT_FOUND, Node v24.15.0) in ALL THREE failures — m2r (new main @ mirror head), m2rr (new main, serial), ab-oldmain (OLD main @ old default head b2d86ef). Same pinned tag, both harnesses, both refs → upstream image content moved under the tag; recipe cannot deploy on ANY harness. Evidence: `grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/default/`. Restructure-neutral (old==new L0). - M2.3 in-flight proof runs (serial queue /root/m2-proof.sh + /root/m2-proof2.sh, logs /root/m2-proof-logs/, driver /root/m2-proof-logs/driver.log): 1. **lasuite-drive @baseline ref ffa7d585afa2 PR=1 on merged main @5c0676b** (post-fix-forward 1357544) → run id m2p-lasuite-drive: **WILL LAND L0 — second P2b regression found via this run, root-caused LIVE.** The 1357544 best-effort path WORKED (`!!` warn + continue in the log); the one-shot task went **Complete** ~3min in (bucket created); but a completed restart_policy-none one-shot reports replicas 0/1 FOREVER, and services_converged requires cur==want → the install assert burned DEPLOY_TIMEOUT (1800s) and failed. Old world never saw this: setup_custom_tests.sh ran POST-install-assert (its own header: orchestrator runs it after the deploy is healthy); P2b moved the trigger to ops.py pre_install = PRE-assert. Verified live during the run: app HTTP 200, all other services 1/1, `docker service ps ..._minio-createbuckets` = Complete, pytest in converge loop 27+ min. **Fix-forward proposed, awaiting Adversary approval: branch `fix/converged-oneshot` @ be2026a** — services_converged treats a replica deficit explained ENTIRELY by Complete tasks as converged (Failed/mixed/spinning-up/no-tasks still block; 0/0 + N/N unchanged); pinned by tests/unit/test_converged_oneshot.py (7 cases). Proof: working tree on cc-ci `cc-ci-run -m pytest tests/unit -q` → 199 passed; lint PASS. **APPROVED (REVIEW a531746) and MERGED to main as 6cabbe7** (merge commit, no force); merged diff == be2026a diff (`git diff be2026a..main -- runner/harness/lifecycle.py tests/unit/test_converged_oneshot.py` = empty). Push build green: drone build **350 success** on 914c166 (branch head incl. the merge; verify on cc-ci: `docker cp :/data/database.sqlite /tmp/d.sqlite && sqlite3 /tmp/d.sqlite "select build_number,build_status,build_after from builds order by build_id desc limit 5"`). Post-fix re-run QUEUED: /root/m2-proof3.sh waits for the discourse A/B pair to drain, then runs lasuite-drive @ffa7d585afa2 PR=1 from fresh clone /root/m2-postfix @6cabbe7 → CCCI_RUN_ID=m2p2-lasuite-drive, log /root/m2-proof-logs/lasuite-drive-postfix.log. EXPECTED **L5** (binding condition 1 of the approval). DISCLOSED INTERVENTION: in the doomed pre-fix m2p run, after the GENERIC install assert had already failed at the 1800s converge deadline, the OVERLAY install test entered a second identical 1800s converge burn — Builder sent it (pytest pid only) SIGINT at ~01:00Z to skip the redundant 20+ min wait. The log therefore shows `KeyboardInterrupt` at generic.py:97 (the converge poll — the exact diagnosed line). The orchestrator's own exit paths/teardown untouched; run continued to upgrade/backup/restore/custom normally. The m2p result is diagnostic evidence of the bug, not a baseline data point — the binding proof is m2p2. 2. **discourse @7ae7b0f PR=2 on merged main** (exact baseline-184 invocation) → m2p-discourse; discriminates PR=0-artifact/race vs deterministic-at-ref. Unaffected by the one-shot issue. 3. **discourse @7ae7b0f PR=2 on OLD main** (/root/m2-oldmain) → ab-discourse-7ae7b0f-oldmain; completes the same-ref A/B the upgrade-HC1 mode is missing. - M2.4 spot-greps (customizations actually executed — log evidence in /root/m2-logs/): manifest block present 21/21; mumble `ready-probe OK (tcp 3x): 127.0.0.1:64738`; ghost+discourse `ccci-overlay: provided compose.ccci.yml ... auto-chaos` (P2a first-class path live); discourse BACKUP_VERIFY hook live (3 verify lines); lasuite-docs `install-time OIDC: provisioning deps ['keycloak'] BEFORE deploy` + `test_oidc_login_via_keycloak PASSED` (requires_deps skip-count 0); immich ops.py pre_upgrade/pre_backup/pre_restore seed lines; cryptpad EXTRA_ENV='' in manifest + its 4 overlays + playwright green (hook applied); 19 screenshot.png across m2r-* dirs. - Teardown: `docker stack ls` after the full 21-recipe sweep = infra stacks + warm-keycloak only, **zero leaked apps**. - Drone→harness path: !testme on two open recipe PRs pending after the re-runs. **Gate history: M1 CLAIMED 2026-06-10 → PASS** (branch head 858e0f5) - WHAT: P1–P6 complete on branch `restructure/recipe-custom` (P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2, P5=68954be, P6=da558ca, +858e0f5 manifest redaction). Working tree clean, all pushed. - HOW (cold, from a fresh clone of the branch): - `cc-ci-run -m pytest tests/unit -q` → EXPECTED: **192 passed** - `cc-ci-run -m pytest tests/concurrency -q` → EXPECTED: **23 passed** (untouched by this plan; Builder proof run 2026-06-10 on branch head: 23 passed in 11.46s) - `nix develop .#lint --command scripts/lint.sh` → EXPECTED: **lint: PASS** - resolved-customization diff old-vs-new for all 21 recipe dirs (Adversary's own script) → EXPECTED: 0 deltas - adversarial review of the full diff `main..restructure/recipe-custom` - WHERE: origin branch `restructure/recipe-custom` @ 858e0f5; baseline matrix above (M2 prep, committed pre-merge per plan). ## Current M2 in progress: merge done (01e6d49, build 326 green); canary suite running on cc-ci; 21-recipe sweep queued behind it. Evidence lands here as steps complete.