21 KiB
STATUS — sub-phase rcust (recipe-customization restructure)
DONE
Phase complete 2026-06-11: M1 PASS (REVIEW-rcust.md 01f9f70, 2026-06-10) + M2 PASS (REVIEW-rcust.md
3245150, 2026-06-11) — both fresh, Adversary-verified, no standing VETO. Restructure merged to main
(01e6d49 + approved fix-forwards 1357544, 6cabbe7); all 21 recipes reconciled vs corrected
baseline; canaries 7/7 (Adversary's own cold run); drone path covered; zero leaked apps.
Non-rcust follow-ups filed in machine-docs/DEFERRED.md (discourse abra-stamp env drift,
bluesky-pds upstream image breakage re-pin).
Plan: /srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md (SSOT for this phase).
Reference spec: docs/recipe-customization.md @ 76a4b6b.
Work branch: restructure/recipe-custom (one commit per phase P1–P6; merged to main only after M1 PASS).
Phase progress
- P1 — single loader + key registry + migrate L1–L6 + unit tests + doc gen
(branch commit
472a68b) - P2 — delete legacy keys/paths: compose.ccci.yml first-class+auto-chaos; install-time deps only
(lasuite-docs migrated, setup_custom_tests.sh gone); SKIP_GENERIC meta deleted (env dev-only +
loud CI warning); conftest cleanup (deployed/deployed_app/app_domain gone, one
depsfixture) (branch commit8cd72fd) - P3 — uniform ctx hook convention: HookCtx(.domain/.base_url/.meta/.deps/.op); all hooks
take ctx; legacy signatures raise MetaError at load naming the migration (branch
fd02d9f) - P4 — custom-test ergonomics: placement rule (custom under functional/+playwright/ only),
op_state fixture, deps fixture tests (branch
29a28e2) - P5 — customization manifest: one block at run start (non-default meta keys, hooks, overlays,
custom-test counts, active CCCI_SKIP_GENERIC* env overrides with !! CI flag) printed +
embedded verbatim in results.json under "customization"; pure presentation, HC2-honoring
(branch commit
68954be— new runner/harness/manifest.py + tests/unit/test_manifest.py) - P6 — docs rewritten to the end state: recipe-customization.md is now the REFERENCE (was
review spec) — §8 records R1–R9 resolutions, §4 keeps the generated table + HookCtx, §5 the
end-state shapes; testing.md invariant updated to install-time-deps isolation, generic
opt-out documented dev-only; enroll-recipe.md worked examples (lasuite-docs install-time
OIDC, mumble post-F2-14c), deps fixture, ctx signatures (branch commit
da558ca) - Adversary inbox 19:06Z (P5 manifest dashboard hygiene) — addressed: secret-NAMED meta
values (top-level + nested dict keys) render as '' in manifest + results.json;
key names stay visible; unit-test pinned (branch commit
858e0f5)
P1–P6 verification facts (for the eventual M1 cold-verify)
- WHERE: branch
restructure/recipe-custom, P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2, P5=68954be, P6=da558ca, manifest-redaction fix=858e0f5 (branch head). - HOW:
cc-ci-run -m pytest tests/unit -qandnix develop .#lint --command scripts/lint.shfrom a clean checkout of the branch. - EXPECTED: 192 passed;
lint: PASS. - New single loader:
runner/harness/meta.py::load(); all-recipes typo gate + R2 proof intests/unit/test_meta.py; docs §4 table generated byscripts/gen-meta-docs.py(sync pinned by unit test).
M2 baseline matrix (built BEFORE merge, per plan M2.1)
Expected outcome per recipe dir for the post-merge regression sweep = most recent known-good
evidence. Levels are results.json level; evidence = run id under /var/lib/cc-ci-runs//
(on cc-ci) unless noted. Bad canaries are EXPECTED to fail at their designed tier.
| Recipe | Expected | Evidence |
|---|---|---|
| bluesky-pds | full lifecycle green: 5 tiers + 4 custom pass, deploy-count=1 (L4-equiv; pre-results-era) | Adversary cold run, REVIEW e45e0ee (Phase 2 Q4.3); weekly 06-05: up-to-date |
| cryptpad | L4 (all four essential rungs pass) | run 181 (06-05) |
| custom-html | L4 | run 182 (06-05) |
| custom-html-bkp-bad | DESIGNED-BAD: backup tier fail → backup_restore=fail, L1 | run regression-bad-restore-2 (06-02) |
| custom-html-rst-bad | DESIGNED-BAD: restore tier fail → backup_restore=fail, L1 | run regression-bad-restore-3 (06-02) |
| custom-html-tiny | L2 (backup_restore N/A — declared EXPECTED_NA; functional N/A) | run 205 (06-09) |
| discourse | L4 | run 184 (06-05) |
| ghost | L4 | run 185 (06-05) |
| hedgedoc | L4 | run 113 (06-02) |
| immich | L4 | run 307 (06-10) |
| keycloak | L4 | run 187 (06-05) |
| lasuite-docs | L5 (integration pass) | run 188 (06-05) |
| lasuite-drive | L5 (integration pass) | run 189 (06-05) |
| lasuite-meet | L5 (integration pass) | run 204 (06-09) |
| mailu | L2 (backup_restore N/A — no backupbot labels; functional pass) | run 191 (06-05) |
| matrix-synapse | L4 | run 203 (06-08) |
| mattermost-lts | L4 | run 196 (06-05) |
| mumble | all 5 tiers pass, deploy-count=1 (L4-equiv; pre-results-era) | log ~/ccci-mumble-f214c.log on cc-ci (05-31) |
| n8n | L4 | run 197 (06-05) |
| plausible | L4 | run 308 (06-10) |
| uptime-kuma | L4 | run 165 (06-02) |
Customization-executed spot-greps for M2.4 (mumble READY_PROBE tcp lines, cryptpad SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + chaos base, lasuite-* deps provisioning + OIDC skip-count 0, immich ops.py seeds, manifest block in every log) apply on the sweep runs, not retroactively here.
Gate
Gate: M2 CLAIMED 2026-06-11 ~01:30Z, awaiting Adversary.
M2 claim — WHAT / HOW / EXPECTED / WHERE
WHAT: plan M2.0–M2.4 complete on merged main. Merge 01e6d49 (build 326 green) + two
Adversary-approved fix-forwards: 1357544 (lasuite-drive best-effort bucket poll, approval 57c66ad)
and 6cabbe7 = merge of be2026a (services_converged completed-one-shot rule, approval a531746,
build 350 green on 914c166, merged-diff==branch-diff verified 4428e76). Canaries 7/7. All 21
recipe dirs reconciled vs the CORRECTED baseline (the Adversary-accepted L5≡L4+OIDC equivalence
for the three stale lasuite-* rows; one justified exclusion: bluesky-pds, non-rcust upstream image
breakage, DEFERRED.md). Drone→harness path covered (2 PR !testme runs green). Zero leaked apps.
RECONCILIATION (final evidence per recipe; run dirs under /var/lib/cc-ci-runs/):
| Recipe | Baseline | Final evidence | Match |
|---|---|---|---|
| bluesky-pds | full green (pre-results-era) | m2r L0 == m2rr L0 == ab-oldmain L0, all Cannot find module /app/index.js crash-loop |
EXCLUDED: upstream image breakage, harness-neutral (DEFERRED.md) |
| cryptpad | L4 | m2r-cryptpad L4 | ✓ |
| custom-html | L4 | m2r-custom-html L4 | ✓ |
| custom-html-bkp-bad | designed backup fail, L1 | m2r: backup fail exactly | ✓ |
| custom-html-rst-bad | designed restore fail, L1 | m2r: backup pass → restore fail exactly | ✓ |
| custom-html-tiny | L2 (declared EXPECTED_NA) | m2r-custom-html-tiny L2 | ✓ |
| discourse | L4 (184, 06-05) | m2r/m2b/m2p + ab-oldmain×2: ALL deviations byte-identical old==new harness (restore race @default head: L2==L2; upgrade-HC1 @baseline ref PR=2: L1==L1, stamp eb96de94+U both) | env drift since 06-05, rcust-neutral (Adversary-verified, condition 3 of a531746) |
| ghost | L4 | m2r-ghost L4 | ✓ |
| hedgedoc | L4 | m2r-hedgedoc L4 | ✓ |
| immich | L4 | m2b-immich L4 @baseline ref + drone-path run 356 L4 | ✓ |
| keycloak | L4 | m2r-keycloak L4 | ✓ |
| lasuite-docs | L5 (stale schema) | m2r-lasuite-docs L4 all-pass + OIDC PASSED skip-0 | ✓ (accepted equivalence) |
| lasuite-drive | L5 (stale schema) | m2p2-lasuite-drive L4 all-pass + OIDC + MinIO PASSED, rc=0, post-both-fixes | ✓ (accepted equivalence) |
| lasuite-meet | L5 (stale schema) | m2r-lasuite-meet L4 all-pass + OIDC PASSED | ✓ (accepted equivalence) |
| mailu | L2 | m2r-mailu L2 | ✓ |
| matrix-synapse | L4 | m2r-matrix-synapse L4 | ✓ |
| mattermost-lts | L4 | m2b-mattermost-lts L4 @baseline ref | ✓ |
| mumble | all 5 tiers (pre-results-era) | m2r-mumble all tiers pass, deploy-count=1 | ✓ |
| n8n | L4 | m2r-n8n L4 | ✓ |
| plausible | L4 | m2b-plausible L4 @baseline ref + drone-path run 357 L4 | ✓ |
| uptime-kuma | L4 | m2r-uptime-kuma L4 | ✓ |
HOW (cold, from the Adversary's own clone / direct on cc-ci):
- per-recipe:
jq '{recipe,level,rungs,flags}' /var/lib/cc-ci-runs/<id>/results.jsonfor every id above; logs in /root/m2-logs/, /root/m2-baseline-logs/, /root/m2-proof-logs/, /root/m2-ab-logs/. - canaries: /root/m2-canary.log (7/7, fresh clone of merged main).
- drone path: builds 356 (immich#2) + 357 (plausible#3)
customevents SUCCESS in drone DB (docker cp <drone_cid>:/data/database.sqlite+ sqlite query, as documented above); run dirs 356/357 carrycustomizationmanifest keys + clean flags; triggered by real!testmecomments (gitea comment ids 14317/14318). - M2.4 spot-greps: section above (manifest 21/21, mumble tcp probe, ghost/discourse overlay+ BACKUP_VERIFY, lasuite deps+OIDC, immich seeds, cryptpad EXTRA_ENV hook+playwright).
- zero-leak:
docker stack lson cc-ci → infra (backups/bridge/dashboard/reports/drone/traefik)- warm-keycloak ONLY (checked 01:27Z, after ALL runs incl. drone-path).
- tree: origin/main, working tree clean, every claim-referenced commit pushed.
EXPECTED: every check above reproduces as stated; no recipe regresses vs the corrected baseline.
WHERE: origin/main @ (this commit); REVIEW-rcust.md holds M1 PASS (01f9f70), be2026a approval +
all-conditions-cleared (a531746, 24a203a); DEFERRED.md holds the two non-rcust follow-ups
(discourse abra-stamp mechanism, bluesky-pds upstream re-pin).
Gate history: M2 IN PROGRESS — M1 PASS in REVIEW-rcust.md (01f9f70, 2026-06-10).
- M2.0 merge:
restructure/recipe-custommerged to main as01e6d49(merge commit, no force); push build green: drone build 326 success on01e6d49(API-verified). - M2.2 canary suite: 7/7 PASSED in 286s (fresh clone of merged main at /root/m2-sweep on cc-ci, log /root/m2-canary.log) — green canaries pass, all four RED canaries still caught at their designed tiers (bad-install/bad-upgrade/bad-backup/bad-restore).
- M2.3 per-recipe sweep (driver /root/m2-driver.sh, 2 concurrent, REF = mirror heads; logs
/root/m2-logs/.log; results /var/lib/cc-ci-runs/m2r-/): first pass 15/21 matched
baseline —
hedgedoc/custom-html/custom-html-tiny/uptime-kuma/n8n/cryptpad/ghost/keycloak/mumble/mailu/
matrix-synapse/lasuite-docs/lasuite-meet at baseline level; both DESIGNED-BAD canaries failed
at exactly their designed tier (bkp-bad: backup fail; rst-bad: backup pass→restore fail).
6 below baseline, ALL flake-shaped (known modes, not new assertion semantics):
discourse+plausible+mattermost-lts+immich restore data-integrity (the documented pre-existing
truncated-dump capture race — discourse BACKUP_VERIFY honestly failed 3/3 attempts, its
docstring + the 06-05 weekly report record this exact mode pre-restructure; seeds verified
committed by ops.py read-back asserts, i.e. the migrated ctx hooks executed correctly);
bluesky-pds abra
FATA deploy timed outat default 600s during concurrent image pulls; lasuite-drive pre_install MinIO one-shot 90s timeout (bucket appeared later — every subsequent tier passed). Serial re-runs (MAX=1, /root/m2-rerun.sh, logs /root/m2-rerun-logs/, results m2rr-/) completed 20:44Z — but ran default heads, not baseline refs (superseded by the targeted runs below). - M2.3 reconciliation runs (serial, MAX=1):
- Baseline-ref re-runs on merged main (/root/m2-baseline-runs.sh, logs /root/m2-baseline-logs/,
results m2b-/): plausible L4, mattermost-lts L4, immich L4 at their exact baseline refs —
baseline REPRODUCED on the restructured harness; restore-race cluster closed for those three.
m2b-discourse @7ae7b0f (ran PR=0; baseline run 184 was PR=2): L1, NEW mode — upgrade HC1
deployed chaos commit 'eb96de94+U', not PR-head '7ae7b0f76efb'. Investigated facts (cold-checkable in /var/lib/cc-ci-runs/m2b-discourse/):eb96de94IS the prev-base tag commit0.7.0+3.3.1(git -C .../abra/recipes/discourse rev-list -n1 0.7.0+3.3.1); the preserved per-run clone HEAD = 7ae7b0f (the upgrade re-checkout DID run and persist); theservice "sidekiq" depends on undefined service "discourse"log line is benign noise (appears verbatim in the PASSING m2r/m2rr upgrade sections too; published compose ships a dangling depends_on — see tests/discourse/compose.ccci.yml NOTE). So the chaos redeploy itself left the base stamp in place at this ref. NOT folded into the restore-flake cluster; discriminating runs queued (below). - Old-main A/B at the m2r ref (/root/m2-ab.sh, /root/m2-ab-logs/, results ab--oldmain/): discourse @7d53d4ec on OLD main = L2 restore fail == new-main m2r L2 at the same ref → restore race harness-neutral at that ref. bluesky-pds @b2d86ef on OLD main = L0 install fail.
- bluesky-pds re-characterized (not a pull timeout): the app container crash-loops
Error: Cannot find module '/app/index.js'(MODULE_NOT_FOUND, Node v24.15.0) in ALL THREE failures — m2r (new main @ mirror head), m2rr (new main, serial), ab-oldmain (OLD main @ old default head b2d86ef). Same pinned tag, both harnesses, both refs → upstream image content moved under the tag; recipe cannot deploy on ANY harness. Evidence:grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/default/. Restructure-neutral (old==new L0).
- Baseline-ref re-runs on merged main (/root/m2-baseline-runs.sh, logs /root/m2-baseline-logs/,
results m2b-/): plausible L4, mattermost-lts L4, immich L4 at their exact baseline refs —
baseline REPRODUCED on the restructured harness; restore-race cluster closed for those three.
m2b-discourse @7ae7b0f (ran PR=0; baseline run 184 was PR=2): L1, NEW mode — upgrade HC1
- M2.3 in-flight proof runs (serial queue /root/m2-proof.sh + /root/m2-proof2.sh, logs
/root/m2-proof-logs/, driver /root/m2-proof-logs/driver.log):
- lasuite-drive @baseline ref ffa7d585afa2 PR=1 on merged main @5c0676b (post-fix-forward
1357544) → run id m2p-lasuite-drive: WILL LAND L0 — second P2b regression found via this run, root-caused LIVE. The1357544best-effort path WORKED (!!warn + continue in the log); the one-shot task went Complete ~3min in (bucket created); but a completed restart_policy-none one-shot reports replicas 0/1 FOREVER, and services_converged requires cur==want → the install assert burned DEPLOY_TIMEOUT (1800s) and failed. Old world never saw this: setup_custom_tests.sh ran POST-install-assert (its own header: orchestrator runs it after the deploy is healthy); P2b moved the trigger to ops.py pre_install = PRE-assert. Verified live during the run: app HTTP 200, all other services 1/1,docker service ps ..._minio-createbuckets= Complete, pytest in converge loop 27+ min. Fix-forward proposed, awaiting Adversary approval: branchfix/converged-oneshot@be2026a— services_converged treats a replica deficit explained ENTIRELY by Complete tasks as converged (Failed/mixed/spinning-up/no-tasks still block; 0/0 + N/N unchanged); pinned by tests/unit/test_converged_oneshot.py (7 cases). Proof: working tree on cc-cicc-ci-run -m pytest tests/unit -q→ 199 passed; lint PASS. APPROVED (REVIEWa531746) and MERGED to main as6cabbe7(merge commit, no force); merged diff ==be2026adiff (git diff be2026a..main -- runner/harness/lifecycle.py tests/unit/test_converged_oneshot.py= empty). Push build green: drone build 350 success on914c166(branch head incl. the merge; verify on cc-ci:docker cp <drone_cid>:/data/database.sqlite /tmp/d.sqlite && sqlite3 /tmp/d.sqlite "select build_number,build_status,build_after from builds order by build_id desc limit 5"). Post-fix re-run QUEUED: /root/m2-proof3.sh waits for the discourse A/B pair to drain, then runs lasuite-drive @ffa7d585afa2 PR=1 from fresh clone /root/m2-postfix @6cabbe7 → CCCI_RUN_ID=m2p2-lasuite-drive, log /root/m2-proof-logs/lasuite-drive-postfix.log. EXPECTED L5 (binding condition 1 of the approval). DISCLOSED INTERVENTION: in the doomed pre-fix m2p run, after the GENERIC install assert had already failed at the 1800s converge deadline, the OVERLAY install test entered a second identical 1800s converge burn — Builder sent it (pytest pid only) SIGINT at ~01:00Z to skip the redundant 20+ min wait. The log therefore showsKeyboardInterruptat generic.py:97 (the converge poll — the exact diagnosed line). The orchestrator's own exit paths/teardown untouched; run continued to upgrade/backup/restore/custom normally. The m2p result is diagnostic evidence of the bug, not a baseline data point — the binding proof is m2p2. - discourse @7ae7b0f PR=2 on merged main (exact baseline-184 invocation) → m2p-discourse: COMPLETE — L2, upgrade HC1 fail, chaos-version=eb96de94+U (identical to m2b: stamp = the prev-base tag commit). Deterministic at this ref on new main; NOT a PR=0 artifact, NOT a race. install/backup/restore/custom all pass.
- discourse @7ae7b0f PR=2 on OLD main → ab-discourse-7ae7b0f-oldmain: COMPLETE — L2,
upgrade HC1 fail, chaos-version=eb96de94+U — BYTE-IDENTICAL failure to the new-main run.
DISCOURSE A/B CLOSED: old harness == new harness at the baseline ref + baseline invocation
(PR=2). The upgrade-HC1 mode is HARNESS-NEUTRAL — not an rcust regression. Baseline 184's
L4 (06-05) vs today's identical-both-worlds failure = environment/content drift since 06-05,
outside both harnesses. Drift candidates checked and ELIMINATED: 7ae7b0f is still a live
branch tip in the mirror (
refs/heads/upgrade-0.8.0+3.5.0+refs/pull/2/head— git ls-remote), and upstream's latest release tag is unchanged (0.7.0+3.3.1 = eb96de94, no new tag since 06-05). flake.lock (abra pin) identical in both worlds. HC1 firing rather than false-greening is the guard working as designed. Cold-verify: results.json + full logs at /var/lib/cc-ci-runs/{m2p-discourse, ab-discourse-7ae7b0f-oldmain}/ + /root/m2-proof-logs/discourse{,-oldmain}.log. - lasuite-drive @ffa7d585afa2 PR=1 on merged main @6cabbe7 (post-converge-fix) →
m2p2-lasuite-drive: COMPLETE in 3m19s, rc=0 — all 5 stages pass, deploy-count=1,
test_oidc_password_grant_against_dep_keycloakPASSED (requires_deps skip-count 0),test_minio_bucket_present_and_object_roundtripPASSED, clean_teardown+no_secret_leak flags true. NO converge burn: the one-shot again exceeded its 90s window (!!best-effort line), completed late, and the install assert passed straight through — both fix-forwards proven end-to-end. results.jsonlevel=4, NOT 5 — see schema note below.
- lasuite-drive @baseline ref ffa7d585afa2 PR=1 on merged main @5c0676b (post-fix-forward
- BASELINE SCHEMA NOTE (affects lasuite-docs/-drive/-meet expected "L5"): the 6-rung ladder
(L5 integration / L6 recipe-local) was REMOVED from main by the deliberate mainline refactor
46e2cdb+c51cd84("four essential rungs only — integration & recipe-local are optional", PR #6, 2026-06-09 ~03:00Z) — BEFORE the rcust merge and NOT part of it (merge diff 01e6d49^1..01e6d49 touches level.py not at all and results.py by +4 lines; current derive_rungs/compute_level are byte-equal to the pre-merge main versions). Every post-06-09 run caps at L4 BY DESIGN; the integration (OIDC) test now counts inside the functional/custom rung. Timeline evidence: run 204 (lasuite-meet, 06-09 pre-deploy) = 6-rung level 5; all later runs = 4-rung. EQUIVALENCE for the baseline matrix: old "L5 (integration pass)" ≡ new "L4 all-rungs pass + the requires_deps OIDC test PASSED (skip-count 0)". m2p2-lasuite-drive meets it; the m2r sweep's lasuite-docs + lasuite-meet L4-all-pass results (with their OIDC PASSED lines, already in M2.4 spot-greps) meet it identically. - M2.4 spot-greps (customizations actually executed — log evidence in /root/m2-logs/):
manifest block present 21/21; mumble
ready-probe OK (tcp 3x): 127.0.0.1:64738; ghost+discourseccci-overlay: provided compose.ccci.yml ... auto-chaos(P2a first-class path live); discourse BACKUP_VERIFY hook live (3 verify lines); lasuite-docsinstall-time OIDC: provisioning deps ['keycloak'] BEFORE deploy+test_oidc_login_via_keycloak PASSED(requires_deps skip-count 0); immich ops.py pre_upgrade/pre_backup/pre_restore seed lines; cryptpad EXTRA_ENV='' in manifest + its 4 overlays + playwright green (hook applied); 19 screenshot.png across m2r-* dirs. - Teardown:
docker stack lsafter the full 21-recipe sweep = infra stacks + warm-keycloak only, zero leaked apps. - Drone→harness path: !testme on two open recipe PRs pending after the re-runs.
Gate history: M1 CLAIMED 2026-06-10 → PASS (branch head 858e0f5)
- WHAT: P1–P6 complete on branch
restructure/recipe-custom(P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2, P5=68954be, P6=da558ca, +858e0f5 manifest redaction). Working tree clean, all pushed. - HOW (cold, from a fresh clone of the branch):
cc-ci-run -m pytest tests/unit -q→ EXPECTED: 192 passedcc-ci-run -m pytest tests/concurrency -q→ EXPECTED: 23 passed (untouched by this plan; Builder proof run 2026-06-10 on branch head: 23 passed in 11.46s)nix develop .#lint --command scripts/lint.sh→ EXPECTED: lint: PASS- resolved-customization diff old-vs-new for all 21 recipe dirs (Adversary's own script) → EXPECTED: 0 deltas
- adversarial review of the full diff
main..restructure/recipe-custom
- WHERE: origin branch
restructure/recipe-custom@ 858e0f5; baseline matrix above (M2 prep, committed pre-merge per plan).
Current
M2 CLAIMED (see Gate above) — awaiting Adversary cold-verify. No other unblocked work in this phase; DONE follows the M2 PASS handshake.