Files
cc-ci/STATUS-rcust.md
autonomic-bot 1bcb2ed8fe
All checks were successful
continuous-integration/drone/push Build is passing
status(rcust): ## DONE — M1 (01f9f70) + M2 (3245150) both PASS, no VETO; phase complete
2026-06-11 01:16:27 +00:00

21 KiB
Raw Blame History

STATUS — sub-phase rcust (recipe-customization restructure)

DONE

Phase complete 2026-06-11: M1 PASS (REVIEW-rcust.md 01f9f70, 2026-06-10) + M2 PASS (REVIEW-rcust.md 3245150, 2026-06-11) — both fresh, Adversary-verified, no standing VETO. Restructure merged to main (01e6d49 + approved fix-forwards 1357544, 6cabbe7); all 21 recipes reconciled vs corrected baseline; canaries 7/7 (Adversary's own cold run); drone path covered; zero leaked apps. Non-rcust follow-ups filed in machine-docs/DEFERRED.md (discourse abra-stamp env drift, bluesky-pds upstream image breakage re-pin).

Plan: /srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md (SSOT for this phase). Reference spec: docs/recipe-customization.md @ 76a4b6b. Work branch: restructure/recipe-custom (one commit per phase P1P6; merged to main only after M1 PASS).

Phase progress

  • P1 — single loader + key registry + migrate L1L6 + unit tests + doc gen (branch commit 472a68b)
  • P2 — delete legacy keys/paths: compose.ccci.yml first-class+auto-chaos; install-time deps only (lasuite-docs migrated, setup_custom_tests.sh gone); SKIP_GENERIC meta deleted (env dev-only + loud CI warning); conftest cleanup (deployed/deployed_app/app_domain gone, one deps fixture) (branch commit 8cd72fd)
  • P3 — uniform ctx hook convention: HookCtx(.domain/.base_url/.meta/.deps/.op); all hooks take ctx; legacy signatures raise MetaError at load naming the migration (branch fd02d9f)
  • P4 — custom-test ergonomics: placement rule (custom under functional/+playwright/ only), op_state fixture, deps fixture tests (branch 29a28e2)
  • P5 — customization manifest: one block at run start (non-default meta keys, hooks, overlays, custom-test counts, active CCCI_SKIP_GENERIC* env overrides with !! CI flag) printed + embedded verbatim in results.json under "customization"; pure presentation, HC2-honoring (branch commit 68954be — new runner/harness/manifest.py + tests/unit/test_manifest.py)
  • P6 — docs rewritten to the end state: recipe-customization.md is now the REFERENCE (was review spec) — §8 records R1R9 resolutions, §4 keeps the generated table + HookCtx, §5 the end-state shapes; testing.md invariant updated to install-time-deps isolation, generic opt-out documented dev-only; enroll-recipe.md worked examples (lasuite-docs install-time OIDC, mumble post-F2-14c), deps fixture, ctx signatures (branch commit da558ca)
  • Adversary inbox 19:06Z (P5 manifest dashboard hygiene) — addressed: secret-NAMED meta values (top-level + nested dict keys) render as '' in manifest + results.json; key names stay visible; unit-test pinned (branch commit 858e0f5)

P1P6 verification facts (for the eventual M1 cold-verify)

  • WHERE: branch restructure/recipe-custom, P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2, P5=68954be, P6=da558ca, manifest-redaction fix=858e0f5 (branch head).
  • HOW: cc-ci-run -m pytest tests/unit -q and nix develop .#lint --command scripts/lint.sh from a clean checkout of the branch.
  • EXPECTED: 192 passed; lint: PASS.
  • New single loader: runner/harness/meta.py::load(); all-recipes typo gate + R2 proof in tests/unit/test_meta.py; docs §4 table generated by scripts/gen-meta-docs.py (sync pinned by unit test).

M2 baseline matrix (built BEFORE merge, per plan M2.1)

Expected outcome per recipe dir for the post-merge regression sweep = most recent known-good evidence. Levels are results.json level; evidence = run id under /var/lib/cc-ci-runs// (on cc-ci) unless noted. Bad canaries are EXPECTED to fail at their designed tier.

Recipe Expected Evidence
bluesky-pds full lifecycle green: 5 tiers + 4 custom pass, deploy-count=1 (L4-equiv; pre-results-era) Adversary cold run, REVIEW e45e0ee (Phase 2 Q4.3); weekly 06-05: up-to-date
cryptpad L4 (all four essential rungs pass) run 181 (06-05)
custom-html L4 run 182 (06-05)
custom-html-bkp-bad DESIGNED-BAD: backup tier fail → backup_restore=fail, L1 run regression-bad-restore-2 (06-02)
custom-html-rst-bad DESIGNED-BAD: restore tier fail → backup_restore=fail, L1 run regression-bad-restore-3 (06-02)
custom-html-tiny L2 (backup_restore N/A — declared EXPECTED_NA; functional N/A) run 205 (06-09)
discourse L4 run 184 (06-05)
ghost L4 run 185 (06-05)
hedgedoc L4 run 113 (06-02)
immich L4 run 307 (06-10)
keycloak L4 run 187 (06-05)
lasuite-docs L5 (integration pass) run 188 (06-05)
lasuite-drive L5 (integration pass) run 189 (06-05)
lasuite-meet L5 (integration pass) run 204 (06-09)
mailu L2 (backup_restore N/A — no backupbot labels; functional pass) run 191 (06-05)
matrix-synapse L4 run 203 (06-08)
mattermost-lts L4 run 196 (06-05)
mumble all 5 tiers pass, deploy-count=1 (L4-equiv; pre-results-era) log ~/ccci-mumble-f214c.log on cc-ci (05-31)
n8n L4 run 197 (06-05)
plausible L4 run 308 (06-10)
uptime-kuma L4 run 165 (06-02)

Customization-executed spot-greps for M2.4 (mumble READY_PROBE tcp lines, cryptpad SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + chaos base, lasuite-* deps provisioning + OIDC skip-count 0, immich ops.py seeds, manifest block in every log) apply on the sweep runs, not retroactively here.

Gate

Gate: M2 CLAIMED 2026-06-11 ~01:30Z, awaiting Adversary.

M2 claim — WHAT / HOW / EXPECTED / WHERE

WHAT: plan M2.0M2.4 complete on merged main. Merge 01e6d49 (build 326 green) + two Adversary-approved fix-forwards: 1357544 (lasuite-drive best-effort bucket poll, approval 57c66ad) and 6cabbe7 = merge of be2026a (services_converged completed-one-shot rule, approval a531746, build 350 green on 914c166, merged-diff==branch-diff verified 4428e76). Canaries 7/7. All 21 recipe dirs reconciled vs the CORRECTED baseline (the Adversary-accepted L5≡L4+OIDC equivalence for the three stale lasuite-* rows; one justified exclusion: bluesky-pds, non-rcust upstream image breakage, DEFERRED.md). Drone→harness path covered (2 PR !testme runs green). Zero leaked apps.

RECONCILIATION (final evidence per recipe; run dirs under /var/lib/cc-ci-runs/):

Recipe Baseline Final evidence Match
bluesky-pds full green (pre-results-era) m2r L0 == m2rr L0 == ab-oldmain L0, all Cannot find module /app/index.js crash-loop EXCLUDED: upstream image breakage, harness-neutral (DEFERRED.md)
cryptpad L4 m2r-cryptpad L4
custom-html L4 m2r-custom-html L4
custom-html-bkp-bad designed backup fail, L1 m2r: backup fail exactly
custom-html-rst-bad designed restore fail, L1 m2r: backup pass → restore fail exactly
custom-html-tiny L2 (declared EXPECTED_NA) m2r-custom-html-tiny L2
discourse L4 (184, 06-05) m2r/m2b/m2p + ab-oldmain×2: ALL deviations byte-identical old==new harness (restore race @default head: L2==L2; upgrade-HC1 @baseline ref PR=2: L1==L1, stamp eb96de94+U both) env drift since 06-05, rcust-neutral (Adversary-verified, condition 3 of a531746)
ghost L4 m2r-ghost L4
hedgedoc L4 m2r-hedgedoc L4
immich L4 m2b-immich L4 @baseline ref + drone-path run 356 L4
keycloak L4 m2r-keycloak L4
lasuite-docs L5 (stale schema) m2r-lasuite-docs L4 all-pass + OIDC PASSED skip-0 ✓ (accepted equivalence)
lasuite-drive L5 (stale schema) m2p2-lasuite-drive L4 all-pass + OIDC + MinIO PASSED, rc=0, post-both-fixes ✓ (accepted equivalence)
lasuite-meet L5 (stale schema) m2r-lasuite-meet L4 all-pass + OIDC PASSED ✓ (accepted equivalence)
mailu L2 m2r-mailu L2
matrix-synapse L4 m2r-matrix-synapse L4
mattermost-lts L4 m2b-mattermost-lts L4 @baseline ref
mumble all 5 tiers (pre-results-era) m2r-mumble all tiers pass, deploy-count=1
n8n L4 m2r-n8n L4
plausible L4 m2b-plausible L4 @baseline ref + drone-path run 357 L4
uptime-kuma L4 m2r-uptime-kuma L4

HOW (cold, from the Adversary's own clone / direct on cc-ci):

  • per-recipe: jq '{recipe,level,rungs,flags}' /var/lib/cc-ci-runs/<id>/results.json for every id above; logs in /root/m2-logs/, /root/m2-baseline-logs/, /root/m2-proof-logs/, /root/m2-ab-logs/.
  • canaries: /root/m2-canary.log (7/7, fresh clone of merged main).
  • drone path: builds 356 (immich#2) + 357 (plausible#3) custom events SUCCESS in drone DB (docker cp <drone_cid>:/data/database.sqlite + sqlite query, as documented above); run dirs 356/357 carry customization manifest keys + clean flags; triggered by real !testme comments (gitea comment ids 14317/14318).
  • M2.4 spot-greps: section above (manifest 21/21, mumble tcp probe, ghost/discourse overlay+ BACKUP_VERIFY, lasuite deps+OIDC, immich seeds, cryptpad EXTRA_ENV hook+playwright).
  • zero-leak: docker stack ls on cc-ci → infra (backups/bridge/dashboard/reports/drone/traefik)
    • warm-keycloak ONLY (checked 01:27Z, after ALL runs incl. drone-path).
  • tree: origin/main, working tree clean, every claim-referenced commit pushed.

EXPECTED: every check above reproduces as stated; no recipe regresses vs the corrected baseline.

WHERE: origin/main @ (this commit); REVIEW-rcust.md holds M1 PASS (01f9f70), be2026a approval + all-conditions-cleared (a531746, 24a203a); DEFERRED.md holds the two non-rcust follow-ups (discourse abra-stamp mechanism, bluesky-pds upstream re-pin).

Gate history: M2 IN PROGRESS — M1 PASS in REVIEW-rcust.md (01f9f70, 2026-06-10).

  • M2.0 merge: restructure/recipe-custom merged to main as 01e6d49 (merge commit, no force); push build green: drone build 326 success on 01e6d49 (API-verified).
  • M2.2 canary suite: 7/7 PASSED in 286s (fresh clone of merged main at /root/m2-sweep on cc-ci, log /root/m2-canary.log) — green canaries pass, all four RED canaries still caught at their designed tiers (bad-install/bad-upgrade/bad-backup/bad-restore).
  • M2.3 per-recipe sweep (driver /root/m2-driver.sh, 2 concurrent, REF = mirror heads; logs /root/m2-logs/.log; results /var/lib/cc-ci-runs/m2r-/): first pass 15/21 matched baseline — hedgedoc/custom-html/custom-html-tiny/uptime-kuma/n8n/cryptpad/ghost/keycloak/mumble/mailu/ matrix-synapse/lasuite-docs/lasuite-meet at baseline level; both DESIGNED-BAD canaries failed at exactly their designed tier (bkp-bad: backup fail; rst-bad: backup pass→restore fail). 6 below baseline, ALL flake-shaped (known modes, not new assertion semantics): discourse+plausible+mattermost-lts+immich restore data-integrity (the documented pre-existing truncated-dump capture race — discourse BACKUP_VERIFY honestly failed 3/3 attempts, its docstring + the 06-05 weekly report record this exact mode pre-restructure; seeds verified committed by ops.py read-back asserts, i.e. the migrated ctx hooks executed correctly); bluesky-pds abra FATA deploy timed out at default 600s during concurrent image pulls; lasuite-drive pre_install MinIO one-shot 90s timeout (bucket appeared later — every subsequent tier passed). Serial re-runs (MAX=1, /root/m2-rerun.sh, logs /root/m2-rerun-logs/, results m2rr-/) completed 20:44Z — but ran default heads, not baseline refs (superseded by the targeted runs below).
  • M2.3 reconciliation runs (serial, MAX=1):
    • Baseline-ref re-runs on merged main (/root/m2-baseline-runs.sh, logs /root/m2-baseline-logs/, results m2b-/): plausible L4, mattermost-lts L4, immich L4 at their exact baseline refs — baseline REPRODUCED on the restructured harness; restore-race cluster closed for those three. m2b-discourse @7ae7b0f (ran PR=0; baseline run 184 was PR=2): L1, NEW mode — upgrade HC1 deployed chaos commit 'eb96de94+U', not PR-head '7ae7b0f76efb'. Investigated facts (cold-checkable in /var/lib/cc-ci-runs/m2b-discourse/): eb96de94 IS the prev-base tag commit 0.7.0+3.3.1 (git -C .../abra/recipes/discourse rev-list -n1 0.7.0+3.3.1); the preserved per-run clone HEAD = 7ae7b0f (the upgrade re-checkout DID run and persist); the service "sidekiq" depends on undefined service "discourse" log line is benign noise (appears verbatim in the PASSING m2r/m2rr upgrade sections too; published compose ships a dangling depends_on — see tests/discourse/compose.ccci.yml NOTE). So the chaos redeploy itself left the base stamp in place at this ref. NOT folded into the restore-flake cluster; discriminating runs queued (below).
    • Old-main A/B at the m2r ref (/root/m2-ab.sh, /root/m2-ab-logs/, results ab--oldmain/): discourse @7d53d4ec on OLD main = L2 restore fail == new-main m2r L2 at the same ref → restore race harness-neutral at that ref. bluesky-pds @b2d86ef on OLD main = L0 install fail.
    • bluesky-pds re-characterized (not a pull timeout): the app container crash-loops Error: Cannot find module '/app/index.js' (MODULE_NOT_FOUND, Node v24.15.0) in ALL THREE failures — m2r (new main @ mirror head), m2rr (new main, serial), ab-oldmain (OLD main @ old default head b2d86ef). Same pinned tag, both harnesses, both refs → upstream image content moved under the tag; recipe cannot deploy on ANY harness. Evidence: grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/default/. Restructure-neutral (old==new L0).
  • M2.3 in-flight proof runs (serial queue /root/m2-proof.sh + /root/m2-proof2.sh, logs /root/m2-proof-logs/, driver /root/m2-proof-logs/driver.log):
    1. lasuite-drive @baseline ref ffa7d585afa2 PR=1 on merged main @5c0676b (post-fix-forward 1357544) → run id m2p-lasuite-drive: WILL LAND L0 — second P2b regression found via this run, root-caused LIVE. The 1357544 best-effort path WORKED (!! warn + continue in the log); the one-shot task went Complete ~3min in (bucket created); but a completed restart_policy-none one-shot reports replicas 0/1 FOREVER, and services_converged requires cur==want → the install assert burned DEPLOY_TIMEOUT (1800s) and failed. Old world never saw this: setup_custom_tests.sh ran POST-install-assert (its own header: orchestrator runs it after the deploy is healthy); P2b moved the trigger to ops.py pre_install = PRE-assert. Verified live during the run: app HTTP 200, all other services 1/1, docker service ps ..._minio-createbuckets = Complete, pytest in converge loop 27+ min. Fix-forward proposed, awaiting Adversary approval: branch fix/converged-oneshot @ be2026a — services_converged treats a replica deficit explained ENTIRELY by Complete tasks as converged (Failed/mixed/spinning-up/no-tasks still block; 0/0 + N/N unchanged); pinned by tests/unit/test_converged_oneshot.py (7 cases). Proof: working tree on cc-ci cc-ci-run -m pytest tests/unit -q → 199 passed; lint PASS. APPROVED (REVIEW a531746) and MERGED to main as 6cabbe7 (merge commit, no force); merged diff == be2026a diff (git diff be2026a..main -- runner/harness/lifecycle.py tests/unit/test_converged_oneshot.py = empty). Push build green: drone build 350 success on 914c166 (branch head incl. the merge; verify on cc-ci: docker cp <drone_cid>:/data/database.sqlite /tmp/d.sqlite && sqlite3 /tmp/d.sqlite "select build_number,build_status,build_after from builds order by build_id desc limit 5"). Post-fix re-run QUEUED: /root/m2-proof3.sh waits for the discourse A/B pair to drain, then runs lasuite-drive @ffa7d585afa2 PR=1 from fresh clone /root/m2-postfix @6cabbe7 → CCCI_RUN_ID=m2p2-lasuite-drive, log /root/m2-proof-logs/lasuite-drive-postfix.log. EXPECTED L5 (binding condition 1 of the approval). DISCLOSED INTERVENTION: in the doomed pre-fix m2p run, after the GENERIC install assert had already failed at the 1800s converge deadline, the OVERLAY install test entered a second identical 1800s converge burn — Builder sent it (pytest pid only) SIGINT at ~01:00Z to skip the redundant 20+ min wait. The log therefore shows KeyboardInterrupt at generic.py:97 (the converge poll — the exact diagnosed line). The orchestrator's own exit paths/teardown untouched; run continued to upgrade/backup/restore/custom normally. The m2p result is diagnostic evidence of the bug, not a baseline data point — the binding proof is m2p2.
    2. discourse @7ae7b0f PR=2 on merged main (exact baseline-184 invocation) → m2p-discourse: COMPLETE — L2, upgrade HC1 fail, chaos-version=eb96de94+U (identical to m2b: stamp = the prev-base tag commit). Deterministic at this ref on new main; NOT a PR=0 artifact, NOT a race. install/backup/restore/custom all pass.
    3. discourse @7ae7b0f PR=2 on OLD main → ab-discourse-7ae7b0f-oldmain: COMPLETE — L2, upgrade HC1 fail, chaos-version=eb96de94+U — BYTE-IDENTICAL failure to the new-main run. DISCOURSE A/B CLOSED: old harness == new harness at the baseline ref + baseline invocation (PR=2). The upgrade-HC1 mode is HARNESS-NEUTRAL — not an rcust regression. Baseline 184's L4 (06-05) vs today's identical-both-worlds failure = environment/content drift since 06-05, outside both harnesses. Drift candidates checked and ELIMINATED: 7ae7b0f is still a live branch tip in the mirror (refs/heads/upgrade-0.8.0+3.5.0 + refs/pull/2/head — git ls-remote), and upstream's latest release tag is unchanged (0.7.0+3.3.1 = eb96de94, no new tag since 06-05). flake.lock (abra pin) identical in both worlds. HC1 firing rather than false-greening is the guard working as designed. Cold-verify: results.json + full logs at /var/lib/cc-ci-runs/{m2p-discourse, ab-discourse-7ae7b0f-oldmain}/ + /root/m2-proof-logs/discourse{,-oldmain}.log.
    4. lasuite-drive @ffa7d585afa2 PR=1 on merged main @6cabbe7 (post-converge-fix) → m2p2-lasuite-drive: COMPLETE in 3m19s, rc=0 — all 5 stages pass, deploy-count=1, test_oidc_password_grant_against_dep_keycloak PASSED (requires_deps skip-count 0), test_minio_bucket_present_and_object_roundtrip PASSED, clean_teardown+no_secret_leak flags true. NO converge burn: the one-shot again exceeded its 90s window (!! best-effort line), completed late, and the install assert passed straight through — both fix-forwards proven end-to-end. results.json level=4, NOT 5 — see schema note below.
  • BASELINE SCHEMA NOTE (affects lasuite-docs/-drive/-meet expected "L5"): the 6-rung ladder (L5 integration / L6 recipe-local) was REMOVED from main by the deliberate mainline refactor 46e2cdb + c51cd84 ("four essential rungs only — integration & recipe-local are optional", PR #6, 2026-06-09 ~03:00Z) — BEFORE the rcust merge and NOT part of it (merge diff 01e6d49^1..01e6d49 touches level.py not at all and results.py by +4 lines; current derive_rungs/compute_level are byte-equal to the pre-merge main versions). Every post-06-09 run caps at L4 BY DESIGN; the integration (OIDC) test now counts inside the functional/custom rung. Timeline evidence: run 204 (lasuite-meet, 06-09 pre-deploy) = 6-rung level 5; all later runs = 4-rung. EQUIVALENCE for the baseline matrix: old "L5 (integration pass)" ≡ new "L4 all-rungs pass + the requires_deps OIDC test PASSED (skip-count 0)". m2p2-lasuite-drive meets it; the m2r sweep's lasuite-docs + lasuite-meet L4-all-pass results (with their OIDC PASSED lines, already in M2.4 spot-greps) meet it identically.
  • M2.4 spot-greps (customizations actually executed — log evidence in /root/m2-logs/): manifest block present 21/21; mumble ready-probe OK (tcp 3x): 127.0.0.1:64738; ghost+discourse ccci-overlay: provided compose.ccci.yml ... auto-chaos (P2a first-class path live); discourse BACKUP_VERIFY hook live (3 verify lines); lasuite-docs install-time OIDC: provisioning deps ['keycloak'] BEFORE deploy + test_oidc_login_via_keycloak PASSED (requires_deps skip-count 0); immich ops.py pre_upgrade/pre_backup/pre_restore seed lines; cryptpad EXTRA_ENV='' in manifest + its 4 overlays + playwright green (hook applied); 19 screenshot.png across m2r-* dirs.
  • Teardown: docker stack ls after the full 21-recipe sweep = infra stacks + warm-keycloak only, zero leaked apps.
  • Drone→harness path: !testme on two open recipe PRs pending after the re-runs.

Gate history: M1 CLAIMED 2026-06-10 → PASS (branch head 858e0f5)

  • WHAT: P1P6 complete on branch restructure/recipe-custom (P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2, P5=68954be, P6=da558ca, +858e0f5 manifest redaction). Working tree clean, all pushed.
  • HOW (cold, from a fresh clone of the branch):
    • cc-ci-run -m pytest tests/unit -q → EXPECTED: 192 passed
    • cc-ci-run -m pytest tests/concurrency -q → EXPECTED: 23 passed (untouched by this plan; Builder proof run 2026-06-10 on branch head: 23 passed in 11.46s)
    • nix develop .#lint --command scripts/lint.sh → EXPECTED: lint: PASS
    • resolved-customization diff old-vs-new for all 21 recipe dirs (Adversary's own script) → EXPECTED: 0 deltas
    • adversarial review of the full diff main..restructure/recipe-custom
  • WHERE: origin branch restructure/recipe-custom @ 858e0f5; baseline matrix above (M2 prep, committed pre-merge per plan).

Current

M2 CLAIMED (see Gate above) — awaiting Adversary cold-verify. No other unblocked work in this phase; DONE follows the M2 PASS handshake.