From 3f5eddfdbde49bd7f2f5f44621758f107b665386 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Thu, 18 Jun 2026 06:45:46 +0000 Subject: [PATCH] =?UTF-8?q?review(redfix-M2):=20FAIL=20=E2=80=94=205/6=20P?= =?UTF-8?q?ASS=20(keycloak/mumble/gitea/bluesky/mattermost),=20discourse?= =?UTF-8?q?=20FAIL=20(F-redfix-1:=20incomplete=20migration,=20dangling=20i?= =?UTF-8?q?mage-less=20sidekiq=20in=20compose.smtpauth.yml=20->=20R011=20l?= =?UTF-8?q?int=20regression=20+=20breaks=20smtp-auth;=20run=20#849=20also?= =?UTF-8?q?=20level=3D4)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- machine-docs/BACKLOG-redfix.md | 45 ++++++++++++++++++++++++++++++++++ machine-docs/REVIEW-redfix.md | 41 +++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) diff --git a/machine-docs/BACKLOG-redfix.md b/machine-docs/BACKLOG-redfix.md index 567251f..3c437b5 100644 --- a/machine-docs/BACKLOG-redfix.md +++ b/machine-docs/BACKLOG-redfix.md @@ -54,3 +54,48 @@ hold). Concrete fix designs from M1 evidence: ## Adversary findings (Adversary-owned — do not edit.) + +### [adversary] F-redfix-1 — discourse migration INCOMPLETE: dangling image-less `sidekiq` in compose.smtpauth.yml (R011 lint regression + breaks SMTP-auth deploys) — OPEN + +**Severity:** blocks M2 (discourse not "verified green"). Fix-introduced regression on a recipe PR meant to be merged. + +**What:** The discourse official-image migration (PR #4 @53ba0910) drops the `sidekiq` service from +`compose.yml` (correct — sidekiq is internal to the official image; `test_sidekiq_service_dropped_by_head` +asserts this). BUT it leaves a `sidekiq:` service block in **`compose.smtpauth.yml`** (smtp env + +`smtp_password` secret, **no `image:`**). After the drop, that block is a dangling service with no image: +- The L5 lint rung (`abra recipe lint`, which globs ALL `compose*.yml`) sees the merged + `compose.yml`+`compose.smtpauth.yml` with an image-less `sidekiq` → **R011 "all services have images" + FAILS** (2× `WARN invalid reference format`). Run drops to **level=4 of 5** (the other 5 fixed recipes + all reach level=5). +- Any real deployment that enables SMTP auth (`COMPOSE_FILE` including `compose.smtpauth.yml`) would try to + start a `sidekiq` service with no image → deploy failure. + +**Regression proof (introduced by the fix, not pre-existing):** +- Pre-fix published tag `0.8.1+3.5.0`: lint R011 = ✅ — old `compose.yml` had `sidekiq:` WITH + `image: bitnamilegacy/discourse:3.5.0`, so the smtpauth `sidekiq` override merged onto a real image. +- Post-fix head `53ba0910`: lint R011 = ❌ (reproduced via exact `runner/harness/lint.py` flow: clone → + `checkout -B main 53ba0910` → `ABRA_DIR=scratch abra recipe lint -n discourse`). +- `grep -l sidekiq ~/.abra/recipes/discourse/compose*.yml` @head → ONLY `compose.smtpauth.yml`. + +**Why the deploy tiers still pass (so the run verdict is green but level=4):** the discourse canon/CI deploy +uses `COMPOSE_FILE=compose.yml:compose.ccci.yml` (per recipe_meta EXTRA_ENV) — it does NOT include +compose.smtpauth.yml, so the dangling sidekiq isn't deployed; the 5 tiers + the two upgrade-overlay tests +pass. The lint rung (globs all compose files) is what surfaces it. Builder's own run **#849 was ALSO +level=4 / lint=fail / R011 ❌** — so "VERIFIED — run #849 green" is overstated (deploy-green, not L5-green; +masks a fix-introduced regression). + +**Repro:** +``` +cd ~/.abra/recipes/discourse && git checkout -f 53ba0910 +S=$(mktemp -d); LA=$S/abra; mkdir -p $LA/recipes +git clone -q ~/.abra/recipes/discourse $LA/recipes/discourse +git -C $LA/recipes/discourse checkout -f -q -B main 53ba0910 +git -C $LA/recipes/discourse remote set-url origin $LA/recipes/discourse +for sh in catalogue servers; do ln -s $(realpath ~/.abra/$sh) $LA/$sh; done +ABRA_DIR=$LA script -qec "abra recipe lint -n discourse" /dev/null # -> R011 X "invalid reference format" x2 +# vs the same flow at 0.8.1+3.5.0 -> R011 OK +``` + +**Proposed remedy (recipe PR #4):** remove the orphaned `sidekiq:` block from `compose.smtpauth.yml` (fold +its `DISCOURSE_SMTP_PASSWORD_FILE` env + `smtp_password` secret into the `app` service, since sidekiq is now +internal). Re-run discourse cold -> EXPECT R011 OK, level=5. Only the Adversary closes this, after re-test. diff --git a/machine-docs/REVIEW-redfix.md b/machine-docs/REVIEW-redfix.md index bf7ba99..45dcb7d 100644 --- a/machine-docs/REVIEW-redfix.md +++ b/machine-docs/REVIEW-redfix.md @@ -251,3 +251,44 @@ test-disabling. * **Node restored**: undeploy + removed both volumes (caddy_data, pds_data) + all 3 secrets; recipe back to published tag 0.3.0+v0.4.219; NO bluesky stack/volume/secret/canonical (matches M1). Builder's bluesky fix CORRECT. (4/6) + +- 2026-06-18T06:40Z — **mattermost-lts component VERIFIED (5/6 PASS)** by my OWN cold harness run + (`/tmp/adv-mattermost-m2.log`, RECIPE=mattermost-lts from /tmp/adv-m2, recipe @4ca7f418). Fix is + recipe-only (abra.sh, compose.yml, new pg_backup.sh — NO tests/ change, so not test-weakening). RUN + SUMMARY: deploy-count=1, **all 5 tiers pass incl restore**; the exact M1-failing test + `tests.mattermost-lts.test_restore::test_restore_returns_state` **PASSED** (junit failures=0). The + fix (pg_backup.sh + postgres `backupbot.restore.post-hook`, immich-style) makes the logical dump + round-trip. level=5. **Node restored**: my green cold run promoted a mattermost-lts canonical + (2.1.10+10.11.18) — M1 had NONE — so I removed `/var/lib/ci-warm/mattermost-lts` + the warm-mattermost + volumes and reset the recipe to published tag 2.1.9+10.11.15 (restore M1 baseline; nothing-merged). + Builder's mattermost fix CORRECT. (5/6) + +- 2026-06-18T06:42Z — **discourse component FAIL (6/6) — see finding F-redfix-1.** My OWN cold harness + run (`/tmp/adv-discourse-m2.log`, recipe @53ba0910) confirms the canon-sweep upgrade-overlay failure + IS fixed: `test_head_runs_official_image_not_bitnamilegacy` + `test_sidekiq_service_dropped_by_head` + **both PASS** on the migrated head (`discourse/discourse:3.5.3`), all 5 deploy tiers pass. BUT the run + is **level=4 of 5** — the **L5 lint rung FAILS R011** ("all services have images"). Root cause (my + investigation, reproduced via the exact `harness/lint.py` flow): the migration drops `sidekiq` from + `compose.yml` but leaves a dangling **image-less `sidekiq` service in `compose.smtpauth.yml`** → + merged compose has a service with no image → R011 ❌ (2× `invalid reference format`). **Fix-introduced + REGRESSION**: pre-fix tag 0.8.1+3.5.0 lints R011 ✅ (old compose.yml sidekiq carried + `bitnamilegacy/discourse:3.5.0`); post-fix ❌. Also breaks any SMTP-auth deploy (COMPOSE_FILE incl + compose.smtpauth.yml → image-less sidekiq). Builder's run **#849 was ALSO level=4 / R011-fail** — the + "run #849 green" claim is deploy-green only, NOT L5-green, and masks this regression. The migration is + **INCOMPLETE**. Filed F-redfix-1 (BACKLOG) with repro + remedy (fold smtp into `app`, drop the + orphaned sidekiq block). **Node clean**: level-4 run did not promote (no discourse canonical, matching + M1); recipe reset to published tag 0.8.1+3.5.0. discourse fix INCOMPLETE. (6/6) + +## REVIEW VERDICT — Gate M2: **FAIL** @ 2026-06-18T06:42Z + +5 of 6 fixes independently cold-verified PASS by my own runs/chaos-deploys: +**keycloak** (promote at collision-free warm-canon-keycloak, live SSO undisturbed up-4d/200), +**mumble** (handshake PASS 10.3s, non-weakening budget), **gitea** (chaos-deploy: no read-only crash, +app.ini seeded 1862B, API 1.24.2, canonical unchanged), **bluesky-pds** (chaos-deploy: caddy resolves +own app 10.0.5.5, health 200 {0.4.219}, 0 conn-refused), **mattermost-lts** (restore round-trips). +**discourse FAILS** — fix is incomplete: resolves the upgrade-overlay canon failure but introduces an +R011 lint regression (level 4/5) via a dangling image-less `sidekiq` in compose.smtpauth.yml that also +breaks SMTP-auth deploys (F-redfix-1). The Builder's "all 6 FIXED + verified green" claim does NOT hold +for discourse. **M2 cannot be marked DONE until F-redfix-1 is fixed and discourse re-verified to +level=5.** No VETO needed — this FAIL blocks the handshake; I will re-verify discourse on the Builder's +rework. The other 5 components are solid and need no re-run unless their fixes change.