journal(redfix): M1 discourse isolation — canon root-cause wrong; deploys fine, only upgrade overlay (unreleased official-image migration) fails
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
This commit is contained in:
@ -23,3 +23,43 @@ bluesky routing, discourse compose), then run isolation re-runs. discourse's rec
|
||||
UPSTREAM compose defect (`sidekiq.depends_on: discourse` while service is `app`) that FATAs before any
|
||||
deploy — that's deterministic, not a load timeout, so it may not even need a long isolation run to
|
||||
confirm; verify the compose at the latest tag directly first.
|
||||
|
||||
## 2026-06-17T23:40Z — M1: discourse isolation run — CANON ROOT-CAUSE WAS WRONG
|
||||
|
||||
Ran discourse ALONE on cc-ci (`recipe_checkout discourse 0.8.1+3.5.0` + `RECIPE=discourse
|
||||
CCCI_SKIP_FETCH=1 cc-ci-run runner/run_recipe_ci.py`, log `/tmp/redfix-discourse.log`).
|
||||
|
||||
RESULT: **install PASS, upgrade FAIL, backup PASS, restore PASS, custom PASS** — the recipe deploys,
|
||||
serves (200 /srv/status), backs up and restores cleanly. NOT a deploy timeout, NOT a 51-min wedge,
|
||||
NOT a deploy FATA. The canon DECISIONS root-cause ("`abra app deploy` FATAs: service sidekiq depends
|
||||
on undefined service discourse → invalid compose project") is **misattributed**: that string appears
|
||||
ONLY from the non-fatal prepull `docker compose config --images` (rc=15, harness logs "skipping
|
||||
(deploy will pull as usual)"). The real `abra app deploy` is a swarm `docker stack deploy`, which
|
||||
ignores `depends_on` entirely → the stack converges (`UpdateStatus=completed`).
|
||||
|
||||
The ONLY failure is the cc-ci upgrade OVERLAY `tests/discourse/test_upgrade.py`:
|
||||
- `test_head_runs_official_image_not_bitnamilegacy` — app image is `bitnamilegacy/discourse:3.5.0`;
|
||||
test demands `discourse/discourse:3.5.3` (official).
|
||||
- `test_sidekiq_service_dropped_by_head` — services `['app','db','redis','sidekiq']`; test demands
|
||||
sidekiq dropped.
|
||||
|
||||
These `prevb`-phase overlay tests are PR-FAITHFULNESS assertions for a specific migration PR
|
||||
(bitnamilegacy → official `discourse/discourse:3.5.3`, drop sidekiq). Verified that migration exists
|
||||
in **NO upstream release tag and NOT in main** — `git show main:compose.yml` and every tag
|
||||
(`0.1.0…0.8.1+3.5.0`) all use `bitnamilegacy/discourse:3.5.0` + sidekiq. So the overlay asserts a
|
||||
state that doesn't exist anywhere upstream → deterministic RED whenever the sweep tests the latest
|
||||
release tag. The head DID deploy (chaos-version label = head f87c612d+U, converged) — the test
|
||||
expectation is simply wrong for the released recipe.
|
||||
|
||||
Note (M2 design): migrating discourse from the deprecated `bitnamilegacy` image to official
|
||||
`discourse/discourse` is a MAJOR recipe rewrite (different fs layout, entrypoint, no `/opt/bitnami`
|
||||
sidekiq run.sh) — not a 1-line image swap. So the overlay test's `discourse/discourse:3.5.3`
|
||||
expectation may not be a realistic near-term recipe change. The bitnamilegacy deprecation is real
|
||||
(bitnami sunset legacy images), so a migration is the right long-term direction, but the test as
|
||||
written hard-codes a migration target absent upstream. Classification + fix approach to settle in M1
|
||||
table / M2.
|
||||
|
||||
Classification: **stale/PR-specific cc-ci OVERLAY test mismatched to the canonical-sweep context**
|
||||
(NOT a flake, NOT a load timeout, NOT a recipe-deploy defect, NOT warm-machinery). Teardown clean (no
|
||||
discourse stack left). Evidence: `/tmp/redfix-discourse.log` on cc-ci; junit under
|
||||
`/var/lib/cc-ci-runs/manual/junit/upgrade__cc-ci__test_upgrade.xml`.
|
||||
|
||||
@ -35,8 +35,8 @@ flake source per phase plan §2.1). Runs execute on cc-ci from `/etc/cc-ci`.
|
||||
|
||||
| Recipe | Isolation run | Result | Root cause | Classification |
|
||||
|---|---|---|---|---|
|
||||
| discourse | pending | — | — | — |
|
||||
| mattermost-lts | pending | — | — | — |
|
||||
| discourse | DONE @23:40Z (`/tmp/redfix-discourse.log` on cc-ci) | install/backup/restore/custom PASS; **upgrade overlay FAIL**. Deploys+serves fine — NOT a timeout/FATA. | cc-ci overlay `tests/discourse/test_upgrade.py` asserts head runs official `discourse/discourse:3.5.3` + drops sidekiq; latest tag `0.8.1+3.5.0` AND main both still `bitnamilegacy/discourse:3.5.0`+sidekiq (migration exists in no release/main). The `depends_on discourse` string is a non-fatal prepull-only warning, not the deploy. | **stale/PR-specific cc-ci OVERLAY test** mismatched to canonical-sweep context (not flake/timeout/recipe-deploy/warm-machinery) |
|
||||
| mattermost-lts | running (isolation) | — | — | — |
|
||||
| mumble | pending | — | — | — |
|
||||
| bluesky-pds | pending | — | — | — |
|
||||
| gitea | pending | — | — | — |
|
||||
|
||||
Reference in New Issue
Block a user