full4 base deploy timed out at 2400s on the 7-GiB single node. Root causes:
(1) sidekiq.depends_on referenced undefined service 'discourse' (main svc is 'app') → abra config
--images rc=15 → prepull SKIPPED → 2.4GB image pulled inline during deploy, eating convergence
budget. Overlay now overrides sidekiq.depends_on:[app] (swarm ignores depends_on → no-op at
runtime, masks nothing) so prepull resolves+pre-pulls images on both base+head deploys.
(2) bumped DEPLOY_TIMEOUT/TIMEOUT 2400→3600 for headroom on the RAM/CPU-constrained Rails cold boot.
Also pre-cached bitnamilegacy/discourse:3.3.1 by tag on cc-ci (was dangling <none>).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
45 lines
3.2 KiB
YAML
45 lines
3.2 KiB
YAML
---
|
|
version: "3.8"
|
|
# cc-ci overlay (Phase 2 Q4.6) — minimal, single-purpose: make the UPGRADE-tier BASE deploy (the
|
|
# previous published discourse version) deployable so upgrade-to-latest can run.
|
|
#
|
|
# WHY THIS OVERLAY EXISTS (plan-ccci-compose-overlay-policy.md §1 "minimal justified fallback" +
|
|
# the §1 mandate that upgrade-to-latest must ALWAYS run): the harness base-deploys the from-version
|
|
# (UPGRADE_BASE_VERSION = 0.7.0+3.3.1), then `deploy --chaos` to the recipe-PR head. Two blockers on
|
|
# that published base, both resolved here, NEITHER weakening any test:
|
|
# 1. RE-PIN: every published discourse tag pins `bitnami/discourse:3.3.1` (and 0.6.3 → 3.1.2),
|
|
# but Docker Hub REMOVED the bitnami/discourse namespace (404). The recipe-PR (recipe-maintainers/
|
|
# discourse#1) re-pins app+sidekiq to `bitnamilegacy/discourse:3.3.1` (the legit upstream
|
|
# relocation of the identical image). This overlay applies the SAME namespace-only re-pin to the
|
|
# BASE 0.7.0 (identical version 3.3.1, identical image content) so the from-version pulls — exactly
|
|
# the policy-blessed "minimal bitnami→bitnamilegacy re-pin overlay on the 0.7.0 from-version".
|
|
# 2. GRACE: discourse's Rails cold first boot (DB migrate + asset precompile) is 15-25min on cc-ci,
|
|
# exceeding the published 5m start_period → swarm kills the still-booting app. start_period CANNOT
|
|
# be an env var (abra validates the literal 'duration' BEFORE substitution → FATA; Adversary-
|
|
# reproduced, REVIEW-2 4b862f6), so we widen it to a literal 20m on the BASE. The PR head already
|
|
# ships 20m, so this overlay is idempotent on the head (it persists untracked across the checkout).
|
|
# Both changes are namespace/grace-only: identical image content, a healthy check still marks healthy
|
|
# immediately → NO assertion is weakened and no defect is masked.
|
|
#
|
|
# 3. PREPULL-ENABLE (compose-validity): the published recipe (BOTH 0.7.0 base AND the PR head) ships
|
|
# `sidekiq.depends_on: [discourse]`, but the main service is named `app` — `discourse` is an
|
|
# UNDEFINED service, so `abra app config --images` (the harness prepull step) returns
|
|
# `invalid compose project` (rc=15) and prepull is SKIPPED → the 2.4GB discourse image is pulled
|
|
# INLINE during `abra app deploy`, eating the convergence budget on the 7-GiB single node and
|
|
# pushing the slow Rails cold boot past the deploy timeout (full4 base-deploy timed out at 2400s).
|
|
# This overlay overrides sidekiq.depends_on to the real service `app` so the merged compose is
|
|
# VALID → prepull resolves+pre-pulls the images before deploy → reliable convergence. This is a
|
|
# NO-OP at runtime: `docker stack deploy` (swarm) IGNORES depends_on entirely, so correcting a
|
|
# dangling lint-only reference to the real service changes NOTHING swarm does — it masks no defect
|
|
# and weakens no test; it only unblocks the image pre-pull. (The same dangling ref exists upstream;
|
|
# filed as a recipe nit, but the fix here is the immutable-published-base + harness-prepull path.)
|
|
services:
|
|
app:
|
|
image: bitnamilegacy/discourse:3.3.1
|
|
healthcheck:
|
|
start_period: 20m
|
|
sidekiq:
|
|
image: bitnamilegacy/discourse:3.3.1
|
|
depends_on:
|
|
- app
|