decisions(2): plausible clickhouse-backup boot-download = upstream robustness defect; recipe-PR deferred (Q4.7b)

This commit is contained in:
2026-05-29 18:55:45 +01:00
parent f9ebb3f610
commit 19f1ea6da4

View File

@ -831,3 +831,29 @@ only); an end-to-end UDP media-relay path to a per-run container is an **environ
limitation**, not a test-quality gap (§7.1 env-blocker exception). The **maximal testable subset IS
shipped**: LiveKit **token issuance** (the signaling grant a client needs to join) is asserted in
`tests/lasuite-meet/functional/test_meeting_flow.py` (create room → JWT token granting the room).
## plausible: clickhouse-backup boot-download is an upstream robustness defect (2026-05-29)
**Decision (settled):** plausible's `entrypoint.clickhouse.sh` downloads a 22MB `clickhouse-backup`
tarball from GitHub at every container start with `set -e` and no retry/cache, BEFORE exec'ing
clickhouse-server. A failed download exits 1 (clickhouse never starts) → swarm crash-loops →
re-downloads 22MB each restart → triggers GitHub secondary rate-limiting → sustained crash-loop →
`abra app deploy` times out. The deploy converges only when GitHub answers the *first* wget (normal
single-deploy CI), but is not robust to a transient first-wget failure, and back-to-back heavy testing
exhausts the host IP's GitHub budget and induces the spiral.
**This is an UPSTREAM RECIPE defect, not a cc-ci test/harness defect** (same class as Q3.2b
lasuite-drive collabora and immich's missing pg_dump hook). The cc-ci test content for plausible
(event-tracking §4.3 tests, lifecycle overlays, /api/health readiness probe) is correct and the §4.3
functional tests are proven green.
**The durable fix is a recipe PR** hardening the clickhouse entrypoint: cache the binary on the
persistent `/var/lib/clickhouse` volume (skip-if-present so restarts don't re-download → no
amplification), retry-with-backoff, and `set +e` so a download failure never blocks clickhouse-server
start (DB must come up; backup degrades gracefully). IMPORTANT CONSTRAINT: the cc-ci **install** tier
deploys the PREVIOUS PUBLISHED version (`recipe_checkout` to the tag), so a recipe PR only fixes the
**upgrade** tier and FUTURE installs once released — it does NOT make this gate's install tier converge
under an active throttle. Therefore the gate's full-lifecycle green still depends on GitHub answering
the install-tier deploy's first download (achievable after a rate-limit cooldown with a single clean
run). The recipe PR is filed as a deferred robustness follow-up (Q4.7b), mirroring the Q3.2b/immich
pattern; Adversary/operator weigh whether it gates Phase-2 DONE.