From 19f1ea6da439ab31e287cd08134dfb720842f0f3 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Fri, 29 May 2026 18:55:45 +0100 Subject: [PATCH] decisions(2): plausible clickhouse-backup boot-download = upstream robustness defect; recipe-PR deferred (Q4.7b) --- machine-docs/DECISIONS.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/machine-docs/DECISIONS.md b/machine-docs/DECISIONS.md index 45b237b..a01ffdf 100644 --- a/machine-docs/DECISIONS.md +++ b/machine-docs/DECISIONS.md @@ -831,3 +831,29 @@ only); an end-to-end UDP media-relay path to a per-run container is an **environ limitation**, not a test-quality gap (§7.1 env-blocker exception). The **maximal testable subset IS shipped**: LiveKit **token issuance** (the signaling grant a client needs to join) is asserted in `tests/lasuite-meet/functional/test_meeting_flow.py` (create room → JWT token granting the room). + +## plausible: clickhouse-backup boot-download is an upstream robustness defect (2026-05-29) + +**Decision (settled):** plausible's `entrypoint.clickhouse.sh` downloads a 22MB `clickhouse-backup` +tarball from GitHub at every container start with `set -e` and no retry/cache, BEFORE exec'ing +clickhouse-server. A failed download exits 1 (clickhouse never starts) → swarm crash-loops → +re-downloads 22MB each restart → triggers GitHub secondary rate-limiting → sustained crash-loop → +`abra app deploy` times out. The deploy converges only when GitHub answers the *first* wget (normal +single-deploy CI), but is not robust to a transient first-wget failure, and back-to-back heavy testing +exhausts the host IP's GitHub budget and induces the spiral. + +**This is an UPSTREAM RECIPE defect, not a cc-ci test/harness defect** (same class as Q3.2b +lasuite-drive collabora and immich's missing pg_dump hook). The cc-ci test content for plausible +(event-tracking §4.3 tests, lifecycle overlays, /api/health readiness probe) is correct and the §4.3 +functional tests are proven green. + +**The durable fix is a recipe PR** hardening the clickhouse entrypoint: cache the binary on the +persistent `/var/lib/clickhouse` volume (skip-if-present so restarts don't re-download → no +amplification), retry-with-backoff, and `set +e` so a download failure never blocks clickhouse-server +start (DB must come up; backup degrades gracefully). IMPORTANT CONSTRAINT: the cc-ci **install** tier +deploys the PREVIOUS PUBLISHED version (`recipe_checkout` to the tag), so a recipe PR only fixes the +**upgrade** tier and FUTURE installs once released — it does NOT make this gate's install tier converge +under an active throttle. Therefore the gate's full-lifecycle green still depends on GitHub answering +the install-tier deploy's first download (achievable after a rate-limit cooldown with a single clean +run). The recipe PR is filed as a deferred robustness follow-up (Q4.7b), mirroring the Q3.2b/immich +pattern; Adversary/operator weigh whether it gates Phase-2 DONE.