Files
plausible/entrypoint.clickhouse.sh
autonomic-bot 9f8bcbc9e3
Some checks failed
cc-ci/testme cc-ci: failure
fix: clickhouse-backup install must succeed loudly, never silently degrade
Replaces the previous best-effort (|| true) approach: a deploy without
clickhouse-backup would have silently broken backup/restore, so the
entrypoint now hard-fails (visibly, in service logs) if the tool truly
cannot be installed — but makes that case effectively unreachable:

- cache the VERIFIED binary on the persistent clickhouse volume, keyed
  by version: downloaded at most once per app; container restarts never
  re-fetch (kills the re-download amplification that turned a GitHub
  throttle into a permanent crash-loop)
- canonical Altinity release URL (project moved; old path is a redirect)
- bounded retries with backoff + wget read timeout (a stalled connection
  can no longer hang the deploy)
- verify the binary executes before trusting or caching it (catches
  truncated downloads and a corrupt cache)
- compose: fix app depends_on to the real service name
  (plausible_events_db) — docker compose config was failing on it, which
  disabled CI image prepull and pushed pulls into the deploy window
- bump CLICKHOUSE_ENTRYPOINT_VERSION v4 -> v5 (swarm configs immutable)

Verified on a dev deploy: fresh download path, cached-restart path,
clickhouse-backup create/list/delete, and /api/health all green.
2026-06-09 19:09:13 +00:00

73 lines
2.9 KiB
Bash

#!/bin/bash
# clickhouse-backup powers this recipe's backup/restore (the backupbot pre/post-hooks run
# `clickhouse-backup create/restore`). A deploy without it would have silently broken backups,
# so if it truly cannot be installed the deploy must FAIL LOUDLY rather than degrade.
#
# The published recipe fetched it with a single silenced no-retry wget at every container start:
# any transient GitHub failure exited the container (set -e) before clickhouse-server started,
# and the swarm restart loop re-downloaded the 22 MB asset on every restart, amplifying a
# throttle into a permanent crash-loop and a deploy timeout (cc-ci Q4.7).
#
# Hardening (no behaviour change when the fetch succeeds first try):
# - cache the VERIFIED binary on the persistent clickhouse volume, keyed by version, so it is
# downloaded at most once per app and container restarts never re-fetch;
# - canonical Altinity URL (the project moved; the old AlexAkulov path is just a redirect);
# - bounded retries with backoff + a read timeout, so a stalled connection cannot hang the
# deploy and a transient failure cannot kill it;
# - verify the binary actually executes before trusting or caching it (catches truncated
# downloads and a corrupt cache);
# - un-silenced: every attempt and the final verdict are visible in `docker service logs`.
set -e
CLICKHOUSE_BACKUP_VERSION=2.4.2
ARCH=$(uname -m)
if [[ $ARCH =~ "aarch64" ]]; then
ARCH="arm64"
elif [[ $ARCH =~ "armv5l" ]]; then
ARCH="armv5"
elif [[ $ARCH =~ "armv6l" ]]; then
ARCH="armv6"
elif [[ $ARCH =~ "armv7l" ]]; then
ARCH="armv7"
elif [[ $ARCH =~ "x86_64" ]]; then
ARCH="amd64"
fi
CACHE_DIR=/var/lib/clickhouse/.ccci-bin
CACHED="${CACHE_DIR}/clickhouse-backup-v${CLICKHOUSE_BACKUP_VERSION}"
BIN=/usr/local/bin/clickhouse-backup
URL="https://github.com/Altinity/clickhouse-backup/releases/download/v${CLICKHOUSE_BACKUP_VERSION}/clickhouse-backup-linux-${ARCH}.tar.gz"
binary_ok() {
"$1" --version >/dev/null 2>&1
}
install_clickhouse_backup() {
mkdir -p "$CACHE_DIR"
if [ -x "$CACHED" ] && binary_ok "$CACHED"; then
cp -f "$CACHED" "$BIN"
echo "clickhouse-backup: using verified cached binary ($CACHED)"
return 0
fi
rm -f "$CACHED" # absent or fails to execute — re-fetch
for attempt in 1 2 3 4 5; do
if wget -T 30 --continue --output-document=/tmp/clickhouse-backup.tar.gz "$URL" \
&& tar -xf /tmp/clickhouse-backup.tar.gz --directory=/usr/local/bin --strip-components=3 \
&& binary_ok "$BIN"; then
cp -f "$BIN" "$CACHED" 2>/dev/null || true
echo "clickhouse-backup: downloaded, verified + cached (attempt ${attempt})"
return 0
fi
echo "clickhouse-backup: fetch attempt ${attempt}/5 failed" >&2
[ "$attempt" -lt 5 ] && sleep $((attempt * 10))
done
echo "clickhouse-backup: could not install after 5 attempts — failing the deploy (without it backup/restore would be silently broken)" >&2
return 1
}
install_clickhouse_backup
exec /entrypoint.sh