fix: clickhouse entrypoint - backup download is best-effort (server must start regardless)
Some checks failed
cc-ci/testme cc-ci: failure
Some checks failed
cc-ci/testme cc-ci: failure
The previous entrypoint treated clickhouse-backup as required: a download failure (rate-limit or transient network) caused install_clickhouse_backup to return 1 which with set -e exited the entrypoint before /entrypoint.sh ran. ClickHouse never started, the swarm restarted it, the download was retried, amplifying the throttle -> crash-loop -> deploy timeout (cc-ci Q4.7b). Fix: install_clickhouse_backup || true — the server starts even if the backup tool cannot be fetched. Backup/restore degrades until a later restart fetches it. Also: fix stray trailing quote in backupbot.restore.post-hook; bump CLICKHOUSE_ENTRYPOINT_VERSION v3->v4 (config content changed).
This commit is contained in:
2
abra.sh
2
abra.sh
@ -1,3 +1,3 @@
|
|||||||
export CLICKHOUSE_CONF_VERSION=v2
|
export CLICKHOUSE_CONF_VERSION=v2
|
||||||
export CLICKHOUSE_USER_CONF_VERSION=v2
|
export CLICKHOUSE_USER_CONF_VERSION=v2
|
||||||
export CLICKHOUSE_ENTRYPOINT_VERSION=v3
|
export CLICKHOUSE_ENTRYPOINT_VERSION=v4
|
||||||
|
|||||||
@ -82,7 +82,7 @@ services:
|
|||||||
backupbot.backup.path: "/var/lib/clickhouse/backup/events"
|
backupbot.backup.path: "/var/lib/clickhouse/backup/events"
|
||||||
backupbot.backup.post-hook: "rm -rf /var/lib/clickhouse/backup/events"
|
backupbot.backup.post-hook: "rm -rf /var/lib/clickhouse/backup/events"
|
||||||
backupbot.restore: "true"
|
backupbot.restore: "true"
|
||||||
backupbot.restore.post-hook: clickhouse-backup restore --rm events && rm -rf /var/lib/clickhouse/backup/events"
|
backupbot.restore.post-hook: clickhouse-backup restore --rm events && rm -rf /var/lib/clickhouse/backup/events
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
db-data:
|
db-data:
|
||||||
|
|||||||
@ -1,12 +1,17 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
# clickhouse-backup is the BACKUP tool (backupbot pre/post-hooks: `clickhouse-backup create/restore`).
|
||||||
# clickhouse-backup is a backup tool (backupbot pre/post-hooks: `clickhouse-backup create/restore`).
|
# It is NOT required for clickhouse-SERVER (`/entrypoint.sh`) to run. The published recipe fetched it
|
||||||
# It is a 22 MB GitHub download (rate-limit / network), which can fail to download, and lead to crash loop and download throttling.
|
# with `set -ex` + a single silenced no-retry wget to ephemeral /tmp, so ANY transient failure of the
|
||||||
|
# 22 MB GitHub download (rate-limit / network) exited the container BEFORE the server started -> swarm
|
||||||
|
# restarted it -> re-downloaded -> amplified the throttle -> crash-loop -> deploy timeout (cc-ci Q4.7).
|
||||||
#
|
#
|
||||||
# to make the download smoother:
|
# Hardening (no behaviour change when the download succeeds first try):
|
||||||
# - cache the binary on the persistent clickhouse data volume (/var/lib/clickhouse) so it is fetched
|
# - cache the binary on the PERSISTENT clickhouse data volume (/var/lib/clickhouse) so it is fetched
|
||||||
# at most once and reused on every container restart (no re-download amplification);
|
# at most once and reused on every container restart (no re-download amplification);
|
||||||
# - retry with backoff to ride out transient GitHub failures
|
# - retry with backoff;
|
||||||
|
# - NEVER let a download failure block the server start (best-effort: the server comes up, backup/
|
||||||
|
# restore degrade until the next successful fetch);
|
||||||
|
# - un-silenced so a failure is diagnosable in `docker service logs`.
|
||||||
|
|
||||||
set -e
|
set -e
|
||||||
|
|
||||||
@ -47,11 +52,11 @@ install_clickhouse_backup() {
|
|||||||
echo "clickhouse-backup: fetch attempt ${attempt} failed; backing off $((attempt * 10))s" >&2
|
echo "clickhouse-backup: fetch attempt ${attempt} failed; backing off $((attempt * 10))s" >&2
|
||||||
sleep $((attempt * 10))
|
sleep $((attempt * 10))
|
||||||
done
|
done
|
||||||
echo "clickhouse-backup: fetch FAILED after all retries — aborting; clickhouse-server will NOT start (backup tool is required)" >&2
|
echo "clickhouse-backup: fetch FAILED after retries — starting clickhouse-server WITHOUT the backup tool (backup/restore unavailable until a later restart fetches it)" >&2
|
||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
|
|
||||||
#if the backup tool cannot be installed after retries, it aborts (set -e) so the deploy fails
|
# Best-effort: the server MUST start even if the backup-tool fetch fails (it is not a server dependency).
|
||||||
install_clickhouse_backup
|
install_clickhouse_backup || true
|
||||||
|
|
||||||
exec /entrypoint.sh
|
exec /entrypoint.sh
|
||||||
|
|||||||
Reference in New Issue
Block a user