30 Commits

Author SHA1 Message Date
13458fac56 refactor: extract backup/restore into config scripts, trim comments
All checks were successful
cc-ci/testme cc-ci: success
Move the postgres and clickhouse backup/restore hook logic out of inline
compose labels into dedicated pg_backup.sh / clickhouse_backup.sh config
scripts (the pattern other recipes use), and trim the verbose explanatory
comments down to the essential rationale, now living in the scripts.
2026-06-10 16:55:20 +00:00
270c8404ce fix: make restore correct under a live app (CI restore + custom tiers)
All checks were successful
cc-ci/testme cc-ci: success
Three independent bugs made `abra app restore` leave the stack broken:

1. ClickHouse: schema_migrations is a TinyLog table and clickhouse-backup
   can only FREEZE MergeTree data - it backed up the table schema but
   not its rows, so a restore emptied the migration ledger. The next app
   boot re-ran every IngestRepo migration against the fully-built tables
   and crash-looped (DUPLICATE_COLUMN: utm_medium) - the post-restore 502
   in CI build 237. Fix: export the ledger to TSV into the backup dir
   (rides in the snapshotted backup/events path) in the backup pre-hook,
   reload it in the restore post-hook.

2. App restart policy: condition was on-failure, but when postgres is
   disrupted under the app the BEAM supervision tree escalates and Erlang
   exits GRACEFULLY (status 0) - swarm marks the task Complete and never
   restarts it (reproduced: app stranded at 0/1). Fix: condition any.

3. pg_restore: --clean without --if-exists exits 1 when a dropped object
   is absent ("errors ignored"), killing the && chain and leaving the
   dump behind. Fix: --if-exists, plus pg_terminate_backend afterwards so
   the app pooled connections reconnect against the recreated objects.

Validated on a dev deploy: marker + truncated ClickHouse events both
return on restore, migration ledger intact (17 rows), post-restore event
ingestion for a new site works, and an app reboot after restore migrates
cleanly. Known cosmetic caveat: until the app is restarted, its Postgrex
type cache holds stale OIDs and background Oban jobs log "cache lookup
failed for type" - ingestion and serving are unaffected; an operator
restart after a restore clears it.
2026-06-09 23:01:24 +00:00
4cab6b5146 fix: backup labels to backup-bot-two v2 volume syntax (restore was a no-op)
Some checks failed
cc-ci/testme cc-ci: failure
backup-bot-two 2.4.0 snapshots paths INSIDE named volumes
(backupbot.backup.volumes.<vol>.path, relative to the volume root) and
IGNORES the old backupbot.backup.path label. The db pre-hook wrote
/postgres.dump.gz to the container's ephemeral root fs — outside every
volume — so the dump never reached the snapshot and the restore post-hook
failed on a missing file (gzip: /postgres.dump.gz: No such file).

- db: dump into the db-data volume (transient; hooks remove it) and
  snapshot only that file via backupbot.backup.volumes.db-data.path —
  same pattern as keycloak, which passes backup/restore on this CI.
  Also use $POSTGRES_DB in the restore hook: the previous $PLAUSIBLE_DB
  is defined nowhere and only connected via libpq's username fallback.
- clickhouse: snapshot only backup/events (the clickhouse-backup output)
  inside the event-data volume instead of the whole volume — restoring
  raw data files under a running server is unsafe; the post-hook performs
  the logical restore.
2026-06-09 21:53:18 +00:00
9f8bcbc9e3 fix: clickhouse-backup install must succeed loudly, never silently degrade
Some checks failed
cc-ci/testme cc-ci: failure
Replaces the previous best-effort (|| true) approach: a deploy without
clickhouse-backup would have silently broken backup/restore, so the
entrypoint now hard-fails (visibly, in service logs) if the tool truly
cannot be installed — but makes that case effectively unreachable:

- cache the VERIFIED binary on the persistent clickhouse volume, keyed
  by version: downloaded at most once per app; container restarts never
  re-fetch (kills the re-download amplification that turned a GitHub
  throttle into a permanent crash-loop)
- canonical Altinity release URL (project moved; old path is a redirect)
- bounded retries with backoff + wget read timeout (a stalled connection
  can no longer hang the deploy)
- verify the binary executes before trusting or caching it (catches
  truncated downloads and a corrupt cache)
- compose: fix app depends_on to the real service name
  (plausible_events_db) — docker compose config was failing on it, which
  disabled CI image prepull and pushed pulls into the deploy window
- bump CLICKHOUSE_ENTRYPOINT_VERSION v4 -> v5 (swarm configs immutable)

Verified on a dev deploy: fresh download path, cached-restart path,
clickhouse-backup create/list/delete, and /api/health all green.
2026-06-09 19:09:13 +00:00
b90a8c4239 fix: clickhouse entrypoint - backup download is best-effort (server must start regardless)
Some checks failed
cc-ci/testme cc-ci: failure
The previous entrypoint treated clickhouse-backup as required: a download failure
(rate-limit or transient network) caused install_clickhouse_backup to return 1 which
with set -e exited the entrypoint before /entrypoint.sh ran. ClickHouse never started,
the swarm restarted it, the download was retried, amplifying the throttle -> crash-loop
-> deploy timeout (cc-ci Q4.7b).

Fix: install_clickhouse_backup || true — the server starts even if the backup tool
cannot be fetched. Backup/restore degrades until a later restart fetches it.

Also: fix stray trailing quote in backupbot.restore.post-hook; bump
CLICKHOUSE_ENTRYPOINT_VERSION v3->v4 (config content changed).
2026-06-09 18:30:18 +00:00
50a3715caa chore: upgrade to 3.1.0+v2.0.0
Some checks failed
cc-ci/testme cc-ci: failure
Minor bump — no operator action required (Postgres/ClickHouse changes are automatic).

- Postgres: use pgautoupgrade/pgautoupgrade:18-alpine in place of the custom
  pg_upgrade entrypoint. The existing cluster is upgraded in place automatically
  on deploy; PGDATA pinned to the legacy path; adds a pg_isready healthcheck.
  Removes entrypoint.postgres.sh.tmpl and DB_ENTRYPOINT_VERSION.
- ClickHouse backup fetch: cache the clickhouse-backup binary on the persistent
  volume and retry with backoff to avoid the download crash-loop. The tool is
  required — if it can't be installed after retries the entrypoint aborts and
  the server does not start, rather than coming up without backup/restore.
- Add CLICKHOUSE_DATABASE_URL; bump the clickhouse entrypoint config version.
- Remove a stray broken link reference in the README.
2026-06-09 15:46:28 +00:00
da159375d8 Update .drone.yml 2025-01-08 10:09:13 -08:00
f83774500d Remove swapfile 2024-05-03 09:10:11 +00:00
71dfab1129 Merge pull request 'fix backup and restore when database was already created' (#5) from p4u1/plausible:fix-backup-restore into main
Reviewed-on: https://git.coopcloud.tech/coop-cloud/plausible/pulls/5
2024-04-06 18:08:54 +00:00
2330e73915 SMTP settings incl now 2024-03-05 10:45:49 -05:00
bdc6e77e40 fix backup and restore 2023-11-11 13:18:40 +01:00
3wc
b26d957cad Add scary release note warning 2023-11-09 20:35:00 +00:00
3wc
4a70aadfb4 chore: publish 3.0.1+v2.0.0 release 2023-11-09 20:31:44 +00:00
3wc
6c73753dc3 Add x86_64 support 2023-11-09 20:27:10 +00:00
3wc
1a29f24eba chore: publish 3.0.0+v2.0.0 release 2023-11-09 19:20:28 +00:00
a30993cdb1 Merge pull request 'add backup labels for postgres and clickhouse' (#4) from p4u1/plausible:backup into main
Reviewed-on: https://git.coopcloud.tech/coop-cloud/plausible/pulls/4
2023-11-09 19:18:58 +00:00
b72203b089 add backup labels for postgres and clickhouse 2023-11-09 11:17:45 +01:00
3wc
7fa53d58eb Rename release note to make it actually work 2023-08-11 12:26:19 +02:00
3wc
c5b29affd8 chore: publish 2.0.0+v1.5.1 release 2023-08-11 12:15:40 +02:00
3wc
ee337feaea Custom postgres entrypoint to handle major upgrades..
..appropriated from the coop-cloud/outline recipe.
2023-08-11 12:15:23 +02:00
3wc
276f4f6933 Remove admin user stuff..
..it's done on first launch now
2023-08-11 12:15:23 +02:00
3wc
f5f1bdd5eb Remove deprecated init-admin from startup command 2023-08-11 12:15:23 +02:00
fa5e91fc33 Changes to clickhouse logger 2023-08-11 12:15:23 +02:00
013352258b Readme update 2023-08-11 12:15:23 +02:00
3wc
d61a6c0bba Change version to 1.1.0 2023-08-11 12:15:23 +02:00
c53bf21e35 Changed version of Plausible from latest to actual version, upgraded PoSQL and change clickhouse to actual docker image not Yandex 2023-08-11 12:15:13 +02:00
3wc
c9227acce5 Switch to self-hosted stack-ssh-deploy image [mass update] 2023-08-11 12:15:13 +02:00
3wc
10b628f075 Fix CI by adding networks: [mass update] 2023-08-11 12:15:13 +02:00
3wc
a94abcb823 Automatically generate catalogue on release [mass update]
Re: coop-cloud/recipes-catalogue-json#4
2023-08-11 12:15:13 +02:00
3wc
f0820ed7b8 Update abra syntax in examples (finally) [mass update] 2023-08-11 12:15:13 +02:00
10 changed files with 195 additions and 16 deletions

View File

@ -45,7 +45,7 @@ steps:
from_secret: drone_abra-bot_token
fork: true
repositories:
- coop-cloud/auto-recipes-catalogue-json
- toolshed/auto-recipes-catalogue-json
trigger:
event: tag

View File

@ -6,9 +6,6 @@ DOMAIN=plausible.example.com
#EXTRA_DOMAINS=', `www.plausible.example.com`'
LETS_ENCRYPT_ENV=production
ADMIN_USER_EMAIL=replace-me
ADMIN_USER_NAME=replace-me
ADMIN_USER_PWD=replace-me
SECRET_KEY_BASE=replace-me
DISABLE_AUTH=replace-me # true or false
DISABLE_REGISTRATION=replace-me # true or false

View File

@ -7,7 +7,7 @@
* **Status**: 1, alpha
* **Image**: [`plausible/analytics`](https://hub.docker.com/plausible/analytics), 4, upstream
* **Healthcheck**:
* **Backups**: No
* **Backups**: Yes
* **Email**: No
* **Tests**:
* **SSO**: No
@ -26,4 +26,3 @@
[`abra`]: https://git.coopcloud.tech/coop-cloud/abra
[`coop-cloud/traefik`]: https://git.coopcloud.tech/coop-cloud/traefik
p-cloud/traefik

View File

@ -1,2 +1,5 @@
export CLICKHOUSE_CONF_VERSION=v1
export CLICKHOUSE_CONF_VERSION=v2
export CLICKHOUSE_USER_CONF_VERSION=v2
export CLICKHOUSE_ENTRYPOINT_VERSION=v6
export PG_BACKUP_VERSION=v1
export CLICKHOUSE_BACKUP_SCRIPT_VERSION=v1

30
clickhouse_backup.sh Normal file
View File

@ -0,0 +1,30 @@
#!/bin/bash
set -e
# clickhouse-backup output lives inside the event-data volume (snapshotted via
# backupbot.backup.volumes.event-data.path). Restoring the raw data files under a
# running server is unsafe, so restore performs a logical restore instead.
BACKUP_DIR=/var/lib/clickhouse/backup/events
MIGRATIONS_TSV="$BACKUP_DIR/schema_migrations.tsv"
backup() {
clickhouse-backup create events
# schema_migrations is a TinyLog table — clickhouse-backup only FREEZEs MergeTree
# data, so its rows aren't captured. Export them alongside the backup, else a restore
# leaves the ledger empty and the next boot re-runs every migration (DUPLICATE_COLUMN).
clickhouse-client --query "SELECT * FROM plausible_events_db.schema_migrations FORMAT TSV" > "$MIGRATIONS_TSV"
}
backup_cleanup() {
rm -rf "$BACKUP_DIR"
}
restore() {
clickhouse-backup restore --rm events
clickhouse-client --query "TRUNCATE TABLE plausible_events_db.schema_migrations"
clickhouse-client --query "INSERT INTO plausible_events_db.schema_migrations FORMAT TSV" < "$MIGRATIONS_TSV"
rm -rf "$BACKUP_DIR"
}
"$@"

View File

@ -3,20 +3,22 @@ version: "3.8"
services:
app:
image: plausible/analytics:v1.5.1
command: sh -c "sleep 10 && /entrypoint.sh db createdb && /entrypoint.sh db migrate && /entrypoint.sh db init-admin && /entrypoint.sh run"
image: plausible/analytics:v2.0.0
command: sh -c "sleep 10 && /entrypoint.sh db createdb && /entrypoint.sh db migrate && /entrypoint.sh run"
depends_on:
- db
- events_db
- plausible_events_db
environment:
- BASE_URL=https://$DOMAIN
- ADMIN_USER_EMAIL
- ADMIN_USER_NAME
- ADMIN_USER_PWD
- SECRET_KEY_BASE
- DATABASE_URL=postgres://plausible:plausible@${STACK_NAME}_db:5432/plausible
- CLICKHOUSE_DATABASE_URL=http://${STACK_NAME}_plausible_events_db:8123/plausible_events_db
- SMTP_HOST_ADDR
- MAILER_EMAIL
- SMTP_HOST_PORT
- SMTP_USER_NAME
- SMTP_USER_PWD
- SMTP_HOST_SSL_ENABLED
- DISABLE_REGISTRATION
- DISABLE_AUTH
networks:
@ -24,36 +26,72 @@ services:
- internal
deploy:
restart_policy:
condition: on-failure
# `any`, not `on-failure`: a restore disrupts postgres under the app and Erlang then
# shuts down gracefully (exit 0), which on-failure treats as done and never restarts.
condition: any
labels:
- "traefik.enable=true"
- "traefik.http.services.${STACK_NAME}.loadbalancer.server.port=8000"
- "traefik.http.routers.${STACK_NAME}.rule=Host(`${DOMAIN}`${EXTRA_DOMAINS})"
- "traefik.http.routers.${STACK_NAME}.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}.tls.certresolver=${LETS_ENCRYPT_ENV}"
- coop-cloud.${STACK_NAME}.version=1.1.0+1.5.1
- coop-cloud.${STACK_NAME}.version=3.1.0+v2.0.0
db:
image: postgres:13.11-alpine
image: pgautoupgrade/pgautoupgrade:18-alpine
volumes:
- db-data:/var/lib/postgresql/data
environment:
# pin legacy PGDATA so the existing cluster on the volume is upgraded in place, not re-init'd
- PGDATA=/var/lib/postgresql/data
- POSTGRES_USER=plausible
- POSTGRES_PASSWORD=plausible
- POSTGRES_DB=plausible
networks:
- internal
healthcheck:
test: ["CMD-SHELL", "pg_isready -U plausible -d plausible"]
interval: 5s
timeout: 5s
retries: 60
configs:
- source: pg_backup
target: /pg_backup.sh
mode: 0555
deploy:
labels:
backupbot.backup: "true"
backupbot.backup.volumes.db-data.path: "postgres.dump.gz"
backupbot.backup.pre-hook: "/pg_backup.sh backup"
backupbot.backup.post-hook: "/pg_backup.sh backup_cleanup"
backupbot.restore: "true"
backupbot.restore.post-hook: "/pg_backup.sh restore"
plausible_events_db:
image: clickhouse/clickhouse-server:23.4.2.11-alpine
volumes:
- event-data:/var/lib/clickhouse
entrypoint: /custom-entrypoint.sh
configs:
- source: clickhouse-config
target: /etc/clickhouse-server/config.d/logging.xml
- source: clickhouse-user-config
target: /etc/clickhouse-server/users.d/clickhouse-user-config.xml
- source: clickhouse_entrypoint
target: /custom-entrypoint.sh
mode: 0555
- source: clickhouse_backup
target: /clickhouse_backup.sh
mode: 0555
networks:
- internal
deploy:
labels:
backupbot.backup: "true"
backupbot.backup.volumes.event-data.path: "backup/events"
backupbot.backup.pre-hook: "/clickhouse_backup.sh backup"
backupbot.backup.post-hook: "/clickhouse_backup.sh backup_cleanup"
backupbot.restore: "true"
backupbot.restore.post-hook: "/clickhouse_backup.sh restore"
volumes:
db-data:
@ -71,3 +109,12 @@ configs:
clickhouse-user-config:
name: ${STACK_NAME}_clickhouse_user_config_${CLICKHOUSE_USER_CONF_VERSION}
file: clickhouse-user-config.xml
clickhouse_entrypoint:
name: ${STACK_NAME}_clickhouse_entrypoint_${CLICKHOUSE_ENTRYPOINT_VERSION}
file: entrypoint.clickhouse.sh
pg_backup:
name: ${STACK_NAME}_pg_backup_${PG_BACKUP_VERSION}
file: pg_backup.sh
clickhouse_backup:
name: ${STACK_NAME}_clickhouse_backup_${CLICKHOUSE_BACKUP_SCRIPT_VERSION}
file: clickhouse_backup.sh

59
entrypoint.clickhouse.sh Normal file
View File

@ -0,0 +1,59 @@
#!/bin/bash
# Install clickhouse-backup (powers this recipe's backup/restore hooks) before starting the
# server. The binary is cached on the persistent volume keyed by version (downloaded at most
# once per app) and fetched with bounded retries + a read timeout; the binary is verified before
# being trusted or cached. If it truly cannot be installed the deploy fails loudly rather than
# silently shipping broken backups.
set -e
CLICKHOUSE_BACKUP_VERSION=2.4.2
ARCH=$(uname -m)
if [[ $ARCH =~ "aarch64" ]]; then
ARCH="arm64"
elif [[ $ARCH =~ "armv5l" ]]; then
ARCH="armv5"
elif [[ $ARCH =~ "armv6l" ]]; then
ARCH="armv6"
elif [[ $ARCH =~ "armv7l" ]]; then
ARCH="armv7"
elif [[ $ARCH =~ "x86_64" ]]; then
ARCH="amd64"
fi
CACHE_DIR=/var/lib/clickhouse/.ccci-bin
CACHED="${CACHE_DIR}/clickhouse-backup-v${CLICKHOUSE_BACKUP_VERSION}"
BIN=/usr/local/bin/clickhouse-backup
URL="https://github.com/Altinity/clickhouse-backup/releases/download/v${CLICKHOUSE_BACKUP_VERSION}/clickhouse-backup-linux-${ARCH}.tar.gz"
binary_ok() {
"$1" --version >/dev/null 2>&1
}
install_clickhouse_backup() {
mkdir -p "$CACHE_DIR"
if [ -x "$CACHED" ] && binary_ok "$CACHED"; then
cp -f "$CACHED" "$BIN"
echo "clickhouse-backup: using verified cached binary ($CACHED)"
return 0
fi
rm -f "$CACHED" # absent or fails to execute — re-fetch
for attempt in 1 2 3 4 5; do
if wget -T 30 --continue --output-document=/tmp/clickhouse-backup.tar.gz "$URL" \
&& tar -xf /tmp/clickhouse-backup.tar.gz --directory=/usr/local/bin --strip-components=3 \
&& binary_ok "$BIN"; then
cp -f "$BIN" "$CACHED" 2>/dev/null || true
echo "clickhouse-backup: downloaded, verified + cached (attempt ${attempt})"
return 0
fi
echo "clickhouse-backup: fetch attempt ${attempt}/5 failed" >&2
[ "$attempt" -lt 5 ] && sleep $((attempt * 10))
done
echo "clickhouse-backup: could not install after 5 attempts — failing the deploy (without it backup/restore would be silently broken)" >&2
return 1
}
install_clickhouse_backup
exec /entrypoint.sh

29
pg_backup.sh Normal file
View File

@ -0,0 +1,29 @@
#!/bin/sh
set -e
# The dump lives at the db-data volume root: backup-bot-two v2 snapshots paths inside
# named volumes (backupbot.backup.volumes.db-data.path), not the container root fs.
DUMP=/var/lib/postgresql/data/postgres.dump
backup() {
pg_dump -U "$POSTGRES_USER" -Fc "$POSTGRES_DB" | gzip > "$DUMP.gz"
}
backup_cleanup() {
rm -f "$DUMP.gz"
}
restore() {
gzip -d "$DUMP.gz"
# --if-exists: otherwise DROPs on objects absent from the live db error out and
# pg_restore exits 1, killing the chain and leaving the dump behind.
pg_restore --clean --if-exists -U "$POSTGRES_USER" --dbname="$POSTGRES_DB" < "$DUMP"
rm -f "$DUMP"
# pg_restore --clean recreates objects under the live app, so its pooled connections
# keep stale type-OID caches ('cache lookup failed for type ...' crash loops, e.g.
# Oban). Terminate them so Ecto reconnects fresh.
psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND pid <> pg_backend_pid();"
}
"$@"

7
release/2.0.0+v1.5.1 Normal file
View File

@ -0,0 +1,7 @@
If you're upgrading from a pre-release version, there will be a major Postgresql
version upgrade -- this should happen automatically, but **please take a
backup**, at least of the `<stack_name>_db` volume, if not all the data volumes,
before the upgrade.
If you haven't taken a backup already, it's probably safest to bail using
Ctrl+C, take the backup, and re-run your `upgrade` / `deploy` command.

8
release/3.0.0+v2.0.0 Normal file
View File

@ -0,0 +1,8 @@
⚠ WARNING! ⚠
This major version upgrade of Plausible requires running a manual data migration
-- otherwise you'll see all historical data disappear (don't worry, it's
"probably" still there).
Take a manual docker volume backup, then see here, and strap in:
https://github.com/plausible/analytics/discussions/3132