Compare commits

..

3 Commits

Author SHA1 Message Date
3758522cf8 fix(backup): block reconnections via pg_hba during pg restore (app reconnect broke reimport) 2026-05-30 23:37:31 +00:00
7a2e0e044c fix(backup): reimport postgres dump on restore (restore was a no-op)
The db service dumped the DB on backup (pg_dump pre-hook) but shipped no
restore hook, and a file-level restore does not reload into the running
postgres, so a restored backup silently kept the live (un-restored) state.
Add pg_backup.sh (backup=pg_dump|gzip into the postgresql_data volume;
restore=terminate conns + DROP DATABASE WITH FORCE + createdb + reimport),
mount it via a config, and wire the backupbot backup/restore hooks. Same
fix as the immich / mattermost-lts / ghost recipes.
2026-05-30 15:19:26 +00:00
7b7ddd70bc fix(image): re-pin bitnami/discourse -> bitnamilegacy/discourse:3.3.1
Docker Hub emptied the bitnami/discourse namespace (manifest 3.3.1 -> 404, tags:[]); Bitnami moved
archived images to the bitnamilegacy namespace. bitnamilegacy/discourse:3.3.1 is the byte-identical
drop-in (manifest -> 200, same /bitnami/discourse paths + /opt/bitnami scripts). Re-pin both app +
sidekiq services; bump 0.7.0 -> 0.8.0. Found via cc-ci enrollment.
2026-05-30 09:52:27 +00:00
9 changed files with 139 additions and 199 deletions

View File

@ -5,19 +5,18 @@ DOMAIN=discourse.example.com
#EXTRA_DOMAINS=', `www.discourse.example.com`'
LETS_ENCRYPT_ENV=production
# Admin / developer accounts (comma-separated); these become admins on signup
DISCOURSE_DEVELOPER_EMAILS=admin@example.com
# Outgoing email
#DISCOURSE_SMTP_HOST=
#DISCOURSE_SMTP_PORT=
#DISCOURSE_SMTP_USER=
#DISCOURSE_SMTP_PROTOCOL=
#DISCOURSE_SMTP_AUTH=
# Set this if you send e-mails from a different domain than noreply@$DOMAIN
#DISCOURSE_NOTIFICATION_EMAIL=$SMTP_USER
# Outgoing email (official discourse/discourse env names)
#DISCOURSE_SMTP_ADDRESS=
#DISCOURSE_SMTP_PORT=587
#DISCOURSE_SMTP_USER_NAME=
#DISCOURSE_SMTP_AUTHENTICATION=login
#DISCOURSE_SMTP_ENABLE_START_TLS=true
# Set this if you send e-mail from a different address than noreply@$DOMAIN
#DISCOURSE_NOTIFICATION_EMAIL=
# SMTP password as a secret
# SMTP authentication
#COMPOSE_FILE="compose.yml:compose.smtpauth.yml"
#SECRET_SMTP_PASSWORD_VERSION=v1
SECRET_DB_PASSWORD_VERSION=v1

View File

@ -6,60 +6,67 @@ A platform for community discussion
<!-- metadata -->
* **Category**: Apps
* **Status**: 3, experimental
* **Image**: [`discourse/discourse`](https://hub.docker.com/r/discourse/discourse), 4, upstream
* **Status**:
* **Image**: [`bitnami/discourse`](https://hub.docker.com/r/bitnami/discourse)
* **Healthcheck**: yes
* **Backups**: yes
* **Backups**: no
* **Email**: yes
* **Tests**: yes
* **Tests**: no
* **SSO**: no
<!-- endmetadata -->
> **Note**: this recipe runs the official, **experimental** `discourse/discourse`
> image. Upstream does not yet recommend it for production — see
> <https://meta.discourse.org/t/380646>. Use with care.
## Basic usage
1. Set up Docker Swarm and [`abra`]
2. Deploy [`coop-cloud/traefik`]
3. `abra app new discourse --secrets` (optionally with `--pass` to save secrets in `pass`)
4. `abra app config YOURAPPDOMAIN` — set `DOMAIN` and `DISCOURSE_DEVELOPER_EMAILS`
3. `abra app new discourse --secrets` (optionally with `--pass` if you'd like
to save secrets in `pass`)
4. `abra app config YOURAPPDOMAIN` - be sure to change `$DOMAIN` to something that resolves to
your Docker swarm box
5. `abra app deploy YOURAPPDOMAIN`
6. Open the configured domain in your browser to finish set-up. The first account
that registers with an address listed in `DISCOURSE_DEVELOPER_EMAILS` becomes
an admin.
6. Open the configured domain in your browser to finish set-up
[`abra`]: https://docs.coopcloud.tech/abra/
[`coop-cloud/traefik`]: https://git.coopcloud.tech/coop-cloud/traefik
[`abra`]: https://git.autonomic.zone/autonomic-cooperative/abra
[`coop-cloud/traefik`]: https://git.autonomic.zone/coop-cloud/traefik
The app serves plain HTTP on port 80; Traefik terminates TLS in front of it. The
image's built-in nginx/Let's Encrypt is disabled by the recipe (`install-ssl`
override) so it works behind the reverse proxy.
## To add a new admin user
## Add an admin user
1. Login to the instance `abra app run APPNAME app sh`
2. `cd /opt/bitnami/discourse`
3. `RAILS_ENV=production bundle exec rake admin:create` and follow prompts.
```
abra app run YOURAPPDOMAIN app discourse admin create
```
## Install plugins
1. Login to instance `abra app run APPNAME app sh`
2. `cd /bitnami/discourse/plugins/`
3. `git clone plugin.git` for example `https://github.com/discourse/discourse-openid-connect.git`
4. `abra app restart YOURAPPDOMAIN app`
### Events / calendar plugin
We've had some luck running [discourse-events](https://github.com/paviliondev/discourse-events).
## Setup Notes
Until issue #1 is fixed, the default user is `user` and the default password is `bitnami123`
## Postgres major version upgrades
Handled automatically. The `db` entrypoint detects an older data directory on
deploy and runs `pg_upgrade` in place (it installs the old binaries, detects the
cluster's real install superuser, and upgrades). No manual dump/restore needed.
Welcome to hell.
## Migrating from the previous (bitnami) recipe
The official image stores uploads under `/shared` rather than bitnami's
`/bitnami/discourse`. On first boot the recipe copies uploads + backups from the
old bitnami volume (mounted read-only at `/legacy`) into `/shared`, once,
idempotently. The Postgres database is reused as-is. After a successful migration
a later recipe version will drop the transitional `/legacy` mount.
If you are upgrading from the bitnami recipe, also remove the now-unused sidekiq
service that swarm leaves behind (sidekiq runs inside the app container now):
```
docker service rm YOURSTACK_sidekiq
```
1. `abra app run YOURAPPDOMAIN db pg_dumpall -U discourse | gzip > YOURAPPDOMAIN_db_DATE.sql.gz`
2. `abra app volume ls YOURAPPDOMAIN`, find the name of the Postgres data volume
3. `scp` the backup to your VPS
4. `abra app undeploy YOURAPPDOMAIN`
5. `abra app volume rm YOURAPPDOMAIN`, choose the Postgres data volume
6. `abra app deploy YOURAPPDOMAIN`, then `abra app undeploy YOURAPPDOMAIN`
7. `ssh` to the VPS, run (replacing `13-alpine` with the new Postgres version)
`docker run -v YOURDATAVOLUME:/var/lib/postgresql/data -e POSTGRES_HOST_AUTH_METHOD=trust -it postgres:13-alpine`
8. In another SSH session on the server, run `docker ps` to find the ID of the
new Postgres container, then `docker exec -it CONTAINERID bash`
9. In the shell you just launched, run `dropdb -U discourse discourse`, then
`createdb -U discourse discourse`, then Ctrl+D or run `exit`
10. In the second SSH session, run `zcat YOURAPPDOMAIN_db_DATE.sql.gz | docker exec -it CONTAINERID psql -U discourse`
11. Exit the second SSH session
12. Back in the first SSH session, Ctrl+C to shut down the database
13. `abra app deploy YOURAPPDOMAIN`

View File

@ -1,5 +1,2 @@
export DB_ENTRYPOINT_VERSION=v3
export PG_BACKUP_VERSION=v2
export APP_ENTRYPOINT_VERSION=v1
export APP_INSTALL_SSL_VERSION=v1
export APP_MIGRATE_UPLOADS_VERSION=v1
export DB_ENTRYPOINT_VERSION=v1
export PG_BACKUP_VERSION=v1

View File

@ -1,11 +0,0 @@
#!/bin/bash
# Overrides the official image's /etc/runit/1.d/install-ssl.
#
# The stock install-ssl always runs configure-ssl (and configure-letsencrypt),
# which empties the default `listen 80` nginx outlet and switches to `listen 443
# ssl` against a cert that does not exist here — nginx then crash-loops, or the
# image tries to obtain its own Let's Encrypt cert. Under Co-op Cloud, Traefik
# terminates TLS and proxies plain HTTP to port 80, so we skip the image's SSL
# setup entirely and let nginx keep its default HTTP-on-80 config.
echo "install-ssl overridden by recipe: serving plain HTTP on :80 behind Traefik"
exit 0

View File

@ -1,11 +0,0 @@
#!/bin/bash
# Co-op Cloud wrapper around the official image's /sbin/boot.
# discourse/discourse reads DISCOURSE_DB_PASSWORD from the process env (pups/Ruby;
# it has no *_FILE support), so inject it from the docker secret before booting.
set -e
if [ -f /run/secrets/db_password ]; then
export DISCOURSE_DB_PASSWORD="$(cat /run/secrets/db_password)"
fi
exec /sbin/boot

View File

@ -3,67 +3,56 @@ version: "3.8"
services:
app:
image: discourse/discourse:3.5.3
image: bitnamilegacy/discourse:3.3.1
networks:
- proxy
- internal
# official image CMD is /sbin/boot; wrapper injects the DB password secret first
entrypoint: /usr/local/bin/cc-app-entrypoint.sh
# entrypoint: ['tail', '-f', '/dev/null']
environment:
- DISCOURSE_HOSTNAME=${DOMAIN}
- DISCOURSE_DEVELOPER_EMAILS=${DISCOURSE_DEVELOPER_EMAILS}
- DISCOURSE_DB_HOST=${STACK_NAME}_db
- DISCOURSE_DB_PORT=5432
- DISCOURSE_DB_NAME=discourse
- DISCOURSE_DB_USERNAME=discourse
- DISCOURSE_REDIS_HOST=${STACK_NAME}_redis
- DISCOURSE_REDIS_PORT=6379
- DISCOURSE_SMTP_ADDRESS
- DISCOURSE_SMTP_PORT
- DISCOURSE_SMTP_USER_NAME
- DISCOURSE_SMTP_PASSWORD
- DISCOURSE_SMTP_AUTHENTICATION
- DISCOURSE_SMTP_ENABLE_START_TLS
- ALLOW_EMPTY_PASSWORD=yes
- DISCOURSE_DATABASE_HOST=${STACK_NAME}_db
- DISCOURSE_DATABASE_NAME=discourse
- DISCOURSE_DATABASE_PASSWORD_FILE=/run/secrets/db_password
- DISCOURSE_DATABASE_USER=discourse
- DISCOURSE_HOST=${DOMAIN}
- DISCOURSE_NOTIFICATION_EMAIL
- DISCOURSE_SMTP_AUTH
- DISCOURSE_SMTP_HOST
- DISCOURSE_SMTP_PORT
- DISCOURSE_SMTP_PROTOCOL
- DISCOURSE_SMTP_USER
- PASSENGER_COMPILE_NATIVE_SUPPORT_BINARY=0
volumes:
- 'discourse_shared:/shared'
# transition only: legacy bitnami volume, read-only, for one-time upload migration
- 'discourse_data:/legacy:ro'
- 'discourse_data:/bitnami/discourse'
secrets:
- db_password
configs:
- source: app_entrypoint
target: /usr/local/bin/cc-app-entrypoint.sh
mode: 0555
- source: app_install_ssl
target: /etc/runit/1.d/install-ssl
mode: 0555
- source: app_migrate_uploads
target: /etc/runit/1.d/02-migrate-bitnami-uploads
mode: 0555
depends_on:
- db
- redis
deploy:
update_config:
failure_action: rollback
order: stop-first
order: start-first
labels:
- "traefik.enable=true"
- "traefik.http.services.${STACK_NAME}.loadbalancer.server.port=80"
- "traefik.http.services.${STACK_NAME}.loadbalancer.server.port=3000"
- "traefik.http.routers.${STACK_NAME}.rule=Host(`${DOMAIN}`${EXTRA_DOMAINS})"
- "traefik.http.routers.${STACK_NAME}.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}.tls.certresolver=${LETS_ENCRYPT_ENV}"
- "coop-cloud.${STACK_NAME}.version=1.0.0+3.5.3"
## Redirect from EXTRA_DOMAINS to DOMAIN
#- "traefik.http.routers.${STACK_NAME}.middlewares=${STACK_NAME}-redirect"
#- "traefik.http.middlewares.${STACK_NAME}-redirect.headers.SSLForceHost=true"
#- "traefik.http.middlewares.${STACK_NAME}-redirect.headers.SSLHost=${DOMAIN}"
- "coop-cloud.${STACK_NAME}.version=0.8.0+3.3.1"
healthcheck:
test: "curl -fsS http://localhost/srv/status || exit 1"
test: "ruby -e \"require 'uri'; require 'net/http'; uri = URI('http://localhost:3000/srv/status'); res = Net::HTTP.get_response(uri); if res.is_a?(Net::HTTPSuccess) then exit (0) else exit (1) end\""
interval: 30s
timeout: 10s
retries: 6
start_period: 25m
start_period: 20m
db:
image: pgvector/pgvector:pg17
image: postgres:13
networks:
- internal
secrets:
@ -83,15 +72,6 @@ services:
- POSTGRES_USER=discourse
- POSTGRES_DB=discourse
- POSTGRES_PASSWORD_FILE=/run/secrets/db_password
healthcheck:
test: "pg_isready -U discourse -d discourse"
interval: 30s
timeout: 10s
retries: 5
# generous: a postgres major-version upgrade (apt install + pg_upgrade) runs
# in the entrypoint before the server accepts connections — don't let the
# healthcheck kill an in-progress migration
start_period: 10m
deploy:
labels:
backupbot.backup: "true"
@ -105,12 +85,35 @@ services:
- internal
volumes:
- 'redis_data:/data'
healthcheck:
test: "redis-cli ping | grep -q PONG"
interval: 30s
timeout: 5s
retries: 5
start_period: 30s
sidekiq:
image: bitnamilegacy/discourse:3.3.1
networks:
- proxy
- internal
depends_on:
- discourse
volumes:
- 'discourse_data:/bitnami/discourse'
command: /opt/bitnami/scripts/discourse-sidekiq/run.sh
secrets:
- db_password
environment:
- ALLOW_EMPTY_PASSWORD=yes
- DISCOURSE_DATABASE_HOST=db
- DISCOURSE_DATABASE_NAME=discourse
- DISCOURSE_DATABASE_PASSWORD_FILE=/run/secrets/db_password
- DISCOURSE_DATABASE_PORT_NUMBER=5432
- DISCOURSE_DATABASE_USER=discourse
- DISCOURSE_HOST=${DOMAIN}
- DISCOURSE_REDIS_HOST=redis
- DISCOURSE_REDIS_PORT_NUMBER=6379
- DISCOURSE_SMTP_HOST
- DISCOURSE_SMTP_PORT
- DISCOURSE_SMTP_PROTOCOL
- DISCOURSE_SMTP_USER
- PASSENGER_COMPILE_NATIVE_SUPPORT_BINARY=0
- DISCOURSE_SMTP_AUTH
secrets:
db_password:
@ -120,7 +123,6 @@ secrets:
volumes:
postgresql_data:
redis_data:
discourse_shared:
discourse_data:
networks:
@ -129,15 +131,6 @@ networks:
internal:
configs:
app_entrypoint:
name: ${STACK_NAME}_app_entrypoint_${APP_ENTRYPOINT_VERSION}
file: cc-app-entrypoint.sh
app_install_ssl:
name: ${STACK_NAME}_app_install_ssl_${APP_INSTALL_SSL_VERSION}
file: app-install-ssl.sh
app_migrate_uploads:
name: ${STACK_NAME}_app_migrate_uploads_${APP_MIGRATE_UPLOADS_VERSION}
file: migrate-uploads.sh
db_entrypoint:
name: ${STACK_NAME}_db_entrypoint_${DB_ENTRYPOINT_VERSION}
file: entrypoint.postgres.sh.tmpl

View File

@ -2,23 +2,16 @@
set -e
MIGRATION_MARKER=$PGDATA/migration_in_progress
OLDDATA=$PGDATA/old_data
NEWDATA=$PGDATA/new_data
echo "Running as $(id)"
# The migration uses $OLDDATA/$NEWDATA as scratch and removes them when it
# finishes; a leftover *empty* one means a run was interrupted before any data
# moved (data still intact at $PGDATA) so we clear it and retry, while a
# *non-empty* one means data may live only there, so we stop for manual recovery.
for scratch in $OLDDATA $NEWDATA; do
if [ -d "$scratch" ] && [ -n "$(ls -A "$scratch")" ]; then
echo "FATAL: $scratch exists and is not empty - a previous migration did not"
echo "complete and the data may only exist there. manual recovery necessary."
exit 1
fi
done
rm -rf $OLDDATA $NEWDATA
if [ -e $MIGRATION_MARKER ]; then
echo "FATAL: migration was started but did not complete in a previous run. manual recovery necessary"
exit 1
fi
if [ -f $PGDATA/PG_VERSION ]; then
DATA_VERSION=$(cat $PGDATA/PG_VERSION)
@ -30,33 +23,22 @@ if [ -f $PGDATA/PG_VERSION ]; then
apt-get update && apt-get install -y --no-install-recommends \
postgresql-$DATA_VERSION \
&& rm -rf /var/lib/apt/lists/*
# pg_upgrade must run as the old cluster's bootstrap superuser (the "install
# user", oid 10), and the new cluster must be initialised with that same
# user. It is not necessarily $POSTGRES_USER (e.g. clusters created with the
# default "postgres" superuser and a separate app role), so read it from the
# old cluster: briefly start it and ask, connecting as the app role we know.
PGBIN=/usr/lib/postgresql/$DATA_VERSION/bin
gosu postgres $PGBIN/pg_ctl -D $PGDATA -w \
-o "-c listen_addresses= -c unix_socket_directories=/tmp" start
INSTALL_USER=$(gosu postgres psql -h /tmp -U "$POSTGRES_USER" -d postgres -tAc \
"select rolname from pg_roles where oid = 10")
gosu postgres $PGBIN/pg_ctl -D $PGDATA -w stop
echo "old cluster install user: $INSTALL_USER"
echo "shuffling around"
gosu postgres mkdir $OLDDATA $NEWDATA
chmod 700 $OLDDATA $NEWDATA
mv $PGDATA/* $OLDDATA/ || true
touch $MIGRATION_MARKER
echo "running initdb"
# abuse entrypoint script for initdb by making server error out; initialise
# the new cluster with the same superuser as the old one so pg_upgrade matches
gosu postgres bash -c "export PGDATA=$NEWDATA POSTGRES_USER=$INSTALL_USER ; /usr/local/bin/docker-entrypoint.sh --invalid-arg || true"
# abuse entrypoint script for initdb by making server error out
gosu postgres bash -c "export PGDATA=$NEWDATA ; /usr/local/bin/docker-entrypoint.sh --invalid-arg || true"
echo "running pg_upgrade"
cd /tmp
gosu postgres pg_upgrade --link -b /usr/lib/postgresql/$DATA_VERSION/bin -d $OLDDATA -D $NEWDATA -U $INSTALL_USER
gosu postgres pg_upgrade --link -b /usr/lib/postgresql/$DATA_VERSION/bin -d $OLDDATA -D $NEWDATA -U $POSTGRES_USER
cp $OLDDATA/pg_hba.conf $NEWDATA/
mv $NEWDATA/* $PGDATA
rm -rf $OLDDATA
rmdir $NEWDATA
rm $MIGRATION_MARKER
echo "migration complete"
fi
fi

View File

@ -1,24 +0,0 @@
#!/bin/bash
# One-time, idempotent, NON-destructive migration of uploads + backups from a
# legacy bitnami discourse volume into the official image's /shared.
#
# Runs on every boot as a runit 1.d hook but no-ops after the first success
# (sentinel) and when there is no legacy volume mounted (fresh installs). It only
# ever COPIES from the read-only /legacy mount, so an interruption just re-copies
# on the next boot — there is no move/delete to leave the data half-migrated.
set -e
SENTINEL=/shared/.bitnami-uploads-migrated
[ -e "$SENTINEL" ] && exit 0
if [ -d /legacy/public/uploads ]; then
echo "[migrate-uploads] copying bitnami uploads/backups -> /shared"
mkdir -p /shared/uploads /shared/backups
cp -a /legacy/public/uploads/. /shared/uploads/ 2>/dev/null || true
cp -a /legacy/public/backups/. /shared/backups/ 2>/dev/null || true
# discourse runs as uid 1000; the official boot also chowns /shared, but be explicit
chown -R discourse:discourse /shared/uploads /shared/backups 2>/dev/null || true
echo "[migrate-uploads] done"
fi
touch "$SENTINEL"

View File

@ -1,6 +1,18 @@
#!/bin/bash
# Postgres backup/restore hook for the discourse `db` service.
# Postgres backup/restore hook for the discourse `db` service. Invoked by backupbot-two via:
# backupbot.backup.pre-hook = "/pg_backup.sh backup"
# backupbot.backup.volumes.postgresql_data.path = "backup.sql"
# backupbot.restore.post-hook = "/pg_backup.sh restore"
# Backup dumps the DB to backup.sql (gzip) inside the postgresql_data volume; backupbot archives it.
# Restore reimports it. Discourse (the rails app + sidekiq) keeps many TCP connections open to the DB
# and reconnects within milliseconds, so a one-shot pg_terminate_backend is NOT enough: restore must
# first block all non-local connections at the pg_hba level (so the app cannot reconnect and interfere
# mid-reimport), then FORCE-drop, recreate, and deterministically reimport the dump, then restore
# pg_hba. (Mirrors the proven matrix-synapse restore hook.) The previous recipe shipped a pg_dump
# backup but NO restore hook — a file-level restore did not reload into the running postgres, so a
# restored backup silently kept the live (un-restored) state. cc-ci caught this: a seeded ci_marker row
# was gone after restore. Same pattern as the immich / mattermost-lts / ghost recipe-PRs.
set -e
@ -17,7 +29,8 @@ function restore {
cd /var/lib/postgresql/data/
# Block all non-local connections so the running discourse app + sidekiq cannot reconnect and
# interfere with the drop/recreate/reimport. Restored on exit.
# interfere with the drop/recreate/reimport (a one-shot pg_terminate_backend is not enough — the
# app reconnects within ms over TCP). Restored on exit.
restore_hba() {
cat pg_hba.conf.bak > pg_hba.conf
rm -f pg_hba.conf.bak
@ -28,16 +41,11 @@ function restore {
su postgres -c 'pg_ctl reload'
trap restore_hba EXIT INT TERM
# terminate any lingering local sessions before recreate
# see https://stackoverflow.com/questions/5108876/kill-a-postgresql-session-connection
# Terminate lingering local sessions, then FORCE-drop + recreate + deterministic reimport.
psql -U "$DB_USER" -d postgres -c \
"SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='${DB_NAME}' AND pid<>pg_backend_pid();"
# drop database and then recreate it
psql -U "$DB_USER" -d postgres -c "DROP DATABASE ${DB_NAME} WITH (FORCE);"
createdb -U "$DB_USER" "$DB_NAME"
# reimport data
gunzip -c "$BACKUP_FILE" | psql -U "$DB_USER" -d "$DB_NAME" -1 -v ON_ERROR_STOP=1 -f -
}