feat(db): switch to discourse/postgres image (auto-upgrade)

Move the db off the bitnami-era pgvector:pg17 + hand-rolled pg_upgrade entrypoint to discourse/postgres:pg18 (pgvector + discourse's auto-upgrade layer). The image runs the in-place major-version pg_upgrade itself on boot; the recipe configures it via env: - a small inline entrypoint injects the db password secret into $DB_PASSWORD (the image expects it in the env, no *_FILE support) - POSTGRES_USER (the install user pg_upgrade must match) defaults to 'postgres' -- correct for fresh installs and bitnami-origin clusters -- overridable from .env - POSTGRES_INITDB_ARGS=--no-data-checksums so the new pg18 cluster matches pre-18 clusters (pg18 initdb enables checksums by default; pg_upgrade needs a match) - mount postgresql_data at /var/lib/postgresql (versioned PGDATA .../18/docker) - pg_backup.sh uses POSTGRES_USER for the dump/drop/recreate; fix paths - document the POSTGRES_USER override in .env.sample, README and the release note - drop entrypoint.postgres.sh.tmpl Tested on cctest: pg17->pg18 upgrade preserves data and serves over HTTPS; fresh install works; backup+restore round-trips.
2026-06-22 19:57:54 +00:00
parent 0c4539b7ad
commit 1f77af93bd
7 changed files with 76 additions and 95 deletions
--- a/.env.sample
+++ b/.env.sample
@ -21,3 +21,9 @@ DISCOURSE_DEVELOPER_EMAILS=admin@example.com
 #SECRET_SMTP_PASSWORD_VERSION=v1

 SECRET_DB_PASSWORD_VERSION=v1
+
+# Postgres bootstrap superuser (the cluster's "install user"). Defaults to
+# `postgres`, which matches fresh installs and bitnami-origin clusters. Only set
+# this if you are upgrading a cluster that was bootstrapped with a different
+# superuser (e.g. `discourse`) — a postgres major upgrade fails unless it matches.
+#POSTGRES_USER=postgres
--- a/README.md
+++ b/README.md
@ -43,6 +43,22 @@ override) so it works behind the reverse proxy.
 abra app run YOURAPPDOMAIN app discourse admin create
 ```

+## Postgres major version upgrades
+
+Handled automatically by the [`discourse/postgres`] image (pgvector + an
+auto-upgrade layer). On deploy it finds an older cluster, installs the old
+binaries and runs `pg_upgrade` into the new versioned data directory. No manual
+dump/restore needed.
+
+`pg_upgrade` must run as the old cluster's bootstrap superuser (its "install
+user"). The recipe uses `POSTGRES_USER`, which defaults to `postgres` — the right
+value for fresh installs and for clusters that came from the old bitnami recipe.
+If your cluster was bootstrapped with a different superuser (e.g. `discourse`),
+set `POSTGRES_USER` in the app `.env` before upgrading, otherwise `pg_upgrade`
+will refuse with an install-user mismatch.
+
+[`discourse/postgres`]: https://github.com/discourse/discourse-postgres
+
 ## Migrating from the previous (bitnami) recipe

 The official image stores uploads under `/shared` rather than bitnami's
--- a/abra.sh
+++ b/abra.sh
@ -1,5 +1,4 @@
-export DB_ENTRYPOINT_VERSION=v3
-export PG_BACKUP_VERSION=v2
+export PG_BACKUP_VERSION=v4
 export APP_ENTRYPOINT_VERSION=v2
 export APP_INSTALL_SSL_VERSION=v1
 export APP_MIGRATE_UPLOADS_VERSION=v1
--- a/compose.yml
+++ b/compose.yml
@ -63,35 +63,51 @@ services:
      start_period: 25m

  db:
-    image: pgvector/pgvector:pg17
+    # discourse/postgres = pgvector + discourse's postgres management layer, which
+    # auto-upgrades an older cluster in place on boot (pg_upgrade into the versioned
+    # PGDATA /var/lib/postgresql/${MAJOR}/docker); everything is driven by the env below.
+    image: discourse/postgres:pg18
    networks:
      - internal
    secrets:
      - db_password
    volumes:
-      - 'postgresql_data:/var/lib/postgresql/data'
+      # the image expects the whole cluster tree mounted here (not the data subdir);
+      # an existing pg17 cluster at the volume root is found and upgraded into /18/docker
+      - 'postgresql_data:/var/lib/postgresql'
    configs:
-      - source: db_entrypoint
-        target: /docker-entrypoint.sh
-        mode: 0555
      - source: pg_backup
        target: /pg_backup.sh
        mode: 0555
-    entrypoint: /docker-entrypoint.sh
+    entrypoint:
+      - /bin/bash
+      - -c
+      - |
+        if [ -f /run/secrets/db_password ]; then
+          DB_PASSWORD="$$(cat /run/secrets/db_password)"
+          export DB_PASSWORD POSTGRES_PASSWORD="$$DB_PASSWORD"
+        fi
+        exec run-postgres.sh postgres
    environment:
+      # internal-only overlay network; keep all-trust so the app and the
+      # backup/restore hooks connect without juggling the superuser password
      - POSTGRES_HOST_AUTH_METHOD=trust
-      - POSTGRES_USER=discourse
      - POSTGRES_DB=discourse
-      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password
+      - DB_USER=discourse
+      # pg_upgrade runs as this role and initdb's the new cluster with it; it must
+      # match the OLD cluster's bootstrap superuser (oid 10). The image default
+      # `postgres` matches fresh installs and bitnami-origin clusters. Override in
+      # the app .env (POSTGRES_USER=...) only for a cluster bootstrapped differently.
+      - POSTGRES_USER=${POSTGRES_USER:-postgres}
+      # pg18's initdb enables data checksums by default, but pg13-17 clusters here
+      # have them off and pg_upgrade requires a match -> initialise without them.
+      - POSTGRES_INITDB_ARGS=--no-data-checksums
    healthcheck:
      test: "pg_isready -U discourse -d discourse"
      interval: 30s
      timeout: 10s
      retries: 5
-      # generous: a postgres major-version upgrade (apt install + pg_upgrade) runs
-      # in the entrypoint before the server accepts connections — don't let the
-      # healthcheck kill an in-progress migration
-      start_period: 10m
+      start_period: 15m
    deploy:
      labels:
        backupbot.backup: "true"
@ -138,10 +154,6 @@ configs:
  app_migrate_uploads:
    name: ${STACK_NAME}_app_migrate_uploads_${APP_MIGRATE_UPLOADS_VERSION}
    file: migrate-uploads.sh
-  db_entrypoint:
-    name: ${STACK_NAME}_db_entrypoint_${DB_ENTRYPOINT_VERSION}
-    file: entrypoint.postgres.sh.tmpl
-    template_driver: golang
  pg_backup:
    name: ${STACK_NAME}_pg_backup_${PG_BACKUP_VERSION}
    file: pg_backup.sh
--- a/entrypoint.postgres.sh.tmpl
+++ b/entrypoint.postgres.sh.tmpl
@ -1,64 +0,0 @@
-#!/bin/bash
-
-set -e
-
-OLDDATA=$PGDATA/old_data
-NEWDATA=$PGDATA/new_data
-
-echo "Running as $(id)"
-
-# The migration uses $OLDDATA/$NEWDATA as scratch and removes them when it
-# finishes; a leftover *empty* one means a run was interrupted before any data
-# moved (data still intact at $PGDATA) so we clear it and retry, while a
-# *non-empty* one means data may live only there, so we stop for manual recovery.
-for scratch in $OLDDATA $NEWDATA; do
-  if [ -d "$scratch" ] && [ -n "$(ls -A "$scratch")" ]; then
-    echo "FATAL: $scratch exists and is not empty - a previous migration did not"
-    echo "complete and the data may only exist there. manual recovery necessary."
-    exit 1
-  fi
-done
-rm -rf $OLDDATA $NEWDATA
-
-if [ -f $PGDATA/PG_VERSION ]; then
-  DATA_VERSION=$(cat $PGDATA/PG_VERSION)
-
-  if [ -n "$DATA_VERSION" -a "$PG_MAJOR" != "$DATA_VERSION" ]; then
-    echo "postgres data version $DATA_VERSION found, but need $PG_MAJOR. Starting migration"
-    echo "Installing postgres $DATA_VERSION"
-    sed -i "s/$/ $DATA_VERSION/" /etc/apt/sources.list.d/pgdg.list
-    apt-get update && apt-get install -y --no-install-recommends \
-      postgresql-$DATA_VERSION \
-      && rm -rf /var/lib/apt/lists/*
-    # pg_upgrade must run as the old cluster's bootstrap superuser (the "install
-    # user", oid 10), and the new cluster must be initialised with that same
-    # user. It is not necessarily $POSTGRES_USER (e.g. clusters created with the
-    # default "postgres" superuser and a separate app role), so read it from the
-    # old cluster: briefly start it and ask, connecting as the app role we know.
-    PGBIN=/usr/lib/postgresql/$DATA_VERSION/bin
-    gosu postgres $PGBIN/pg_ctl -D $PGDATA -w \
-      -o "-c listen_addresses= -c unix_socket_directories=/tmp" start
-    INSTALL_USER=$(gosu postgres psql -h /tmp -U "$POSTGRES_USER" -d postgres -tAc \
-      "select rolname from pg_roles where oid = 10")
-    gosu postgres $PGBIN/pg_ctl -D $PGDATA -w stop
-    echo "old cluster install user: $INSTALL_USER"
-    echo "shuffling around"
-    gosu postgres mkdir $OLDDATA $NEWDATA
-    chmod 700 $OLDDATA $NEWDATA
-    mv $PGDATA/* $OLDDATA/ || true
-    echo "running initdb"
-    # abuse entrypoint script for initdb by making server error out; initialise
-    # the new cluster with the same superuser as the old one so pg_upgrade matches
-    gosu postgres bash -c "export PGDATA=$NEWDATA POSTGRES_USER=$INSTALL_USER ; /usr/local/bin/docker-entrypoint.sh --invalid-arg || true"
-    echo "running pg_upgrade"
-    cd /tmp
-    gosu postgres pg_upgrade --link -b /usr/lib/postgresql/$DATA_VERSION/bin -d $OLDDATA -D $NEWDATA -U $INSTALL_USER
-    cp $OLDDATA/pg_hba.conf $NEWDATA/
-    mv $NEWDATA/* $PGDATA
-    rm -rf $OLDDATA
-    rmdir $NEWDATA
-    echo "migration complete"
-  fi
-fi
-
-/usr/local/bin/docker-entrypoint.sh postgres
--- a/pg_backup.sh
+++ b/pg_backup.sh
@ -1,44 +1,47 @@
 #!/bin/bash

-# Postgres backup/restore hook for the discourse `db` service.
+# Postgres backup/restore hook for the discourse `db` service (discourse/postgres image).

 set -e

-BACKUP_FILE='/var/lib/postgresql/data/backup.sql'
-export PGPASSWORD=$(cat "${POSTGRES_PASSWORD_FILE:-/run/secrets/db_password}")
-DB_USER="${POSTGRES_USER:-discourse}"
+# dump goes at the volume root so backupbot's backup.sql label finds it
+BACKUP_FILE='/var/lib/postgresql/backup.sql'
+DATADIR="${PGDATA:-/var/lib/postgresql/18/docker}"
 DB_NAME="${POSTGRES_DB:-discourse}"

+# bootstrap superuser for the dump/drop/recreate; same POSTGRES_USER the db service sets
+SU="${POSTGRES_USER:-postgres}"
+
 function backup {
-  pg_dump -U "$DB_USER" "$DB_NAME" | gzip > "$BACKUP_FILE"
+  pg_dump -U "$SU" "$DB_NAME" | gzip > "$BACKUP_FILE"
 }

 function restore {
-  cd /var/lib/postgresql/data/
+  cd "$DATADIR"

  # Block all non-local connections so the running discourse app + sidekiq cannot reconnect and
  # interfere with the drop/recreate/reimport. Restored on exit.
  restore_hba() {
    cat pg_hba.conf.bak > pg_hba.conf
    rm -f pg_hba.conf.bak
-    su postgres -c 'pg_ctl reload'
+    su postgres -c "pg_ctl -D '$DATADIR' reload"
  }
  cp pg_hba.conf pg_hba.conf.bak
  echo 'local all all trust' > pg_hba.conf
-  su postgres -c 'pg_ctl reload'
+  su postgres -c "pg_ctl -D '$DATADIR' reload"
  trap restore_hba EXIT INT TERM

  # terminate any lingering local sessions before recreate
  # see https://stackoverflow.com/questions/5108876/kill-a-postgresql-session-connection
-  psql -U "$DB_USER" -d postgres -c \
+  psql -U "$SU" -d postgres -c \
    "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='${DB_NAME}' AND pid<>pg_backend_pid();"

  # drop database and then recreate it
-  psql -U "$DB_USER" -d postgres -c "DROP DATABASE ${DB_NAME} WITH (FORCE);"
-  createdb -U "$DB_USER" "$DB_NAME"
+  psql -U "$SU" -d postgres -c "DROP DATABASE ${DB_NAME} WITH (FORCE);"
+  createdb -U "$SU" "$DB_NAME"

-  # reimport data 
-  gunzip -c "$BACKUP_FILE" | psql -U "$DB_USER" -d "$DB_NAME" -1 -v ON_ERROR_STOP=1 -f -
+  # reimport data
+  gunzip -c "$BACKUP_FILE" | psql -U "$SU" -d "$DB_NAME" -1 -v ON_ERROR_STOP=1 -f -
 }

 $@
--- a/release/1.0.0+3.5.3
+++ b/release/1.0.0+3.5.3
@ -8,3 +8,12 @@ Rename these in your app's .env (the values carry over):
  DISCOURSE_SMTP_USER     --> DISCOURSE_SMTP_USER_NAME
  DISCOURSE_SMTP_AUTH     --> DISCOURSE_SMTP_AUTHENTICATION
  DISCOURSE_SMTP_PROTOCOL --> DISCOURSE_SMTP_ENABLE_START_TLS (takes a boolean true/false, not the old tls/ssl value, so translate it rather than copying it straight across)
+
+WARNING: if your deployment's database has an "install user" other than `postgres`
+(some older deployments do), you must set the POSTGRES_USER env var in your .env
+for this migration, otherwise the postgres upgrade aborts with an install-user
+mismatch.
+
+Check your old deployment's install user before upgrading (if this command returns postgres, then you do not need to set this env):
+
+  abra app run YOURAPPDOMAIN db -- psql -U discourse -tAc 'select rolname from pg_roles where oid = 10'