fix(db): make pg_upgrade migration idempotent & crash-safe #3

Closed
autonomic-bot wants to merge 0 commits from fix-pg-migration-idempotent into main

Makes the postgres pg_upgrade major-version migration idempotent and crash-safe: a state-driven guard on the old_data/new_data scratch dirs replaces the marker file, so an interrupted migration auto-recovers (empty leftovers) or stops cleanly for manual recovery (non-empty) instead of crash-looping on mkdir: File exists or silently re-initdb-ing over live data. Bumps DB_ENTRYPOINT_VERSION v1->v2 so swarm reloads the config.

Makes the postgres `pg_upgrade` major-version migration idempotent and crash-safe: a state-driven guard on the `old_data`/`new_data` scratch dirs replaces the marker file, so an interrupted migration auto-recovers (empty leftovers) or stops cleanly for manual recovery (non-empty) instead of crash-looping on `mkdir: File exists` or silently re-initdb-ing over live data. Bumps `DB_ENTRYPOINT_VERSION` v1->v2 so swarm reloads the config.
autonomic-bot added 1 commit 2026-06-16 17:00:35 +00:00
The postgres major-version migration in the db entrypoint was not safe to
re-run. If the container was killed mid-migration it could crash-loop forever
("mkdir: cannot create directory .../old_data: File exists") or silently initdb
a fresh empty cluster over the live data once PG_VERSION had been moved out of
$PGDATA but before the in-progress marker was written.

Replace the marker file with a state-driven guard keyed on the scratch dirs:
empty old_data/new_data means the run was interrupted before any data moved, so
discard and retry (idempotent); non-empty means data may only live there, so
stop for manual recovery. Bump DB_ENTRYPOINT_VERSION v1->v2 so swarm picks up
the new (immutable) config.
Author
Owner

Tested on cctest by running the new entrypoint in pgvector/pgvector:pg17 against a real seeded postgres 13 cluster: (1) full 13→17 migration completes with data intact; (2) a stuck deploy with empty old_data/new_data auto-recovers and migrates; (3) a non-empty scratch dir exits FATAL without deleting the data.

Tested on cctest by running the new entrypoint in `pgvector/pgvector:pg17` against a real seeded postgres 13 cluster: (1) full 13→17 migration completes with data intact; (2) a stuck deploy with empty `old_data`/`new_data` auto-recovers and migrates; (3) a non-empty scratch dir exits FATAL without deleting the data.
autonomic-bot added 1 commit 2026-06-16 17:59:29 +00:00
pg_upgrade must run as the old cluster's bootstrap superuser (oid 10), and the
new cluster must be initialised with that same user, otherwise it fails the
"database user is the install user" consistency check. The install user is not
necessarily $POSTGRES_USER: clusters created with the default "postgres"
superuser plus a separate app role (e.g. discourse) are common.

Detect it from the old cluster by briefly starting it and reading pg_roles
(oid = 10) as the known app role, then use it for both the new cluster's initdb
and the pg_upgrade -U argument.
Author
Owner

Added commit 57f5ee2: pg_upgrade was hardcoded to -U $POSTGRES_USER (discourse), which fails the "database user is the install user" check on clusters whose bootstrap superuser is postgres with discourse as a separate app role. It now detects the old cluster's real install user (briefly starts it and reads pg_roles where oid = 10) and uses that for both the new cluster's initdb and pg_upgrade -U. Verified on cctest against a prod-like v13 cluster (superuser postgres, non-superuser discourse role): 13→17 completes, data intact.

Added commit `57f5ee2`: pg_upgrade was hardcoded to `-U $POSTGRES_USER` (discourse), which fails the "database user is the install user" check on clusters whose bootstrap superuser is `postgres` with `discourse` as a separate app role. It now detects the old cluster's real install user (briefly starts it and reads `pg_roles` where `oid = 10`) and uses that for both the new cluster's initdb and `pg_upgrade -U`. Verified on cctest against a prod-like v13 cluster (superuser `postgres`, non-superuser `discourse` role): 13→17 completes, data intact.
Owner

!testme

!testme
autonomic-bot added 1 commit 2026-06-16 18:04:08 +00:00
fix(db): bump DB_ENTRYPOINT_VERSION to v3 so the entrypoint config reloads
All checks were successful
cc-ci/testme cc-ci: success
bd5f181737
The install-user fix changed the entrypoint content; swarm configs are
immutable, so the config name (which embeds DB_ENTRYPOINT_VERSION) must change
for a redeploy to pick up the new script.
Author
Owner

🌻 cc-cidiscourse @ bd5f1817 passed

cc-ci result card

level

full logs · dashboard

<!-- cc-ci:testme --> 🌻 **cc-ci** — `discourse` @ `bd5f1817` ✅ **passed** [![cc-ci result card](https://ci.commoninternet.net/runs/699/summary.png)](https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/699) [![level](https://ci.commoninternet.net/runs/699/badge.svg)](https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/699) [full logs](https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/699) · [dashboard](https://ci.commoninternet.net/)
autonomic-bot force-pushed fix-pg-migration-idempotent from bd5f181737 to 33add86dd3 2026-06-16 18:22:28 +00:00 Compare
autonomic-bot force-pushed fix-pg-migration-idempotent from 33add86dd3 to 5d71fc560d 2026-06-16 18:26:41 +00:00 Compare
autonomic-bot force-pushed fix-pg-migration-idempotent from 5d71fc560d to 6ae2d2cf51 2026-06-16 18:27:48 +00:00 Compare
autonomic-bot force-pushed fix-pg-migration-idempotent from 6ae2d2cf51 to a9f08eed28 2026-06-16 18:29:27 +00:00 Compare
Author
Owner

Auto-closed by cc-ci canonical sweep: its changes are already in upstream main (merged upstream); mirror main re-synced

Auto-closed by cc-ci canonical sweep: its changes are already in upstream main (merged upstream); mirror main re-synced
autonomic-bot closed this pull request 2026-06-17 07:24:31 +00:00

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: recipe-maintainers/discourse#3
No description provided.