diff --git a/machine-docs/JOURNAL-2.md b/machine-docs/JOURNAL-2.md index 20a0381..d464b46 100644 --- a/machine-docs/JOURNAL-2.md +++ b/machine-docs/JOURNAL-2.md @@ -1256,3 +1256,33 @@ Validated: full run 2 (`/root/ccci-matrix-full2.log`) GREEN — `[register] …: (attempt 1) → succeeded (attempt 2)`, all 5 tiers pass, deploy-count=1, clean teardown. Claimed `9a8850a`. (This is a general pattern other DB-write functional tests may need after the restore tier; noted for the remaining recipes.) + +--- +## 2026-05-30T~00:30 — Q4.5 mattermost-lts: P4 overlay caught a real recipe restore defect + +Authored the mattermost P4 overlay (ops.py postgres ci_marker + test_install/upgrade/backup/restore). +First run failed on a self-inflicted bug: the postgres service is named `postgres`, not `db` (I misread +compose; `exec_in_app(service="db")` → "no running container"). Fixed (commit 012a477), re-ran. + +Re-run: install+upgrade+backup+custom GREEN (ci_marker survives the upgrade chaos crossover +2.1.9+10.11.15→2.1.10+10.11.18, captured "original" at backup; all 3 functional tests pass incl. +create_message_roundtrip §4.3). **restore FAILED**: after `abra app restore`, `relation "ci_marker" +does not exist`. + +**Root cause = recipe defect (same class as immich, different shape).** mattermost's `postgres` +service backs up via a pg_dump pre-hook (→ /var/lib/postgresql/data/postgres-backup.sql) + archives the +whole PGDATA dir (`backup.path=/var/lib/postgresql/data/`), but ships **NO `backupbot.restore.post-hook`**. +backupbot's restore extracts the archived files into the volume, but the RUNNING postgres doesn't +reload PGDATA without a restart, so the live DB keeps the post-drop (pre_restore) state → the seeded +marker is gone. The logical dump is in the archive but never reimported. + +**Fix = recipe-PR (immich pattern):** add a `backupbot.restore.post-hook` that reimports the dump into +the live DB deterministically — terminate connections → DROP DATABASE … WITH (FORCE) → createdb → +`psql -f postgres-backup.sql`. (Validate the mechanism live first, like immich, since the dump is a +plain pg_dump reimported into a fresh DB.) Mirror+PR `recipe-maintainers/mattermost-lts`, then +`RECIPE=mattermost-lts PR=` proves restore green. QUEUED as the next mattermost unit. + +This is the 2nd recipe (after immich) where the P4 data-integrity overlay caught a genuine +backup/restore defect — strong evidence the phase's P4 requirement is doing real work. The remaining +backup-capable recipes (bluesky-pds, uptime-kuma, ghost) should be assumed similarly suspect until their +restore is proven to round-trip seeded data. diff --git a/machine-docs/STATUS-2.md b/machine-docs/STATUS-2.md index 85f7bc1..817f97d 100644 --- a/machine-docs/STATUS-2.md +++ b/machine-docs/STATUS-2.md @@ -62,12 +62,18 @@ deploy-count=1, clean teardown; the retry log proves the transient (POST 500 att attempt 2) and the synapse capture log shows the cause (restore-tier `DROP DATABASE FORCE` closed synapse's DB pool: `psycopg2.InterfaceError: connection already closed`). -**Q4.5 mattermost-lts — P4 overlay authored, full-lifecycle run IN FLIGHT @2026-05-30** -(`/root/ccci-mattermost-full.log`). Added P4 data-integrity overlay (ops.py postgres ci_marker seed -+ test_install/upgrade/backup/restore). Recipe postgres-backed; its db backup is the whole PGDATA dir -(pg_dump pre-hook + backup.path=/var/lib/postgresql/data/) with NO explicit restore post-hook → -verifying empirically whether the file-level restore brings the ci_marker back (else recipe-PR like -immich). P3 present (create_message_roundtrip §4.3 + system_ping). NOT claimed. +**Q4.5 mattermost-lts — P4 RESTORE BROKEN (recipe defect); recipe-PR needed @2026-05-30.** Full run +`/root/ccci-mattermost-full2.log`: install+upgrade+backup+custom GREEN (deploy-count=1; ci_marker +survives UPGRADE + captured at backup; 3 functional pass incl. create_message_roundtrip §4.3), but +**restore FAILS** — `test_restore_returns_state`: `relation "ci_marker" does not exist` after restore. +ROOT CAUSE (recipe defect, same class as immich): the `postgres` service has a backup (pg_dump +pre-hook → postgres-backup.sql + `backup.path=/var/lib/postgresql/data/`) but **NO +`backupbot.restore.post-hook`**, and the file-level PGDATA restore doesn't take effect on the running +postgres → the DB keeps the post-drop state. FIX = recipe-PR adding a DB restore that reimports the +dump (immich pg_backup.sh pattern: terminate conns + drop+recreate + reimport postgres-backup.sql). +P4 overlay is correct + non-vacuous (it caught a real bug). NOT claimed — recipe-PR queued (BACKLOG-2 +/ DEFERRED). Node clean. (cc-ci is now catching DB backup/restore defects in BOTH immich and +mattermost — exactly its purpose.) **Q4.6 discourse — BLOCKED/DEFERRED @2026-05-29.** Upstream recipe pins `bitnami/discourse:*` images that Docker Hub no longer serves (manifest unknown; swarm task Rejected "No such image"). Image exists