decisions(2): ghost P4 restore dead-end + root cause (abra backup intermittently omits mysql volume; restore post-hook silent no-op); fix plan
This commit is contained in:
@ -1054,3 +1054,43 @@ sub-plan's own lasuite-drive collabora "start_period [KEYSTONE]" recipe-PR.
|
||||
- **discourse**: recipe-PR `recipe-maintainers/discourse#1` sets `start_period: 20m` (covers the
|
||||
15-25min Rails first-boot; default was 5m). cc-ci recipe_meta no longer sets APP_START_PERIOD.
|
||||
- **ghost (E1)**: must use the SAME literal-bump approach, NOT an env var (same abra limitation).
|
||||
|
||||
## 2026-05-30 — ghost P4 restore: 3rd-failure DEAD-END + root cause (backup omits mysql volume)
|
||||
|
||||
**Dead-end (stop per §guardrail, 3 identical fails):** ghost full5/full6/full7 (REF=ae43ffe, db-grace
|
||||
overlay db@15m start_period) ALL failed P4 restore — `ci_marker` table absent post-restore — after
|
||||
full3 (app-only overlay, db@native 1m) passed it once. Re-running unchanged is futile.
|
||||
|
||||
**Definitive root cause (restic snapshot inspection, repo /backups/restic in backupbot container):**
|
||||
`abra app backup create` is INTERMITTENTLY OMITTING the mysql volume from the snapshot.
|
||||
- full5 backup snapshot `b6200e44`: contains `…_mysql/_data/backup.sql.gz` ✓
|
||||
- full6 `7daac418` + full7 `410a63b9` (latest): contain ONLY `…_ghost_content/_data` + `/secrets` —
|
||||
**no mysql volume / no backup.sql.gz**.
|
||||
`abra app restore` restores the LATEST snapshot → for full6/7 that snapshot has no db dump →
|
||||
`/var/lib/mysql/backup.sql.gz` absent at restore → the recipe restore post-hook
|
||||
`gunzip -c …/backup.sql.gz | mysql -u root` reads nothing. mysql_backup.sh has `set -e` but NOT
|
||||
`set -o pipefail`, so a failed/empty gunzip pipes empty input to mysql (rc 0) → restore "succeeds"
|
||||
SILENTLY while reimporting nothing → dropped ci_marker never returns. (cc-ci P4 correctly caught a
|
||||
real data-loss path; generic test_restore_healthy passed = app up, masking it — exactly why P4 exists.)
|
||||
|
||||
**Two distinct defects, both in the ghost recipe-PR (recipe-maintainers/ghost#1), NOT cc-ci tests:**
|
||||
1. **Unreliable backup capture.** The db service uses `backupbot.backup.volumes.mysql.path: backup.sql.gz`
|
||||
— a schema whose volume capture is intermittent here (worked full5, silently skipped full6/7;
|
||||
correlates with the db-grace overlay disturbing db-container settle timing around the upgrade
|
||||
chaos-redeploy + backup). The PROVEN pattern in this project is mattermost-lts#1's
|
||||
`backupbot.backup.path: "/var/lib/postgresql/data/"` (container-path schema, pre-hook dumps into the
|
||||
dir, post-hook rm's the dump, physical restore) — never intermittent.
|
||||
2. **Silent no-op restore.** mysql_backup.sh restore lacks `set -o pipefail` + a backup-file existence
|
||||
guard, so a missing/empty dump silently restores nothing instead of failing loud.
|
||||
|
||||
**Fix plan (next tick, methodical — not a blind re-run):**
|
||||
(a) Re-run ONCE with a FIXED diagnostic watcher that samples db `.State.Health.Status` + the snapshot
|
||||
volume list DURING the backup tier, to confirm the "db unsettled at backup → volume skipped" link.
|
||||
(b) Recipe-PR ghost#1: harden mysql_backup.sh (`set -o pipefail`; restore fails loud if backup.sql.gz
|
||||
missing/empty) AND switch the db backup to the proven mattermost `backupbot.backup.path` schema so
|
||||
the dump is reliably archived+restored. Keep logical dump+reimport (physical InnoDB copy of a live
|
||||
data dir is consistency-risky). Then full ghost run green incl upgrade-to-latest → claim.
|
||||
NB also flagged: snapshots accumulate across runs under the SAME deterministic domain hash
|
||||
(ghos-9431a1) — restore picks the global LATEST, which can be a prior run's; teardown removes app+
|
||||
volumes but not the restic repo. Confirm the harness restore targets THIS run's snapshot (likely fine
|
||||
since each run's latest is its own backup, but worth a harness check).
|
||||
|
||||
Reference in New Issue
Block a user