journal(2): backupbot enumerate-once flow → harness must verify+re-invoke backup if db volume missing (chosen fix)

This commit is contained in:
autonomic-bot
2026-05-30 21:19:08 +00:00
parent ad7b3d0e8c
commit c2c66f21d8

View File

@ -1434,3 +1434,21 @@ does it enumerate volumes relative to that?) to confirm the cycle+capture intera
candidate = harness verifies the backup snapshot contains the db volume and retries if not, AND/OR the
recipe-PR backup is made resilient (+ `set -o pipefail` + fail-loud on missing dump so it can never be
silent again). 5 ghost runs done (full4 timeout-fixed; full5/6/7/8 restore race) — stop blind re-runs.
## 2026-05-30T21:18Z — backupbot backup flow read: enumerate-once → no retry recovers a dropped volume
Read backup-bot-two `/usr/bin/backup` `create`: it computes (pre_cmds, post_cmds, backup_paths) ONCE
via get_backup_details (which resolves each labelled volume's host path from the RUNNING service spec),
then runs pre_cmds (mysqldump via docker exec), then `backup_volumes(backup_paths, retries)` (restic),
then post_cmds. It does NOT stop/cycle the db. So the db cycle I observed during backup is swarm/mysqld,
NOT backupbot. Critically: backup_paths are enumerated ONCE up-front; if the db service is mid-cycle at
enumeration, its mysql path is omitted from backup_paths and abra's `--retries` (which only retries the
restic step) can NEVER recover it. So a per-restic retry is useless here.
FIX (chosen, harness-side, general for all DB recipes): after perform_backup, VERIFY the resulting
snapshot includes the db service's backupbot-labelled volume path; if missing, RE-INVOKE the whole
`abra app backup create` (fresh enumeration) up to N times. This closes the enumerate-during-cycle race
generally. Pair with recipe-PR mysql_backup.sh `set -o pipefail` + fail-loud-on-missing-dump so a
dump-less restore can never silently no-op again. (Still-open minor: the db cycle's own trigger during
backup — not OOM/not-healthcheck — left as a separate observation; the harness verify+retry makes the
backup correct regardless.) Implement next tick, then ghost full run to verify green incl upgrade.