decisions(2): SETTLED — harness BACKUP_VERIFY hook + backup retry closes the backup-capture race (recipe-scoped, additive)
This commit is contained in:
@ -1094,3 +1094,22 @@ NB also flagged: snapshots accumulate across runs under the SAME deterministic d
|
||||
(ghos-9431a1) — restore picks the global LATEST, which can be a prior run's; teardown removes app+
|
||||
volumes but not the restic repo. Confirm the harness restore targets THIS run's snapshot (likely fine
|
||||
since each run's latest is its own backup, but worth a harness check).
|
||||
|
||||
## 2026-05-30 — SETTLED: harness BACKUP_VERIFY hook + backup retry (backup-capture integrity)
|
||||
|
||||
New shared-harness capability (run_recipe_ci `_perform_op` backup branch + meta allowlist): a recipe's
|
||||
`recipe_meta.py` MAY define `BACKUP_VERIFY(domain) -> bool`, a READ-ONLY probe run AFTER the backup op
|
||||
that confirms the backup actually captured the recipe's critical data. If it returns False, the harness
|
||||
re-runs the entire `abra app backup create` (fresh restic snapshot, re-stabilised services) up to 3
|
||||
attempts. Recipes that don't define it keep the single-backup behaviour (zero blast radius).
|
||||
|
||||
Rationale: DB recipes dump their data in a backupbot `backup.pre-hook`; backup-bot-two enumerates the
|
||||
volume paths, runs the pre-hook (dump), then restic-captures the paths. If the DB container cycles
|
||||
mid-dump (observed intermittent on the loaded single CI node — ghost full5/6/7 RED, full8 green; ruled
|
||||
out OOM + healthcheck), the dump is truncated/absent → restic snapshots an empty path → `abra app
|
||||
backup create` still returns success → a later restore reimports nothing → silent data loss that P4
|
||||
catches as RED. A per-restic `--retries` cannot help (paths enumerated once, up-front). The verify+retry
|
||||
closes the race generally. It is NOT a test weakening: BACKUP_VERIFY is read-only and only retries a
|
||||
flaky CAPTURE so the P4 restore assertion is exercised reliably instead of luck-dependently. Companion
|
||||
recipe-PR hardening (mysql_backup.sh `set -o pipefail` + fail-loud on missing dump) still wanted so the
|
||||
reimport can never silently no-op. ghost BACKUP_VERIFY: backup.sql.gz is a valid non-empty gzip.
|
||||
|
||||
Reference in New Issue
Block a user