journal(2): ghost full5 restore RED (ci_marker absent) — full6 instrumented re-run to characterize flaky vs systematic

This commit is contained in:
autonomic-bot
2026-05-30 20:14:13 +00:00
parent 3a706bd96e
commit 01fd43bcd5

View File

@ -1368,3 +1368,24 @@ migration — pure can't-connect), so the only cost is the mysql-init head start
Fix (4a160f6): bump ghost DEPLOY_TIMEOUT + EXTRA_ENV TIMEOUT 1200→2400s (matches discourse). Not a
test weakening — the wait is bounded; a genuine hang still fails at 40min. Teardown after full4 was
clean (no leftover stack/volume/secret). Re-running as full5.
## 2026-05-30T20:10Z — ghost full5: P4 restore RED (ci_marker table absent post-restore) — investigating
full5 (`/root/ccci-ghost-full5.log`, 2400s timeout): deploy-count=1, install/upgrade/backup/custom
PASS, **restore FAIL**. `test_restore_returns_state`: `Table 'ghost.ci_marker' doesn't exist` after
restore (generic test_restore_healthy PASSED → app up; my P4 overlay caught a data-integrity gap).
Recipe-PR head ae43ffe DOES ship the reimport hook (verified ~/.abra/recipes/ghost on cc-ci):
compose db service has `backupbot.backup.pre-hook=/mysql_backup.sh backup`,
`backupbot.backup.volumes.mysql.path=backup.sql.gz`, `backupbot.restore.post-hook=/mysql_backup.sh
restore`; mysql_backup.sh restore = `gunzip -c /var/lib/mysql/backup.sql.gz | mysql -u root`.
Puzzle: full3 (17:23, app-ONLY overlay, db@native 1m) was FULLY GREEN incl restore; full5 (after
3ca45c7 added db@15m grace) regressed on restore. db-grace was observed-necessary (run#2 mysql-init
exit-137 redo-corruption deadlock under load), so I can't just drop it. But db start_period only
changes WHEN swarm marks unhealthy — it shouldn't mechanically break the reimport. So leading
hypotheses: (a) load-dependent flake in backupbot restore / the reimport; (b) recipe-hook robustness
gap — `gunzip -c | mysql` has `set -e` but NOT `set -o pipefail`, so a failed/empty gunzip silently
reimports nothing yet returns 0. Action: full6 re-run + instrument the restore tier live (capture
backupbot restore output, backup.sql.gz presence, whether reimport populated ci_marker). NOT claiming
ghost until restore is reliably green. Stack/vol teardown after full5 was clean.