From 01fd43bcd5fd8cd2f7820ac89334088c499bc396 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sat, 30 May 2026 20:14:13 +0000 Subject: [PATCH] =?UTF-8?q?journal(2):=20ghost=20full5=20restore=20RED=20(?= =?UTF-8?q?ci=5Fmarker=20absent)=20=E2=80=94=20full6=20instrumented=20re-r?= =?UTF-8?q?un=20to=20characterize=20flaky=20vs=20systematic?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- machine-docs/JOURNAL-2.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/machine-docs/JOURNAL-2.md b/machine-docs/JOURNAL-2.md index 02412b0..a5699a2 100644 --- a/machine-docs/JOURNAL-2.md +++ b/machine-docs/JOURNAL-2.md @@ -1368,3 +1368,24 @@ migration — pure can't-connect), so the only cost is the mysql-init head start Fix (4a160f6): bump ghost DEPLOY_TIMEOUT + EXTRA_ENV TIMEOUT 1200→2400s (matches discourse). Not a test weakening — the wait is bounded; a genuine hang still fails at 40min. Teardown after full4 was clean (no leftover stack/volume/secret). Re-running as full5. + +## 2026-05-30T20:10Z — ghost full5: P4 restore RED (ci_marker table absent post-restore) — investigating + +full5 (`/root/ccci-ghost-full5.log`, 2400s timeout): deploy-count=1, install/upgrade/backup/custom +PASS, **restore FAIL**. `test_restore_returns_state`: `Table 'ghost.ci_marker' doesn't exist` after +restore (generic test_restore_healthy PASSED → app up; my P4 overlay caught a data-integrity gap). + +Recipe-PR head ae43ffe DOES ship the reimport hook (verified ~/.abra/recipes/ghost on cc-ci): +compose db service has `backupbot.backup.pre-hook=/mysql_backup.sh backup`, +`backupbot.backup.volumes.mysql.path=backup.sql.gz`, `backupbot.restore.post-hook=/mysql_backup.sh +restore`; mysql_backup.sh restore = `gunzip -c /var/lib/mysql/backup.sql.gz | mysql -u root`. + +Puzzle: full3 (17:23, app-ONLY overlay, db@native 1m) was FULLY GREEN incl restore; full5 (after +3ca45c7 added db@15m grace) regressed on restore. db-grace was observed-necessary (run#2 mysql-init +exit-137 redo-corruption deadlock under load), so I can't just drop it. But db start_period only +changes WHEN swarm marks unhealthy — it shouldn't mechanically break the reimport. So leading +hypotheses: (a) load-dependent flake in backupbot restore / the reimport; (b) recipe-hook robustness +gap — `gunzip -c | mysql` has `set -e` but NOT `set -o pipefail`, so a failed/empty gunzip silently +reimports nothing yet returns 0. Action: full6 re-run + instrument the restore tier live (capture +backupbot restore output, backup.sql.gz presence, whether reimport populated ci_marker). NOT claiming +ghost until restore is reliably green. Stack/vol teardown after full5 was clean.