journal+decisions(2): ghost migration-lock deadlock root cause + healthcheck-overlay fix + abra +U chaos-version normalization
This commit is contained in:
@ -1315,3 +1315,36 @@ or a sqlite-aware restore hook). Deploy-test first to find out (don't assume).
|
||||
|
||||
Checkpointing here (node clean, no gate pending — all 3 claims this session PASSed) to take bluesky
|
||||
fresh next cycle; the analysis above lets it start at the overlay, not the investigation.
|
||||
|
||||
## 2026-05-30 — Q4.4 ghost: P3 create-post GREEN + P4 non-vacuous; migration-lock deadlock + +U fixes
|
||||
|
||||
Authored ghost P4 overlays (MySQL `ci_marker` in the `ghost` DB — recipe is MySQL not sqlite; stale
|
||||
comment) + §4.3 create-post round-trip (cookie-aware Admin API client `_ghost.py`). Run-4 results
|
||||
(`/root/ccci-ghost-4.log`): deploy-count=1; install/backup/custom PASS; `test_create_post_roundtrip
|
||||
PASSED (22s)`; P4 upgrade+backup markers PASS; restore RED (real recipe gap — no reimport-on-restore).
|
||||
|
||||
**Why two deploys failed first (NOT test issues):**
|
||||
1. **migrations_lock deadlock.** Ghost's fresh-DB first boot runs a ~6-9min schema migration (dozens
|
||||
of CREATE TABLEs, each a separate MySQL round-trip — round-trip-bound, NOT CPU-bound: hit on BOTH
|
||||
2- and 4-vCPU). The recipe healthcheck `start_period:1m` (+10×30s ≈ 6min grace) marks the still-
|
||||
migrating task unhealthy → swarm kills it mid-migration → leaves `migrations_lock.locked=1,
|
||||
released_at=NULL` → every later task boots, sees the held lock, refuses (`MigrationsAreLockedError`)
|
||||
→ permanent deadlock. Bumping the abra TIMEOUT does NOT help (the lock never clears). FIX: a cc-ci
|
||||
DEPLOY overlay `compose.ccci-health.yml` raising the app healthcheck start_period to 900s (failures
|
||||
ignored during it; a PASS still marks healthy at once) so the fresh migration finishes + releases
|
||||
the lock. Wired via recipe_meta COMPOSE_FILE + install_steps.sh + CHAOS_BASE_DEPLOY. NOT a test
|
||||
change — the real healthcheck still gates readiness. Validated: migration ran past the old kill
|
||||
point, install converged 1/1. (Operator bumped the VM 2→4 vCPU mid-session; didn't fix this — the
|
||||
migration is round-trip-bound — but made everything else snappier.)
|
||||
2. **`+U` chaos-version marker.** The untracked overlay makes abra stamp `chaos-version='<commit>+U'`
|
||||
(U=untracked). The commit equals head_ref (HC1 satisfied) but `+U` broke assert_upgraded's exact-
|
||||
prefix match → spurious upgrade FAIL. FIX: strip the working-tree-state marker before the commit
|
||||
match (commit identity still enforced; HC1 preserved). mumble dodged this only because its overlay
|
||||
is tracked natively in newer versions; cc-ci overlays generally aren't → general harness fix.
|
||||
|
||||
**P4 restore gap (real recipe defect → recipe-PR):** ghost db service has `mysqldump --tab` backup
|
||||
pre-hook but NO `backupbot.restore.*` hook, and the mysql data volume isn't backupbot-labelled → the
|
||||
dump is restored to disk but never reimported → dropped `ci_marker` doesn't return. Non-vacuous
|
||||
(backup PASS with marker, restore RED). Same class as immich#1 / mattermost-lts#1. FIX = recipe-PR
|
||||
adding a mysql dump+reimport hook (mirror mattermost `pg_backup.sh` → `mysql_backup.sh`). Ghost not
|
||||
yet mirrored on gitea (404) → mirror first (plan §0b), then PR, then final green run, then claim.
|
||||
|
||||
Reference in New Issue
Block a user