3.8 KiB
Plan — debug & fix the ghost recipe upgrade PR
Context: during the 2026-06-12 weekly upgrade, ghost (ghost 6.42.0→6.44.1 + mysql bump) was the
recipe whose !testme kept wedging. Its test deploys (ghos-bdd2f3 etc.) hung at 0/1 in Swarm
New state — which we now know was the proxy VIP exhaustion (see
proxy-vip-exhaustion-runbook / plan-proxy-vip-exhaustion-fix.md), NOT necessarily a ghost
defect. It also got run by a DUPLICATE subagent during the interrupt churn, so the PR/branch state
may be messy. This plan figures out what actually went wrong and leaves the ghost PR clean + green.
Execute AFTER the proxy VIP fix (so the infra confound is gone) and the current upgrade settles.
Owner: orchestrator, or a focused /recipe-upgrade ghost re-run.
What the upgrader already found (2026-06-12 summary — start here)
ghost 6.42.0→6.44.1 (+ mysql 8.0→8.4). PR is recipe-maintainers/ghost #4, open with an analysis
comment. Three !testme attempts:
- run 1 (2026-06-05): PASS at 6.44.0 + mysql:8.4 (under lighter load).
- run 2 (2026-06-12): FAIL — IPAM/proxy-VIP exhaustion (infra; the proxy-vip-exhaustion-runbook issue).
- run 3 (2026-06-12): FAIL — a REAL issue: Swarm
UpdateStatus=pausedon the mysql 8.0→8.4 data-dir upgrade race — the default 5supdate_config.monitoris too tight for the mysql data-dir migration under load, so Swarm marks the update paused/failed. The upgrader's suggested fix: addupdate_config.monitor: 300s(and likelyfailure_action: continue/ a longerstart_period) to the ghost app service so the mysql data-dir upgrade has time to converge. So the likely real fix is a small recipe-PR change to ghost'supdate_config, then re-verify — NOT a test change.
Steps
- Inventory the ghost PR state. On recipe-maintainers/ghost: list open PRs — is there ONE
upgrade PR or a DUPLICATE (two branches/PRs from the two ghost subagents)? Capture each PR's
branch, diff (image tag + version-label bumps), and its
!testmecomment history / build results. Read the upgrader transcript for both ghost subagents to see what each did. - Separate infra failure from real failure. The deploy wedges were proxy-VIP exhaustion
(infra). Determine whether ghost ALSO has a genuine upgrade problem: does ghost 6.44.1 + the
mysql bump deploy + pass its tests on a HEALTHY swarm? Re-run
!testmeon the ghost PR now that the box is healthy (post docker-restart / post proxy fix) and watch the real result. - Dedup. If two ghost PRs/branches exist, keep the correct one (right version bump, clean
diff), close the duplicate with a note, and ensure no leftover
dev-ghost/ghos-*stacks remain (reap). - Fix forward to green. If
!testmeis RED for a REAL reason (e.g. ghost 6.44.1 needs a config change, or the mysql major bump needs a migration step / a genuinely-stale test): apply the minimal recipe fix per/recipe-upgraderules — recipe PR changes only; if a cc-ci TEST is genuinely stale, leave an explanatory PR COMMENT (do NOT edit tests in default mode). Iterate!testme≤3× to green. - Leave it operator-ready. One clean ghost PR,
!testmeGREEN (or a clear comment explaining a legitimately-deferred issue), no duplicate, no leaked deploys. NEVER merge — operator merges.
Acceptance
The ghost upgrade is represented by exactly one PR with a clear, green (or clearly-explained)
!testme, the duplicate-subagent mess cleaned, and a one-line note on whether ghost's original
failure was purely the proxy-VIP infra issue or a real upgrade problem (and how it was fixed).
Guardrails
Recipe mirror = PR only, never merge / never push main. Reap any dev-ghost/ghos-* test stacks on
exit. No secrets in logs/commits. Don't run while the proxy recreate (maintenance window) is in
progress.