3.8 KiB
Phase ghost — re-evaluate ghost after proxy fix and leave one clean PR
Mission: re-evaluate the ghost upgrade failure after the Swarm proxy/IPAM infra
confound has been removed, then leave exactly one operator-ready ghost PR: green if the
recipe is sound, or clearly explained with the minimum required recipe fix/comment if a real
Ghost/MySQL upgrade issue remains.
State files live under machine-docs/: STATUS-ghost.md, BACKLOG-ghost.md,
REVIEW-ghost.md, JOURNAL-ghost.md.
Context
The 2026-06-12 /upgrade-all recorded ghost as the only failed recipe, but the evidence
was mixed:
- One failure was definitely infra: shared
proxyoverlay VIP exhaustion left tasks stuck in SwarmNewstate. - A later failure may be recipe-specific: MySQL 8.0 to 8.4 data-dir upgrade timing under
Swarm's default update monitor, producing
UpdateStatus=pausedunder load. - A previous run on 2026-06-05 passed the Ghost/MySQL path under lighter load.
- Duplicate ghost subagent churn may have left branch/PR/comment state messy.
Existing focused plan/background: /srv/cc-ci/cc-ci-plan/plan-ghostpr-debug-fix.md.
Required Work
- Inventory PR state. On
recipe-maintainers/ghost, list all open PRs and branches related to the upgrade. Identify the correct PR, expected to be ghost PR#4, and close or clearly mark any duplicate only if it is truly superseded. Never merge recipe PRs. - Separate infra from recipe behavior. After
pvfixandpvcheck, trigger a fresh!testmeon the correct ghost PR and watch the run. Do not count pre-proxy failures as current recipe evidence. - If green: record that the prior failure was infra/timing-confounded, ensure no stale stacks/volumes remain, and leave the PR ready for operator review.
- If red for a real recipe reason: make the smallest recipe PR change needed. The
suspected fix is a longer Swarm update monitor/start grace around the MySQL 8.0 to 8.4
data-dir migration, e.g.
update_config.monitor: 300sand related minimal service health timing. Validate the hypothesis with logs; do not cargo-cult timing knobs. - If the test is genuinely stale: default recipe-upgrade policy applies: leave an explanatory PR comment for the operator. Do not edit cc-ci tests in this phase unless the operator explicitly asks for a test-update phase.
- Deduplicate and clean up. Ensure exactly one relevant open ghost upgrade PR remains,
comments explain the final state, and no
ghos-*/dev-ghoststacks or volumes leak.
Gates
M1 — State inventory and clean retry. Builder documents PR/branch/comment/build state,
identifies the correct PR, and runs one clean post-proxy !testme. Adversary verifies that
pre-proxy infra failures were not misclassified as current recipe failures.
M2 — Operator-ready outcome. The ghost PR is green, or it has the minimal justified recipe fix/comment and a clear current blocker. Duplicate PR/branch mess is resolved and no ghost resources leak. Adversary verifies live PR state, build evidence, and cleanup.
Guardrails
- Recipe PRs are never merged by agents.
- Do not weaken tests to get green.
- Do not re-run ghost during proxy maintenance or while
cfoldowns a broad CI sweep. - Keep iterations bounded: at most three fresh post-proxy
!testmeattempts unless the operator authorizes more. - Preserve useful failure evidence in PR comments and
machine-docs/STATUS-ghost.md.
Definition of Done
Exactly one ghost upgrade PR is operator-ready, with a fresh post-proxy verdict and clear
classification of the 2026-06-12 failure. Any real recipe fix is minimal and verified;
otherwise the PR is green or has a precise operator-facing explanation. Adversary has
signed off on M1 and M2 in machine-docs/REVIEW-ghost.md; Builder writes ## DONE only
after both gates have fresh Adversary PASSes.