1.4 KiB
1.4 KiB
name, description, metadata
| name | description | metadata | ||||||
|---|---|---|---|---|---|---|---|---|
| swarm-updatestatus-convergence-gotchas | N/N replicas is not convergence during stop-first rolling updates, and a 'paused' UpdateStatus persists forever — both bit cc-ci harness waits (builds 238/241) |
|
Two docker-swarm facts that broke cc-ci convergence waits on 2026-06-09:
- N/N ≠ converged. A service update (e.g. chaos redeploy changing a db image) is registered
immediately but may not have started — the OLD task still shows 1/1, then dies seconds later
(stop-first). Build 238: backupbot exec'd a pre-hook into the just-killed db container → 409 →
empty snapshot → RED. Convergence must also check
docker service inspect --format '{{if .UpdateStatus}}{{.UpdateStatus.State}}{{end}}'. pausedpersists forever. Swarm's defaultupdate-failure-action: pauseflips UpdateStatus topausedon ONE task flicker, and the flag never clears (until the next update) even when the service recovers to N/N healthy. Build 241 hung 22min treating it as in-flight. Onlyupdatingandrollback_startedare active states worth waiting on.
Both encoded in cc-ci runner/harness/lifecycle.py::services_converged (commits 68ef0f8 + e6d55b5).
Remember when writing any NEW wait/health logic against swarm. Related: shared-recipe-checkout-race