fix(harness): a paused swarm update is settled — only active states block convergence
68ef0f8 made services_converged() require UpdateStatus settled, treating
'paused' as in flight. But swarm's default update-failure-action pauses the
update on a single task flicker and the flag persists FOREVER (until the next
update): immich CI 241 had the app service 'paused' from a restart during
restore while the service was back at 1/1 and healthy — every subsequent wait
hung to its deadline and the run had to be killed.
Only 'updating' and 'rollback_started' now block convergence: those are the
states swarm is actively driving (the 238 stop-first race lives in 'updating').
'paused'/'rollback_paused' make no progress without intervention, so waiting on
them is pointless — N/N replicas is already required, and the HTTP-health and
tier assertions still gate whether the app actually works.
lint: PASS, unit tests: 138 passed.
This commit is contained in:
@ -384,7 +384,13 @@ def services_converged(domain: str) -> bool:
|
||||
if proc.returncode != 0:
|
||||
return False # a service vanished mid-check — not settled
|
||||
for state in proc.stdout.split("\n"):
|
||||
if state.strip() not in ("", "completed", "rollback_completed"):
|
||||
# Only ACTIVE states block convergence. 'paused'/'rollback_paused' are terminal-without-
|
||||
# intervention: swarm's default update-failure-action pauses the update on one task flicker
|
||||
# and the flag then persists FOREVER (immich CI 241: app service 'paused' from a restart
|
||||
# during restore, service back at 1/1 and healthy — the wait hung to its deadline). With
|
||||
# N/N already required above, a paused update is settled for our purposes; the HTTP-health
|
||||
# and tier assertions still gate whether the app actually works.
|
||||
if state.strip() in ("updating", "rollback_started"):
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
Reference in New Issue
Block a user