fix: abra app upgrade -c (no-converge-checks) — abra false-fails slow heavy rolling upgrades
Diagnosed via instrumented diag: lasuite-docs upgrade reported 'FATA deploy failed' while all 9 services converged 1/1 — abra's convergence poll gives up too early on the slow stop-first roll (pulling new images). Disable abra's check; the harness wait_healthy + data-survival assertion is the real, more-patient gate (a genuine failure still fails the test: app never gets healthy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -101,10 +101,14 @@ def upgrade(domain: str, version: Optional[str] = None, timeout: int = 900) -> N
|
||||
args = ["app", "upgrade", domain]
|
||||
if version:
|
||||
args.append(version)
|
||||
# -f no prompt, -D skip public-DNS checks (our per-run domains route via the gateway), -o offline
|
||||
# (use local tags — incl. the upstream tags fetched at clone — and DON'T fetch from the private
|
||||
# mirror origin, which 401s). upgrade has no --chaos flag.
|
||||
args += ["-f", "-D", "-n", "-o"]
|
||||
# -f no prompt, -D skip public-DNS checks, -o offline (local tags, no private-origin 401),
|
||||
# -c no-converge-checks: abra's convergence poll gives up too early on a slow heavy rolling
|
||||
# upgrade (e.g. lasuite-docs' 9-service stop-first roll while pulling new images) and reports a
|
||||
# FALSE "deploy failed" even though all services do converge. We disable abra's check and rely on
|
||||
# the harness's own wait_healthy + data-survival assertion (more patient + the real test) to gate
|
||||
# the upgrade. A genuinely-failed upgrade still fails the test (app never gets healthy). upgrade
|
||||
# has no --chaos flag.
|
||||
args += ["-f", "-D", "-n", "-o", "-c"]
|
||||
_run(args, timeout=timeout)
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user