fix(2): F2-12 lasuite-drive upgrade tier — own convergence wait (abra -c) + collabora READY_PROBE
Adversary cold-verify FAILed Q3.2 (F2-12): the prev→PR-head chaos upgrade's abra converge monitor FATAs while the NEW collabora 25.04.9.4.1's healthcheck is still in start_period (jail/config init), even though it converges given swarm's healthcheck retries. My WOPI pre-gate fixed the OLD collabora being killed mid-boot but not the NEW collabora's convergence. Flaky (3x green for me, 1x fail cold). Fix (cc-ci-side, stronger verification — not weaker): - abra.deploy gains no_converge_checks (`-c`); chaos_redeploy passes it for the upgrade op so abra's impatient monitor no longer FATAs (the stack spec is applied regardless). - perform_upgrade now OWNS the convergence verification after the redeploy: wait_healthy (services N/N + app HEALTH_PATH) + new lifecycle.wait_ready_probes (recipe READY_PROBE), bounded by the recipe DEPLOY_TIMEOUT (generous) not abra's impatient window. meta threaded _perform_op→perform_upgrade. - recipe_meta READY_PROBE hook (added to _load_meta whitelist): lasuite-drive probes collabora WOPI discovery (/hosting/discovery on collabora-<domain>) → 200. Called after install deploy AND after the upgrade redeploy. No-op for recipes without a READY_PROBE. NOT re-claiming yet — validating the upgrade tier is now reliably green (incl. the slow-collabora crossover) across multiple runs before re-claiming Q3.2. F2-12 stays open (Adversary-owned). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -117,10 +117,17 @@ def secret_generate(domain: str, timeout: int = 300) -> None:
|
||||
)
|
||||
|
||||
|
||||
def deploy(domain: str, chaos: bool = True, timeout: int = 900) -> None:
|
||||
def deploy(domain: str, chaos: bool = True, timeout: int = 900, no_converge_checks: bool = False) -> None:
|
||||
args = ["app", "deploy", domain, "-o", "-n"]
|
||||
if chaos:
|
||||
args.append("-C")
|
||||
if no_converge_checks:
|
||||
# `-c`: skip abra's own post-deploy convergence monitor. Used by the upgrade chaos redeploy
|
||||
# of heavy stacks (lasuite-drive): abra's monitor FATAs while a slow service (collabora's
|
||||
# new-version jail/config init) is still becoming healthy, even though it converges given
|
||||
# time. The caller then performs its OWN, stricter convergence+health wait (services N/N +
|
||||
# app health + recipe READY_PROBE) with a generous deadline — see lifecycle.chaos_redeploy.
|
||||
args.append("-c")
|
||||
_run(args, timeout=timeout)
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user