fix(2): F2-12 lasuite-drive upgrade tier — own convergence wait (abra -c) + collabora READY_PROBE

Adversary cold-verify FAILed Q3.2 (F2-12): the prev→PR-head chaos upgrade's abra converge monitor
FATAs while the NEW collabora 25.04.9.4.1's healthcheck is still in start_period (jail/config init),
even though it converges given swarm's healthcheck retries. My WOPI pre-gate fixed the OLD collabora
being killed mid-boot but not the NEW collabora's convergence. Flaky (3x green for me, 1x fail cold).

Fix (cc-ci-side, stronger verification — not weaker):
- abra.deploy gains no_converge_checks (`-c`); chaos_redeploy passes it for the upgrade op so abra's
  impatient monitor no longer FATAs (the stack spec is applied regardless).
- perform_upgrade now OWNS the convergence verification after the redeploy: wait_healthy (services
  N/N + app HEALTH_PATH) + new lifecycle.wait_ready_probes (recipe READY_PROBE), bounded by the
  recipe DEPLOY_TIMEOUT (generous) not abra's impatient window. meta threaded _perform_op→perform_upgrade.
- recipe_meta READY_PROBE hook (added to _load_meta whitelist): lasuite-drive probes collabora WOPI
  discovery (/hosting/discovery on collabora-<domain>) → 200. Called after install deploy AND after
  the upgrade redeploy. No-op for recipes without a READY_PROBE.

NOT re-claiming yet — validating the upgrade tier is now reliably green (incl. the slow-collabora
crossover) across multiple runs before re-claiming Q3.2. F2-12 stays open (Adversary-owned).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 11:55:53 +01:00
parent aab77ea0f3
commit e1147b5fe3
5 changed files with 92 additions and 12 deletions

View File

@ -34,6 +34,17 @@ DEPS = ["keycloak"]
OIDC_AT_INSTALL = True
def READY_PROBE(domain):
"""Readiness signals beyond replica-convergence + the app HEALTH_PATH (Q3.2/F2-12). collabora's
coolwsd reports its container 1/1 'running' while still doing jail/config init, and its WOPI
discovery endpoint 404s until ready — so the harness waits for `/hosting/discovery` → 200 on the
collabora sibling host after the install deploy AND after the upgrade chaos redeploy. This is what
makes the heavy prev→PR-head crossover reliably green (the new collabora 25.04.9.x finishes init
within swarm's healthcheck retries; abra's own converge monitor was too impatient — F2-12)."""
label, _, rest = domain.partition(".")
return [{"host": f"collabora-{domain}", "path": "/hosting/discovery", "ok": (200,)}]
def EXTRA_ENV(domain):
# Two of lasuite-drive's services route on DOMAIN-DERIVED **nested** subdomains —
# `MINIO_DOMAIN="minio.${DOMAIN}"` and `COLLABORA_DOMAIN="collabora.${DOMAIN}"`. The cc-ci