Q3.2a run 1: Part A (install-time OIDC) GREEN — deploy-count=1, install/backup/restore/custom +
OIDC test all PASS. BUT upgrade tier FAILED: the in-place `abra app deploy --chaos` redeploy landed
on a STILL-BOOTING collabora (coolwsd ~2min boot: 1300+ l10n files + RSA keygen) and SIGTERMed it
mid-init ("Shutdown requested while starting up", forced exit 70) → abra aborted the deploy. The
install wait_healthy returns on container 1/1 while coolwsd is still loading. Fixes (plan §C
readiness-gating, no test weakened):
- tests/lasuite-drive/ops.py::pre_upgrade — wait for collabora WOPI discovery (/hosting/discovery
on collabora-<domain>) → 200 BEFORE the chaos redeploy, so it replaces a ready collabora cleanly.
- runner/harness/lifecycle.chaos_redeploy + generic.perform_upgrade + run_recipe_ci._perform_op —
plumb the recipe DEPLOY_TIMEOUT to the upgrade chaos redeploy (was abra.deploy's 900s default,
while the .env internal TIMEOUT is 1500s → Python could SIGKILL abra mid-wait on the slow
collabora/onlyoffice reconverge). Mirrors the install deploy_app timeout plumbing.
Also (operator naming change 2026-05-29): renamed `--extra-tests` -> `--extra` in DEFERRED.md +
BACKLOG-2.md Build-backlog section. 3 refs remain in BACKLOG-2 Adversary-findings section
(241/248/292, closed findings) — left for the Adversary (single-writer); orchestrator updated
IDEAS.md/plan-sso-dep-testing.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
70 lines
2.9 KiB
Python
70 lines
2.9 KiB
Python
"""lasuite-drive — pre-op seed hooks (Phase 1e HC3). The orchestrator runs these BEFORE the op; the
|
|
matching test_<op>.py asserts post-op (assertion-only). The marker is a dedicated `ci_marker` row in
|
|
postgres (independent of the app's Django migrations — CREATE TABLE IF NOT EXISTS), written via psql
|
|
in the `db` service. The backup path exercises the recipe's pg_backup.sh DB-dump hook (postgres is
|
|
backupbot-labelled)."""
|
|
|
|
import os
|
|
import sys
|
|
import time
|
|
|
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
|
from harness import lifecycle # noqa: E402
|
|
|
|
|
|
def _wait_collabora_ready(domain, timeout=420):
|
|
"""Gate the upgrade op on collabora being FULLY ready (WOPI discovery endpoint → 200), not just
|
|
container 1/1 'running'. coolwsd takes ~2min to boot (pre-reads 1300+ l10n files + RSA keygen);
|
|
the install wait_healthy returns on container 1/1 while coolwsd is still loading. An in-place
|
|
`abra app deploy --chaos` upgrade that lands on a still-booting collabora SIGTERMs it mid-init
|
|
("Shutdown requested while starting up", forced exit 70) → abra aborts the deploy (Q3.2a run 1,
|
|
JOURNAL 2026-05-29). Waiting for discovery=200 first makes the redeploy replace a ready collabora
|
|
cleanly. collabora routes on the COLLABORA_DOMAIN sibling (collabora-<domain>); /hosting/discovery
|
|
is the WOPI discovery endpoint celery's configure_wopi calls."""
|
|
host = f"collabora-{domain}"
|
|
deadline = time.time() + timeout
|
|
last = 0
|
|
while time.time() < deadline:
|
|
last = lifecycle.http_get(host, "/hosting/discovery", timeout=15)
|
|
if last == 200:
|
|
print(f" pre_upgrade: collabora WOPI discovery ready (200) on {host}", flush=True)
|
|
return
|
|
time.sleep(5)
|
|
raise AssertionError(
|
|
f"collabora WOPI discovery not ready on {host} (last status {last}) within {timeout}s"
|
|
)
|
|
|
|
|
|
def _psql(domain, sql):
|
|
cmd = f'PGPASSWORD=$(cat /run/secrets/postgres_p) psql -U drive -d drive -tAc "{sql}"'
|
|
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
|
|
|
|
|
def _seed(domain, value):
|
|
_psql(
|
|
domain,
|
|
"CREATE TABLE IF NOT EXISTS ci_marker(v text); DELETE FROM ci_marker; "
|
|
f"INSERT INTO ci_marker VALUES('{value}');",
|
|
)
|
|
assert _psql(domain, "SELECT v FROM ci_marker;") == value
|
|
|
|
|
|
def pre_upgrade(domain, meta):
|
|
# Gate the chaos redeploy on a fully-ready collabora (else it kills a still-booting coolwsd and
|
|
# abra aborts the upgrade deploy — Q3.2a run 1). Then seed the data-integrity marker.
|
|
_wait_collabora_ready(domain)
|
|
_seed(domain, "upgrade-survives")
|
|
|
|
|
|
def pre_backup(domain, meta):
|
|
_seed(domain, "original")
|
|
|
|
|
|
def pre_restore(domain, meta):
|
|
# drop the marker table (diverge from the backup) so a successful restore is observable
|
|
_psql(domain, "DROP TABLE ci_marker;")
|
|
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
|
"",
|
|
"NULL",
|
|
), "drop did not take"
|