Files
cc-ci/tests/mumble/recipe_meta.py
autonomic-bot ec76072489 fix(2): Q4.2 mumble — TCP voice-server READY_PROBE gates backup past upgrade host-port churn
Diagnostic (RECIPE=mumble STAGES=install,backup,restore,custom, no upgrade) PROVED backup+restore green
on a stable 1.0.0 deploy incl. ci_marker survival (P4). The full-run backup 409 ('container not
running') was the chaos UPGRADE redeploy: host-mode 64738 must be released by the old task + rebound by
the new, and HEALTH_PATH '/' only proves the mumble-web sidecar (not the voice server), so wait_healthy
passed while the app churned → backup-bot execed a not-running container. Fix: extend
lifecycle.wait_ready_probes to support a TCP probe ({tcp_host,tcp_port,stable=N consecutive connects});
mumble recipe_meta READY_PROBE returns 64738 (stable=3) so the harness waits for the voice server up
after install AND upgrade before backup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 20:19:07 +01:00

50 lines
3.0 KiB
Python

# Per-recipe harness config for mumble (Phase 2 Q4.2 — a TCP/voice recipe, not HTTP-native).
#
# Mumble's voice server speaks its own TLS protocol on 64738 (no HTTP API). To fit cc-ci's
# HTTP-readiness + on-host test model we deploy two recipe overlays:
# - compose.mumbleweb.yml -> a mumble-web HTTP client routed through Traefik on the app domain,
# giving the generic harness a real HTTP readiness/serving signal (HEALTH_PATH "/") AND the
# web_client.py parity surface.
# - compose.host-ports.yml -> publishes 64738 (tcp+udp) directly on the cc-ci host (mode: host).
# Tests run on-host (cc-ci-run), so the protocol tests connect to 127.0.0.1:64738.
# Both overlays are shipped by the upstream recipe; this is a documented deployment mode, not a fork.
#
# Distinctive config markers (read back by the recipe-specific functional tests, proving our config
# actually propagated into the running server — version-independent, not hard-coded upstream values):
# WELCOME_TEXT -> MUMBLE_CONFIG_WELCOMETEXT, surfaced in the ServerSync welcome_text.
# USERS -> MUMBLE_CONFIG_USERS (max users), surfaced in the ServerConfig.max_users.
HEALTH_PATH = "/" # mumble-web client UI
HEALTH_OK = (200,)
# install_steps.sh provides compose.host-ports.yml to recipe versions that predate it (the upgrade
# tier's base deploy is the previous published version, 0.2.0+, which lacks the upstream overlay).
# That untracked file makes abra's PINNED base-deploy clean-tree check FATA, so deploy the
# explicitly-checked-out pinned version with chaos (skips lint/clean-tree; deploys the version, not
# LATEST). No-op for the upgrade tier (already a PR-head chaos redeploy). See DECISIONS.md.
CHAOS_BASE_DEPLOY = True
DEPLOY_TIMEOUT = 900 # two images to pull (mumble-server + mumble-web) on a cold node
HTTP_TIMEOUT = 300
# A unique, stable welcome-text marker the round-trip test asserts surfaces over the protocol.
WELCOME_TEXT_MARKER = "cc-ci-mumble-welcome-7f3a9c"
# A distinctive max-users value (not the recipe default 100) the server_config test asserts.
MAX_USERS = 42
EXTRA_ENV = {
"COMPOSE_FILE": "compose.yml:compose.mumbleweb.yml:compose.host-ports.yml",
"WELCOME_TEXT": WELCOME_TEXT_MARKER,
"USERS": str(MAX_USERS),
}
def READY_PROBE(domain):
# HEALTH_PATH "/" only proves the mumble-web HTTP sidecar; it does NOT reflect the voice server.
# After a chaos upgrade redeploy the host-mode 64738 port must be released by the old task and
# rebound by the new one — a window where the app (voice) container isn't yet serving while
# mumble-web still returns 200. backup-bot then execs its sqlite pre-hook into a not-running app
# container → 409. Gate readiness on the voice port being STABLY listening (3 consecutive
# connects) before the harness proceeds to the backup tier. The port is host-published
# (compose.host-ports.yml), so we probe it on the cc-ci host where the run executes.
return [{"tcp_host": "127.0.0.1", "tcp_port": 64738, "stable": 3}]