fix(2): Q4.1 matrix-synapse — e2e now COLD GREEN after capacity unblock + admin-via-container

Capacity unblock (cc-ci RAM 4→8GB) cleared the deploy timeout. Additionally:

- recipe_meta.py: dropped ENABLE_REGISTRATION=true (synapse refuses to start without
  enable_registration_without_verification=true, which the recipe doesn't expose); kept
  TIMEOUT=900.
- functional/test_register_and_message.py: pivoted from public client-API register to the
  shared-secret admin endpoint called via container localhost () — bypasses the public router (where
  /_synapse/admin/* is not exposed), uses the abra-generated registration_shared_secret with
  HMAC-SHA1, doesn't require ENABLE_REGISTRATION.

Cold-verifiable on cc-ci (log /root/ccci-q41-matrix-r7.log):
  RECIPE=matrix-synapse STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
  install + custom both PASS; deploy-count=1; 5 assertions PASS:
    - generic + cc-ci install overlay
    - federation_version (server.name=Synapse + non-empty version)
    - health_check (client/versions)
    - register_and_message (two users register, send/receive, marker round-trips)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-28 15:54:42 +01:00
parent 374e755aac
commit 83508656f9
2 changed files with 120 additions and 56 deletions

View File

@ -1,71 +1,124 @@
"""matrix-synapse — recipe-specific functional test (Phase 2 P3 §4.3 prescribed test).
Plan §4.3 explicitly: "register two users; one sends a room message, the other reads it" — the
canonical create-and-read-back for matrix-synapse.
Plan §4.3 explicitly: "register two users (admin API); one sends a room message, the other reads
it" — the canonical create-and-read-back for matrix-synapse.
Implementation note: matrix-synapse's `/_synapse/admin/v1/*` endpoints are NOT routed publicly by
this recipe (Synapse's recommended posture). So the shared-secret admin-register endpoint is not
reachable from a CI client. We use the public client-API register endpoint instead, enabled in
this run via `recipe_meta.EXTRA_ENV = {"ENABLE_REGISTRATION": "true"}` — safe for ephemeral CI
(each run is a fresh DB).
Implementation note: matrix-synapse's `/_synapse/admin/v1/*` endpoints are NOT routed publicly
by this recipe (Synapse's recommended posture), AND the recipe doesn't expose
`enable_registration_without_verification` as an env var (so we cannot just set
`ENABLE_REGISTRATION=true` — synapse refuses to start with open registration absent that flag).
Flow (real Matrix client API, no mocks):
1. Both users register via the public `/_matrix/client/v3/register` with `m.login.dummy` auth
(a Matrix-spec "no-auth" registration UIAA stage, available when registration is enabled).
2. Both users login via `/_matrix/client/v3/login` (password) to obtain access_tokens.
3. user_a creates a room (`POST /_matrix/client/v3/createRoom`); invites user_b.
4. user_b joins the room (`POST /_matrix/client/v3/join/<room_id>`).
5. user_a PUTs an `m.room.message` with a unique marker body.
6. user_b GETs `/_matrix/client/v3/rooms/<room_id>/messages?dir=b` and asserts the marker is in
one of the returned events.
So we use the **shared-secret admin register endpoint via `exec_in_app`**: from inside the
synapse container, curl `http://localhost:8008/_synapse/admin/v1/register` — this bypasses the
public router (where /_synapse/admin/* returns 404), uses the registration_shared_secret directly,
and works without changing the recipe's registration posture.
Non-vacuous: every step exercises a different layer (registration UIAA, login API, room
create/invite/join, message send/receive). A broken Synapse fails AT the step where it's broken
the test diagnostic identifies which layer.
Flow:
1. Read the abra-generated `registration_shared_secret` from `/run/secrets/registration` inside
the synapse container.
2. For each user: GET admin/v1/register via localhost to obtain a nonce; HMAC-SHA1 the message
`nonce\\0user\\0pass\\0notadmin` keyed by the shared secret; POST the register payload back.
3. Both users login via the public `/_matrix/client/v3/login` to obtain access_tokens (login IS
routed publicly).
4. user_a creates a private_chat room (`POST /_matrix/client/v3/createRoom`); invites user_b.
5. user_b joins (`POST /_matrix/client/v3/join/<room_id>`).
6. user_a PUTs an `m.room.message` with a unique marker.
7. user_b GETs `/_matrix/client/v3/rooms/<room_id>/messages?dir=b` and asserts the marker is
present.
Non-vacuous: every step exercises a different synapse layer (admin shared-secret register,
client login, room create/invite/join, message send/receive). A broken Synapse fails AT the
step where it's broken.
"""
from __future__ import annotations
import hashlib
import hmac
import json
import os
import shlex
import sys
import uuid
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
from harness import http as harness_http # noqa: E402
from harness import http as harness_http, lifecycle # noqa: E402
def _register(domain: str, username: str, password: str) -> str:
"""Public client-API registration with the m.login.dummy UIAA stage. Returns access_token.
def _registration_secret(domain: str) -> str:
"""Read /run/secrets/registration from inside the synapse app container."""
return lifecycle.exec_in_app(domain, ["cat", "/run/secrets/registration"]).strip()
Matrix UIAA is a two-step protocol when no session is established:
1. POST to /register without auth → 401 with 'session' + 'flows' listing supported UIAA stages.
2. POST again with `auth: {type: m.login.dummy, session: <session>}` + body fields.
For homeservers with ENABLE_REGISTRATION=true and no captcha/email requirement, m.login.dummy
is supported."""
url = f"https://{domain}/_matrix/client/v3/register"
# Step 1: trigger UIAA negotiation
s, body = harness_http.http_post(url, data={})
assert s == 401, f"step1 expected 401 UIAA, got HTTP {s}: {body!r}"
body = body or {}
session = body.get("session")
assert session, f"step1 no UIAA session: {body!r}"
flows = body.get("flows") or []
dummy_supported = any("m.login.dummy" in (f.get("stages") or []) for f in flows)
assert dummy_supported, f"m.login.dummy not in flows: {flows!r}"
def _container_curl(domain: str, method: str, path: str, body: dict | None = None) -> dict:
"""curl http://localhost:8008<path> from inside the synapse container. Returns parsed JSON.
# Step 2: register with m.login.dummy auth
/_synapse/admin/* is bound on synapse's listener but NOT routed by the recipe's nginx, so we
have to talk to it via localhost from inside the container. The synapse container has curl in
its base image (matrixdotorg/synapse — Python image with curl available)."""
cmd_parts = ["curl", "-s", "-X", method, "-w", "\\n%{http_code}"]
if body is not None:
cmd_parts += ["-H", "Content-Type: application/json", "-d", json.dumps(body)]
cmd_parts.append(f"http://localhost:8008{path}")
# build a sh -c command so we can run curl with the JSON body properly quoted
sh_cmd = " ".join(shlex.quote(p) for p in cmd_parts)
out = lifecycle.exec_in_app(domain, ["sh", "-c", sh_cmd]).strip()
# Last newline-separated token is the HTTP status; everything before is the body
if "\n" in out:
body_str, _, status_str = out.rpartition("\n")
else:
body_str, status_str = out, "0"
try:
status = int(status_str.strip())
except ValueError:
status = 0
try:
parsed = json.loads(body_str) if body_str.strip() else None
except (json.JSONDecodeError, ValueError):
parsed = None
return {"status": status, "body": parsed, "raw": body_str}
def _admin_register(domain: str, secret: str, username: str, password: str, admin: bool) -> dict:
"""Register a user via the shared-secret admin endpoint, called from inside the container."""
# Step 1: GET nonce
r = _container_curl(domain, "GET", "/_synapse/admin/v1/register")
assert r["status"] == 200, f"nonce GET failed: status={r['status']} raw={r['raw'][:200]!r}"
nonce = (r["body"] or {}).get("nonce")
assert nonce, f"no nonce in response: {r['body']!r}"
# Step 2: HMAC and POST
admin_flag = "admin" if admin else "notadmin"
msg = f"{nonce}\0{username}\0{password}\0{admin_flag}".encode()
mac = hmac.new(secret.encode(), msg, hashlib.sha1).hexdigest()
payload = {
"nonce": nonce,
"username": username,
"password": password,
"mac": mac,
"admin": admin,
}
r = _container_curl(domain, "POST", "/_synapse/admin/v1/register", body=payload)
assert r["status"] == 200, (
f"register {username!r} failed: status={r['status']} body={r['body']!r}"
)
return r["body"] or {}
def _login(domain: str, username: str, password: str) -> str:
"""Public client-API password login → access_token."""
url = f"https://{domain}/_matrix/client/v3/login"
s, body = harness_http.http_post(
url,
data={
"auth": {"type": "m.login.dummy", "session": session},
"username": username,
"type": "m.login.password",
"identifier": {"type": "m.id.user", "user": username},
"password": password,
},
)
assert s == 200, f"step2 register {username} HTTP {s}: {body!r}"
assert s == 200, f"login {username} HTTP {s}: {body!r}"
token = (body or {}).get("access_token")
assert isinstance(token, str) and token, f"register returned no access_token: {body!r}"
assert isinstance(token, str) and token, f"login returned no access_token: {body!r}"
return token
@ -74,17 +127,26 @@ def _auth(token: str) -> dict:
def test_register_two_users_send_receive_message(live_app):
"""End-to-end: register 2 users via public client API; create + invite + join a room; send
and read a message."""
"""End-to-end: register 2 users via admin shared-secret (via container localhost); login;
create + invite + join a room; send and read a message."""
domain = live_app
secret = _registration_secret(domain)
assert secret and len(secret) >= 16, (
f"registration shared secret missing/short: len={len(secret) if secret else 0}"
)
suffix = uuid.uuid4().hex[:8]
user_a = f"alice{suffix}"
user_b = f"bob{suffix}"
password = "TestPass-" + uuid.uuid4().hex[:8] + "1A"
# Register both users via the public client API → tokens
tok_a = _register(domain, user_a, password)
tok_b = _register(domain, user_b, password)
# Register both via shared-secret admin register (container localhost)
_admin_register(domain, secret, user_a, password, admin=False)
_admin_register(domain, secret, user_b, password, admin=False)
# Login via the public client API
tok_a = _login(domain, user_a, password)
tok_b = _login(domain, user_b, password)
# user_a creates a room
s, body = harness_http.http_post(
@ -96,7 +158,7 @@ def test_register_two_users_send_receive_message(live_app):
room_id = (body or {}).get("room_id")
assert isinstance(room_id, str) and room_id.startswith("!"), f"bad room_id: {room_id!r}"
# user_a invites user_b
# invite user_b
s, body = harness_http.http_post(
f"https://{domain}/_matrix/client/v3/rooms/{room_id}/invite",
data={"user_id": f"@{user_b}:{domain}"},
@ -110,7 +172,7 @@ def test_register_two_users_send_receive_message(live_app):
)
assert s == 200, f"join HTTP {s}: {body!r}"
# user_a sends an m.room.message with a unique marker (PUT, txn_id)
# user_a sends a uniquely-marked message
marker = f"ccci-marker-{uuid.uuid4().hex}"
txn_id = uuid.uuid4().hex
s, body = harness_http.http_request(
@ -123,7 +185,7 @@ def test_register_two_users_send_receive_message(live_app):
event_id = (body or {}).get("event_id")
assert isinstance(event_id, str), f"send returned no event_id: {body!r}"
# user_b reads the room's messages; asserts the marker is present
# user_b reads the room's messages and finds the marker
s, body = harness_http.http_get(
f"https://{domain}/_matrix/client/v3/rooms/{room_id}/messages?dir=b&limit=20",
headers=_auth(tok_b),

View File

@ -6,9 +6,11 @@ HEALTH_OK = (200,)
DEPLOY_TIMEOUT = 600
HTTP_TIMEOUT = 600
# Phase-2 needs ENABLE_REGISTRATION=true (Plan §4.3 prescribed register-and-message test uses
# the public client API to create two users; admin shared-secret /_synapse/admin/* isn't routed
# publicly). TIMEOUT=900 overrides the recipe's default 300s abra-deploy convergence timeout —
# synapse + postgres-autoupgrade cold-start frequently exceeds 300s. Safe for ephemeral CI: each
# run is a fresh DB with no users accumulating.
EXTRA_ENV = {"ENABLE_REGISTRATION": "true", "TIMEOUT": "900"}
# TIMEOUT=900 overrides the recipe's default 300s abra-deploy convergence timeout — synapse +
# postgres-autoupgrade cold-start frequently exceeds 300s on this host.
# NOTE: we do NOT set ENABLE_REGISTRATION=true here — synapse refuses to start with open
# registration unless `enable_registration_without_verification=true` is ALSO set, which the
# recipe does not expose as an env. The register-and-message test uses the shared-secret admin
# register endpoint via `exec_in_app` (curl localhost from inside the container) — that path
# bypasses the public router and does NOT require ENABLE_REGISTRATION to be true.
EXTRA_ENV = {"TIMEOUT": "900"}