review(2): Q4.1 matrix-synapse PASS — COLD first-hand full lifecycle GREEN (my clone, log adv-matrix-cold); 5 tiers + 3 custom, deploy-count=1, real upgrade crossover 7.1.0→7.1.1, P4 restore ci_marker survives; §4.3 register retry verified NON-VACUOUS + reproduced the real post-restore transient (500 attempt1/2 → succeeded attempt3, full register→room→send→readback chain intact, 4xx fail-fast, timeout RAISEs); clean teardown; no veto

This commit is contained in:
2026-05-30 01:07:53 +01:00
parent b73018c9ab
commit c503f7d51c

View File

@ -1542,3 +1542,48 @@ clean teardown. **The previously-OPEN Q3.5 P4-restore RED is CLOSED.** No `## VE
**Isolation note:** verdict formed from the plan + code (ops/test_backup/test_restore + the 2 functional
tests + recipe-PR `pg_backup.sh`/`compose.yml`) + the STATUS claim verification info + my own cold
full-lifecycle re-run + direct recipe-checkout inspection. JOURNAL-2 not consulted before this verdict.
---
## Q4.1 matrix-synapse — PASS @2026-05-30T~01:07Z (COLD, first-hand, my clone /root/adv-verify @origin/main b73018c)
Re-ran the FULL harness myself cold: `RECIPE=matrix-synapse PR=0 cc-ci-run runner/run_recipe_ci.py`.
Log `/root/adv-matrix-cold.log`. **All 5 tiers + 3 custom GREEN; deploy-count=1; clean teardown.** The
contested fix (a bounded readiness-retry on the §4.3 register test) is honest and non-vacuous, and I
**independently reproduced the exact transient** it handles.
- **RUN SUMMARY:** `deploy-count = 1 (expect 1)`; install/upgrade/backup/restore/custom **all pass**.
- **Upgrade — real crossover (HC1):** `upgrade→PR-head: head_ref=5b21a6b4 chaos-version=5b21a6b4
version=7.1.0+v1.149.1→7.1.1+v1.149.1` (head_ref==chaos-version; genuine prev→latest recipe-version
crossover, chaos redeploy on the PR-head checkout).
- **§4.3 register test — the crux — PASSED, and I OBSERVED the real transient.** The custom-tier log
shows: `[register] alice…: POST transient 500 (attempt 1, synapse recovering) — retrying` → `(attempt
2) — retrying` → `succeeded on attempt 3 (synapse recovered)`, then PASSED (39s). This independently
confirms the Builder's root cause: the restore tier's `DROP DATABASE … WITH (FORCE)` (pg_backup.sh)
force-closes synapse's postgres pool, so a registration (a DB *write*) 500s during the pool-recovery
window while HTTP health (a read) is already green. The retry is **NOT a weakening** (I audited
`_admin_register`): 90s bounded deadline; retries only on 5xx/transport-error re-fetching a fresh
nonce; **4xx → immediate `raise` (fail-fast, real rejections not retried)**; timeout → `raise
AssertionError` (fails loud, never silent-skips). The full assertion chain is intact and ran to
completion: register 2 users (shared-secret admin via container localhost) → public login → createRoom
→ invite → join → send `m.room.message` w/ unique marker → user_b read-back asserts the marker. Each
step exercises a distinct synapse layer; a broken synapse fails at that step.
- **P4 (data-integrity) — restore PASSED.** `test_restore_returns_state PASSED` + `test_backup_captures_
state PASSED` — the postgres `ci_marker` survives the recipe's real `pg_backup.sh` DB-dump
backup→wipe→restore. Non-vacuous (`ops.pre_restore` DROPs the table and asserts the drop took).
- **P2 parity:** `health_check.py`→`test_synapse_client_versions_returns_json` (PASSED). Heavy
operational parity ports (compress_state/complexity/purge) deferred to `--extra` (DEFERRED.md,
operator-confirmed). Accepted.
- **P3 — 2 separate non-vacuous functional tests (both PASSED):** `test_register_two_users_send_receive_
message` (§4.3 above) + `test_federation_version_endpoint` (`/_matrix/federation/v1/version` — the
distinctive federation surface). Distinct code paths.
- **P5/P6 — N/A justified.** Self-contained (postgres in-recipe, no external dep); core function is the
client/federation API, fully exercised; no browser-only UX owed.
- **Teardown:** post-run no `matr-*` stack, volumes, or secrets. Clean.
**Verdict: Q4.1 matrix-synapse PASS.** Full lifecycle GREEN cold, deploy-count=1, real upgrade
crossover, P4 ci_marker survives the real DB backup→restore, both P3 tests non-vacuous, the §4.3
register flow genuinely completes (I reproduced the post-restore transient → bounded-retry → success;
the fix is honest, fail-loud, and does not mask a persistent failure), clean teardown. No `## VETO`.
**Isolation note:** verdict from the plan + code (`_admin_register` retry logic + the full §4.3 flow +
ops/test_backup/test_restore) + STATUS claim verification info + my own cold full-lifecycle re-run
(which reproduced the transient first-hand). JOURNAL-2 not consulted before this verdict.