From 9a8850affac832b9dfe00a3ffdf04e6529a0d0ed Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sat, 30 May 2026 01:00:38 +0100 Subject: [PATCH] =?UTF-8?q?claim(Q4.1):=20matrix-synapse=20full=20lifecycl?= =?UTF-8?q?e=20GREEN=20=E2=80=94=20=C2=A74.3=20register=20transient=20post?= =?UTF-8?q?-restore=20500=20root-caused=20(synapse=20DB=20pool=20closed=20?= =?UTF-8?q?by=20restore=20DROP=20DATABASE=20FORCE)=20+=20fixed=20with=20bo?= =?UTF-8?q?unded=20readiness-retry=20(not=20weakened);=205=20tiers=20+=203?= =?UTF-8?q?=20functional=20pass,=20P4=20ci=5Fmarker=20survives,=20deploy-c?= =?UTF-8?q?ount=3D1,=20clean=20teardown?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 (1M context) --- machine-docs/STATUS-2.md | 75 +++++++++++++++++++++++++++++++++++----- 1 file changed, 66 insertions(+), 9 deletions(-) diff --git a/machine-docs/STATUS-2.md b/machine-docs/STATUS-2.md index 2144f60..7825ab1 100644 --- a/machine-docs/STATUS-2.md +++ b/machine-docs/STATUS-2.md @@ -54,14 +54,12 @@ deploy-count=1, real upgrade crossover, P4 restore `test_restore_returns_state` recipe-PR `recipe-maintainers/immich#1` pg_dump backup is a real fix, published-recipe bug statically confirmed), 2 distinct P3 functional tests, clean teardown. **P4-restore RED CLOSED, no veto.** DONE. -**Q4.1 matrix-synapse — install/upgrade/backup/restore GREEN; custom §4.3 register test FAILED, -DIAGNOSING @2026-05-30.** Full run `/root/ccci-matrix-full.log`: install+upgrade+backup+restore pass -(deploy-count=1, P4 ci_marker survives), 2/3 functional pass (federation-version + client-versions-JSON); -but `test_register_two_users_send_receive_message` FAILED — synapse returns **HTTP 500 M_UNKNOWN -Internal server error** on the shared-secret admin register POST (nonce GET succeeds, so the endpoint -is enabled). Synapse v1.149.1 (latest published). Reproducing with synapse-log capture -(`/root/matrix-synapse-debug.log` + `/root/ccci-matrix-debug.log`) to get the traceback. NOT a test to -weaken — diagnosing the 500 (recipe config vs synapse regression vs test). NOT claimed. +**Q4.1 matrix-synapse — ✅ FULL LIFECYCLE GREEN @2026-05-30 — CLAIMED (see ## Gate Q4.1), awaiting +Adversary.** The §4.3 register test's transient post-restore 500 was root-caused + fixed with a +bounded readiness-retry (NOT weakened). Full run `/root/ccci-matrix-full2.log`: all 5 tiers pass, +deploy-count=1, clean teardown; the retry log proves the transient (POST 500 attempt 1 → succeeded +attempt 2) and the synapse capture log shows the cause (restore-tier `DROP DATABASE FORCE` closed +synapse's DB pool: `psycopg2.InterfaceError: connection already closed`). **Q4.6 discourse — BLOCKED/DEFERRED @2026-05-29.** Upstream recipe pins `bitnami/discourse:*` images that Docker Hub no longer serves (manifest unknown; swarm task Rejected "No such image"). Image exists @@ -203,7 +201,66 @@ SKIP no longer yields a GREEN `!testme`. ## Gate -**Gate: Q3.5 immich — CLAIMED @2026-05-30, awaiting Adversary.** +**Gate: Q4.1 matrix-synapse — CLAIMED @2026-05-30, awaiting Adversary.** + +**WHAT.** matrix-synapse (DB + media-store category; synapse `app` + postgres `db` + nginx `web`) +runs its **full lifecycle GREEN** — install + upgrade (real prev→latest crossover) + backup + restore ++ custom. One real defect was found + fixed honestly (not weakened): the §4.3 register test hit a +**transient post-restore HTTP 500** because the restore tier's `DROP DATABASE … WITH (FORCE)` +(pg_backup.sh restore) force-closes synapse's postgres connection pool, and a registration (a DB +*write*) issued during synapse's pool-recovery window 500s even while HTTP health (a read) is green. +Fixed by a **bounded readiness-retry** in `_admin_register` (re-fetch nonce + re-POST on 5xx/transport +error, ≤90s, then RAISE; 4xx = fail-fast). The assertion is unchanged (two users must register + +send/receive a message). Root cause is proven: the synapse capture log shows +`server closed the connection unexpectedly` / `psycopg2.InterfaceError: connection already closed` at +the restore moment, and the retry diagnostic shows `POST transient 500 (attempt 1) → succeeded +(attempt 2)`. +- **P2 parity:** `health_check.py` → `functional/test_health_check.py::test_synapse_client_versions_returns_json`. + (Heavy operational parity ports — compress_state/complexity/purge — deferred to the `--extra` flag, + DEFERRED.md, operator-confirmed.) +- **P3 (≥2 separate non-vacuous functional tests):** `test_register_and_message.py` (§4.3: register + two users via shared-secret admin → login → create room → invite → join → send → read-back a unique + marker) + `test_federation_version.py` (`/_matrix/federation/v1/version` reachable — the distinctive + federation surface). +- **P4 (data-integrity):** `ops.py` seeds a postgres `ci_marker` in synapse's DB; survives upgrade + (chaos crossover) + backup→wipe→restore via the recipe's real `pg_backup.sh` DB-dump hook + (`test_backup_captures_state` + `test_restore_returns_state` PASS; non-vacuous — pre_restore DROPs + the table and asserts the drop took). +- **P5/P6 N/A:** self-contained (postgres is in-recipe, no external dep); core function is the + client/federation API, fully exercised — no browser-only UX owed. + +**HOW (Adversary, cold, on cc-ci):** +``` +ssh cc-ci 'cd /root/ && git pull && RECIPE=matrix-synapse PR=0 \ + cc-ci-run runner/run_recipe_ci.py' +``` + +**EXPECTED:** +- RUN SUMMARY: `deploy-count = 1 (expect 1)`; `install/upgrade/backup/restore/custom` **all pass**. +- Upgrade: `upgrade→PR-head: head_ref=5b21a6b4 chaos-version=5b21a6b4 version=7.1.0+v1.149.1→ + 7.1.1+v1.149.1` (HC1, head_ref==chaos-version, real prev→latest crossover). +- Restore: `tests/matrix-synapse/test_restore.py::test_restore_returns_state PASSED` (postgres + ci_marker survives the recipe's pg_backup.sh backup→restore). +- Custom — **3 PASS**: `test_federation_version_endpoint`, + `test_synapse_client_versions_returns_json`, `test_register_two_users_send_receive_message`. The + register test MAY log `[register] …: POST transient 500 (attempt N, synapse recovering) — retrying` + then `succeeded` — that is the EXPECTED post-restore recovery, not a failure (it still PASSES; a + *persistent* 500 would RAISE after 90s). +- Clean teardown: post-run no `matr-*` stack/volumes/secrets. + +**WHERE.** Fix commit `db124d5` (`tests/matrix-synapse/functional/test_register_and_message.py` +readiness-retry). matrix tests: `tests/matrix-synapse/{recipe_meta.py,PARITY.md,ops.py,test_install.py, +test_upgrade.py,test_backup.py,test_restore.py,functional/{test_health_check.py,test_federation_version.py, +test_register_and_message.py}}`. Authoritative log `/root/ccci-matrix-full2.log` (5 tiers green, +deploy-count=1, retry diagnostic, clean teardown). Root-cause evidence: synapse capture +`/root/matrix-synapse-debug.log` (psycopg2 connection-closed at restore). + +--- + +**Gate: Q3.5 immich — ✅ Adversary PASS @2026-05-30 (REVIEW-2 `11c5498`).** Cold full lifecycle GREEN, +deploy-count=1, P4 restore non-vacuous PASS (recipe-PR pg_dump fix real; published-recipe bug +statically confirmed), 2 distinct P3 tests, clean teardown. P4-restore RED CLOSED, no veto. (Claim +detail retained below.) **WHAT.** immich (D10 object-storage / large-volume photo+video manager; self-contained: app + machine-learning + redis + postgres) runs its **full lifecycle GREEN** — install + upgrade (real