claim(Q4.1): matrix-synapse full lifecycle GREEN — §4.3 register transient post-restore 500 root-caused (synapse DB pool closed by restore DROP DATABASE FORCE) + fixed with bounded readiness-retry (not weakened); 5 tiers + 3 functional pass, P4 ci_marker survives, deploy-count=1, clean teardown

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-30 01:00:38 +01:00
parent db124d5107
commit 9a8850affa

View File

@ -54,14 +54,12 @@ deploy-count=1, real upgrade crossover, P4 restore `test_restore_returns_state`
recipe-PR `recipe-maintainers/immich#1` pg_dump backup is a real fix, published-recipe bug statically
confirmed), 2 distinct P3 functional tests, clean teardown. **P4-restore RED CLOSED, no veto.** DONE.
**Q4.1 matrix-synapse — install/upgrade/backup/restore GREEN; custom §4.3 register test FAILED,
DIAGNOSING @2026-05-30.** Full run `/root/ccci-matrix-full.log`: install+upgrade+backup+restore pass
(deploy-count=1, P4 ci_marker survives), 2/3 functional pass (federation-version + client-versions-JSON);
but `test_register_two_users_send_receive_message` FAILED — synapse returns **HTTP 500 M_UNKNOWN
Internal server error** on the shared-secret admin register POST (nonce GET succeeds, so the endpoint
is enabled). Synapse v1.149.1 (latest published). Reproducing with synapse-log capture
(`/root/matrix-synapse-debug.log` + `/root/ccci-matrix-debug.log`) to get the traceback. NOT a test to
weaken — diagnosing the 500 (recipe config vs synapse regression vs test). NOT claimed.
**Q4.1 matrix-synapse — ✅ FULL LIFECYCLE GREEN @2026-05-30 — CLAIMED (see ## Gate Q4.1), awaiting
Adversary.** The §4.3 register test's transient post-restore 500 was root-caused + fixed with a
bounded readiness-retry (NOT weakened). Full run `/root/ccci-matrix-full2.log`: all 5 tiers pass,
deploy-count=1, clean teardown; the retry log proves the transient (POST 500 attempt 1 → succeeded
attempt 2) and the synapse capture log shows the cause (restore-tier `DROP DATABASE FORCE` closed
synapse's DB pool: `psycopg2.InterfaceError: connection already closed`).
**Q4.6 discourse — BLOCKED/DEFERRED @2026-05-29.** Upstream recipe pins `bitnami/discourse:*` images
that Docker Hub no longer serves (manifest unknown; swarm task Rejected "No such image"). Image exists
@ -203,7 +201,66 @@ SKIP no longer yields a GREEN `!testme`.
## Gate
**Gate: Q3.5 immich — CLAIMED @2026-05-30, awaiting Adversary.**
**Gate: Q4.1 matrix-synapse — CLAIMED @2026-05-30, awaiting Adversary.**
**WHAT.** matrix-synapse (DB + media-store category; synapse `app` + postgres `db` + nginx `web`)
runs its **full lifecycle GREEN** — install + upgrade (real prev→latest crossover) + backup + restore
+ custom. One real defect was found + fixed honestly (not weakened): the §4.3 register test hit a
**transient post-restore HTTP 500** because the restore tier's `DROP DATABASE … WITH (FORCE)`
(pg_backup.sh restore) force-closes synapse's postgres connection pool, and a registration (a DB
*write*) issued during synapse's pool-recovery window 500s even while HTTP health (a read) is green.
Fixed by a **bounded readiness-retry** in `_admin_register` (re-fetch nonce + re-POST on 5xx/transport
error, ≤90s, then RAISE; 4xx = fail-fast). The assertion is unchanged (two users must register +
send/receive a message). Root cause is proven: the synapse capture log shows
`server closed the connection unexpectedly` / `psycopg2.InterfaceError: connection already closed` at
the restore moment, and the retry diagnostic shows `POST transient 500 (attempt 1) → succeeded
(attempt 2)`.
- **P2 parity:** `health_check.py``functional/test_health_check.py::test_synapse_client_versions_returns_json`.
(Heavy operational parity ports — compress_state/complexity/purge — deferred to the `--extra` flag,
DEFERRED.md, operator-confirmed.)
- **P3 (≥2 separate non-vacuous functional tests):** `test_register_and_message.py` (§4.3: register
two users via shared-secret admin → login → create room → invite → join → send → read-back a unique
marker) + `test_federation_version.py` (`/_matrix/federation/v1/version` reachable — the distinctive
federation surface).
- **P4 (data-integrity):** `ops.py` seeds a postgres `ci_marker` in synapse's DB; survives upgrade
(chaos crossover) + backup→wipe→restore via the recipe's real `pg_backup.sh` DB-dump hook
(`test_backup_captures_state` + `test_restore_returns_state` PASS; non-vacuous — pre_restore DROPs
the table and asserts the drop took).
- **P5/P6 N/A:** self-contained (postgres is in-recipe, no external dep); core function is the
client/federation API, fully exercised — no browser-only UX owed.
**HOW (Adversary, cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && git pull && RECIPE=matrix-synapse PR=0 \
cc-ci-run runner/run_recipe_ci.py'
```
**EXPECTED:**
- RUN SUMMARY: `deploy-count = 1 (expect 1)`; `install/upgrade/backup/restore/custom` **all pass**.
- Upgrade: `upgrade→PR-head: head_ref=5b21a6b4 chaos-version=5b21a6b4 version=7.1.0+v1.149.1→
7.1.1+v1.149.1` (HC1, head_ref==chaos-version, real prev→latest crossover).
- Restore: `tests/matrix-synapse/test_restore.py::test_restore_returns_state PASSED` (postgres
ci_marker survives the recipe's pg_backup.sh backup→restore).
- Custom — **3 PASS**: `test_federation_version_endpoint`,
`test_synapse_client_versions_returns_json`, `test_register_two_users_send_receive_message`. The
register test MAY log `[register] …: POST transient 500 (attempt N, synapse recovering) — retrying`
then `succeeded` — that is the EXPECTED post-restore recovery, not a failure (it still PASSES; a
*persistent* 500 would RAISE after 90s).
- Clean teardown: post-run no `matr-*` stack/volumes/secrets.
**WHERE.** Fix commit `db124d5` (`tests/matrix-synapse/functional/test_register_and_message.py`
readiness-retry). matrix tests: `tests/matrix-synapse/{recipe_meta.py,PARITY.md,ops.py,test_install.py,
test_upgrade.py,test_backup.py,test_restore.py,functional/{test_health_check.py,test_federation_version.py,
test_register_and_message.py}}`. Authoritative log `/root/ccci-matrix-full2.log` (5 tiers green,
deploy-count=1, retry diagnostic, clean teardown). Root-cause evidence: synapse capture
`/root/matrix-synapse-debug.log` (psycopg2 connection-closed at restore).
---
**Gate: Q3.5 immich — ✅ Adversary PASS @2026-05-30 (REVIEW-2 `11c5498`).** Cold full lifecycle GREEN,
deploy-count=1, P4 restore non-vacuous PASS (recipe-PR pg_dump fix real; published-recipe bug
statically confirmed), 2 distinct P3 tests, clean teardown. P4-restore RED CLOSED, no veto. (Claim
detail retained below.)
**WHAT.** immich (D10 object-storage / large-volume photo+video manager; self-contained: app +
machine-learning + redis + postgres) runs its **full lifecycle GREEN** — install + upgrade (real