From bcc32d997b407267224e130cba13b6184fde970a Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sun, 31 May 2026 00:30:15 +0000 Subject: [PATCH] =?UTF-8?q?status(2):=20discourse=20=E2=80=94=202=20bugs?= =?UTF-8?q?=20root-caused=20(post-upgrade=20backup=20race=20+=20mint=5Fadm?= =?UTF-8?q?in=20ruby=20PATH),=20fixes=20in=20full4=20validation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- machine-docs/STATUS-2.md | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/machine-docs/STATUS-2.md b/machine-docs/STATUS-2.md index d122830..26378d1 100644 --- a/machine-docs/STATUS-2.md +++ b/machine-docs/STATUS-2.md @@ -94,17 +94,25 @@ Standing VETO on DONE (REVIEW-2 @16:22:07Z) requires: ghost + discourse + mumble **upgrade-to-latest** green with justified `compose.ccci.yml` overlays. Current cycle: - **ghost F2-14b — ✅ Adversary PASS @2026-05-30T22:42Z (REVIEW-2, COLD, `/root/adv-ghost-f214b.log`).** Closes the GHOST portion of the DONE VETO checklist. DONE. -- **discourse Q4.6 — restore-hook fix, RE-RUNNING.** full1 (`/root/ccci-discourse-full1.log`): - install/upgrade/backup PASS; **restore FAIL** (`test_restore_returns_state`: ci_marker gone) + - **custom FAIL** (both gate on `/site.json` 200, which never converged). ROOT CAUSE (single): - the pg_backup.sh restore hook only did a one-shot `pg_terminate_backend` — the discourse app + - sidekiq reconnect over TCP within ms and interfered with the drop/recreate/reimport, breaking the - DB → ci_marker lost AND `/site.json` 500 in the post-restore custom tier. FIX (recipe-PR - `recipe-maintainers/discourse#1`, new head `3758522`): block all non-local connections via - `pg_hba.conf` (`local all all trust` + reload) before drop, restore on exit — mirrors the PROVEN - matrix-synapse restore hook (identical backupbot wiring, restore PASSED there). Harness now echoes - abra restore output (backupbot post-hook) into the run log (cc-ci `4a29ca6`) so restore is no longer - opaque. Run shape full `install,upgrade,backup,restore,custom`. PR head `3758522` (was `7a2e0e0`). +- **discourse Q4.6 — TWO bugs root-caused + fixed, VALIDATING (full4).** Investigation across full1-3: + - **(A) backup race — backup.sql not captured after the upgrade tier.** restic snapshots of full1/full2 + (WITH upgrade) lacked `postgresql_data/backup.sql` entirely (only discourse_data+redis_data); the + recipe's backupbot db pre-hook `/pg_backup.sh backup` didn't produce the dump at backup time, so + restore reimported nothing → ci_marker lost AND `/site.json` 500 in the post-restore custom tier. + Proven NOT a script bug: manual `bash -c 'set -o pipefail;/pg_backup.sh backup'` on the live db + yields a valid 922KB dump (exit 0); matrix-synapse uses the identical pattern and its snapshots DO + contain `postgres/_data/backup.sql`. full3 (WITHOUT upgrade) ran the pre-hook fine + restore PASSED. + Conclusion: the immediately-preceding UPGRADE chaos-redeploy cycles the db; pg_dump races that cycle + → dump truncated/absent (same race ghost F2-14b hit). FIX: `BACKUP_VERIFY` probe in + `tests/discourse/recipe_meta.py` (gzip-valid + non-empty backup.sql; False → harness re-runs the + whole backup, caps 3 then proceeds → non-masking; restore stays the real gate). Also kept the pg_hba + connection-block restore hook (recipe-PR head `3758522`) — correct hardening regardless. + - **(B) create_topic — `mint_admin` ruby not on PATH.** `bin/rails runner` (`#!/usr/bin/env ruby`) under + `bash -lc` (login shell resets PATH) → `env: 'ruby': No such file or directory` (rc=127). FIX: `bash -c` + (inherit image ENV) + discover ruby (`command -v ruby || /opt/bitnami/ruby/bin/ruby`) + invoke explicitly. + - Harness now echoes abra backup+restore output into the run log (cc-ci `4a29ca6`,`2f6a684`) — backup/ + restore no longer opaque. cc-ci fixes `8d689d6`. Validation run `/root/ccci-discourse-full4.log` + (full `install,upgrade,backup,restore,custom`, PR head `3758522`). - mumble F2-14c + plausible Q4.7b still open. ## Gate F2-14b — CLAIMED @2026-05-30T22:10Z (ghost upgrade-to-latest + reliable P4 backup-integrity)