Files
cc-ci/machine-docs/BACKLOG-redfix.md

4.4 KiB

BACKLOG — phase redfix

Build backlog

M1 — investigate + isolate + classify (all six)

  • discourse — reproduce cold-deploy timeout/wedge in isolation; root-cause (headroom vs convergence bug vs upstream compose defect sidekiq.depends_on: discourse); classify.
  • mattermost-lts — test_restore.py::test_restore_returns_state in isolation: green→load flake, red→diagnose restore (recipe vs test).
  • mumble — custom/test_protocol_handshake.py::test_handshake_completes_with_channel_presence in isolation (canonical already present from today → likely flake; confirm).
  • bluesky-pds — warm-canonical promote routing: why warm-bluesky-pds… → 000 over HTTPS while container healthy internally + cold-test domain routes. Find cc-ci warm-machinery defect.
  • gitea — 3.5.3→3.6.0 warm advance crash (app.ini read-only, JWT save). Recipe vs harness.
  • keycloak — de-enrolled (live-warm OIDC collision). Design collision-free warm domain/namespace.

M2 — FIX + verify all six (recipe PR or harness improvement)

Execution gated on M1 PASS (avoid node contention with Adversary M1 re-runs; classifications must hold). Concrete fix designs from M1 evidence:

  • mattermost-lts (recipe PR, clearest) — add pg_backup.sh (immich pattern, no VectorChord bits): backup(){ pg_dump -U mattermost mattermost | gzip > /var/lib/postgresql/data/backup.sql; } restore(){ gunzip -c …/backup.sql | psql -U mattermost -d mattermost -f -; }. compose: add configs: pg_backup → /pg_backup.sh; postgres labels → backup.pre-hook: /pg_backup.sh backup, restore.post-hook: /pg_backup.sh restore, backup.volumes.postgres.path: backup.sql (dump-only, drop the whole-PGDATA backup.path + the rm post-hook). Verify via !testme → restore green.
  • bluesky-pds (recipe PR) — eliminate the app-alias collision on shared proxy: give the PDS service a unique name (e.g. pds) OR a unique network alias, and update caddy refs (reverse_proxy, on_demand_tls ask http://…/tls-check), healthcheck, backup labels, ops/test service= refs. Verify warm promote → 200 on /xrpc/_health. (NOTE: cc-ci harness ops.py/tests reference service="app" for bluesky? check + update if the recipe service renames — but recipe mirror is PR-only; cc-ci-side refs are a separate cc-ci change.) Confirm exact approach in M2.
  • gitea (recipe PR) — make app.ini writable on the warm-reattach advance so 3.6.0 can persist the JWT secret: render app.ini into the WRITABLE config:/etc/gitea volume via the existing docker-setup.sh entrypoint (copy the templated config to a writable path) instead of the read-only app_ini docker-config mount; OR ensure the persisted JWT secret is accepted without rewrite. Verify the 3.5.3→3.6.0 advance promotes. (Ties to LFS PR #1.)
  • keycloak (harness, cc-ci branch) — canonical.canonical_domain(r): return a collision-free domain when r is a live-warm provider (r in warm.WARM_DOMAINS) → e.g. warm-canon-<r>.ci.commoninternet.net; else keep warm-<r> (zero blast radius on the 15 others). Set keycloak WARM_CANONICAL=True. Verify keycloak promotes at warm-canon-keycloak WITHOUT disrupting live warm-keycloak (200 throughout).
  • mumble (harness, cc-ci branch) — stabilize the handshake under load: add a READY_PROBE/ readiness gate (TCP 64738 stably listening + a successful handshake) before the custom tier and/or raise retry_handshake budget; verify green under a concurrent-load re-run.
  • discourse (TRICKIEST — decide in M2) — the overlay test_upgrade.py asserts a bitnamilegacy→official migration absent from all releases/main. Options: (a) cc-ci test PR (--with-tests) scoping the faithfulness assertion to ONLY fire when the head actually performs the migration (image still bitnamilegacy → N/A, not RED) — NOT a weakening, a correct scope; + file an upstream recipe issue/PR for the real bitnamilegacy→official migration. (b) recipe PR doing the migration (major rewrite — official discourse image is launcher-based, likely infeasible cleanly). Lean (a)+tracked-upstream; may need operator input (DEFERRED?) — assess in M2.

Adversary findings

(Adversary-owned — do not edit.)