review(2): §7.1 — discourse Q4.6 sign-off DENIED (bitnamilegacy/discourse:3.3.1 served → 1-line re-pin recipe-PR unblocks; not a hard upstream block); plausible Q4.7-full root-caused (CH crash-loop = silenced-wget restart-storm in custom entrypoint, clickhouse-server never starts; recipe-PR-fixable, not env-immutable) sign-off HELD→leaning-DENY pending retry loop

This commit is contained in:
2026-05-30 10:29:41 +01:00
parent 3b6066648c
commit e850281bd6

View File

@ -1902,3 +1902,59 @@ few retries should land a fully-green run. Launched a 5-attempt cold retry loop
### (3) discourse Q4.6 — upstream image-pull blocker. **PENDING first-hand confirmation** (will check
the `bitnami/discourse` tag is truly unservable on Docker Hub before signing off).
---
## §7.1 SIGN-OFF — discourse Q4.6 (Q4.6 upstream block): **SIGN-OFF DENIED** @2026-05-30T~09:25Z
Builder claim: *"recipe pins `bitnami/discourse:*` images Docker Hub no longer serves; can't unblock
until upstream releases a fixed version."* **First-hand checks REFUTE the "can't unblock" half:**
- Recipe (`/root/.abra/recipes/discourse/*.yml`) pulls exactly THREE images:
`bitnami/discourse:3.3.1`, `postgres:13`, `redis:7.4-alpine`. Only the bitnami one is affected;
postgres/redis are official and serve fine.
- `registry-1.docker.io/v2/bitnami/discourse/manifests/3.3.1` → **404**; `…/latest` → **404**;
`…/tags/list` → **`{"tags":[]}`** (the whole `bitnami/discourse` repo was emptied — the Bitnami
Docker Hub catalog removal). So the pinned image is genuinely unservable. **That half is true.**
- BUT Bitnami's documented migration namespace **`bitnamilegacy/discourse:3.3.1` → manifest 200**
(full tag list present, incl. `3.3.1`). It is a byte-identical archive of the old image (same
paths/env), a drop-in. So the unblock path is a **one-line recipe-PR**:
`image: bitnami/discourse:3.3.1` → `image: bitnamilegacy/discourse:3.3.1`.
- Per §7.1, "upstream moved the image" is **not** a valid "untestable" excuse when a re-pin path
exists — the recipe-PR mechanism (tests run against PR head) is exactly for this. The maximal
testable subset here is the **FULL** discourse suite against a re-pinned PR head, not zero.
**VERDICT: §7.1 sign-off for discourse Q4.6 DENIED.** Not a hard upstream blocker — a low-effort
re-pin recipe-PR (`bitnamilegacy/discourse:3.3.1`, confirmed served) unblocks the full enroll. This is
in-scope Builder work, not a deferral. (Not a VETO — discourse is not claimed DONE — but it does NOT
qualify for the §8 env-blocker exception.)
## §7.1 SIGN-OFF — plausible Q4.7 full lifecycle: ROOT-CAUSE NAILED; sign-off **HELD → leaning DENY** @2026-05-30T~09:29Z
First-hand diagnosis of the live crash-loop (attempt 1 of my cold retry loop, stack `plau-8abbd9`):
- `plausible_events_db` (ClickHouse `clickhouse/clickhouse-server:23.4.2.11-alpine`) crash-loops
`task: non-zero exit (1)` every ~10s; `docker service logs` AND `docker logs <dead container>` both
**EMPTY**. Confirms the "no stdout" symptom — but NOT "inaccessible/undiagnosable."
- **Both mounted volumes are EMPTY**: the data vol (`…_event-data` → `/var/lib/clickhouse`) and the
log vol (`` → `/var/log/clickhouse-server`) contain nothing; `ExitCode=1`, `OOMKilled=false`.
⇒ **clickhouse-server NEVER STARTS.** The failure is UPSTREAM of it, in the recipe's custom
`entrypoint.clickhouse.sh`.
- That entrypoint: `set -e`; then `wget --quiet … 2>/dev/null` of a 22 MB clickhouse-backup v2.4.2
tarball from `github.com/AlexAkulov/clickhouse-backup`; then `tar -x`; then `/entrypoint.sh`. With
`set -e` + stderr silenced, ANY wget hiccup ⇒ silent `exit 1` with empty data+logs — exactly what I
observe.
- I replicated wget+tar in a fresh container: **succeeds in isolation** (22.4 MB, rc=0, binary
extracted); both download URLs (AlexAkulov + the renamed Altinity repo) → **200** from the host.
So the download works *once*; the failure is the **self-amplifying restart storm** — each 10s
restart re-pulls 22 MB (no caching: `/tmp` is container-local + fresh per restart, so
`--continue/--no-clobber` are no-ops), hammering GitHub until throttled ⇒ persistent crash-loop
"within a run" + GitHub-throttle bleed into back-to-back retries (explains the Builder's "3
consecutive failures").
**This is a RECIPE-LEVEL defect with known durable fixes**, not an immutable environment limit:
cache the tarball on a volume (download once), add wget retry/backoff, drop `2>/dev/null`, and/or
`set +e` with a fallback — i.e. the Builder's own described "Q4.7b recipe-PR." The harness runs tests
against PR head, so a fixed-entrypoint PR is fully in-scope. Per §7.1 this is **testable with effort**,
so a blanket "§4.3-floor is all we can do, env-blocked" sign-off is **not** justified on the merits.
HELD pending my 5-attempt cold retry loop: if ANY attempt's first ClickHouse boot wins the race and
the run goes 5-tier green, Q4.7-full is **PROVEN** (best outcome). If all 5 fail, the required path is
the Q4.7b recipe-PR (cache+retry+un-silence), NOT a §8 deferral. Will finalize on loop completion.