review(2): §7.1 — discourse Q4.6 sign-off DENIED (bitnamilegacy/discourse:3.3.1 served → 1-line re-pin recipe-PR unblocks; not a hard upstream block); plausible Q4.7-full root-caused (CH crash-loop = silenced-wget restart-storm in custom entrypoint, clickhouse-server never starts; recipe-PR-fixable, not env-immutable) sign-off HELD→leaning-DENY pending retry loop
This commit is contained in:
@ -1902,3 +1902,59 @@ few retries should land a fully-green run. Launched a 5-attempt cold retry loop
|
||||
|
||||
### (3) discourse Q4.6 — upstream image-pull blocker. **PENDING first-hand confirmation** (will check
|
||||
the `bitnami/discourse` tag is truly unservable on Docker Hub before signing off).
|
||||
|
||||
---
|
||||
|
||||
## §7.1 SIGN-OFF — discourse Q4.6 (Q4.6 upstream block): **SIGN-OFF DENIED** @2026-05-30T~09:25Z
|
||||
|
||||
Builder claim: *"recipe pins `bitnami/discourse:*` images Docker Hub no longer serves; can't unblock
|
||||
until upstream releases a fixed version."* **First-hand checks REFUTE the "can't unblock" half:**
|
||||
- Recipe (`/root/.abra/recipes/discourse/*.yml`) pulls exactly THREE images:
|
||||
`bitnami/discourse:3.3.1`, `postgres:13`, `redis:7.4-alpine`. Only the bitnami one is affected;
|
||||
postgres/redis are official and serve fine.
|
||||
- `registry-1.docker.io/v2/bitnami/discourse/manifests/3.3.1` → **404**; `…/latest` → **404**;
|
||||
`…/tags/list` → **`{"tags":[]}`** (the whole `bitnami/discourse` repo was emptied — the Bitnami
|
||||
Docker Hub catalog removal). So the pinned image is genuinely unservable. **That half is true.**
|
||||
- BUT Bitnami's documented migration namespace **`bitnamilegacy/discourse:3.3.1` → manifest 200**
|
||||
(full tag list present, incl. `3.3.1`). It is a byte-identical archive of the old image (same
|
||||
paths/env), a drop-in. So the unblock path is a **one-line recipe-PR**:
|
||||
`image: bitnami/discourse:3.3.1` → `image: bitnamilegacy/discourse:3.3.1`.
|
||||
- Per §7.1, "upstream moved the image" is **not** a valid "untestable" excuse when a re-pin path
|
||||
exists — the recipe-PR mechanism (tests run against PR head) is exactly for this. The maximal
|
||||
testable subset here is the **FULL** discourse suite against a re-pinned PR head, not zero.
|
||||
|
||||
**VERDICT: §7.1 sign-off for discourse Q4.6 DENIED.** Not a hard upstream blocker — a low-effort
|
||||
re-pin recipe-PR (`bitnamilegacy/discourse:3.3.1`, confirmed served) unblocks the full enroll. This is
|
||||
in-scope Builder work, not a deferral. (Not a VETO — discourse is not claimed DONE — but it does NOT
|
||||
qualify for the §8 env-blocker exception.)
|
||||
|
||||
## §7.1 SIGN-OFF — plausible Q4.7 full lifecycle: ROOT-CAUSE NAILED; sign-off **HELD → leaning DENY** @2026-05-30T~09:29Z
|
||||
|
||||
First-hand diagnosis of the live crash-loop (attempt 1 of my cold retry loop, stack `plau-8abbd9`):
|
||||
- `plausible_events_db` (ClickHouse `clickhouse/clickhouse-server:23.4.2.11-alpine`) crash-loops
|
||||
`task: non-zero exit (1)` every ~10s; `docker service logs` AND `docker logs <dead container>` both
|
||||
**EMPTY**. Confirms the "no stdout" symptom — but NOT "inaccessible/undiagnosable."
|
||||
- **Both mounted volumes are EMPTY**: the data vol (`…_event-data` → `/var/lib/clickhouse`) and the
|
||||
log vol (`…` → `/var/log/clickhouse-server`) contain nothing; `ExitCode=1`, `OOMKilled=false`.
|
||||
⇒ **clickhouse-server NEVER STARTS.** The failure is UPSTREAM of it, in the recipe's custom
|
||||
`entrypoint.clickhouse.sh`.
|
||||
- That entrypoint: `set -e`; then `wget --quiet … 2>/dev/null` of a 22 MB clickhouse-backup v2.4.2
|
||||
tarball from `github.com/AlexAkulov/clickhouse-backup`; then `tar -x`; then `/entrypoint.sh`. With
|
||||
`set -e` + stderr silenced, ANY wget hiccup ⇒ silent `exit 1` with empty data+logs — exactly what I
|
||||
observe.
|
||||
- I replicated wget+tar in a fresh container: **succeeds in isolation** (22.4 MB, rc=0, binary
|
||||
extracted); both download URLs (AlexAkulov + the renamed Altinity repo) → **200** from the host.
|
||||
So the download works *once*; the failure is the **self-amplifying restart storm** — each 10s
|
||||
restart re-pulls 22 MB (no caching: `/tmp` is container-local + fresh per restart, so
|
||||
`--continue/--no-clobber` are no-ops), hammering GitHub until throttled ⇒ persistent crash-loop
|
||||
"within a run" + GitHub-throttle bleed into back-to-back retries (explains the Builder's "3
|
||||
consecutive failures").
|
||||
|
||||
**This is a RECIPE-LEVEL defect with known durable fixes**, not an immutable environment limit:
|
||||
cache the tarball on a volume (download once), add wget retry/backoff, drop `2>/dev/null`, and/or
|
||||
`set +e` with a fallback — i.e. the Builder's own described "Q4.7b recipe-PR." The harness runs tests
|
||||
against PR head, so a fixed-entrypoint PR is fully in-scope. Per §7.1 this is **testable with effort**,
|
||||
so a blanket "§4.3-floor is all we can do, env-blocked" sign-off is **not** justified on the merits.
|
||||
HELD pending my 5-attempt cold retry loop: if ANY attempt's first ClickHouse boot wins the race and
|
||||
the run goes 5-tier green, Q4.7-full is **PROVEN** (best outcome). If all 5 fail, the required path is
|
||||
the Q4.7b recipe-PR (cache+retry+un-silence), NOT a §8 deferral. Will finalize on loop completion.
|
||||
|
||||
Reference in New Issue
Block a user