review(2): Q4.7 plausible CONSOLIDATED verdict — self-corrects 0efcc36+1ecae1c (both had errors); §4.3 green in ONE clean Builder log + non-vacuous; full-lifecycle unproven (upstream clickhouse stall); not cleared, no veto

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 19:45:51 +01:00
parent 3360f1b266
commit 27abce678b
1 changed files with 56 additions and 0 deletions
--- a/machine-docs/REVIEW-2.md
+++ b/machine-docs/REVIEW-2.md
@ -1235,3 +1235,59 @@ fault blocked confirmation this turn); I will read it back next wake.
  stronger than my erroneous prior entry implied. No VETO; no gate-FAIL (Q4.7 not claimed DONE).
  Lesson logged: never write a "no evidence" verdict off a single search when the output channel is
  known-flaky — retry/corroborate first.
+
+
+---
+
+## Q4.7 plausible — CONSOLIDATED verdict (SUPERSEDES `0efcc36` + `1ecae1c`; both contained factual errors) @2026-05-29T~18:50Z
+
+**Why this entry exists / self-correction.** My two earlier Q4.7 entries this session were each written
+off partially-buffered tool output and are FACTUALLY WRONG. Correcting the record:
+- `0efcc36` (and its dup `8761548`) said *"the '§4.3 event tests proven green' claim has NO surviving
+  evidence on cc-ci."* **FALSE** — `/root/ccci-plausible-instcustom.log` does show it. My first host
+  search returned empty due to an output-buffering fault and I wrote the verdict off that empty result.
+- `1ecae1c` ("CORRECTION") then over-corrected with fresh errors: it claimed *"two Builder logs, both
+  green"*, called `instcustom.log` *"curated/contaminated"*, and called `fix2.log` *"a clean
+  corroborating capture."* **All three FALSE.** Only ONE log shows the tests green; `instcustom.log`
+  is a plain pytest capture (NOT curated); `fix2.log` shows a FAILED deploy, not corroboration.
+
+**GROUND TRUTH (from full reads of each artifact this session):**
+- `/root/ccci-plausible-instcustom.log` (4468 B, plain `cc-ci-run` pytest capture, rootdir
+  `/root/builder-clone`, app `plau-2f2c63`): custom tier
+  `test_event_tracking.py::test_pageview_event_roundtrip PASSED` +
+  `test_custom_event_roundtrip PASSED` (**2 passed in 73.58s**) and
+  `test_health_check.py::test_plausible_root_serves PASSED`. Its INSTALL tier
+  `tests/plausible/test_install.py::test_serving` **FAILED** (`/`→500, the pre-`b4f39cb` `/`-probe
+  issue, since fixed to probe `/api/health`). RUN SUMMARY: **install: fail / custom: pass**.
+  → This is the ONE log that demonstrates the §4.3 event tests green. It is genuine, not curated.
+- `/root/ccci-plausible-fix2.log` (full 5-tier, 3.0.0+v2.0.0): **`FATA deploy failed`**, install:fail,
+  all other tiers **skip**. Does NOT show the event tests. NOT corroboration.
+- `/root/ccci-q47-plausible.log`: deploy not healthy (`/`→500), install:fail, custom:skip.
+- **My OWN cold run** (`/root/adv-q47-plausible-cold.log`, from `/root/adv-verify`): launched ~18:28,
+  **hung in the deploy/install stage ~32 min in** (log frozen at 385 B / deploy-start; runner pid still
+  alive past the 1200s DEPLOY_TIMEOUT). First-hand confirmation that the full deploy does NOT converge
+  under current conditions — exactly the documented upstream clickhouse-backup boot-download stall.
+
+**Assessment (accurate):**
+- **(a) Test content NON-VACUOUS** — code-read of `tests/plausible/functional/test_event_tracking.py`:
+  registers the site in postgres (sites_cache gate), POSTs `/api/event` with a browser UA, asserts the
+  202 ack, then polls ClickHouse `events_v2` on a **unique pathname** and asserts `count>=1` plus
+  stored `name`/`pathname`/`hostname` equality; the custom test asserts the goal name is stored
+  verbatim (not coerced to `pageview`). A broken ingestion path raises → FAILS. ClickHouse-direct
+  read-back (Stats API unavailable under `DISABLE_AUTH`) is the *stronger* persistence assertion, accepted.
+- **(b) §4.3 event tests GREEN** — demonstrated in exactly ONE clean Builder log (`instcustom.log`).
+  My own cold-run first-hand PASS is NOT yet obtained (the deploy hung). So §4.3-green currently rests
+  on a single Builder-produced log + my code-read of non-vacuousness, NOT on my own green run.
+- **(c) Full 5-tier lifecycle NOT proven** — multiple deploy attempts (mine + fix2 + q47) fail to
+  converge at install; root cause is the upstream `entrypoint.clickhouse.sh` 22 MB boot-download with
+  `set -e`/no-cache/no-retry → crash-loop + GitHub secondary-rate-limit amplification. The Q4.7b
+  recipe-PR deferral (cache-on-volume + retry + `set +e`) is the right durable fix and is a legitimate
+  §8 env-blocker-class deferral (same family as lasuite-meet/drive/immich).
+
+**VERDICT: Q4.7 NOT fully cleared.** §4.3 functional content is sound and shown green once (Builder
+log) but I have not reproduced it first-hand; the full lifecycle does not converge under the active
+upstream defect. **No `## VETO`** and **no gate-FAIL** — Q4.7 is not claimed DONE; this is a
+documented-deferral-under-scrutiny, not a refuted gate. To upgrade to a first-hand §4.3 PASS I need a
+single clean cold run (after a GitHub-rate-limit cooldown) where ClickHouse converges and both
+`*_event_roundtrip` tests PASS in my own re-run. Pending items: confirm my hung cold run tears down
+its `plau-0c70fd` stack cleanly (runner auto-teardown; will verify).