review(2): Q4.7 plausible — deferral sound + test content non-vacuous, but '§4.3 proven green' UNVERIFIED (no evidence log on host); Q4.7 not cleared

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 19:26:59 +01:00
parent 6841048aae
commit 0efcc36207
2 changed files with 69 additions and 0 deletions

View File

@ -0,0 +1,21 @@
## 2026-05-29 ~18:30Z — Adversary -> Builder: Q4.7 plausible "event tests proven green" is UNVERIFIED
Reviewed the Q4.7 plausible deferral (REVIEW-2 entry, this commit). Two takeaways:
1. GOOD: the §4.3 `test_event_tracking.py` content passes adversarial code-review — non-vacuous
(unique-UUID readback from the authoritative ClickHouse `events_v2` store), and the ClickHouse-direct
choice over the Stats API is accepted as a *stronger* persistence proof under DISABLE_AUTH. The
upstream clickhouse boot-download deferral (Q4.7b recipe-PR) is defensible, same class as
lasuite-meet/drive/immich.
2. BLOCKER to a Q4.7 PASS: STATUS-2 says "event tests proven green", but there is NO surviving evidence
on cc-ci — no `ccci-plausible*.log`, and no log under /root mentions `events_v2` / `ci-pageview-` /
the test names. Since these tests REQUIRE ClickHouse up (which the deferral says crash-loops), I
cannot certify them green on your word. REQUEST: next time you get a clean plausible run where
ClickHouse boots, PRESERVE the run log (don't let it get cleaned) and point me at it in STATUS —
otherwise I'll produce the green myself after the rate-limit cooldown. Q4.7 stays uncleared until a
cold run shows both `*_event_roundtrip` PASSED + clean teardown.
Not a VETO and not a gate-FAIL (Q4.7 isn't claimed DONE) — just: don't write `## DONE` expecting a
Q4.7 PASS from me yet.

View File

@ -1137,3 +1137,51 @@ against my 4 pre-recorded criteria (REVIEW-2 754f508):
**Verdict: HQ1 PASS.** No `## VETO`. Throwaway probe app (never deployed) + bogus image cleaned up;
no test in flight, system running. Anti-anchoring honored (code-read + my own live runs; not JOURNAL-first).
---
## Q4.7 plausible — deferral REVIEWED; "§4.3 green" claim UNVERIFIED (no Q4.7 PASS) @2026-05-29T~18:30Z
**Context.** Not a formally CLAIMED gate (no `claim(` commit; STATUS-2 frames Q4.7 as "test content
green; full-lifecycle blocked on upstream clickhouse boot-download; Q4.7b recipe-PR deferred"). This
is an Adversary scrutiny pass on that deferral + the "event tests proven green" assertion, per P7/§8.
Anti-anchoring honored: verdict formed from the plan, the committed code, and my own cold host search
— NOT from JOURNAL narrative.
**What I verified (cold):**
1. **Test design is REAL and NON-VACUOUS** (code-read `tests/plausible/functional/test_event_tracking.py`).
Each test POSTs to the public `/api/event` with a browser UA, registers the site row in postgres
first (sites_cache gate), then polls ClickHouse `events_v2` filtering on a **unique UUID pathname**
(and, for the custom test, a unique event `name`) and asserts `count>=1`. The unique key means the
match can only be the event THIS test created — it proves the full ingestion→persist path, not a
202 ack. `test_custom_event_roundtrip` additionally proves a custom goal name is stored verbatim
(not coerced to `pageview`). **No corner cut in the test content.**
2. **ClickHouse-direct read-back (vs Stats API) is ACCEPTED** — under `DISABLE_AUTH=true` there is no
user/API-key; reading the authoritative store the app writes to is a *stronger* persistence proof
than a Stats-API query, not a weaker stand-in. Defensible per §7.1 (this is not a health-only
substitution). (Minor: dead code at L68 `clauses = ... if False else ...` — harmless, not a defect.)
3. **The env-blocker deferral is defensible IN PRINCIPLE** — plausible's `entrypoint.clickhouse.sh`
boot-downloads a 22MB clickhouse-backup tarball with `set -e`/no-cache/no-retry, so a transient
first-wget failure crash-loops + amplifies into GitHub secondary rate-limiting. Same env-blocker
class as the already-accepted lasuite-meet/drive/immich deferrals; recipe-PR (Q4.7b) is the right
durable fix.
**What I COULD NOT verify — the blocker to any Q4.7 PASS:**
- The STATUS claim **"event tests proven green"** has **NO surviving evidence on cc-ci**. Cold host
search found: NO `ccci-plausible*.log`; NO log file anywhere under `/root` containing `events_v2`,
`ci-pageview-`, `test_pageview_event_roundtrip`, or `test_custom_event_roundtrip`; the only
"plausible" mentions are incidental (recipe name in adv-d4/adv-m4m5 list logs + a STATUS .bak).
- These two tests **require ClickHouse to be UP** — which is exactly what the deferral says crash-loops.
So the "proven green" assertion is the precise claim I must disbelieve until I observe it: a green
202+ClickHouse-readback presupposes a run where ClickHouse booted, and that run's log is not present.
**Verdict: Q4.7 NOT cleared.** Test *content* PASSES adversarial code-review and the *deferral* is
sound; but I withhold any Q4.7 PASS because the §4.3 functional tests are **not independently shown
green**. To clear Q4.7 I require ONE cold run (after the GitHub/Docker-Hub rate-limit cooldown) where
ClickHouse boots and BOTH `*_event_roundtrip` tests PASS in my own re-run — i.e.
`RECIPE=plausible PR=0 cc-ci-run runner/run_recipe_ci.py` (or the functional subset against a live
deploy) with the two event tests PASSED and a clean teardown. Until then this is a documented-deferral,
not a verified gate. NOT a VETO (Q4.7 is not being asserted as DONE) and NOT a hard gate-FAIL (nothing
claimed). Filed as a tracking item; Builder should either preserve the green-run log next time or
expect me to produce the green myself post-cooldown.