Phase-2 plan: harden Adversary mandate — no skipped tests / corners cut

Add §7.1 Adversary mandate: default assumption is everything meaningful is testable (OIDC/SSO, federation, media, WOPI, WebRTC connectivity, backup data survival) — the job is a good test, not declaring impossibility. Adversary reads test bodies, rejects skip/xfail/mock/health-only/empty-assertion tests and bogus parity renames, re-runs cold. "Untestable" is a rare exception needing a true environment blocker + maximal subset + Adversary sign-off; "needs browser/SSO/another app" is not valid. Tighten P7 and §8 to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 03:50:33 +01:00
parent 07faa6007f
commit 781f9fd91f
1 changed files with 31 additions and 4 deletions
--- a/cc-ci-plan/plan-phase2-recipe-tests.md
+++ b/cc-ci-plan/plan-phase2-recipe-tests.md
@ -64,8 +64,12 @@ each within 24h** (logged in `REVIEW.md`):
      (respecting `MAX_TESTS`/node budget) and SSO setup runs automatically (§4.2).
 - [ ] **P6 — Browser flows where they matter (D3).** Recipes whose core function is a UI flow have a
      Playwright test of that flow (login, create-an-object, etc.), not just API checks.
- [ ] **P7 — No weakened tests.** Every assertion is real; nothing is `skip`/`xfail`'d to go green.
-      Genuinely-untestable aspects are documented findings (`DECISIONS.md`), not silent skips.
+- [ ] **P7 — No weakened tests, no corners cut (§7.1).** Every assertion is real and checks app
+      state; nothing is `skip`/`xfail`'d, mocked, or reduced to a health-only stand-in to go green.
+      The bar: anything meaningful is testable with effort (OIDC/SSO, federation, media, WOPI, WebRTC
+      connectivity, data survival all included). Any "untestable" claim is the rare exception — a true
+      environment-level blocker only, with the maximal subset still implemented and **Adversary
+      sign-off** (§8); "needs a browser / SSO / another app" is not a valid excuse.
 - [ ] **P8 — Docs.** `docs/enroll-recipe.md` updated with the per-recipe test contract (§4.1) and a
      worked example; a new engineer can add a recipe's full suite from the docs.

@ -208,6 +212,26 @@ Same as `plan.md` §6/§6.1/§7/§9. Phase-2-specific emphases:
  for health-only stand-ins, `skip`/`xfail`, or assertions that don't actually check app state.
 - **Real data-integrity for backups** (P4) — "service is up after restore" is *not* sufficient; the
  seeded data must be proven to survive.
+
+### 7.1 Adversary mandate (Phase 2) — no skipped tests, no corners cut
+The default assumption is that **everything meaningful about an app is testable with enough effort —
+the job is to write a *good* test, not to declare it impossible.** OIDC/SSO login, token issuance and
+JWT validation, federation endpoints, media upload/download, WOPI discovery, WebRTC ICE/connectivity,
+backup data survival — these are all testable end-to-end and **must** be tested, not stubbed. The
+Adversary actively enforces this and **reads the test bodies, not just pass/fail**:
+- **Reject** any test that is `skip`/`xfail`/commented-out, mocked, mutated to a `health_check`
+  stand-in, asserts nothing material (e.g. only `status==200`), or is hard-coded to pass.
+- **Reject "we couldn't test X"** unless it is a genuine *environment-level* limitation (e.g. the
+  test host cannot receive inbound UDP, so the full lasuite-meet *media relay* path can't complete) —
+  and even then demand the **maximal testable subset** (e.g. signaling, token issuance, ICE candidate
+  gathering) plus a `DECISIONS.md` justification with the specific technical blocker. "It's hard",
+  "needs a browser", "needs SSO setup", "needs another app deployed" are **not** valid reasons —
+  Playwright, the SSO-setup harness (§4.2), and the dependency resolver exist precisely to remove
+  those excuses.
+- **Verify parity for real (P2):** for each `PARITY.md` row, confirm the cc-ci test checks the *same
+  thing* the recipe-maintainer original did — not a hollow rename.
+- **Re-run cold and inspect:** the Adversary re-runs a sampled recipe's suite from a clean state and
+  reads the diffs/assertions; a green run with empty assertions is a FAIL and a `[adversary]` finding.
 - **Respect Phase-1 resource caps** — deps multiply live apps per run; keep within `MAX_TESTS`/node
  budget; sequence heavy recipes; teardown (incl. deps) is guaranteed.
 - **Tests are recipe-versioned** — they run against the PR's recipe version; don't hardcode values
@ -225,5 +249,8 @@ Same as `plan.md` §6/§6.1/§7/§9. Phase-2-specific emphases:
  store; SSO test users/clients are class-B, generated per run, destroyed at teardown).
 - How many recipe-specific tests beyond the **≥2** floor per recipe (scale with the app's surface;
  don't gold-plate trivial recipes).
- Recipes that genuinely can't be CI'd in this environment (e.g. ones needing inbound UDP/TURN like
-  lasuite-meet's media path) → document the tested subset + why (mirror Phase-1 D10's honesty rule).
+- A test deemed "impossible" is the **rare exception**, not a convenient out (§7.1). It is only
+  acceptable for a true environment-level blocker (e.g. no inbound UDP for lasuite-meet's *media
+  relay*), requires the **maximal testable subset** still implemented, a specific technical reason in
+  `DECISIONS.md`, **and Adversary sign-off**. SSO/OIDC, browser flows, multi-app dependencies, and
+  data-integrity are explicitly **not** exceptions — they are testable and required.