review(2): PASS gate Q3.2 lasuite-drive (re-claim a13d2ae/code e1147b5+6506c4a) — F2-12 CLOSED. Cold re-run: all 5 tiers GREEN, upgrade tier now passes, deploy-count=1, ready-probe OK(200) twice, OIDC+minio round-trip PASS (not skipped), data-integrity survives, teardown clean. abra -c + owned wait_healthy/READY_PROBE proven non-vacuous (5 P7-negative units + code-read RAISE paths). DECISIONS: record operator READY_PROBE principle

This commit is contained in:
2026-05-29 12:59:49 +01:00
parent ac241d44c7
commit 3f5d58a7c2
3 changed files with 66 additions and 1 deletions

View File

@ -547,7 +547,7 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off.
- Filed by Adversary @2026-05-28.
- [ ] **F2-12 [adversary] — BLOCKS Q3.2 gate** — lasuite-drive **upgrade tier FAILS on cold re-run**,
- [x] **F2-12 [adversary] — CLOSED @2026-05-29** (re-verified PASS; was BLOCKS Q3.2 gate) — lasuite-drive **upgrade tier FAILS on cold re-run**,
contradicting the claim "full lifecycle 3× green". Cold-verified @2026-05-29 from `/root/adv-verify`
@ origin/main `911680f` (code `4b38b66`, git==host). `RECIPE=lasuite-drive PR=0 cc-ci-run
runner/run_recipe_ci.py` → RUN SUMMARY: install/backup/restore/custom **pass**, **upgrade FAIL**,
@ -568,3 +568,7 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
per my standing veto-eligible obligation (disk lifted; deferral void). Full verdict: REVIEW-2.md
"## Q3.2 lasuite-drive — FAIL @2026-05-29".
- Filed by Adversary @2026-05-29.
- **CLOSED @2026-05-29:** cold re-run of the F2-12 fix (re-claim a13d2ae) — upgrade tier
GREEN, all 5 tiers pass, deploy-count=1, ready-probe OK(200) twice, clean teardown; `-c`+owned
wait proven non-vacuous (5 P7-negative unit tests pass + code-read of services_converged/
wait_healthy/wait_ready_probes RAISE on stuck convergence). Verdict: REVIEW-2 "## Q3.2 … PASS".

View File

@ -782,3 +782,20 @@ bounded by `DEPLOY_TIMEOUT`. Teeth proven by `tests/unit/test_f212_upgrade_conve
`docker service scale …minio-createbuckets` is NOT a bypass — it triggers the recipe's own
`replicas:0` one-shot (Adversary-confirmed). The Adversary still owns confirming "not a weakening" at
the Q3.2 cold-verify.
## 2026-05-29 — READY_PROBE / abra `-c` policy (operator principle; Adversary-recorded)
**Decision (operator, plan.md §9):** deploys/upgrades use REAL abra commands — **never** `docker
service update/scale` to mutate app state. **PREFER abra's own convergence checks by default.** Only
skip abra convergence (`-c`/`--no-converge-checks`) + use a harness `READY_PROBE` when abra genuinely
does not fit — e.g. its window is too short for a heavy app and it FATAs on a deploy that DOES converge
(F2-12: lasuite-drive new collabora 25.04.9.4.1 in healthcheck `start_period`). When skipping:
- the deploy stays **real abra** (only abra's *waiting* is replaced, not the deploy);
- the custom probe MUST be genuinely **STRICT** — all services N/N **plus** a real app-level check —
and **RAISE** on actual non-readiness; never a no-op that masks a failed deploy;
- prove it has **teeth** with a negative test (cf. F2-12 P7-negative
`tests/unit/test_f212_upgrade_convergence.py`).
**Adversary status:** the F2-12 lasuite-drive READY_PROBE was cold-verified non-weakening at the Q3.2
re-claim (REVIEW-2 "## Q3.2 … PASS @2026-05-29"): `-c`+owned `wait_healthy`(services N/N + HEALTH_PATH)
+`wait_ready_probes`(collabora WOPI 200) all RAISE on stuck convergence (5 unit tests pass + code-read);
upgrade tier GREEN on the Adversary's own cold run. This is the accepted pattern for future heavy
recipes — same teeth + negative-test requirement applies each time.

View File

@ -903,3 +903,47 @@ P7 weakening — so I scrutinized whether the replacement is genuinely stronger
3. deploy-count == 1; clean teardown.
F2-12 stays OPEN (Adversary-owned). NO verdict until Q3.2 is re-claimed. Anti-anchoring: not reading
JOURNAL before the verdict.
## Q3.2 lasuite-drive — PASS @2026-05-29 (cold re-verify after F2-12 fix; re-claim a13d2ae / code e1147b5+6506c4a)
Cold-verified from my own clone `/root/adv-verify` @ origin/main `a13d2ae` (git==host: Builder
`/root/builder-clone` also a13d2ae). `RECIPE=lasuite-drive PR=0 cc-ci-run runner/run_recipe_ci.py`
(log `/root/adv-q32-reclaim-114620.log`). **F2-12 CLOSED.**
**RUN SUMMARY (verbatim):** `deploy-count = 1 (expect 1)`; **install/upgrade/backup/restore/custom
ALL pass** — the upgrade tier (which FAILed my first cold run, aab77ea) is now GREEN.
**Every per-test PASSED (read the lines — nothing skipped/health-only):**
- install: `test_serving` + `test_serving_and_frontend`.
- **upgrade: `test_upgrade_reconverges` + `test_upgrade_preserves_data`** (ci_marker survives the real
prev→PR-head chaos crossover — collabora/code 25.04.9.1.1→25.04.9.4.1, drive v0.12→v0.18, onlyoffice
9.2→9.3).
- backup: `test_backup_artifact` + `test_backup_captures_state`; restore: `test_restore_healthy` +
`test_restore_returns_state` (real backup data-integrity, P4).
- custom: `test_health_check`, **`test_minio_storage` (real S3 upload→list→cat readback round-trip
inside the minio container)**, **`test_oidc_password_grant_against_dep_keycloak` PASSED — NOT skipped**
(real password-grant JWT vs a per-run realm on warm keycloak).
- Log shows `ready-probe OK (200)` **TWICE** — post-install AND post-upgrade — on
`collabora-lasu-e511fe…/hosting/discovery`.
**F2-12 fix is NOT a P7 weakening (the crux — orchestrator 2026-05-29 requires the probe have teeth):**
the upgrade redeploy is still REAL abra (`abra app deploy … -C -c`); only abra's *impatient converge
monitor* is replaced — `docker stack deploy` still applies the spec. The harness then OWNS a STRICTER
wait, and I verified it is non-vacuous by reading the code AND running the negative tests:
- `services_converged` (lifecycle.py:171) checks **EVERY** stack service `cur==want` (N/N), returns
False on any `0/1` still-spinning service (correctly treats `replicas:0` one-shots as 0/0 converged).
- `wait_healthy` RAISES `TimeoutError` if services never converge, OR converge but the app never serves
an OK code. `wait_ready_probes` RAISES if collabora `/hosting/discovery` never returns 200.
- `tests/unit/test_f212_upgrade_convergence.py` — **5 passed** on my clone — asserts exactly those
RAISE paths (probe-never-ready→raise; converge-but-502→raise; never-converge→raise) with a fake
clock; plus returns-when-ready and no-op-without-probe. A genuinely broken upgrade stays RED → `-c`
is not green-washing.
**Robustness bonus:** my run passed while the Builder was concurrently running a cryptpad full-suite
(3 `run_recipe_ci` procs live) — the upgrade converged even under resource contention.
**Teardown sacred:** post-run NO `lasu` stack, NO per-run `lasu` volume; warm custom-html + keycloak
canonical volumes intact. deploy-count=1 (HC1 in-place upgrade, not a 2nd install).
**Verdict: Q3.2 PASS. F2-12 CLOSED.** No `## VETO`. Anti-anchoring honored (verdict from plan + code +
my own run; did not read JOURNAL first). Remaining open Adversary item: cryptpad F2-9 create-pad
(separate cold-verify pending — Builder's `05d0dc1` test + its full-suite run).