review(2): FAIL gate Q3.2 lasuite-drive (claim 911680f/code 4b38b66) — cold re-run upgrade tier FAILS (abra chaos-deploy FATA: new collabora 25.04.9.4.1 not converged; WOPI pre-gate DID work). install/backup/restore/custom+OIDC pass, deploy-count=1, teardown clean. Filed F2-12 BLOCKING

This commit is contained in:
2026-05-29 11:47:38 +01:00
parent 05d0dc14eb
commit aab77ea0f3
2 changed files with 80 additions and 0 deletions

View File

@ -536,3 +536,25 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration
until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off.
- Filed by Adversary @2026-05-28.
- [ ] **F2-12 [adversary] — BLOCKS Q3.2 gate** — lasuite-drive **upgrade tier FAILS on cold re-run**,
contradicting the claim "full lifecycle 3× green". Cold-verified @2026-05-29 from `/root/adv-verify`
@ origin/main `911680f` (code `4b38b66`, git==host). `RECIPE=lasuite-drive PR=0 cc-ci-run
runner/run_recipe_ci.py` → RUN SUMMARY: install/backup/restore/custom **pass**, **upgrade FAIL**,
deploy-count=1.
- **Repro:** the prev→PR-head chaos upgrade redeploy does not converge —
`!! upgrade op failed: abra app deploy lasu-<hex>… failed (1)` → `FATA deploy failed 🛑`
(abra log `/root/.abra/logs/default/lasu-…2026-05-29T103335Z`). Heavy crossover: collabora/code
25.04.9.1.1→25.04.9.4.1, drive-backend/-frontend v0.12.0→v0.18.0, onlyoffice 9.2→9.3.1.2.
The NEW collabora is still in jail/config init (`Kit core version…`, many `Linking file…`,
`etc/* needs to be updated`) when abra's convergence poll gives up.
- **NOT the WOPI pre-gate** — that fix worked: `pre_upgrade: collabora WOPI discovery ready (200)`.
The gap is NEW-collabora convergence within abra's upgrade poll window, not OLD-collabora readiness.
- **Repro steps:** `RECIPE=lasuite-drive PR=0 cc-ci-run runner/run_recipe_ci.py`; observe upgrade fail.
- **Likely fix direction (Builder's call):** raise the abra per-service convergence timeout for the
upgrade redeploy (recipe-internal TIMEOUT/`DEPLOY_TIMEOUT` covers the python subprocess, but abra's
own poll emitted FATA), and/or wait for new-collabora health before asserting reconverge.
- **Close condition (Adversary-owned):** upgrade tier GREEN on **my** cold re-run (repeat-green),
per my standing veto-eligible obligation (disk lifted; deferral void). Full verdict: REVIEW-2.md
"## Q3.2 lasuite-drive — FAIL @2026-05-29".
- Filed by Adversary @2026-05-29.

View File

@ -821,3 +821,61 @@ ahead of the claim so my verdict is instant. Findings to carry into the gate (re
(install = 1 deploy, no mid-run reconverge), the now-REQUIRED **upgrade tier GREEN** (disk lifted),
repeat-green + my own cold re-run reading the assertions. This note is recon only — NO PASS/FAIL until
the Builder claims the gate.
## Q3.2 lasuite-drive — FAIL @2026-05-29 (cold-verify; gate claim 911680f / code 4b38b66)
Cold-verified from my own clone `/root/adv-verify` synced to origin/main `911680f` (claim commit is
**docs-only** — BACKLOG-2/DEFERRED/STATUS-2; verified *code* == `4b38b66`. git==host confirmed:
Builder `/root/builder-clone` @ 4b38b66, deploy tree clean). Ran `RECIPE=lasuite-drive PR=0 cc-ci-run
runner/run_recipe_ci.py` from /root/adv-verify (log `/root/adv-q32-102348.log`).
**Result — RUN SUMMARY (verbatim):**
```
deploy-count = 1 (expect 1)
install : pass
upgrade : fail <-- FAILS the gate (claim said full lifecycle 3x green)
backup : pass
restore : pass
custom : pass
```
**Root cause (from the actual log + abra deploy log — NOT the WOPI gate):** the collabora WOPI-discovery
pre-upgrade gate **worked** — log line 43: `pre_upgrade: collabora WOPI discovery ready (200) on
collabora-lasu-cbcdd6.ci.commoninternet.net`. The failure is the **chaos upgrade deploy itself not
converging**: line 44 `!! upgrade op failed: abra app deploy lasu-cbcdd6.ci.commoninternet.net -o -n -C
failed (1)` → `INFO polling deployment status` → `FATA deploy failed 🛑`
(abra log `/root/.abra/logs/default/lasu-cbcdd6...2026-05-29T103335Z`). This was a real prev→PR-head
crossover with heavy image bumps — collabora/code 25.04.9.1.1→**25.04.9.4.1**, drive-backend
v0.12.0→**v0.18.0**, drive-frontend v0.12.0→**v0.18.0**, onlyoffice 9.2→**9.3.1.2**, nginx 1.29→1.30,
redis 8→8.6.3. The abra deploy log shows the NEW collabora still doing lengthy jail/config init
(`Kit core version …`, hundreds of `Linking file …` lines, `child-roots/.../etc/* needs to be updated`)
when abra's convergence poll gave up. So the upgrade redeploy timed out waiting for the new collabora
to become healthy, not the pre-deploy gate.
**Why FAIL, not a flake-to-retry:**
- The claim is **"flakiness gone, full lifecycle 3× green"** (r2/r3/r4). My **first independent cold
run** does NOT reproduce green — the upgrade tier fails. That contradicts "reproducibly green."
- Upgrade-tier GREEN is my **standing veto-eligible obligation** (disk lifted; deferral void). My
stated criteria required **repeat-green + my own cold re-run** of the upgrade tier. It failed on my run.
- The new-collabora-convergence timeout is the *same class* of collabora-timing problem `4b38b66` set
out to fix; the WOPI pre-gate addresses readiness of the OLD collabora before redeploy, but does not
ensure the NEW collabora (heavier 25.04.9.4.1) converges within abra's upgrade poll window. The fix
is incomplete for the crossover it claims to make green.
**What DID verify (fix is partial, not worthless):**
- **Part A install-time OIDC — GREEN & real.** `deploy-count = 1` (single deploy, no post-deploy
`--chaos` reconverge); log: `using live-warm keycloak … per-run realm`, `install_steps: OIDC env wired
into .env (… no reconverge)`; `test_oidc_password_grant_against_dep_keycloak` **PASSED, not skipped**
(real password-grant JWT vs a per-run realm). **Real-abra-only confirmed** — no `docker service
update/scale` patching of app state (the lone `service scale …minio-createbuckets` triggers the
recipe's own `replicas:0` one-shot; established acceptable in my pre-claim recon).
- **install + backup + restore + custom all pass**; `test_minio_storage` (S3 round-trip) PASSED.
- **Teardown sacred:** post-run NO `lasu` stacks, NO per-run `lasu` volumes; warm-keycloak + warm
custom-html canonical volumes intact (prune/teardown didn't touch the cache).
**FILED: F2-12 [adversary] (BLOCKS the Q3.2 gate).** No phase `## VETO`. Q3.2 cannot PASS until the
**upgrade tier runs GREEN on my own cold re-run** (repeat-green). Likely real fixes for the Builder to
consider: raise the abra upgrade convergence timeout for the new-collabora crossover (the recipe-internal
TIMEOUT/`DEPLOY_TIMEOUT` covers the python subprocess, but abra's own per-service convergence poll is
what emitted `FATA deploy failed`), and/or a post-redeploy collabora-health wait before asserting
reconverge. Anti-anchoring honored: verdict formed from the plan + code + my own run's observable log;
I did NOT read JOURNAL-2 before writing this.