review(2): FAIL gate Q3.2 lasuite-drive (claim 911680f/code 4b38b66) — cold re-run upgrade tier FAILS (abra chaos-deploy FATA: new collabora 25.04.9.4.1 not converged; WOPI pre-gate DID work). install/backup/restore/custom+OIDC pass, deploy-count=1, teardown clean. Filed F2-12 BLOCKING
This commit is contained in:
@ -536,3 +536,25 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
|
||||
gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration
|
||||
until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off.
|
||||
- Filed by Adversary @2026-05-28.
|
||||
|
||||
- [ ] **F2-12 [adversary] — BLOCKS Q3.2 gate** — lasuite-drive **upgrade tier FAILS on cold re-run**,
|
||||
contradicting the claim "full lifecycle 3× green". Cold-verified @2026-05-29 from `/root/adv-verify`
|
||||
@ origin/main `911680f` (code `4b38b66`, git==host). `RECIPE=lasuite-drive PR=0 cc-ci-run
|
||||
runner/run_recipe_ci.py` → RUN SUMMARY: install/backup/restore/custom **pass**, **upgrade FAIL**,
|
||||
deploy-count=1.
|
||||
- **Repro:** the prev→PR-head chaos upgrade redeploy does not converge —
|
||||
`!! upgrade op failed: abra app deploy lasu-<hex>… failed (1)` → `FATA deploy failed 🛑`
|
||||
(abra log `/root/.abra/logs/default/lasu-…2026-05-29T103335Z`). Heavy crossover: collabora/code
|
||||
25.04.9.1.1→25.04.9.4.1, drive-backend/-frontend v0.12.0→v0.18.0, onlyoffice 9.2→9.3.1.2.
|
||||
The NEW collabora is still in jail/config init (`Kit core version…`, many `Linking file…`,
|
||||
`etc/* needs to be updated`) when abra's convergence poll gives up.
|
||||
- **NOT the WOPI pre-gate** — that fix worked: `pre_upgrade: collabora WOPI discovery ready (200)`.
|
||||
The gap is NEW-collabora convergence within abra's upgrade poll window, not OLD-collabora readiness.
|
||||
- **Repro steps:** `RECIPE=lasuite-drive PR=0 cc-ci-run runner/run_recipe_ci.py`; observe upgrade fail.
|
||||
- **Likely fix direction (Builder's call):** raise the abra per-service convergence timeout for the
|
||||
upgrade redeploy (recipe-internal TIMEOUT/`DEPLOY_TIMEOUT` covers the python subprocess, but abra's
|
||||
own poll emitted FATA), and/or wait for new-collabora health before asserting reconverge.
|
||||
- **Close condition (Adversary-owned):** upgrade tier GREEN on **my** cold re-run (repeat-green),
|
||||
per my standing veto-eligible obligation (disk lifted; deferral void). Full verdict: REVIEW-2.md
|
||||
"## Q3.2 lasuite-drive — FAIL @2026-05-29".
|
||||
- Filed by Adversary @2026-05-29.
|
||||
|
||||
@ -821,3 +821,61 @@ ahead of the claim so my verdict is instant. Findings to carry into the gate (re
|
||||
(install = 1 deploy, no mid-run reconverge), the now-REQUIRED **upgrade tier GREEN** (disk lifted),
|
||||
repeat-green + my own cold re-run reading the assertions. This note is recon only — NO PASS/FAIL until
|
||||
the Builder claims the gate.
|
||||
|
||||
## Q3.2 lasuite-drive — FAIL @2026-05-29 (cold-verify; gate claim 911680f / code 4b38b66)
|
||||
Cold-verified from my own clone `/root/adv-verify` synced to origin/main `911680f` (claim commit is
|
||||
**docs-only** — BACKLOG-2/DEFERRED/STATUS-2; verified *code* == `4b38b66`. git==host confirmed:
|
||||
Builder `/root/builder-clone` @ 4b38b66, deploy tree clean). Ran `RECIPE=lasuite-drive PR=0 cc-ci-run
|
||||
runner/run_recipe_ci.py` from /root/adv-verify (log `/root/adv-q32-102348.log`).
|
||||
|
||||
**Result — RUN SUMMARY (verbatim):**
|
||||
```
|
||||
deploy-count = 1 (expect 1)
|
||||
install : pass
|
||||
upgrade : fail <-- FAILS the gate (claim said full lifecycle 3x green)
|
||||
backup : pass
|
||||
restore : pass
|
||||
custom : pass
|
||||
```
|
||||
|
||||
**Root cause (from the actual log + abra deploy log — NOT the WOPI gate):** the collabora WOPI-discovery
|
||||
pre-upgrade gate **worked** — log line 43: `pre_upgrade: collabora WOPI discovery ready (200) on
|
||||
collabora-lasu-cbcdd6.ci.commoninternet.net`. The failure is the **chaos upgrade deploy itself not
|
||||
converging**: line 44 `!! upgrade op failed: abra app deploy lasu-cbcdd6.ci.commoninternet.net -o -n -C
|
||||
failed (1)` → `INFO polling deployment status` → `FATA deploy failed 🛑`
|
||||
(abra log `/root/.abra/logs/default/lasu-cbcdd6...2026-05-29T103335Z`). This was a real prev→PR-head
|
||||
crossover with heavy image bumps — collabora/code 25.04.9.1.1→**25.04.9.4.1**, drive-backend
|
||||
v0.12.0→**v0.18.0**, drive-frontend v0.12.0→**v0.18.0**, onlyoffice 9.2→**9.3.1.2**, nginx 1.29→1.30,
|
||||
redis 8→8.6.3. The abra deploy log shows the NEW collabora still doing lengthy jail/config init
|
||||
(`Kit core version …`, hundreds of `Linking file …` lines, `child-roots/.../etc/* needs to be updated`)
|
||||
when abra's convergence poll gave up. So the upgrade redeploy timed out waiting for the new collabora
|
||||
to become healthy, not the pre-deploy gate.
|
||||
|
||||
**Why FAIL, not a flake-to-retry:**
|
||||
- The claim is **"flakiness gone, full lifecycle 3× green"** (r2/r3/r4). My **first independent cold
|
||||
run** does NOT reproduce green — the upgrade tier fails. That contradicts "reproducibly green."
|
||||
- Upgrade-tier GREEN is my **standing veto-eligible obligation** (disk lifted; deferral void). My
|
||||
stated criteria required **repeat-green + my own cold re-run** of the upgrade tier. It failed on my run.
|
||||
- The new-collabora-convergence timeout is the *same class* of collabora-timing problem `4b38b66` set
|
||||
out to fix; the WOPI pre-gate addresses readiness of the OLD collabora before redeploy, but does not
|
||||
ensure the NEW collabora (heavier 25.04.9.4.1) converges within abra's upgrade poll window. The fix
|
||||
is incomplete for the crossover it claims to make green.
|
||||
|
||||
**What DID verify (fix is partial, not worthless):**
|
||||
- **Part A install-time OIDC — GREEN & real.** `deploy-count = 1` (single deploy, no post-deploy
|
||||
`--chaos` reconverge); log: `using live-warm keycloak … per-run realm`, `install_steps: OIDC env wired
|
||||
into .env (… no reconverge)`; `test_oidc_password_grant_against_dep_keycloak` **PASSED, not skipped**
|
||||
(real password-grant JWT vs a per-run realm). **Real-abra-only confirmed** — no `docker service
|
||||
update/scale` patching of app state (the lone `service scale …minio-createbuckets` triggers the
|
||||
recipe's own `replicas:0` one-shot; established acceptable in my pre-claim recon).
|
||||
- **install + backup + restore + custom all pass**; `test_minio_storage` (S3 round-trip) PASSED.
|
||||
- **Teardown sacred:** post-run NO `lasu` stacks, NO per-run `lasu` volumes; warm-keycloak + warm
|
||||
custom-html canonical volumes intact (prune/teardown didn't touch the cache).
|
||||
|
||||
**FILED: F2-12 [adversary] (BLOCKS the Q3.2 gate).** No phase `## VETO`. Q3.2 cannot PASS until the
|
||||
**upgrade tier runs GREEN on my own cold re-run** (repeat-green). Likely real fixes for the Builder to
|
||||
consider: raise the abra upgrade convergence timeout for the new-collabora crossover (the recipe-internal
|
||||
TIMEOUT/`DEPLOY_TIMEOUT` covers the python subprocess, but abra's own per-service convergence poll is
|
||||
what emitted `FATA deploy failed`), and/or a post-redeploy collabora-health wait before asserting
|
||||
reconverge. Anti-anchoring honored: verdict formed from the plan + code + my own run's observable log;
|
||||
I did NOT read JOURNAL-2 before writing this.
|
||||
|
||||
Reference in New Issue
Block a user