status(2): lasuite-drive Q3.2 NOT claimed — OIDC setup redeploy flaky (collabora reconverge); --detach fix validated; test assertions proven correct (run 1); Q3.2a robustness item added; prune-during-deploy lesson recorded

This commit is contained in:
2026-05-29 07:27:50 +01:00
parent 75ae226c0d
commit 426a953c2b
2 changed files with 47 additions and 2 deletions

View File

@ -678,3 +678,39 @@ lasuite-drive's actual Q3.2 CONTENT works: parity health, the real MinIO S3 uplo
round-trip, and the OIDC password-grant + JWT-claims flow against the dep keycloak. Per §7.1 the
maximal subset is implemented and only the genuinely-disk-blocked upgrade tier is outstanding —
pending Adversary sign-off on the env-blocker.
---
## 2026-05-29 — lasuite-drive: --detach fix validated, but OIDC setup redeploy is FLAKY (NOT claiming Q3.2 yet)
Ran lasuite-drive maximal subset (install,backup,restore,custom) four times today:
- **Run 1** (`ccci-drive-subset.log`): all tiers + all 3 functional GREEN (health, MinIO round-trip,
OIDC JWT) — but required a manual kill of the hung `docker service scale` (the bug I then fixed with
`--detach`, commit `f1c626c`). So the test ASSERTIONS are all correct and CAN pass.
- **Runs 2 & 3** (`-clean`, `-clean2`): corrupted by MY OWN over-eager `docker image prune -f` mid-deploy
— it removed the just-pulled, not-yet-attached digest-pinned images (drive-frontend, onlyoffice),
so swarm rejected with "No such image" and install failed/timed out. **LESSON: never
`docker image prune` during an active deploy — mid-pull images look dangling and get removed.**
Confirmed self-inflicted: `docker pull lasuite/drive-frontend@sha256:eeef` succeeded (image is on
hub), and after seeding it the stack converged. Not a recipe/test issue.
- **Run 4** (`-clean3`, warm images, hands-off, fixed `--detach`): install/backup/restore all PASS,
health + MinIO PASS, **but the OIDC test SKIPPED because `setup_custom_tests.sh` exited 1** — its
step-3 in-place `abra app deploy --force --chaos` (applies the OIDC env) FAILED to converge
("FATA deploy failed"; abra log shows backend `Permission denied: /.gunicorn` + celery
`configure_wopi: 404 from collabora discovery url`). Per F2-11 the run correctly went RED (no false
green) — `custom: pass (1 requires_deps SKIPPED — SSO UNVERIFIED)`, overall=1. The `--detach` fix
itself works (bucket scale returned, secret inserted v2); the failure is the full-stack redeploy.
**Root finding: the OIDC-wiring step (a full 12-service in-place `--chaos` redeploy) is FLAKY on this
heaviest stack** — collabora's reconverge race + a transient backend gunicorn-perms/WOPI-404 window
mean the redeploy succeeds only sometimes (run 1 yes, run 4 no). The OIDC env change only affects
backend/app, so re-converging collabora/onlyoffice is unnecessary exposure. Fix direction (BACKLOG):
wire OIDC at INSTALL time (no post-deploy redeploy — like lasuite-docs install_steps), or make the
setup redeploy resilient (retry / wait for collabora WOPI discovery 200 before declaring ready).
**Decision:** NOT claiming Q3.2 — a flaky OIDC setup is not a reliable green, and claiming would risk
an Adversary cold-verify FAIL. lasuite-drive stays [~]: test content proven correct (run 1), `--detach`
bug fixed, two open issues (disk-blocker on upgrade tier [DEFERRED/operator]; flaky OIDC redeploy
[BACKLOG, needs robustness work]). **Pivoting to lighter recipes for broad Phase-2 progress**;
lasuite-drive's OIDC robustness + upgrade-disk return later. Host left clean (all stacks torn down,
disk 65%, infra healthy).