status(2): lasuite-drive Q3.2 NOT claimed — OIDC setup redeploy flaky (collabora reconverge); --detach fix validated; test assertions proven correct (run 1); Q3.2a robustness item added; prune-during-deploy lesson recorded
This commit is contained in:
@ -678,3 +678,39 @@ lasuite-drive's actual Q3.2 CONTENT works: parity health, the real MinIO S3 uplo
|
||||
round-trip, and the OIDC password-grant + JWT-claims flow against the dep keycloak. Per §7.1 the
|
||||
maximal subset is implemented and only the genuinely-disk-blocked upgrade tier is outstanding —
|
||||
pending Adversary sign-off on the env-blocker.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-29 — lasuite-drive: --detach fix validated, but OIDC setup redeploy is FLAKY (NOT claiming Q3.2 yet)
|
||||
|
||||
Ran lasuite-drive maximal subset (install,backup,restore,custom) four times today:
|
||||
- **Run 1** (`ccci-drive-subset.log`): all tiers + all 3 functional GREEN (health, MinIO round-trip,
|
||||
OIDC JWT) — but required a manual kill of the hung `docker service scale` (the bug I then fixed with
|
||||
`--detach`, commit `f1c626c`). So the test ASSERTIONS are all correct and CAN pass.
|
||||
- **Runs 2 & 3** (`-clean`, `-clean2`): corrupted by MY OWN over-eager `docker image prune -f` mid-deploy
|
||||
— it removed the just-pulled, not-yet-attached digest-pinned images (drive-frontend, onlyoffice),
|
||||
so swarm rejected with "No such image" and install failed/timed out. **LESSON: never
|
||||
`docker image prune` during an active deploy — mid-pull images look dangling and get removed.**
|
||||
Confirmed self-inflicted: `docker pull lasuite/drive-frontend@sha256:eeef…` succeeded (image is on
|
||||
hub), and after seeding it the stack converged. Not a recipe/test issue.
|
||||
- **Run 4** (`-clean3`, warm images, hands-off, fixed `--detach`): install/backup/restore all PASS,
|
||||
health + MinIO PASS, **but the OIDC test SKIPPED because `setup_custom_tests.sh` exited 1** — its
|
||||
step-3 in-place `abra app deploy --force --chaos` (applies the OIDC env) FAILED to converge
|
||||
("FATA deploy failed"; abra log shows backend `Permission denied: /.gunicorn` + celery
|
||||
`configure_wopi: 404 from collabora discovery url`). Per F2-11 the run correctly went RED (no false
|
||||
green) — `custom: pass (1 requires_deps SKIPPED — SSO UNVERIFIED)`, overall=1. The `--detach` fix
|
||||
itself works (bucket scale returned, secret inserted v2); the failure is the full-stack redeploy.
|
||||
|
||||
**Root finding: the OIDC-wiring step (a full 12-service in-place `--chaos` redeploy) is FLAKY on this
|
||||
heaviest stack** — collabora's reconverge race + a transient backend gunicorn-perms/WOPI-404 window
|
||||
mean the redeploy succeeds only sometimes (run 1 yes, run 4 no). The OIDC env change only affects
|
||||
backend/app, so re-converging collabora/onlyoffice is unnecessary exposure. Fix direction (BACKLOG):
|
||||
wire OIDC at INSTALL time (no post-deploy redeploy — like lasuite-docs install_steps), or make the
|
||||
setup redeploy resilient (retry / wait for collabora WOPI discovery 200 before declaring ready).
|
||||
|
||||
**Decision:** NOT claiming Q3.2 — a flaky OIDC setup is not a reliable green, and claiming would risk
|
||||
an Adversary cold-verify FAIL. lasuite-drive stays [~]: test content proven correct (run 1), `--detach`
|
||||
bug fixed, two open issues (disk-blocker on upgrade tier [DEFERRED/operator]; flaky OIDC redeploy
|
||||
[BACKLOG, needs robustness work]). **Pivoting to lighter recipes for broad Phase-2 progress**;
|
||||
lasuite-drive's OIDC robustness + upgrade-disk return later. Host left clean (all stacks torn down,
|
||||
disk 65%, infra healthy).
|
||||
|
||||
Reference in New Issue
Block a user