status(2): lasuite-drive Q3.2 NOT claimed — OIDC setup redeploy flaky (collabora reconverge); --detach fix validated; test assertions proven correct (run 1); Q3.2a robustness item added; prune-during-deploy lesson recorded
This commit is contained in:
@ -75,8 +75,17 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
|
||||
conditional/deferred sign-off (NOT a §7.1 waiver): upgrade tier stays a veto-eligible open item,
|
||||
must run green + cold-verified before Phase-2 DONE once disk grows. Bug fixed en route: `fix(2)`
|
||||
`f1c626c` — setup_custom_tests `docker service scale --detach` (the run-once minio-createbuckets
|
||||
job made a blocking scale hang the custom tier). Clean unassisted re-run with the fix in flight
|
||||
before the formal Q3.2 claim.
|
||||
job made a blocking scale hang the custom tier). **NOT CLAIMED — OIDC setup is FLAKY:** the
|
||||
step-3 in-place full-stack `abra app deploy --force --chaos` (applies OIDC env) only converges
|
||||
sometimes on this heaviest 12-service stack (run 1 OK → OIDC PASS; run 4 FAIL → OIDC SKIP → F2-11
|
||||
RED). Test assertions are all correct (run 1 proved health+MinIO+OIDC green); the flakiness is in
|
||||
the redeploy infra. **Two open issues block a reliable Q3.2 green:** (a) [Q3.2a] flaky OIDC
|
||||
redeploy — see below; (b) upgrade tier disk-blocker (DEFERRED/operator). See JOURNAL-2 2026-05-29.
|
||||
- [ ] **Q3.2a** — Make lasuite-drive OIDC wiring reliable. The full 12-service `--chaos` redeploy to
|
||||
apply OIDC env exposes collabora's flaky reconverge (+ transient backend gunicorn-perms / celery
|
||||
WOPI-404). Fix direction: wire OIDC at INSTALL time (install_steps, no post-deploy redeploy — the
|
||||
lasuite-docs model) OR make setup_custom_tests redeploy resilient (retry + wait for collabora WOPI
|
||||
discovery 200 before ready). Then re-run subset to a reliable green before claiming Q3.2.
|
||||
- [ ] **Q3.3** — lasuite-meet: parity (health_check, oidc_login, meeting_flow, webrtc-media,
|
||||
webrtc-relay) + specific (create-a-room, two-user LiveKit token issuance, ICE-candidate gathering).
|
||||
- [~] **Q3.4** — cryptpad: parity port (health_check) ✓ + 2 NEW recipe-specific
|
||||
|
||||
@ -678,3 +678,39 @@ lasuite-drive's actual Q3.2 CONTENT works: parity health, the real MinIO S3 uplo
|
||||
round-trip, and the OIDC password-grant + JWT-claims flow against the dep keycloak. Per §7.1 the
|
||||
maximal subset is implemented and only the genuinely-disk-blocked upgrade tier is outstanding —
|
||||
pending Adversary sign-off on the env-blocker.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-29 — lasuite-drive: --detach fix validated, but OIDC setup redeploy is FLAKY (NOT claiming Q3.2 yet)
|
||||
|
||||
Ran lasuite-drive maximal subset (install,backup,restore,custom) four times today:
|
||||
- **Run 1** (`ccci-drive-subset.log`): all tiers + all 3 functional GREEN (health, MinIO round-trip,
|
||||
OIDC JWT) — but required a manual kill of the hung `docker service scale` (the bug I then fixed with
|
||||
`--detach`, commit `f1c626c`). So the test ASSERTIONS are all correct and CAN pass.
|
||||
- **Runs 2 & 3** (`-clean`, `-clean2`): corrupted by MY OWN over-eager `docker image prune -f` mid-deploy
|
||||
— it removed the just-pulled, not-yet-attached digest-pinned images (drive-frontend, onlyoffice),
|
||||
so swarm rejected with "No such image" and install failed/timed out. **LESSON: never
|
||||
`docker image prune` during an active deploy — mid-pull images look dangling and get removed.**
|
||||
Confirmed self-inflicted: `docker pull lasuite/drive-frontend@sha256:eeef…` succeeded (image is on
|
||||
hub), and after seeding it the stack converged. Not a recipe/test issue.
|
||||
- **Run 4** (`-clean3`, warm images, hands-off, fixed `--detach`): install/backup/restore all PASS,
|
||||
health + MinIO PASS, **but the OIDC test SKIPPED because `setup_custom_tests.sh` exited 1** — its
|
||||
step-3 in-place `abra app deploy --force --chaos` (applies the OIDC env) FAILED to converge
|
||||
("FATA deploy failed"; abra log shows backend `Permission denied: /.gunicorn` + celery
|
||||
`configure_wopi: 404 from collabora discovery url`). Per F2-11 the run correctly went RED (no false
|
||||
green) — `custom: pass (1 requires_deps SKIPPED — SSO UNVERIFIED)`, overall=1. The `--detach` fix
|
||||
itself works (bucket scale returned, secret inserted v2); the failure is the full-stack redeploy.
|
||||
|
||||
**Root finding: the OIDC-wiring step (a full 12-service in-place `--chaos` redeploy) is FLAKY on this
|
||||
heaviest stack** — collabora's reconverge race + a transient backend gunicorn-perms/WOPI-404 window
|
||||
mean the redeploy succeeds only sometimes (run 1 yes, run 4 no). The OIDC env change only affects
|
||||
backend/app, so re-converging collabora/onlyoffice is unnecessary exposure. Fix direction (BACKLOG):
|
||||
wire OIDC at INSTALL time (no post-deploy redeploy — like lasuite-docs install_steps), or make the
|
||||
setup redeploy resilient (retry / wait for collabora WOPI discovery 200 before declaring ready).
|
||||
|
||||
**Decision:** NOT claiming Q3.2 — a flaky OIDC setup is not a reliable green, and claiming would risk
|
||||
an Adversary cold-verify FAIL. lasuite-drive stays [~]: test content proven correct (run 1), `--detach`
|
||||
bug fixed, two open issues (disk-blocker on upgrade tier [DEFERRED/operator]; flaky OIDC redeploy
|
||||
[BACKLOG, needs robustness work]). **Pivoting to lighter recipes for broad Phase-2 progress**;
|
||||
lasuite-drive's OIDC robustness + upgrade-disk return later. Host left clean (all stacks torn down,
|
||||
disk 65%, infra healthy).
|
||||
|
||||
Reference in New Issue
Block a user