feat(2): lasuite-drive Q3.2a Part A — wire OIDC at INSTALL, eliminate flaky redeploy
Q3.2a / plan-lasuite-drive-oidc-robustness.md Part A. The old setup_custom_tests.sh did a post-deploy in-place `abra app deploy --force --chaos` of the heavy 12-service stack to apply the OIDC env — flaky (collabora WOPI-discovery race + gunicorn-perms; JOURNAL Step 0). Since the OIDC env only affects backend/app and keycloak is live-warm, provision the per-run realm BEFORE the single deploy and wire OIDC into the .env at install time (no reconverge). - runner/run_recipe_ci.py: new _provision_deps() helper (warm/cold split + SSO enrich + write $CCCI_DEPS_FILE), used by both paths. New per-recipe OIDC_AT_INSTALL meta flag (added to _load_meta whitelist). When set + deps live-warm: provision BEFORE deploy_app; the install tier's install_steps.sh wires OIDC into the single deploy; post-deploy step runs only the MinIO bucket one-shot — no re-provision, no redeploy. Legacy post-deploy path unchanged for all other dep recipes (gated on `not oidc_at_install`). - tests/lasuite-drive/install_steps.sh (NEW): install-time OIDC env + secret wiring; no-ops on empty deps file (recipe still boots, OIDC test skips → F2-11 RED). - tests/lasuite-drive/setup_custom_tests.sh: trimmed to MinIO-bucket-only (OIDC moved out). - tests/lasuite-drive/recipe_meta.py: OIDC_AT_INSTALL = True. - JOURNAL-2: Step-0 root-cause failure logs captured before the fix. NOT a claim — validating 3x green (incl. now-required upgrade tier) before claiming Q3.2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -796,3 +796,33 @@ LIFTING). After cc-ci is healthy I can:
|
||||
3. Resume broad heavy-recipe coverage (immich, lasuite-meet) with real disk headroom.
|
||||
|
||||
Note: with 70GB, I can also be less aggressive about teardown/prune churn between heavy runs.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-29 — lasuite-drive Q3.2a Step 0: root-cause failure logs captured (BEFORE any fix)
|
||||
|
||||
Resuming Q3.2a (plan-lasuite-drive-oidc-robustness.md) after Phase 2pc DONE. The Adversary's
|
||||
cold-verify criterion #1 requires real captured failure logs before any fix. Captured from the
|
||||
flaky run-4 deploy (`/root/.abra/logs/default/lasu-288dfd...2026-05-29T062401Z`, the
|
||||
`abra app deploy --force --chaos` OIDC-setup redeploy that exited 1 / "FATA deploy failed"):
|
||||
|
||||
1. **gunicorn perms race** — `backend [1] [ERROR] Control server error: [Errno 13] Permission
|
||||
denied: '/.gunicorn'`. gunicorn tries to create its control-server temp dir under HOME=`/`
|
||||
(not writable). (Part B fix: set perms / writable HOME in entrypoint before exec gunicorn.)
|
||||
2. **WOPI-discovery race** — `celery RuntimeError: status code 404 return by discovery url for
|
||||
wopi client collabora is invalid` at `/app/wopi/tasks/configure_wopi.py:53`. The celery
|
||||
`configure_wopi_clients` task hits collabora's discovery URL at boot (06:21:54) while collabora
|
||||
is still caching its 132+ l10n files (finishes ~06:24) → 404 → task raises. (Part B fix:
|
||||
collabora WOPI healthcheck gating + backend retry/backoff on discovery.)
|
||||
3. **transient db-not-ready** — `db FATAL: database "drive" does not exist` + celery
|
||||
`Could not connect to database: failed to resolve host 'db'` — early-boot DNS/init races that
|
||||
self-heal; harmless on a fresh deploy with the full TIMEOUT window.
|
||||
|
||||
**Key observation that shapes the fix:** the FIRST install deploy converges reliably **every** run
|
||||
(install: pass in runs 1–4, incl. run 4). Only the post-install in-place `--force --chaos` redeploy
|
||||
(applied to push the OIDC env) is flaky. The OIDC env touches ONLY backend/app — re-converging
|
||||
collabora/onlyoffice/minio is unnecessary exposure. → **Part A: wire OIDC into the .env at INSTALL
|
||||
time (between `abra app new` and the single `abra app deploy`) so the recipe deploys ONCE with OIDC
|
||||
already set; no post-deploy reconverge.** keycloak is live-warm (always up), so the per-run realm is
|
||||
a lightweight API call provisioned before the single deploy. Part B (recipe robustness PR) remains
|
||||
the deeper fix so ANY reconverge (incl. the upgrade-tier prev→PR-head crossover) is race-free.
|
||||
|
||||
Reference in New Issue
Block a user