claim(2): Q3.2 lasuite-drive — full lifecycle 3x green via install-time OIDC + collabora-ready upgrade gate

3× repeat-green (logs /root/ccci-drive-q32a-r2/r3/r4.log): install+upgrade+backup+restore+custom all
pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown each run. Resolves the
Adversary's standing veto-eligible obligation (lasuite-drive upgrade tier GREEN + reliable OIDC).

Fixes: install-time OIDC wiring (a151489: _provision_deps before single deploy + OIDC_AT_INSTALL +
install_steps.sh) eliminated the flaky post-deploy --chaos reconverge; collabora-WOPI-ready upgrade
gate + DEPLOY_TIMEOUT plumbing (4b38b66) fixed the upgrade tier (was killing a still-booting collabora,
exit 70). Gate evidence + cold-verify HOW/EXPECTED/WHERE in STATUS-2.md. BACKLOG-2 Q3.2/Q3.2a ticked;
DEFERRED.md disk follow-on noted done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 11:16:18 +01:00
parent 5e0af07b86
commit 911680f843
3 changed files with 70 additions and 11 deletions

View File

@ -63,7 +63,12 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
lasuite-docs OIDC env wiring (install_steps.sh wires dep keycloak's client_secret +
OIDC env into lasuite-docs's .env at install time). Documented in tests/lasuite-docs/
PARITY.md.
- [~] **Q3.2** — lasuite-drive: enrolled (mirrored). Maximal testable subset GREEN @2026-05-29
- [x] **Q3.2** — lasuite-drive: **FULL LIFECYCLE 3× GREEN @2026-05-29 — CLAIMED (STATUS-2 Gate Q3.2),
awaiting Adversary.** install+upgrade+backup+restore+custom all pass; OIDC password-grant PASSED
(not skip); deploy-count=1; clean teardown; data-integrity (ci_marker) survives upgrade +
backup/restore. Fixed via install-time OIDC (commit `a151489`) + collabora-ready upgrade gate +
DEPLOY_TIMEOUT plumbing (commit `4b38b66`). Logs r2/r3/r4. Original [~] detail retained below.
- [~] **Q3.2 (original)** — lasuite-drive: enrolled (mirrored). Maximal testable subset GREEN @2026-05-29
(`/root/ccci-drive-subset.log`): install (generic+cc-ci test_serving_and_frontend) + backup
(P4 test_backup_captures_state) + restore (P4 test_restore_returns_state) + custom — all 3
functional PASS: test_health_check (parity), test_minio_storage (real S3 upload→list→download→
@ -80,7 +85,15 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
RED). Test assertions are all correct (run 1 proved health+MinIO+OIDC green); the flakiness is in
the redeploy infra. **Two open issues block a reliable Q3.2 green:** (a) [Q3.2a] flaky OIDC
redeploy — see below; (b) upgrade tier disk-blocker (DEFERRED/operator). See JOURNAL-2 2026-05-29.
- [ ] **Q3.2a**Make lasuite-drive OIDC wiring reliable. **PLAN:**
- [x] **Q3.2a****DONE @2026-05-29 (Part A + harness upgrade gate; claimed under Q3.2).** Part A
(install-time OIDC, deploy-once, no mid-run reconverge — real abra only) landed `a151489`;
Step 0 root-cause logs captured (JOURNAL-2). The upgrade-tier flakiness (collabora killed
mid-boot by the chaos redeploy) was fixed in the **harness** via a collabora-WOPI-ready gate in
`pre_upgrade` + DEPLOY_TIMEOUT plumbing (`4b38b66`) — 3× repeat-green, so **Part B (recipe PR)
is NOT required for CI green**. (Part B remains an optional upstream-robustness improvement; may
file separately. The `--chaos` reconverge is now race-free because it replaces a fully-ready
collabora.) Original plan detail retained below.
- [~] **Q3.2a (original plan)** — Make lasuite-drive OIDC wiring reliable. **PLAN:**
`cc-ci-plan/plan-lasuite-drive-oidc-robustness.md` (orchestrator, 2026-05-29). The full
12-service `--chaos` redeploy to apply OIDC env exposes collabora's flaky reconverge (+ transient
backend gunicorn-perms / WOPI-404). Structured as: **Step 0** capture real failure logs first;

View File

@ -165,6 +165,11 @@ before the build is called done) — but does **not** force closure.
run lasuite-drive's FULL lifecycle incl. the upgrade tier GREEN + Adversary cold-verify for the
Q3.2 gate (per the Adversary, the upgrade tier is no longer validly deferrable); then re-confirm
immich/lasuite-meet/lasuite-docs upgrade tiers. Tracked under BACKLOG-2 Q3.2.
**UPDATE @2026-05-29:** lasuite-drive full lifecycle (incl. upgrade tier) is now **3× green**
(commits `a151489` install-time OIDC + `4b38b66` collabora-ready upgrade gate; logs r2/r3/r4);
Q3.2 CLAIMED, awaiting Adversary. The upgrade tier converged cleanly at 64G disk with the
collabora-ready gate (the old 28GB pull-overflow concern below is moot at 64G). Remaining
follow-on: re-confirm immich/lasuite-meet/lasuite-docs upgrade tiers when those recipes' gates run.
- [ ] **What:** The upgrade tier for the heaviest recipes cannot complete on the 28GB host. Proven
on **lasuite-drive**: the prev→PR-head chaos upgrade crosses two multi-GB office image versions
at once — onlyoffice/documentserver-de `9.2 → 9.3.1.2` (3.94GB each) + collabora/code

View File

@ -49,15 +49,9 @@ tree must carry:
- **Q5** — Completeness + docs; flip `## DONE`.
## In flight
**Q3.2a — lasuite-drive OIDC robustness (Part A) — VALIDATING (not claimed).**
Phase 2pc DONE; resumed the Adversary's standing veto-eligible obligation (lasuite-drive upgrade
tier GREEN + reliable OIDC). Step 0 root-cause logs captured (JOURNAL-2; gunicorn `/.gunicorn`
perms + celery WOPI-404 vs collabora boot). Part A landed @ commit `a151489`: OIDC wired at INSTALL
time (provision warm-keycloak realm BEFORE the single deploy; `tests/lasuite-drive/install_steps.sh`
writes OIDC env into the one deploy) — the flaky post-deploy `--force --chaos` 12-service reconverge
is gone. Running the full suite (install+upgrade+backup+restore+custom) on cc-ci to confirm 3× green
incl. the now-required upgrade tier before claiming Q3.2. First run in flight @
`/root/ccci-drive-q32a-r1.log` (install-time OIDC confirmed wired in log head).
**Q3.2 lasuite-drive — CLAIMED (Gate: Q3.2 below), awaiting Adversary.** Full lifecycle 3× green
(install+upgrade+backup+restore+custom incl. OIDC). Resolves the Adversary's standing veto-eligible
obligation (upgrade tier GREEN + reliable OIDC). Working next unblocked item meanwhile.
---
**Q3 + Q4 — recipe enrollment sprint.** After capacity unblock + Adversary checkpoint, landed:
@ -113,6 +107,53 @@ SKIP no longer yields a GREEN `!testme`.
straight-line read→sum→predicate→overall wiring is unexercised by a live deploy.
## Gate
**Gate: Q3.2 lasuite-drive — CLAIMED @2026-05-29, awaiting Adversary.**
**WHAT.** lasuite-drive (the heaviest Phase-2 stack: 12 services incl. collabora + onlyoffice +
minio/S3 + postgres, OIDC-dependent) now runs its **full lifecycle GREEN, repeatably** — install +
upgrade (prev→PR-head chaos crossover) + backup + restore + custom (health + MinIO round-trip + OIDC
password-grant), via **two fixes**:
1. **Install-time OIDC wiring** (commit `a151489`) — the orchestrator provisions the per-run realm on
the live-warm keycloak BEFORE the single `abra app deploy`, and `tests/lasuite-drive/install_steps.sh`
writes the OIDC env + client secret into that one deploy. This **eliminates the flaky post-deploy
`--force --chaos` 12-service reconverge** the old `setup_custom_tests.sh` did (collabora WOPI-discovery
race; JOURNAL Step 0). New per-recipe `OIDC_AT_INSTALL` meta flag + reusable `_provision_deps()`
helper; legacy post-deploy path unchanged for all other dep recipes (gated on `not oidc_at_install`).
2. **collabora-ready upgrade gate + DEPLOY_TIMEOUT plumbing** (commit `4b38b66`) — `ops.py::pre_upgrade`
waits for collabora WOPI discovery (`/hosting/discovery` on `collabora-<domain>`) → 200 BEFORE the
chaos redeploy, so it no longer SIGTERMs a still-booting collabora (which caused exit 70 / "FATA
deploy failed" in run 1); `DEPLOY_TIMEOUT` now threads to the upgrade `chaos_redeploy` (was abra's
900s default vs the .env internal TIMEOUT 1500s).
**HOW (Adversary, cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && git pull && RECIPE=lasuite-drive PR=0 cc-ci-run runner/run_recipe_ci.py'
```
**EXPECTED:**
- RUN SUMMARY: `deploy-count = 1 (expect 1)`; `install/upgrade/backup/restore/custom` **all `pass`**.
- `tests/lasuite-drive/functional/test_oidc_with_keycloak.py::test_oidc_password_grant_against_dep_keycloak`
**PASSED** (NOT skipped) — real password-grant JWT against a per-run realm on warm keycloak.
- `test_minio_storage` PASSED (real S3 upload→list→cat readback round-trip inside the minio container).
- Data-integrity: `test_upgrade_preserves_data` (ci_marker survives prev→PR-head chaos crossover) +
backup/restore ci_marker survive.
- Log shows `install-time OIDC: deps provisioned` + `install_steps: OIDC env wired` (no post-deploy
reconverge) and `pre_upgrade: collabora WOPI discovery ready (200)` before the upgrade redeploy.
- Clean teardown: post-run `docker stack ls | grep lasu` and `docker volume ls | grep lasu` both empty.
**WHERE.** Commits `a151489` (Part A) + `4b38b66` (upgrade gate). Files: `runner/run_recipe_ci.py`
(`_provision_deps`, `OIDC_AT_INSTALL` branch, `_perform_op` timeout), `runner/harness/lifecycle.py`
(`chaos_redeploy` timeout), `runner/harness/generic.py` (`perform_upgrade` timeout),
`tests/lasuite-drive/{install_steps.sh,setup_custom_tests.sh,ops.py,recipe_meta.py}`.
**3× repeat-green** (flakiness gone, not absent-once): `/root/ccci-drive-q32a-r2.log`,
`…-r3.log`, `…-r4.log` — each full-suite green, deploy-count=1, OIDC PASSED, clean teardown
(run 1 `…-r1.log` showed the upgrade-tier failure that `4b38b66` fixed). Step-0 root-cause logs in
JOURNAL-2 (2026-05-29). DEFERRED.md disk-blocker entry CLOSED (host grew to 64G); flaky-OIDC
BACKLOG-2 Q3.2a item now resolved.
---
**Gate: Q2 — Adversary PASS @2026-05-28** (REVIEW-2 `## Q2 — PASS @2026-05-28 (re-verify after
F2-5 fix + F2-6 collateral resolution)`; cold e2e on `/root/adv-verify` HEAD `874bfbb`:
deploy-count=2, all 5 assertions PASS, DEPS teardown clean, post-run docker stack/volume/secret