From 911680f84392ef9adc29b6e43bd60c2e245b0336 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Fri, 29 May 2026 11:16:18 +0100 Subject: [PATCH] =?UTF-8?q?claim(2):=20Q3.2=20lasuite-drive=20=E2=80=94=20?= =?UTF-8?q?full=20lifecycle=203x=20green=20via=20install-time=20OIDC=20+?= =?UTF-8?q?=20collabora-ready=20upgrade=20gate?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 3× repeat-green (logs /root/ccci-drive-q32a-r2/r3/r4.log): install+upgrade+backup+restore+custom all pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown each run. Resolves the Adversary's standing veto-eligible obligation (lasuite-drive upgrade tier GREEN + reliable OIDC). Fixes: install-time OIDC wiring (a151489: _provision_deps before single deploy + OIDC_AT_INSTALL + install_steps.sh) eliminated the flaky post-deploy --chaos reconverge; collabora-WOPI-ready upgrade gate + DEPLOY_TIMEOUT plumbing (4b38b66) fixed the upgrade tier (was killing a still-booting collabora, exit 70). Gate evidence + cold-verify HOW/EXPECTED/WHERE in STATUS-2.md. BACKLOG-2 Q3.2/Q3.2a ticked; DEFERRED.md disk follow-on noted done. Co-Authored-By: Claude Opus 4.8 (1M context) --- machine-docs/BACKLOG-2.md | 17 +++++++++-- machine-docs/DEFERRED.md | 5 ++++ machine-docs/STATUS-2.md | 59 +++++++++++++++++++++++++++++++++------ 3 files changed, 70 insertions(+), 11 deletions(-) diff --git a/machine-docs/BACKLOG-2.md b/machine-docs/BACKLOG-2.md index fcd2209..3b7cd43 100644 --- a/machine-docs/BACKLOG-2.md +++ b/machine-docs/BACKLOG-2.md @@ -63,7 +63,12 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md` lasuite-docs OIDC env wiring (install_steps.sh wires dep keycloak's client_secret + OIDC env into lasuite-docs's .env at install time). Documented in tests/lasuite-docs/ PARITY.md. -- [~] **Q3.2** — lasuite-drive: enrolled (mirrored). Maximal testable subset GREEN @2026-05-29 +- [x] **Q3.2** — lasuite-drive: **FULL LIFECYCLE 3× GREEN @2026-05-29 — CLAIMED (STATUS-2 Gate Q3.2), + awaiting Adversary.** install+upgrade+backup+restore+custom all pass; OIDC password-grant PASSED + (not skip); deploy-count=1; clean teardown; data-integrity (ci_marker) survives upgrade + + backup/restore. Fixed via install-time OIDC (commit `a151489`) + collabora-ready upgrade gate + + DEPLOY_TIMEOUT plumbing (commit `4b38b66`). Logs r2/r3/r4. Original [~] detail retained below. +- [~] **Q3.2 (original)** — lasuite-drive: enrolled (mirrored). Maximal testable subset GREEN @2026-05-29 (`/root/ccci-drive-subset.log`): install (generic+cc-ci test_serving_and_frontend) + backup (P4 test_backup_captures_state) + restore (P4 test_restore_returns_state) + custom — all 3 functional PASS: test_health_check (parity), test_minio_storage (real S3 upload→list→download→ @@ -80,7 +85,15 @@ Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md` RED). Test assertions are all correct (run 1 proved health+MinIO+OIDC green); the flakiness is in the redeploy infra. **Two open issues block a reliable Q3.2 green:** (a) [Q3.2a] flaky OIDC redeploy — see below; (b) upgrade tier disk-blocker (DEFERRED/operator). See JOURNAL-2 2026-05-29. -- [ ] **Q3.2a** — Make lasuite-drive OIDC wiring reliable. **PLAN:** +- [x] **Q3.2a** — **DONE @2026-05-29 (Part A + harness upgrade gate; claimed under Q3.2).** Part A + (install-time OIDC, deploy-once, no mid-run reconverge — real abra only) landed `a151489`; + Step 0 root-cause logs captured (JOURNAL-2). The upgrade-tier flakiness (collabora killed + mid-boot by the chaos redeploy) was fixed in the **harness** via a collabora-WOPI-ready gate in + `pre_upgrade` + DEPLOY_TIMEOUT plumbing (`4b38b66`) — 3× repeat-green, so **Part B (recipe PR) + is NOT required for CI green**. (Part B remains an optional upstream-robustness improvement; may + file separately. The `--chaos` reconverge is now race-free because it replaces a fully-ready + collabora.) Original plan detail retained below. +- [~] **Q3.2a (original plan)** — Make lasuite-drive OIDC wiring reliable. **PLAN:** `cc-ci-plan/plan-lasuite-drive-oidc-robustness.md` (orchestrator, 2026-05-29). The full 12-service `--chaos` redeploy to apply OIDC env exposes collabora's flaky reconverge (+ transient backend gunicorn-perms / WOPI-404). Structured as: **Step 0** capture real failure logs first; diff --git a/machine-docs/DEFERRED.md b/machine-docs/DEFERRED.md index 6044107..38f469b 100644 --- a/machine-docs/DEFERRED.md +++ b/machine-docs/DEFERRED.md @@ -165,6 +165,11 @@ before the build is called done) — but does **not** force closure. run lasuite-drive's FULL lifecycle incl. the upgrade tier GREEN + Adversary cold-verify for the Q3.2 gate (per the Adversary, the upgrade tier is no longer validly deferrable); then re-confirm immich/lasuite-meet/lasuite-docs upgrade tiers. Tracked under BACKLOG-2 Q3.2. + **UPDATE @2026-05-29:** lasuite-drive full lifecycle (incl. upgrade tier) is now **3× green** + (commits `a151489` install-time OIDC + `4b38b66` collabora-ready upgrade gate; logs r2/r3/r4); + Q3.2 CLAIMED, awaiting Adversary. The upgrade tier converged cleanly at 64G disk with the + collabora-ready gate (the old 28GB pull-overflow concern below is moot at 64G). Remaining + follow-on: re-confirm immich/lasuite-meet/lasuite-docs upgrade tiers when those recipes' gates run. - [ ] **What:** The upgrade tier for the heaviest recipes cannot complete on the 28GB host. Proven on **lasuite-drive**: the prev→PR-head chaos upgrade crosses two multi-GB office image versions at once — onlyoffice/documentserver-de `9.2 → 9.3.1.2` (3.94GB each) + collabora/code diff --git a/machine-docs/STATUS-2.md b/machine-docs/STATUS-2.md index fc9bed5..5939a03 100644 --- a/machine-docs/STATUS-2.md +++ b/machine-docs/STATUS-2.md @@ -49,15 +49,9 @@ tree must carry: - **Q5** — Completeness + docs; flip `## DONE`. ## In flight -**Q3.2a — lasuite-drive OIDC robustness (Part A) — VALIDATING (not claimed).** -Phase 2pc DONE; resumed the Adversary's standing veto-eligible obligation (lasuite-drive upgrade -tier GREEN + reliable OIDC). Step 0 root-cause logs captured (JOURNAL-2; gunicorn `/.gunicorn` -perms + celery WOPI-404 vs collabora boot). Part A landed @ commit `a151489`: OIDC wired at INSTALL -time (provision warm-keycloak realm BEFORE the single deploy; `tests/lasuite-drive/install_steps.sh` -writes OIDC env into the one deploy) — the flaky post-deploy `--force --chaos` 12-service reconverge -is gone. Running the full suite (install+upgrade+backup+restore+custom) on cc-ci to confirm 3× green -incl. the now-required upgrade tier before claiming Q3.2. First run in flight @ -`/root/ccci-drive-q32a-r1.log` (install-time OIDC confirmed wired in log head). +**Q3.2 lasuite-drive — CLAIMED (Gate: Q3.2 below), awaiting Adversary.** Full lifecycle 3× green +(install+upgrade+backup+restore+custom incl. OIDC). Resolves the Adversary's standing veto-eligible +obligation (upgrade tier GREEN + reliable OIDC). Working next unblocked item meanwhile. --- **Q3 + Q4 — recipe enrollment sprint.** After capacity unblock + Adversary checkpoint, landed: @@ -113,6 +107,53 @@ SKIP no longer yields a GREEN `!testme`. straight-line read→sum→predicate→overall wiring is unexercised by a live deploy. ## Gate + +**Gate: Q3.2 lasuite-drive — CLAIMED @2026-05-29, awaiting Adversary.** + +**WHAT.** lasuite-drive (the heaviest Phase-2 stack: 12 services incl. collabora + onlyoffice + +minio/S3 + postgres, OIDC-dependent) now runs its **full lifecycle GREEN, repeatably** — install + +upgrade (prev→PR-head chaos crossover) + backup + restore + custom (health + MinIO round-trip + OIDC +password-grant), via **two fixes**: +1. **Install-time OIDC wiring** (commit `a151489`) — the orchestrator provisions the per-run realm on + the live-warm keycloak BEFORE the single `abra app deploy`, and `tests/lasuite-drive/install_steps.sh` + writes the OIDC env + client secret into that one deploy. This **eliminates the flaky post-deploy + `--force --chaos` 12-service reconverge** the old `setup_custom_tests.sh` did (collabora WOPI-discovery + race; JOURNAL Step 0). New per-recipe `OIDC_AT_INSTALL` meta flag + reusable `_provision_deps()` + helper; legacy post-deploy path unchanged for all other dep recipes (gated on `not oidc_at_install`). +2. **collabora-ready upgrade gate + DEPLOY_TIMEOUT plumbing** (commit `4b38b66`) — `ops.py::pre_upgrade` + waits for collabora WOPI discovery (`/hosting/discovery` on `collabora-`) → 200 BEFORE the + chaos redeploy, so it no longer SIGTERMs a still-booting collabora (which caused exit 70 / "FATA + deploy failed" in run 1); `DEPLOY_TIMEOUT` now threads to the upgrade `chaos_redeploy` (was abra's + 900s default vs the .env internal TIMEOUT 1500s). + +**HOW (Adversary, cold, on cc-ci):** +``` +ssh cc-ci 'cd /root/ && git pull && RECIPE=lasuite-drive PR=0 cc-ci-run runner/run_recipe_ci.py' +``` + +**EXPECTED:** +- RUN SUMMARY: `deploy-count = 1 (expect 1)`; `install/upgrade/backup/restore/custom` **all `pass`**. +- `tests/lasuite-drive/functional/test_oidc_with_keycloak.py::test_oidc_password_grant_against_dep_keycloak` + **PASSED** (NOT skipped) — real password-grant JWT against a per-run realm on warm keycloak. +- `test_minio_storage` PASSED (real S3 upload→list→cat readback round-trip inside the minio container). +- Data-integrity: `test_upgrade_preserves_data` (ci_marker survives prev→PR-head chaos crossover) + + backup/restore ci_marker survive. +- Log shows `install-time OIDC: deps provisioned` + `install_steps: OIDC env wired` (no post-deploy + reconverge) and `pre_upgrade: collabora WOPI discovery ready (200)` before the upgrade redeploy. +- Clean teardown: post-run `docker stack ls | grep lasu` and `docker volume ls | grep lasu` both empty. + +**WHERE.** Commits `a151489` (Part A) + `4b38b66` (upgrade gate). Files: `runner/run_recipe_ci.py` +(`_provision_deps`, `OIDC_AT_INSTALL` branch, `_perform_op` timeout), `runner/harness/lifecycle.py` +(`chaos_redeploy` timeout), `runner/harness/generic.py` (`perform_upgrade` timeout), +`tests/lasuite-drive/{install_steps.sh,setup_custom_tests.sh,ops.py,recipe_meta.py}`. +**3× repeat-green** (flakiness gone, not absent-once): `/root/ccci-drive-q32a-r2.log`, +`…-r3.log`, `…-r4.log` — each full-suite green, deploy-count=1, OIDC PASSED, clean teardown +(run 1 `…-r1.log` showed the upgrade-tier failure that `4b38b66` fixed). Step-0 root-cause logs in +JOURNAL-2 (2026-05-29). DEFERRED.md disk-blocker entry CLOSED (host grew to 64G); flaky-OIDC +BACKLOG-2 Q3.2a item now resolved. + +--- + **Gate: Q2 — Adversary PASS @2026-05-28** (REVIEW-2 `## Q2 — PASS @2026-05-28 (re-verify after F2-5 fix + F2-6 collateral resolution)`; cold e2e on `/root/adv-verify` HEAD `874bfbb`: deploy-count=2, all 5 assertions PASS, DEPS teardown clean, post-run docker stack/volume/secret