From 7a87dc02b10ae00ad6ec92e4fa0289093ac17f93 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Fri, 29 May 2026 12:58:36 +0100 Subject: [PATCH] plan: lasuite-drive recipe-robustness PR sub-plan (collabora healthcheck + perms + lazy OIDC) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Operator (2026-05-29): dedicated sub-plan for the upstream recipe PR. Fixes collabora WOPI healthcheck/start_period (keystone — fixes F2-12 at the source so cc-ci can return to abra-native convergence + drop the -c/READY_PROBE backstop), backend WOPI retry, gunicorn-perms race, lazy OIDC. PR is 'working' only when cc-ci runs the full suite incl. upgrade tier green + Adversary cold-verify, then operator merges. Broken out from plan-lasuite-drive-oidc-robustness.md Part B (now points here). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../plan-lasuite-drive-oidc-robustness.md | 4 ++ cc-ci-plan/plan-lasuite-drive-recipe-pr.md | 72 +++++++++++++++++++ 2 files changed, 76 insertions(+) create mode 100644 cc-ci-plan/plan-lasuite-drive-recipe-pr.md diff --git a/cc-ci-plan/plan-lasuite-drive-oidc-robustness.md b/cc-ci-plan/plan-lasuite-drive-oidc-robustness.md index 686ad6e..8155d53 100644 --- a/cc-ci-plan/plan-lasuite-drive-oidc-robustness.md +++ b/cc-ci-plan/plan-lasuite-drive-oidc-robustness.md @@ -52,6 +52,10 @@ dep is already running, so configure OIDC **before the single deploy**: ## Part B — lasuite-drive recipe PR (root-cause robustness, we're maintainers) +> **Broken out into its own sub-plan: `plan-lasuite-drive-recipe-pr.md`** (operator, 2026-05-29) — +> it now also folds in the F2-12 upgrade-convergence fix (collabora healthcheck) so cc-ci can return +> to abra-native convergence. The summary below is retained; the sub-plan is authoritative. + Fix the fragility in the recipe itself so it's robust under ANY reconverge — this helps real operators, not just CI, and is the real payoff of maintaining the recipe. On a branch of the lasuite-drive recipe (recipe-maintainer): diff --git a/cc-ci-plan/plan-lasuite-drive-recipe-pr.md b/cc-ci-plan/plan-lasuite-drive-recipe-pr.md new file mode 100644 index 0000000..458a587 --- /dev/null +++ b/cc-ci-plan/plan-lasuite-drive-recipe-pr.md @@ -0,0 +1,72 @@ +# Sub-plan — lasuite-drive recipe robustness PR (fix the root cause upstream) + +**Status:** QUEUED — a **recipe-maintainer PR to the lasuite-drive recipe** (we maintain it). Picks up +after the Q3.2 lasuite-drive test work settles. Complements — and largely **obsoletes** — the +CI-side workarounds the harness currently uses for lasuite-drive's fragility. +**Owner:** Builder + Adversary loops. **This file:** `/srv/cc-ci/cc-ci-plan/plan-lasuite-drive-recipe-pr.md` +**Relationship:** this is the **recipe-side** deliverable. The cc-ci **harness-side** OIDC-at-install +work is `plan-lasuite-drive-oidc-robustness.md` Part A; this sub-plan is its Part B, broken out. + +--- + +## 0. Why (CI surfaced real recipe bugs — fix them at the source) + +cc-ci has surfaced genuine fragility in the lasuite-drive recipe that a **real operator would also +hit**, currently papered over by CI-side workarounds: +- **Install-time:** backend comes up before collabora's WOPI discovery is ready → transient + **WOPI-404** + a **gunicorn-perms** startup race. The flaky 12-service `--chaos` OIDC redeploy. +- **Upgrade-time (F2-12):** upgrading to the heavier new collabora (25.04.9.4.1) **does not converge + within abra's monitor window** → abra FATAs. The harness currently works around this by skipping + abra's convergence monitor (`-c`) and using its own collabora WOPI-200 `READY_PROBE`. + +These are recipe defects. Fixing them upstream helps every lasuite-drive operator **and** lets cc-ci +**go back to abra's native convergence** (per the guardrail "prefer abra convergence; custom probe +only when necessary") — turning the harness `-c`/READY_PROBE from a *necessity* into a *backstop*. + +## 1. The fixes (lasuite-drive recipe) + +1. **Collabora healthcheck + start_period (the keystone).** Add a real Docker **healthcheck** to the + collabora service — WOPI discovery endpoint returns 200 — with a `start_period` generous enough + for the heavy 25.04 image to boot. Effect: (a) swarm/abra see collabora as *unhealthy until WOPI + is actually up*, so **abra's own convergence monitor waits correctly** (fixes F2-12 at the source + — no `-c` skip needed); (b) the install-time WOPI-404 window closes because dependents can gate on + collabora health. +2. **Backend tolerates / waits for collabora WOPI.** Make backend **retry WOPI discovery with + backoff** (and/or order it behind collabora health) instead of failing on the transient 404. +3. **Fix the gunicorn-perms startup race.** Set the volume permissions in the backend entrypoint + (or an init step) **before** exec'ing gunicorn, so there's no read/write race on a freshly-mounted + volume at startup. +4. **Lazy / retrying OIDC discovery.** Backend resolves the OIDC provider **at first login with + retry**, not eagerly at boot — so the app boots cleanly with OIDC env set even if the provider + isn't reachable yet. (This is also what the harness-side OIDC-at-install pattern relies on, and + what keeps the generic-first invariant safe.) + +## 2. Mechanics — branch, PR, and the merge rule + +- Make the change on a **lasuite-drive recipe branch** and open a PR via the **`recipe-create-pr` + skill** (`/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md`) — mirror to + `git.autonomic.zone/recipe-maintainers/lasuite-drive` as needed; upstream is + `git.coopcloud.tech`. +- **The PR is "working" ONLY when cc-ci verifies it green** (operator rule): trigger cc-ci + (`!testme` on the lasuite-drive PR) and require the **full suite incl. the UPGRADE tier** to pass + **repeatedly-green** (not a one-off), **Adversary cold-verified**. **Only then does the operator + merge.** This dogfoods cc-ci: the CI that found the bugs gates the fix. + +## 3. Definition of done +- [ ] Recipe branch with fixes #1–#4; PR opened (recipe-create-pr). +- [ ] **cc-ci runs the full suite (install + upgrade + backup + restore + custom/OIDC) on the PR, + repeatedly green, Adversary cold-verified.** +- [ ] **Root-cause proof:** with the collabora healthcheck in place, demonstrate the upgrade tier + passes under **abra's NATIVE convergence** (i.e. drop `-c` for lasuite-drive and it still + converges + stays green) — confirming the recipe fix resolved F2-12 at the source. If it still + needs the harness backstop, say so honestly (record why). +- [ ] Operator merges the recipe PR. Then: cc-ci can **revert the lasuite-drive `-c`/READY_PROBE + workaround to abra-native convergence** (per the guardrail), and close the lasuite-drive flaky + items. + +## 4. Guardrails +- **Don't weaken any test** to make the PR pass — the fixes must make the recipe genuinely robust, + proven by repeated-green cc-ci runs, not by loosening assertions. +- **Real abra path** throughout (no docker-level bypass). +- **Bounded** — the four targeted robustness fixes; not a recipe rewrite. Bigger recipe improvements + → upstream issues / IDEAS, not this PR.