plan: lasuite-drive recipe-robustness PR sub-plan (collabora healthcheck + perms + lazy OIDC)
Operator (2026-05-29): dedicated sub-plan for the upstream recipe PR. Fixes collabora WOPI healthcheck/start_period (keystone — fixes F2-12 at the source so cc-ci can return to abra-native convergence + drop the -c/READY_PROBE backstop), backend WOPI retry, gunicorn-perms race, lazy OIDC. PR is 'working' only when cc-ci runs the full suite incl. upgrade tier green + Adversary cold-verify, then operator merges. Broken out from plan-lasuite-drive-oidc-robustness.md Part B (now points here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -52,6 +52,10 @@ dep is already running, so configure OIDC **before the single deploy**:
|
||||
|
||||
## Part B — lasuite-drive recipe PR (root-cause robustness, we're maintainers)
|
||||
|
||||
> **Broken out into its own sub-plan: `plan-lasuite-drive-recipe-pr.md`** (operator, 2026-05-29) —
|
||||
> it now also folds in the F2-12 upgrade-convergence fix (collabora healthcheck) so cc-ci can return
|
||||
> to abra-native convergence. The summary below is retained; the sub-plan is authoritative.
|
||||
|
||||
Fix the fragility in the recipe itself so it's robust under ANY reconverge — this helps real
|
||||
operators, not just CI, and is the real payoff of maintaining the recipe. On a branch of the
|
||||
lasuite-drive recipe (recipe-maintainer):
|
||||
|
||||
72
cc-ci-plan/plan-lasuite-drive-recipe-pr.md
Normal file
72
cc-ci-plan/plan-lasuite-drive-recipe-pr.md
Normal file
@ -0,0 +1,72 @@
|
||||
# Sub-plan — lasuite-drive recipe robustness PR (fix the root cause upstream)
|
||||
|
||||
**Status:** QUEUED — a **recipe-maintainer PR to the lasuite-drive recipe** (we maintain it). Picks up
|
||||
after the Q3.2 lasuite-drive test work settles. Complements — and largely **obsoletes** — the
|
||||
CI-side workarounds the harness currently uses for lasuite-drive's fragility.
|
||||
**Owner:** Builder + Adversary loops. **This file:** `/srv/cc-ci/cc-ci-plan/plan-lasuite-drive-recipe-pr.md`
|
||||
**Relationship:** this is the **recipe-side** deliverable. The cc-ci **harness-side** OIDC-at-install
|
||||
work is `plan-lasuite-drive-oidc-robustness.md` Part A; this sub-plan is its Part B, broken out.
|
||||
|
||||
---
|
||||
|
||||
## 0. Why (CI surfaced real recipe bugs — fix them at the source)
|
||||
|
||||
cc-ci has surfaced genuine fragility in the lasuite-drive recipe that a **real operator would also
|
||||
hit**, currently papered over by CI-side workarounds:
|
||||
- **Install-time:** backend comes up before collabora's WOPI discovery is ready → transient
|
||||
**WOPI-404** + a **gunicorn-perms** startup race. The flaky 12-service `--chaos` OIDC redeploy.
|
||||
- **Upgrade-time (F2-12):** upgrading to the heavier new collabora (25.04.9.4.1) **does not converge
|
||||
within abra's monitor window** → abra FATAs. The harness currently works around this by skipping
|
||||
abra's convergence monitor (`-c`) and using its own collabora WOPI-200 `READY_PROBE`.
|
||||
|
||||
These are recipe defects. Fixing them upstream helps every lasuite-drive operator **and** lets cc-ci
|
||||
**go back to abra's native convergence** (per the guardrail "prefer abra convergence; custom probe
|
||||
only when necessary") — turning the harness `-c`/READY_PROBE from a *necessity* into a *backstop*.
|
||||
|
||||
## 1. The fixes (lasuite-drive recipe)
|
||||
|
||||
1. **Collabora healthcheck + start_period (the keystone).** Add a real Docker **healthcheck** to the
|
||||
collabora service — WOPI discovery endpoint returns 200 — with a `start_period` generous enough
|
||||
for the heavy 25.04 image to boot. Effect: (a) swarm/abra see collabora as *unhealthy until WOPI
|
||||
is actually up*, so **abra's own convergence monitor waits correctly** (fixes F2-12 at the source
|
||||
— no `-c` skip needed); (b) the install-time WOPI-404 window closes because dependents can gate on
|
||||
collabora health.
|
||||
2. **Backend tolerates / waits for collabora WOPI.** Make backend **retry WOPI discovery with
|
||||
backoff** (and/or order it behind collabora health) instead of failing on the transient 404.
|
||||
3. **Fix the gunicorn-perms startup race.** Set the volume permissions in the backend entrypoint
|
||||
(or an init step) **before** exec'ing gunicorn, so there's no read/write race on a freshly-mounted
|
||||
volume at startup.
|
||||
4. **Lazy / retrying OIDC discovery.** Backend resolves the OIDC provider **at first login with
|
||||
retry**, not eagerly at boot — so the app boots cleanly with OIDC env set even if the provider
|
||||
isn't reachable yet. (This is also what the harness-side OIDC-at-install pattern relies on, and
|
||||
what keeps the generic-first invariant safe.)
|
||||
|
||||
## 2. Mechanics — branch, PR, and the merge rule
|
||||
|
||||
- Make the change on a **lasuite-drive recipe branch** and open a PR via the **`recipe-create-pr`
|
||||
skill** (`/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md`) — mirror to
|
||||
`git.autonomic.zone/recipe-maintainers/lasuite-drive` as needed; upstream is
|
||||
`git.coopcloud.tech`.
|
||||
- **The PR is "working" ONLY when cc-ci verifies it green** (operator rule): trigger cc-ci
|
||||
(`!testme` on the lasuite-drive PR) and require the **full suite incl. the UPGRADE tier** to pass
|
||||
**repeatedly-green** (not a one-off), **Adversary cold-verified**. **Only then does the operator
|
||||
merge.** This dogfoods cc-ci: the CI that found the bugs gates the fix.
|
||||
|
||||
## 3. Definition of done
|
||||
- [ ] Recipe branch with fixes #1–#4; PR opened (recipe-create-pr).
|
||||
- [ ] **cc-ci runs the full suite (install + upgrade + backup + restore + custom/OIDC) on the PR,
|
||||
repeatedly green, Adversary cold-verified.**
|
||||
- [ ] **Root-cause proof:** with the collabora healthcheck in place, demonstrate the upgrade tier
|
||||
passes under **abra's NATIVE convergence** (i.e. drop `-c` for lasuite-drive and it still
|
||||
converges + stays green) — confirming the recipe fix resolved F2-12 at the source. If it still
|
||||
needs the harness backstop, say so honestly (record why).
|
||||
- [ ] Operator merges the recipe PR. Then: cc-ci can **revert the lasuite-drive `-c`/READY_PROBE
|
||||
workaround to abra-native convergence** (per the guardrail), and close the lasuite-drive flaky
|
||||
items.
|
||||
|
||||
## 4. Guardrails
|
||||
- **Don't weaken any test** to make the PR pass — the fixes must make the recipe genuinely robust,
|
||||
proven by repeated-green cc-ci runs, not by loosening assertions.
|
||||
- **Real abra path** throughout (no docker-level bypass).
|
||||
- **Bounded** — the four targeted robustness fixes; not a recipe rewrite. Bigger recipe improvements
|
||||
→ upstream issues / IDEAS, not this PR.
|
||||
Reference in New Issue
Block a user