5.2 KiB
Sub-plan — lasuite-drive recipe robustness PR (fix the root cause upstream)
Status: QUEUED — a recipe-maintainer PR to the lasuite-drive recipe (we maintain it). Picks up
after the Q3.2 lasuite-drive test work settles. Complements — and largely obsoletes — the
CI-side workarounds the harness currently uses for lasuite-drive's fragility.
Owner: Builder + Adversary loops. This file: /srv/cc-ci/cc-ci-plan/plan-lasuite-drive-recipe-pr.md
Relationship: this is the recipe-side deliverable. The cc-ci harness-side OIDC-at-install
work is plan-lasuite-drive-oidc-robustness.md Part A; this sub-plan is its Part B, broken out.
0. Why (CI surfaced real recipe bugs — fix them at the source)
cc-ci has surfaced genuine fragility in the lasuite-drive recipe that a real operator would also hit, currently papered over by CI-side workarounds:
- Install-time: backend comes up before collabora's WOPI discovery is ready → transient
WOPI-404 + a gunicorn-perms startup race. The flaky 12-service
--chaosOIDC redeploy. - Upgrade-time (F2-12): upgrading to the heavier new collabora (25.04.9.4.1) does not converge
within abra's monitor window → abra FATAs. The harness currently works around this by skipping
abra's convergence monitor (
-c) and using its own collabora WOPI-200READY_PROBE.
These are recipe defects. Fixing them upstream helps every lasuite-drive operator and lets cc-ci
go back to abra's native convergence (per the guardrail "prefer abra convergence; custom probe
only when necessary") — turning the harness -c/READY_PROBE from a necessity into a backstop.
1. The fixes (lasuite-drive recipe)
- Collabora healthcheck + start_period (the keystone). Add a real Docker healthcheck to the
collabora service — WOPI discovery endpoint returns 200 — with a
start_periodgenerous enough for the heavy 25.04 image to boot. Effect: (a) swarm/abra see collabora as unhealthy until WOPI is actually up, so abra's own convergence monitor waits correctly (fixes F2-12 at the source — no-cskip needed); (b) the install-time WOPI-404 window closes because dependents can gate on collabora health. - Backend tolerates / waits for collabora WOPI. Make backend retry WOPI discovery with backoff (and/or order it behind collabora health) instead of failing on the transient 404.
- Fix the gunicorn-perms startup race. Set the volume permissions in the backend entrypoint (or an init step) before exec'ing gunicorn, so there's no read/write race on a freshly-mounted volume at startup.
- Lazy / retrying OIDC discovery. Backend resolves the OIDC provider at first login with retry, not eagerly at boot — so the app boots cleanly with OIDC env set even if the provider isn't reachable yet. (This is also what the harness-side OIDC-at-install pattern relies on, and what keeps the generic-first invariant safe.)
2. Mechanics — branch, PR, and the merge rule
- Make the change on a lasuite-drive recipe branch and open a PR via the
recipe-create-prskill (/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md) — mirror togit.autonomic.zone/recipe-maintainers/lasuite-driveas needed; upstream isgit.coopcloud.tech. - The PR is "working" ONLY when cc-ci verifies it green (operator rule): trigger cc-ci
(
!testmeon the lasuite-drive PR) and require the full suite incl. the UPGRADE tier to pass repeatedly-green (e.g. 3 consecutive passes, not a one-off), Adversary cold-verified. Only then does the operator merge. This dogfoods cc-ci: the CI that found the bugs gates the fix.- SCOPE (operator, 2026-05-29): this repeated-green / 3× bar is specific to lasuite-drive
because it was demonstrably FLAKY — it's a flakiness proof (show the fix made it reliably
green, not green-by-luck-once). It is NOT the general testing standard. Normal recipe gates
remain one Adversary cold-verified green (
plan.md §6.1); do not generalize 3× to other recipes/gates.
- SCOPE (operator, 2026-05-29): this repeated-green / 3× bar is specific to lasuite-drive
because it was demonstrably FLAKY — it's a flakiness proof (show the fix made it reliably
green, not green-by-luck-once). It is NOT the general testing standard. Normal recipe gates
remain one Adversary cold-verified green (
3. Definition of done
- Recipe branch with fixes #1–#4; PR opened (recipe-create-pr).
- cc-ci runs the full suite (install + upgrade + backup + restore + custom/OIDC) on the PR, repeatedly green, Adversary cold-verified.
- Root-cause proof: with the collabora healthcheck in place, demonstrate the upgrade tier
passes under abra's NATIVE convergence (i.e. drop
-cfor lasuite-drive and it still converges + stays green) — confirming the recipe fix resolved F2-12 at the source. If it still needs the harness backstop, say so honestly (record why). - Operator merges the recipe PR. Then: cc-ci can revert the lasuite-drive
-c/READY_PROBE workaround to abra-native convergence (per the guardrail), and close the lasuite-drive flaky items.
4. Guardrails
- Don't weaken any test to make the PR pass — the fixes must make the recipe genuinely robust, proven by repeated-green cc-ci runs, not by loosening assertions.
- Real abra path throughout (no docker-level bypass).
- Bounded — the four targeted robustness fixes; not a recipe rewrite. Bigger recipe improvements → upstream issues / IDEAS, not this PR.