Files
cc-ci-orchestrator/cc-ci-plan/plan-lasuite-drive-recipe-pr.md

5.2 KiB
Raw Blame History

Sub-plan — lasuite-drive recipe robustness PR (fix the root cause upstream)

Status: QUEUED — a recipe-maintainer PR to the lasuite-drive recipe (we maintain it). Picks up after the Q3.2 lasuite-drive test work settles. Complements — and largely obsoletes — the CI-side workarounds the harness currently uses for lasuite-drive's fragility. Owner: Builder + Adversary loops. This file: /srv/cc-ci/cc-ci-plan/plan-lasuite-drive-recipe-pr.md Relationship: this is the recipe-side deliverable. The cc-ci harness-side OIDC-at-install work is plan-lasuite-drive-oidc-robustness.md Part A; this sub-plan is its Part B, broken out.


0. Why (CI surfaced real recipe bugs — fix them at the source)

cc-ci has surfaced genuine fragility in the lasuite-drive recipe that a real operator would also hit, currently papered over by CI-side workarounds:

  • Install-time: backend comes up before collabora's WOPI discovery is ready → transient WOPI-404 + a gunicorn-perms startup race. The flaky 12-service --chaos OIDC redeploy.
  • Upgrade-time (F2-12): upgrading to the heavier new collabora (25.04.9.4.1) does not converge within abra's monitor window → abra FATAs. The harness currently works around this by skipping abra's convergence monitor (-c) and using its own collabora WOPI-200 READY_PROBE.

These are recipe defects. Fixing them upstream helps every lasuite-drive operator and lets cc-ci go back to abra's native convergence (per the guardrail "prefer abra convergence; custom probe only when necessary") — turning the harness -c/READY_PROBE from a necessity into a backstop.

1. The fixes (lasuite-drive recipe)

  1. Collabora healthcheck + start_period (the keystone). Add a real Docker healthcheck to the collabora service — WOPI discovery endpoint returns 200 — with a start_period generous enough for the heavy 25.04 image to boot. Effect: (a) swarm/abra see collabora as unhealthy until WOPI is actually up, so abra's own convergence monitor waits correctly (fixes F2-12 at the source — no -c skip needed); (b) the install-time WOPI-404 window closes because dependents can gate on collabora health.
  2. Backend tolerates / waits for collabora WOPI. Make backend retry WOPI discovery with backoff (and/or order it behind collabora health) instead of failing on the transient 404.
  3. Fix the gunicorn-perms startup race. Set the volume permissions in the backend entrypoint (or an init step) before exec'ing gunicorn, so there's no read/write race on a freshly-mounted volume at startup.
  4. Lazy / retrying OIDC discovery. Backend resolves the OIDC provider at first login with retry, not eagerly at boot — so the app boots cleanly with OIDC env set even if the provider isn't reachable yet. (This is also what the harness-side OIDC-at-install pattern relies on, and what keeps the generic-first invariant safe.)

2. Mechanics — branch, PR, and the merge rule

  • Make the change on a lasuite-drive recipe branch and open a PR via the recipe-create-pr skill (/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md) — mirror to git.autonomic.zone/recipe-maintainers/lasuite-drive as needed; upstream is git.coopcloud.tech.
  • The PR is "working" ONLY when cc-ci verifies it green (operator rule): trigger cc-ci (!testme on the lasuite-drive PR) and require the full suite incl. the UPGRADE tier to pass repeatedly-green (e.g. 3 consecutive passes, not a one-off), Adversary cold-verified. Only then does the operator merge. This dogfoods cc-ci: the CI that found the bugs gates the fix.
    • SCOPE (operator, 2026-05-29): this repeated-green / 3× bar is specific to lasuite-drive because it was demonstrably FLAKY — it's a flakiness proof (show the fix made it reliably green, not green-by-luck-once). It is NOT the general testing standard. Normal recipe gates remain one Adversary cold-verified green (plan.md §6.1); do not generalize 3× to other recipes/gates.

3. Definition of done

  • Recipe branch with fixes #1#4; PR opened (recipe-create-pr).
  • cc-ci runs the full suite (install + upgrade + backup + restore + custom/OIDC) on the PR, repeatedly green, Adversary cold-verified.
  • Root-cause proof: with the collabora healthcheck in place, demonstrate the upgrade tier passes under abra's NATIVE convergence (i.e. drop -c for lasuite-drive and it still converges + stays green) — confirming the recipe fix resolved F2-12 at the source. If it still needs the harness backstop, say so honestly (record why).
  • Operator merges the recipe PR. Then: cc-ci can revert the lasuite-drive -c/READY_PROBE workaround to abra-native convergence (per the guardrail), and close the lasuite-drive flaky items.

4. Guardrails

  • Don't weaken any test to make the PR pass — the fixes must make the recipe genuinely robust, proven by repeated-green cc-ci runs, not by loosening assertions.
  • Real abra path throughout (no docker-level bypass).
  • Bounded — the four targeted robustness fixes; not a recipe rewrite. Bigger recipe improvements → upstream issues / IDEAS, not this PR.