feat(2): Q3.2 lasuite-drive base enrollment + nested-subdomain + replicas:0 harness fixes
- harness: services_converged treats replicas:0 one-shot (minio-createbuckets) as converged (cur==want); removes the want==0 rejection that hung deploys. DECISIONS.md. - recipe_meta.EXTRA_ENV flattens MINIO_DOMAIN/COLLABORA_DOMAIN to single-label wildcard siblings (the *.ci.commoninternet.net cert covers one label only). DECISIONS.md. - lifecycle overlays (install/upgrade/backup/restore) + ops.py postgres ci_marker data-integrity (db user/name=drive). Parity health_check functional test. PARITY.md. - DEPS=[keycloak] + OIDC/WOPI/upload functional tests deferred to the SSO iteration (probe-before-assert: prove the ~10-service base deploy converges first). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -480,3 +480,44 @@ SPA's main menu via a stable accessibility tree (role-based selectors instead of
|
||||
Adversary may file F2-N requesting full create-pad coverage; the answer above is the
|
||||
honest technical reason + the maximal subset. Logged here per plan §7.1.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — nested DOMAIN-derived subdomains flattened to single-label wildcard siblings
|
||||
|
||||
**Decision (settled):** When an enrolled recipe routes additional services on **nested subdomains
|
||||
derived from `DOMAIN`** (e.g. lasuite-drive `MINIO_DOMAIN="minio.${DOMAIN}"` +
|
||||
`COLLABORA_DOMAIN="collabora.${DOMAIN}"`; lasuite-meet `LIVEKIT_DOMAIN="livekit.${DOMAIN}"`), the
|
||||
recipe's `recipe_meta.EXTRA_ENV(domain)` MUST override those vars to a **single-label sibling under
|
||||
the wildcard** — `minio-<domain>`, `collabora-<domain>`, `livekit-<domain>` — NOT the recipe's
|
||||
default `<svc>.<domain>`.
|
||||
|
||||
**Why:** cc-ci's TLS cert is the operator's pre-issued wildcard `*.ci.commoninternet.net` (+ bare
|
||||
`ci.commoninternet.net`) — §4.0/§1.5, renewed out-of-band, no ACME. A wildcard matches exactly **one**
|
||||
label. The per-run app domain is already one label (`lasuite-drive-pr<n>-<sha>.ci.commoninternet.net`),
|
||||
so a nested `minio.lasuite-drive-pr<n>-<sha>.ci.commoninternet.net` is a **2-label** name the wildcard
|
||||
does NOT cover → Traefik would serve an invalid cert on that router and the service is unreachable
|
||||
over HTTPS. Re-prefixing with a hyphen keeps it one label (`minio-lasuite-drive-pr<n>-<sha>` +
|
||||
`.ci.commoninternet.net`), covered by the same wildcard, routed by Traefik's swarm provider with **no
|
||||
cert work and no gateway change** (the gateway already passes the whole wildcard, §4.0). We must NOT
|
||||
mint per-host certs / ACME for these (class-A1 boundary, §9).
|
||||
|
||||
**Scope:** purely a per-recipe `EXTRA_ENV` concern (no shared-harness change). Recipes with no
|
||||
DOMAIN-derived nested subdomains (most) are unaffected.
|
||||
|
||||
## Phase 2 — `services_converged` treats a `replicas: 0` one-shot as converged
|
||||
|
||||
**Decision (settled):** `runner/harness/lifecycle.py::services_converged` now considers a service
|
||||
converged when `cur == want` (desired replica count met), removing the prior
|
||||
`or want == "0"` rejection.
|
||||
|
||||
**Why:** lasuite-drive's `minio-createbuckets` is declared `deploy: {mode: replicated, replicas: 0,
|
||||
restart_policy: {condition: none}}` — an **on-demand one-shot** (scaled up manually only when buckets
|
||||
need (re)creating; it `mc mb …` then `exit 0`). `docker stack services` reports it `0/0`. The old
|
||||
check rejected any `want == "0"` row, so the stack could **never** report converged → every deploy
|
||||
hung until `deploy_timeout`. A service AT its desired count (including 0/0) is converged; a service
|
||||
still spinning up shows `0/1` (`cur != want`) and is correctly not-yet-converged, so the HTTP
|
||||
readiness wait still gates real liveness. Safe for all currently-green recipes (their services are
|
||||
all N/N with N>0; the `0/0` case did not previously occur). Buckets/migrations that the one-shot
|
||||
performs are run on-demand in the recipe's `setup_custom_tests.sh` (post-deploy), not relied upon for
|
||||
generic-install convergence (the SPA at `/` serves 200 without them).
|
||||
|
||||
Reference in New Issue
Block a user