Commit Graph

5 Commits

Author SHA1 Message Date
e1147b5fe3 fix(2): F2-12 lasuite-drive upgrade tier — own convergence wait (abra -c) + collabora READY_PROBE
Adversary cold-verify FAILed Q3.2 (F2-12): the prev→PR-head chaos upgrade's abra converge monitor
FATAs while the NEW collabora 25.04.9.4.1's healthcheck is still in start_period (jail/config init),
even though it converges given swarm's healthcheck retries. My WOPI pre-gate fixed the OLD collabora
being killed mid-boot but not the NEW collabora's convergence. Flaky (3x green for me, 1x fail cold).

Fix (cc-ci-side, stronger verification — not weaker):
- abra.deploy gains no_converge_checks (`-c`); chaos_redeploy passes it for the upgrade op so abra's
  impatient monitor no longer FATAs (the stack spec is applied regardless).
- perform_upgrade now OWNS the convergence verification after the redeploy: wait_healthy (services
  N/N + app HEALTH_PATH) + new lifecycle.wait_ready_probes (recipe READY_PROBE), bounded by the
  recipe DEPLOY_TIMEOUT (generous) not abra's impatient window. meta threaded _perform_op→perform_upgrade.
- recipe_meta READY_PROBE hook (added to _load_meta whitelist): lasuite-drive probes collabora WOPI
  discovery (/hosting/discovery on collabora-<domain>) → 200. Called after install deploy AND after
  the upgrade redeploy. No-op for recipes without a READY_PROBE.

NOT re-claiming yet — validating the upgrade tier is now reliably green (incl. the slow-collabora
crossover) across multiple runs before re-claiming Q3.2. F2-12 stays open (Adversary-owned).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:55:53 +01:00
a151489996 feat(2): lasuite-drive Q3.2a Part A — wire OIDC at INSTALL, eliminate flaky redeploy
Q3.2a / plan-lasuite-drive-oidc-robustness.md Part A. The old setup_custom_tests.sh did a
post-deploy in-place `abra app deploy --force --chaos` of the heavy 12-service stack to apply
the OIDC env — flaky (collabora WOPI-discovery race + gunicorn-perms; JOURNAL Step 0). Since
the OIDC env only affects backend/app and keycloak is live-warm, provision the per-run realm
BEFORE the single deploy and wire OIDC into the .env at install time (no reconverge).

- runner/run_recipe_ci.py: new _provision_deps() helper (warm/cold split + SSO enrich + write
  $CCCI_DEPS_FILE), used by both paths. New per-recipe OIDC_AT_INSTALL meta flag (added to
  _load_meta whitelist). When set + deps live-warm: provision BEFORE deploy_app; the install
  tier's install_steps.sh wires OIDC into the single deploy; post-deploy step runs only the
  MinIO bucket one-shot — no re-provision, no redeploy. Legacy post-deploy path unchanged for
  all other dep recipes (gated on `not oidc_at_install`).
- tests/lasuite-drive/install_steps.sh (NEW): install-time OIDC env + secret wiring; no-ops on
  empty deps file (recipe still boots, OIDC test skips → F2-11 RED).
- tests/lasuite-drive/setup_custom_tests.sh: trimmed to MinIO-bucket-only (OIDC moved out).
- tests/lasuite-drive/recipe_meta.py: OIDC_AT_INSTALL = True.
- JOURNAL-2: Step-0 root-cause failure logs captured before the fix.

NOT a claim — validating 3x green (incl. now-required upgrade tier) before claiming Q3.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:10:05 +01:00
6557197858 feat(2): Q3.2 lasuite-drive SSO iteration — keycloak dep + OIDC test + MinIO storage round-trip
- recipe_meta: DEPS=[keycloak] enabled (base proven cold-green).
- setup_custom_tests.sh: wire OIDC env (explicit keycloak realm endpoints) + insert oidc_rpcs
  secret at bumped version + clear FranceConnect eidas1 acr + in-place redeploy (adapted from
  the proven lasuite-docs hook).
- functional/test_oidc_with_keycloak.py: SSO discovery + password grant + JWT claims vs dep
  keycloak realm 'lasuite-drive' (@requires_deps; F2-11 fails run on skip).
- functional/test_minio_storage.py: §4.3 specific — drive-media-storage bucket present + real
  upload->list->download round-trip via mc inside the minio container.
- PARITY.md: OIDC + MinIO rows landed; backup data-integrity (ci_marker) already real.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 22:28:35 +01:00
1138d77cbb blocked(2): Q3.2 drive base-deploy hits Docker Hub rate limit + Gitea outage
- recipe_meta: bump drive abra TIMEOUT 900->1500, DEPLOY_TIMEOUT 1200->1800 (12-svc
  stack w/ onlyoffice+collabora; cold pulls need a wide window).
- STATUS-2 ## Blocked: two Class-A1 external blocks documented w/ verify commands —
  (1) Docker Hub anon pull rate limit (registry-creds finding per plan §1.5; blocks all
  new deploys), (2) Gitea git.autonomic.zone 404 outage (coordination down; 2 watchdog
  pings unconsumable until recovery). JOURNAL-2: full disk->prune->rate-limit chain.
- Queued locally; push + Adversary-inbox processing deferred to Gitea recovery.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 20:48:52 +01:00
f59d8e6996 feat(2): Q3.2 lasuite-drive base enrollment + nested-subdomain + replicas:0 harness fixes
- harness: services_converged treats replicas:0 one-shot (minio-createbuckets) as
  converged (cur==want); removes the want==0 rejection that hung deploys. DECISIONS.md.
- recipe_meta.EXTRA_ENV flattens MINIO_DOMAIN/COLLABORA_DOMAIN to single-label wildcard
  siblings (the *.ci.commoninternet.net cert covers one label only). DECISIONS.md.
- lifecycle overlays (install/upgrade/backup/restore) + ops.py postgres ci_marker
  data-integrity (db user/name=drive). Parity health_check functional test. PARITY.md.
- DEPS=[keycloak] + OIDC/WOPI/upload functional tests deferred to the SSO iteration
  (probe-before-assert: prove the ~10-service base deploy converges first).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 19:54:31 +01:00