claim(2): Q3.2 re-claim — F2-12 fixed (own convergence wait + READY_PROBE; upgrade 3x green; P7-negative unit-proven)

lasuite-drive full lifecycle 3x repeat-green (logs ccci-drive-f212-v1/v2/v3): install+upgrade+backup+
restore+custom all pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown, ready-
probe OK (200) twice (post-install + post-upgrade collabora WOPI). F2-12 fix e1147b5: upgrade chaos
redeploy uses abra -c (drop abra's impatient converge monitor that FATA'd while new collabora 25.04.9.4.1
was in healthcheck start_period) + perform_upgrade OWNS a stricter convergence wait (services N/N + app
health + collabora WOPI READY_PROBE) bounded by DEPLOY_TIMEOUT. Non-vacuous proven by 5 P7-negative unit
tests (6506c4a). Gate evidence/HOW/EXPECTED/WHERE in STATUS-2. F2-12 Adversary-owned (left to close).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 12:45:02 +01:00
parent 6506c4ac3a
commit a13d2ae48b

View File

@ -49,13 +49,10 @@ tree must carry:
- **Q5** — Completeness + docs; flip `## DONE`.
## In flight
**Q3.2 lasuite-drive — Adversary FAILed (F2-12); fix `e1147b5` validating (NOT re-claimed yet).**
The Adversary's cold re-run hit the upgrade tier failing: abra's converge monitor FATAs while the
NEW collabora 25.04.9.4.1 healthcheck is still in start_period (my WOPI pre-gate fixed the OLD
collabora; the new one's convergence was still abra-impatient → flaky: 3× green for me, 1× fail cold).
Fix `e1147b5`: upgrade chaos redeploy uses `abra … -c` (no impatient converge monitor) + perform_upgrade
OWNS a stricter convergence wait (services N/N + app health + collabora WOPI READY_PROBE) bounded by
DEPLOY_TIMEOUT. Validating across multiple full runs (`/root/ccci-drive-f212-v1.log` …) before re-claim.
**Q3.2 lasuite-drive — RE-CLAIMED (Gate: Q3.2 below) after the F2-12 fix; awaiting Adversary.**
F2-12 fixed by `e1147b5` (abra `-c` + own convergence wait + collabora READY_PROBE) + `6506c4a`
(P7-negative unit tests). 3× repeat-green (v1/v2/v3), upgrade tier passes, deploy-count=1, clean
teardown. F2-12 stays Adversary-owned (for them to close on cold-verify).
**cryptpad F2-9 — RESOLVED (test landed `05d0dc1`, 3/3 green; Adversary to close).** New
`tests/cryptpad/playwright/test_pad_content_roundtrip.py` does the §4.3 create-pad → type → FRESH
@ -118,12 +115,15 @@ SKIP no longer yields a GREEN `!testme`.
## Gate
**Gate: Q3.2 lasuite-drive — CLAIMED @2026-05-29, awaiting Adversary.**
**Gate: Q3.2 lasuite-drive — RE-CLAIMED @2026-05-29 (after F2-12 fix), awaiting Adversary.**
(First claim `911680f` FAILed cold-verify — F2-12: the upgrade chaos redeploy's abra converge monitor
FATA'd while the NEW collabora 25.04.9.4.1 was still in its healthcheck `start_period`. Fixed by
`e1147b5`; re-validated 3× green. F2-12 is Adversary-owned — left for the Adversary to close.)
**WHAT.** lasuite-drive (the heaviest Phase-2 stack: 12 services incl. collabora + onlyoffice +
minio/S3 + postgres, OIDC-dependent) now runs its **full lifecycle GREEN, repeatably** — install +
upgrade (prev→PR-head chaos crossover) + backup + restore + custom (health + MinIO round-trip + OIDC
password-grant), via **two fixes**:
password-grant), via **three fixes**:
1. **Install-time OIDC wiring** (commit `a151489`) — the orchestrator provisions the per-run realm on
the live-warm keycloak BEFORE the single `abra app deploy`, and `tests/lasuite-drive/install_steps.sh`
writes the OIDC env + client secret into that one deploy. This **eliminates the flaky post-deploy
@ -131,36 +131,47 @@ password-grant), via **two fixes**:
race; JOURNAL Step 0). New per-recipe `OIDC_AT_INSTALL` meta flag + reusable `_provision_deps()`
helper; legacy post-deploy path unchanged for all other dep recipes (gated on `not oidc_at_install`).
2. **collabora-ready upgrade gate + DEPLOY_TIMEOUT plumbing** (commit `4b38b66`) — `ops.py::pre_upgrade`
waits for collabora WOPI discovery (`/hosting/discovery` on `collabora-<domain>`) → 200 BEFORE the
chaos redeploy, so it no longer SIGTERMs a still-booting collabora (which caused exit 70 / "FATA
deploy failed" in run 1); `DEPLOY_TIMEOUT` now threads to the upgrade `chaos_redeploy` (was abra's
900s default vs the .env internal TIMEOUT 1500s).
waits for collabora WOPI discovery → 200 BEFORE the chaos redeploy, so it no longer SIGTERMs a
still-booting OLD collabora; `DEPLOY_TIMEOUT` threads to the upgrade `chaos_redeploy`.
3. **F2-12 fix — own the upgrade convergence verification** (commit `e1147b5`). The upgrade chaos
redeploy now runs `abra … -c` (`--no-converge-checks`): abra's own post-deploy monitor — which
FATA'd while the NEW collabora 25.04.9.4.1's healthcheck was still in `start_period` (jail/config
init) — is dropped. `docker stack deploy` still applies the spec; `generic.perform_upgrade` then
OWNS a **stricter** verification with a generous (DEPLOY_TIMEOUT) deadline: `lifecycle.wait_healthy`
(every swarm service N/N + app HEALTH_PATH 200) **then** `lifecycle.wait_ready_probes`
(recipe `READY_PROBE` → collabora WOPI `/hosting/discovery` 200). The new collabora converges
through swarm's healthcheck retries; HC1 (chaos-version == PR-head) + deploy-count=1 preserved.
**Non-vacuous (P7-negative) PROVEN** by `tests/unit/test_f212_upgrade_convergence.py` (commit
`6506c4a`, 5 tests): `wait_ready_probes`/`wait_healthy` RAISE `TimeoutError` on a stuck/never-serving
convergence — so a genuinely broken upgrade stays RED; this is not green-washing abra's skipped check.
**HOW (Adversary, cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && git pull && RECIPE=lasuite-drive PR=0 cc-ci-run runner/run_recipe_ci.py'
ssh cc-ci 'cd /root/<your-clone> && cc-ci-run -m pytest tests/unit/test_f212_upgrade_convergence.py -q'
```
**EXPECTED:**
- RUN SUMMARY: `deploy-count = 1 (expect 1)`; `install/upgrade/backup/restore/custom` **all `pass`**.
- `tests/lasuite-drive/functional/test_oidc_with_keycloak.py::test_oidc_password_grant_against_dep_keycloak`
**PASSED** (NOT skipped) — real password-grant JWT against a per-run realm on warm keycloak.
- `test_minio_storage` PASSED (real S3 upload→list→cat readback round-trip inside the minio container).
- `test_oidc_password_grant_against_dep_keycloak` **PASSED** (NOT skipped) — real password-grant JWT.
- `test_minio_storage` PASSED (real S3 upload→list→cat readback inside the minio container).
- Data-integrity: `test_upgrade_preserves_data` (ci_marker survives prev→PR-head chaos crossover) +
backup/restore ci_marker survive.
- Log shows `install-time OIDC: deps provisioned` + `install_steps: OIDC env wired` (no post-deploy
reconverge) and `pre_upgrade: collabora WOPI discovery ready (200)` before the upgrade redeploy.
reconverge) and **`ready-probe OK (200)` TWICE** (post-install + post-upgrade, collabora WOPI).
- Clean teardown: post-run `docker stack ls | grep lasu` and `docker volume ls | grep lasu` both empty.
- Unit: **5 passed** in `tests/unit/test_f212_upgrade_convergence.py` (the P7-negative proof).
**WHERE.** Commits `a151489` (Part A) + `4b38b66` (upgrade gate). Files: `runner/run_recipe_ci.py`
(`_provision_deps`, `OIDC_AT_INSTALL` branch, `_perform_op` timeout), `runner/harness/lifecycle.py`
(`chaos_redeploy` timeout), `runner/harness/generic.py` (`perform_upgrade` timeout),
`tests/lasuite-drive/{install_steps.sh,setup_custom_tests.sh,ops.py,recipe_meta.py}`.
**3× repeat-green** (flakiness gone, not absent-once): `/root/ccci-drive-q32a-r2.log`,
`…-r3.log`, `…-r4.log` — each full-suite green, deploy-count=1, OIDC PASSED, clean teardown
(run 1 `…-r1.log` showed the upgrade-tier failure that `4b38b66` fixed). Step-0 root-cause logs in
JOURNAL-2 (2026-05-29). DEFERRED.md disk-blocker entry CLOSED (host grew to 64G); flaky-OIDC
BACKLOG-2 Q3.2a item now resolved.
**WHERE.** Commits `a151489` (Part A) + `4b38b66` (upgrade gate) + `e1147b5` (F2-12 own-convergence) +
`6506c4a` (P7-negative unit tests). Files: `runner/run_recipe_ci.py` (`_provision_deps`,
`OIDC_AT_INSTALL` branch, `_perform_op` meta+timeout, post-install `wait_ready_probes`),
`runner/harness/abra.py` (`deploy(no_converge_checks)`), `runner/harness/lifecycle.py`
(`chaos_redeploy(no_converge_checks)`, `wait_ready_probes`), `runner/harness/generic.py`
(`perform_upgrade` own-wait), `tests/lasuite-drive/{install_steps.sh,setup_custom_tests.sh,ops.py,recipe_meta.py}`
(`READY_PROBE`), `tests/unit/test_f212_upgrade_convergence.py`.
**3× repeat-green of the F2-12 fix** (flakiness gone, not absent-once): `/root/ccci-drive-f212-v1.log`,
`…-v2.log`, `…-v3.log` — each full-suite green, deploy-count=1, OIDC PASSED, ready-probe OK twice,
clean teardown. Step-0 root-cause logs in JOURNAL-2. DEFERRED.md disk-blocker CLOSED (host 64G).
---