decisions/deferred(2): lasuite-drive upgrade tier = disk env-blocker (28GB host, dual multi-GB office image crossover); maximal subset in flight; operator disk-resize escalation; adversary heads-up
This commit is contained in:
@ -635,3 +635,46 @@ Q4.10 drone (specifics only), + deferral lift cryptpad create-pad (F2-9, must li
|
||||
(log `/root/ccci-resume-lasuite-drive.log`) — lasuite-drive suite (health parity + real MinIO S3
|
||||
upload/list/download round-trip + OIDC password-grant JWT-claims against dep keycloak) is fully
|
||||
authored; driving it to its first verified-green full run (the Q3.2 acceptance evidence).
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-29 — lasuite-drive full e2e: upgrade tier hits a DISK-SIZE env blocker (host health emergency handled)
|
||||
|
||||
Drove lasuite-drive (heaviest §5 recipe — BOTH office backends) toward its first verified-green full
|
||||
run. install tier PASSED (generic test_serving + cc-ci test_serving_and_frontend; all 12 services
|
||||
converged after collabora won its startup race — see below). backup tier PASSED. Then the **upgrade
|
||||
tier FAILED** and disk hit **99% (522M free)**, risking a host wedge.
|
||||
|
||||
**Root cause (definitive, from the abra DEPLOY OVERVIEW in the log):** the prev→PR-head upgrade
|
||||
crosses *two different multi-GB office image versions simultaneously*:
|
||||
- onlyoffice/documentserver-de: 9.2 → **9.3.1.2** (3.94GB image)
|
||||
- collabora/code: 25.04.9.1.1 → 25.04.9.4.1 (~1GB)
|
||||
- (+ small drive-backend/frontend v0.12.0→v0.18.0, redis, nginx)
|
||||
abra's in-place chaos rolling update must hold BOTH the running prev office images AND pull the new
|
||||
ones before swapping — ~10GB of office images transiently. The 28GB host has only ~14GB docker
|
||||
headroom over the ~13GB baseline (nix store ~9.6GB + infra images ~1.75GB), so the PR-head pull
|
||||
overflowed. **No harness mitigation exists:** the prev images are *running* (not dangling) when the
|
||||
new must be pulled, and you cannot `docker rmi` a running image; a pre-upgrade prune finds nothing
|
||||
dangling. It is fundamentally a disk-SIZE constraint, driven by the recipe legitimately bumping office
|
||||
image tags across releases. Not a test-quality issue and not weakenable.
|
||||
|
||||
**collabora startup race (separate, self-resolving):** collabora/code logs
|
||||
`/usr/bin/coolmount: Operation not permitted` (CapAdd=[] + default seccomp blocks mount()), falls back
|
||||
to slow file-COPYING into its jail; the healthcheck killed an early task (exit 137) but a later task
|
||||
finished the copy and reached 1/1. So collabora converges, just flaps once or twice first. Not the
|
||||
blocker; noting in case it recurs on slower disk.
|
||||
|
||||
**Emergency handled — host fully restored:** killed the run (`pkill -f run_recipe_ci.py`), removed the
|
||||
orphaned `lasu-7ea5e3` stack + its volumes (minio, postgres) + 8 leftover secrets (the killed run's
|
||||
teardown never ran), pruned dangling images. Disk recovered 99% → 37% (17GB free). Infra stacks
|
||||
(traefik/drone/dashboard/bridge/backups/warm-keycloak) untouched and healthy throughout.
|
||||
|
||||
**Decision:** the upgrade tier for lasuite-drive (and very likely other heavy recipes: lasuite-docs
|
||||
also ships collabora; immich ships multi-GB ML images; lasuite-meet) is a genuine **Class A1 env-level
|
||||
disk blocker** — the clean fix is a larger host disk (operator). Filed in DEFERRED.md + DECISIONS.md +
|
||||
BACKLOG-2; flagged to operator (PushNotification) and Adversary (inbox). Meanwhile banking the
|
||||
**maximal testable subset** (install+backup+restore+custom — single version, fits disk) to prove
|
||||
lasuite-drive's actual Q3.2 CONTENT works: parity health, the real MinIO S3 upload→list→download
|
||||
round-trip, and the OIDC password-grant + JWT-claims flow against the dep keycloak. Per §7.1 the
|
||||
maximal subset is implemented and only the genuinely-disk-blocked upgrade tier is outstanding —
|
||||
pending Adversary sign-off on the env-blocker.
|
||||
|
||||
Reference in New Issue
Block a user