Q3.2a run 1: Part A (install-time OIDC) GREEN — deploy-count=1, install/backup/restore/custom +
OIDC test all PASS. BUT upgrade tier FAILED: the in-place `abra app deploy --chaos` redeploy landed
on a STILL-BOOTING collabora (coolwsd ~2min boot: 1300+ l10n files + RSA keygen) and SIGTERMed it
mid-init ("Shutdown requested while starting up", forced exit 70) → abra aborted the deploy. The
install wait_healthy returns on container 1/1 while coolwsd is still loading. Fixes (plan §C
readiness-gating, no test weakened):
- tests/lasuite-drive/ops.py::pre_upgrade — wait for collabora WOPI discovery (/hosting/discovery
on collabora-<domain>) → 200 BEFORE the chaos redeploy, so it replaces a ready collabora cleanly.
- runner/harness/lifecycle.chaos_redeploy + generic.perform_upgrade + run_recipe_ci._perform_op —
plumb the recipe DEPLOY_TIMEOUT to the upgrade chaos redeploy (was abra.deploy's 900s default,
while the .env internal TIMEOUT is 1500s → Python could SIGKILL abra mid-wait on the slow
collabora/onlyoffice reconverge). Mirrors the install deploy_app timeout plumbing.
Also (operator naming change 2026-05-29): renamed `--extra-tests` -> `--extra` in DEFERRED.md +
BACKLOG-2.md Build-backlog section. 3 refs remain in BACKLOG-2 Adversary-findings section
(241/248/292, closed findings) — left for the Adversary (single-writer); orchestrator updated
IDEAS.md/plan-sso-dep-testing.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Q3.2a / plan-lasuite-drive-oidc-robustness.md Part A. The old setup_custom_tests.sh did a
post-deploy in-place `abra app deploy --force --chaos` of the heavy 12-service stack to apply
the OIDC env — flaky (collabora WOPI-discovery race + gunicorn-perms; JOURNAL Step 0). Since
the OIDC env only affects backend/app and keycloak is live-warm, provision the per-run realm
BEFORE the single deploy and wire OIDC into the .env at install time (no reconverge).
- runner/run_recipe_ci.py: new _provision_deps() helper (warm/cold split + SSO enrich + write
$CCCI_DEPS_FILE), used by both paths. New per-recipe OIDC_AT_INSTALL meta flag (added to
_load_meta whitelist). When set + deps live-warm: provision BEFORE deploy_app; the install
tier's install_steps.sh wires OIDC into the single deploy; post-deploy step runs only the
MinIO bucket one-shot — no re-provision, no redeploy. Legacy post-deploy path unchanged for
all other dep recipes (gated on `not oidc_at_install`).
- tests/lasuite-drive/install_steps.sh (NEW): install-time OIDC env + secret wiring; no-ops on
empty deps file (recipe still boots, OIDC test skips → F2-11 RED).
- tests/lasuite-drive/setup_custom_tests.sh: trimmed to MinIO-bucket-only (OIDC moved out).
- tests/lasuite-drive/recipe_meta.py: OIDC_AT_INSTALL = True.
- JOURNAL-2: Step-0 root-cause failure logs captured before the fix.
NOT a claim — validating 3x green (incl. now-required upgrade tier) before claiming Q3.2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 2pc complete: conservative surgical gated prune (ci-docker-prune) live + reproducible from
git, local Docker store retained as the cache (PAT-authenticated, layer reuse proven), registry
pull-through cache deferred to IDEAS. Adversary review(2pc) 486d162 PASS @2026-05-29. Watchdog
auto-returns to Phase 2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adversary FAILed claim de6103d because that commit still named the units docker-prune while the
host runs ci-docker-prune; the rename was committed in b9bbd25 (its endorsed fix) which is in the
current pushed HEAD. git now defines the same ci-docker-prune units STATUS documents and the host
runs. Behavior was already cold-verified GREEN. Inert NixOS-builtin docker-prune.service
(inactive/linked, no timer) is unchanged by this and reproduces identically from git.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Removes virtualisation.docker.autoPrune (daily `docker system prune --all` evicted in-use base
images → cold re-pull → Hub rate-limit churn, JOURNAL-2). Adds modules/docker-prune.nix: daily
timer + oneshot that prunes only dangling+until=24h, gated on disk pressure (>=80%) AND no run-app
live AND no swarm service converging; never --all, never --volumes. Teardown unchanged (never
removes images). Registry pull-through cache dropped per operator scope correction.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## DONE written to STATUS-2w. Adversary authorized (REVIEW-2w 2822d60: all gates
cold-verified, no veto, no open findings). Final state healthy: keycloak+traefik
200, custom-html canonical idle@1.11.0+1.29.0, nightly-sweep timer active, system
running 0 failed, disk 50%. Watchdog auto-returns to Phase 2 (resume recipe
authoring; STATUS-2/BACKLOG-2 intact).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
canonical.enrolled_recipes; runner/nightly_sweep.py (roll keycloak+traefik →
serial full-cold over enrolled on latest → green promotes; skip if test active;
operate against CCCI_REPO checkout for tests/); nix/modules/nightly-sweep.nix
(timer 03:00 Persistent + oneshot service) wired in. 2 bugs fixed via live
service run (repo-relative enrolled scan; util-linux for backup PTY). Live
SERVICE sweep: enrolled=['custom-html'] → all tiers green → canonical advanced
1.10.0→1.11.0; red-run correctly does NOT promote. 71 unit pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
should_promote_canonical (enrolled+green+cold+latest) + promote_canonical
(re-seed canonical at green-verified latest, snapshot+registry, old known-good
replaced only on green). +5 unit (70 pass). Live: custom-html canonical advanced
1.10.0+1.28.0 → 1.11.0+1.29.0 via a full green cold run; snapshot refreshed; idle;
per-run app torn down. WC6 nightly sweep next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>