Commit Graph

432 Commits

Author SHA1 Message Date
f7ed2d967c review(2): cryptpad F2-9 + F2-13 CLOSED — re-verify after fix b44d75b (poll-all-frames). create-pad roundtrip test_cryptpad_pad_content_survives_fresh_session PASSED (46s, was 340s timeout), all 5 tiers green, deploy-count=1, clean teardown. Fix non-vacuous (still asserts marker surfaces in fresh context = server-side encrypted persistence). §4.3 create-pad floor demonstrated; conditional sign-off satisfied 2026-05-29 15:37:12 +01:00
62ac9b59e0 journal/status(2): F2-13 cryptpad read-back robustness FIXED (b44d75b, poll-all-frames) — 3x green vs cold probe; awaiting Adversary re-verify/F2-9 close 2026-05-29 15:26:25 +01:00
82dc2d733d feat(2): immich §4.3 asset upload→read-back→thumbnail test + PARITY
test_asset_upload.py: admin-sign-up → login → POST /api/assets (multipart, unique content → 201) →
GET /api/assets/{id} (200, IMAGE, read-back) → GET .../thumbnail (200, derivative generated, polled).
Verified GREEN against a live immich probe (app v2.7.5). PARITY: health_check port; oidc_login non-port
(authentik-specific, immich OIDC optional, keycloak-default policy). §4.3 floor + characteristic
derivative-generation feature met.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 15:13:11 +01:00
b44d75b89c fix(2): F2-13 cryptpad roundtrip read-back robustness — poll all frames for marker
Adversary cold-verify of F2-9 FAILED: the read-back's CKEditor-frame-attach wait timed out on a fresh
cold context (flaky, not 3x-reliable). Fix: read-back now polls EVERY frame's body text for the marker
(don't require the specific ckeditor-inner frame to attach — that's the flaky part) with a generous
~240s deadline + periodic reloads to unstick cold loads. The marker appearing in a fresh context still
proves server-side E2E-encrypted persistence (only URL+fragment key carried over). Also bumped the
session-1 post-type sync wait 9s→12s. F2-13 Adversary-owned; will validate cold before it closes F2-9.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 15:08:52 +01:00
1cbb1ccd73 review(2): cryptpad F2-9 NOT closed — create-pad roundtrip read-back leg FAILED on cold-verify (CKEditor frame never attached on fresh context, line 133; 1 failed in 340s) → test is flaky not 3x-reliable. Filed F2-13: make read-back robust before F2-9 closes. install/upgrade/backup/restore pass, only the §4.3-floor pad-persist test red; teardown clean. NOT a VETO (F2-9 was conditional/open) 2026-05-29 15:05:22 +01:00
754f508231 review(2): record forward-looking Adversary criteria for pre-pull harness unit (plan-prepull-images.md) — verify warm-cache no-redownload + bad-tag=clear-pull-error-pre-deploy + abra stays real/unchanged + honest scope (pull-time not init-time; F2-12 init races still need healthcheck) 2026-05-29 14:58:38 +01:00
f8af5b2307 backlog(2): HQ1 — image pre-pull harness unit (plan-prepull-images.md), near-term; fixes the first-deploy 'No such image' race 2026-05-29 14:56:18 +01:00
d4eae4ee49 fix(2): set time.timeZone=UTC on cc-ci → create /etc/localtime (immich bind-mount)
immich's compose bind-mounts the host /etc/localtime into the app container; NixOS without a set
timezone leaves /etc/localtime absent → 'bind source path does not exist: /etc/localtime' → app
service rejected (never converges). time.timeZone=UTC creates /etc/localtime (UTC = deterministic CI
timestamps). Nix-declared, reversible; helps any recipe binding /etc/localtime.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 14:51:33 +01:00
b0f1e0b0ad status(2): Q3.3 lasuite-meet Adversary PASS (a46f7d4); immich Q3.5 validating 2026-05-29 14:44:09 +01:00
98a37d44b5 feat(2): Q3.5 immich enrollment (recipe_meta + ops + lifecycle overlays + health parity)
immich (object-storage/large-volume photo mgmt; D10 category): 3 services (app incl. ML + web, redis,
database/postgres), self-contained (no SSO dep — local admin; OIDC optional). recipe_meta (HTTP health,
DEPLOY_TIMEOUT=1500), ops.py postgres ci_marker (postgres/immich, backupbot-labelled), lifecycle
overlays, health_check parity. §4.3 upload-asset→list→thumbnail test next (after live-API discovery).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 14:40:57 +01:00
a46f7d4593 review(2): PASS gate Q3.3 lasuite-meet (claim 5af513e/code 1f7806a) — cold-verify all 5 tiers GREEN, deploy-count=1, real upgrade crossover 0.2.0+v1.15.0->0.3.0+v1.16.0, meeting_flow (room create->read-back->LiveKit video-grant JWT->delete) PASSED, OIDC PASSED not-skipped, ci_marker survives, teardown clean+realm reaped. WebRTC media-relay non-port: ADVERSARY SIGN-OFF (genuine UDP env-blocker, maximal subset=LiveKit token issuance shipped) 2026-05-29 14:40:15 +01:00
5af513e2c8 claim(2): Q3.3 lasuite-meet — full lifecycle green (meeting_flow §4.3 + OIDC; R014 chaos-base; webrtc env-blocker non-port)
lasuite-meet full suite GREEN (log /root/ccci-meet-full6.log): install/upgrade/backup/restore/custom
all pass, deploy-count=1, clean teardown, real upgrade crossover 0.2.0+v1.15.0→0.3.0+v1.16.0.
- §4.3 test_meeting_flow: create-room (201) → read-back (200) → LiveKit join token (JWT room grant) →
  delete. test_oidc_password_grant PASSED. Parity: health_check + oidc_login. Reused lasuite-drive
  OIDC-at-install machinery.
- R014 fix (72719fe): upstream lightweight tag → chaos-base deploy of the checked-out prev version
  (skips lint, deploys prev not latest — verified by the crossover).
- webrtc-media/relay UDP media-relay = documented env-blocker non-port; maximal subset (LiveKit token
  issuance) shipped in meeting_flow.
Gate evidence/HOW/EXPECTED/WHERE in STATUS-2. DECISIONS: R014 chaos-base + webrtc non-port. BACKLOG-2
[idea]: harness image pre-pull. Single cold-verified green is the bar (operator clarification).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 14:33:31 +01:00
1f7806a9c4 fix(2): lasuite-meet meeting_flow — tolerant best-effort delete-verify (meet 0.3.0 soft-deletes)
Full suite #5: install/upgrade/backup/restore + OIDC + create-room/read-back/LiveKit-token ALL pass
(R014 chaos-base fix validated: upgrade crossover real 0.2.0→0.3.0). Only the final 404-after-DELETE
assert failed — meet 0.3.0+v1.16.0 soft/async-deletes (DELETE 2xx, re-GET still 200). The §4.3 floor
(create+read-back+LiveKit token) stays HARD-asserted; delete-gone is now a best-effort poll (not a
§4.3 requirement). PARITY.md noted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 14:24:21 +01:00
72719fe0d7 fix(2): R014 — chaos base deploy for recipes with lightweight tags (replaces fragile origin-repoint)
The origin-repoint approach hit go-git 'reference not found' (mirror HEAD→master vs main). Simpler +
robust: detect lightweight version tags (has_lightweight_version_tags, read-only) and, for the pinned
base deploy of such a recipe, use chaos — which SKIPS abra lint (so no R014 FATA) and deploys the
EXPLICITLY-checked-out pinned version (recipe_checkout already ran; chaos uses the current checkout,
so it's the prev version, NOT LATEST — F1d-2's hazard was the missing checkout). No-op / stays pinned
for all-annotated recipes. The upgrade tier's prev→PR-head crossover + HC1 (chaos-version==head_ref)
still hold (verified by the run's upgrade-tier log).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 14:15:07 +01:00
ad06a5dd3f fix(2): R014 normalize — use git clone --mirror (not --bare) so abra's later fetches find refs/heads/main
--bare lacked refs/heads/main, so abra's post-normalize git ops (app secret insert / deploy) failed
'unable to fetch tags: reference not found' when fetching from the repointed local origin. --mirror
copies all refs (heads+tags) → abra fetch OK + R014 passes (both verified).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 14:05:26 +01:00
da44e2ca8a fix(2): R014 normalize — repoint recipe origin to local bare with annotated tag (abra force-fetches tags before lint, reverting in-place re-annotation)
Diagnosed: abra runs git fetch --tags --force from origin before its pinned-deploy lint, so
re-annotating the lightweight tag in place is reverted before R014 runs. Fix: after re-annotating,
clone the recipe to a local bare repo (carrying the annotated tag) and repoint origin at it, so
abra's force-fetch pulls the annotated tag. Verified: abra recipe lint R014 then PASSES and the
annotation sticks. Deployed commit unchanged. No-op for all-annotated recipes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:59:03 +01:00
8c19b1fadc fix(2): normalize lightweight recipe tags to annotated before pinned deploy (R014)
lasuite-meet upgrade tier failed at the prev-version base deploy: abra's pinned-deploy lint FATA'd on
R014 'only annotated tags used for recipe version' because upstream coop-cloud lasuite-meet ships a
stray LIGHTWEIGHT tag (0.3.0+v1.16.0). chaos deploys skip lint (so install,custom passed) but the
upgrade tier's pinned prev-version deploy lints. New abra.normalize_recipe_tags() re-creates each
lightweight version tag as annotated at the SAME commit (no deployed content changes); called in
lifecycle.deploy_app after recipe_checkout when version is pinned. Idempotent; no-op for all-annotated
recipes (lasuite-drive etc.). Helps any recipe with a stray upstream lightweight tag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:48:55 +01:00
9c6cb539ee feat(2): Q3.3 lasuite-meet §4.3 meeting_flow test + PARITY.md
test_meeting_flow.py: OIDC token → POST /api/v1.0/rooms/ (201 + LiveKit token) → GET read-back (200) →
assert LiveKit JWT grants the room → DELETE (204) → verify gone (404). The §4.3 create-an-object+
read-it-back + the distinctive WebRTC-signaling feature (LiveKit token issuance). PARITY.md maps
health_check/oidc_login/meeting_flow ports + documents webrtc-media/relay non-port (UDP media relay =
env-blocker per §7.1; maximal subset = LiveKit token issuance, shipped). install+OIDC already validated
green (/root/ccci-meet-v1.log). Note: first-deploy 'No such image' was a one-time cold-pull race
(images now cached + kept by conservative prune); deploy converges reliably.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:39:32 +01:00
9c9a0059c1 journal(2): record operator clarification — 3x repeat-green is flakiness-specific (lasuite-drive), not the general gate standard (normal = 1 cold-verified green) 2026-05-29 13:25:56 +01:00
c7b36ebb6a review(2): record operator clarification — 3x repeat-green bar is lasuite-drive-recipe-PR ONLY (flakiness proof); normal gates = ONE cold-verified green per §6.1; cryptpad F2-9 needs only 1x 2026-05-29 13:25:46 +01:00
31bda3995d feat(2): Q3.3 lasuite-meet — install_steps (OIDC-at-install) + lifecycle overlays + health/OIDC parity tests
Mirrors lasuite-drive machinery (sibling La Suite recipe): install_steps.sh wires OIDC at install
(client_id from deps, scopes 'openid email'); ops.py + test_{install,upgrade,backup,restore}.py
lifecycle overlays (postgres meet/meet ci_marker data-integrity); functional/test_health_check.py
(parity) + test_oidc_with_keycloak.py (password-grant JWT vs dep keycloak, realm lasuite-meet-<6hex>).
§4.3 meeting_flow + webrtc specifics next (after install+OIDC validated). No setup_custom_tests.sh
(no post-deploy step — OIDC at install, no minio/collabora).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:22:30 +01:00
32a743f501 feat(2): Q3.3 lasuite-meet recipe_meta — DEPS=keycloak + OIDC_AT_INSTALL + livekit-domain flatten (reuses lasuite-drive machinery)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:14:42 +01:00
3a8c5ca076 journal(2): both Phase-2 blockers cleared (Q3.2 PASS, F2-9 resolved); scout Q3.3 lasuite-meet as next (reuses lasuite-drive OIDC-at-install machinery) 2026-05-29 13:13:32 +01:00
a48543f57b status/journal/deferred(2): cryptpad F2-9 RESOLVED — roundtrip green in full harness custom tier (cold deploy); awaiting Adversary close
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:11:35 +01:00
118305b92f status(2): Q3.2 lasuite-drive Adversary PASS (F2-12 closed); cryptpad roundtrip cold-timing fix in validation 2026-05-29 13:01:43 +01:00
3484d25b5c fix(2): cryptpad roundtrip — more patient pad-creation wait (240s + reload) for cold fresh deploy
Full-suite custom-tier run showed the pad #/2/pad/edit fragment didn't appear within 80s on a fresh
cold deploy (passed on the warm probe). Bump _open_pad hash-wait to ~240s + one mid-way reload.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:01:43 +01:00
af1481f6fc review(2): record forward-looking Adversary criteria for parked lasuite-drive recipe-PR (Q3.2b) — keystone collabora healthcheck must let cc-ci drop -c backstop to abra-native convergence w/o regressing F2-12; repeat-green+cold-verify before operator merge. Does NOT reopen Q3.2 (PASS stands) 2026-05-29 13:01:01 +01:00
3f5d58a7c2 review(2): PASS gate Q3.2 lasuite-drive (re-claim a13d2ae/code e1147b5+6506c4a) — F2-12 CLOSED. Cold re-run: all 5 tiers GREEN, upgrade tier now passes, deploy-count=1, ready-probe OK(200) twice, OIDC+minio round-trip PASS (not skipped), data-integrity survives, teardown clean. abra -c + owned wait_healthy/READY_PROBE proven non-vacuous (5 P7-negative units + code-read RAISE paths). DECISIONS: record operator READY_PROBE principle 2026-05-29 12:59:52 +01:00
ac241d44c7 backlog(2): park Q3.2b — lasuite-drive recipe-PR (plan-lasuite-drive-recipe-pr.md) behind Q3.2; keystone collabora healthcheck lets cc-ci drop the F2-12 -c backstop later
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:59:37 +01:00
7dab4f5cb6 decisions(2): record operator principle — real-abra-only deploys, abra convergence by default, READY_PROBE (strict + negative-tested) only when abra doesn't fit; F2-12 applied
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:57:41 +01:00
a13d2ae48b claim(2): Q3.2 re-claim — F2-12 fixed (own convergence wait + READY_PROBE; upgrade 3x green; P7-negative unit-proven)
lasuite-drive full lifecycle 3x repeat-green (logs ccci-drive-f212-v1/v2/v3): install+upgrade+backup+
restore+custom all pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown, ready-
probe OK (200) twice (post-install + post-upgrade collabora WOPI). F2-12 fix e1147b5: upgrade chaos
redeploy uses abra -c (drop abra's impatient converge monitor that FATA'd while new collabora 25.04.9.4.1
was in healthcheck start_period) + perform_upgrade OWNS a stricter convergence wait (services N/N + app
health + collabora WOPI READY_PROBE) bounded by DEPLOY_TIMEOUT. Non-vacuous proven by 5 P7-negative unit
tests (6506c4a). Gate evidence/HOW/EXPECTED/WHERE in STATUS-2. F2-12 Adversary-owned (left to close).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:45:02 +01:00
6506c4ac3a test(2): F2-12 P7-negative unit tests — owned upgrade-convergence wait fails on stuck convergence
Proactively addresses the Adversary's pre-claim recon (f7c5681): since the F2-12 fix replaces abra's
converge monitor (-c) with the harness's own wait, prove the replacement genuinely FAILS a broken
convergence (non-vacuous), not just passes a slow one. 5 deterministic tests (fake clock, no deploy):
- wait_ready_probes RAISES TimeoutError when the READY_PROBE never returns 200 (collabora wedged).
- wait_ready_probes returns when it reaches 200; no-op without a READY_PROBE.
- wait_healthy RAISES when services never converge, and when converged-but-never-serving.
Run: cc-ci-run -m pytest tests/unit/test_f212_upgrade_convergence.py -q → 5 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:23:34 +01:00
f7c5681cd0 review(2): pre-claim recon F2-12 fix e1147b5 — abra -c skips converge monitor BUT harness owns stricter wait_healthy(N/N all svcs)+READY_PROBE(collabora 200, raises on timeout); plausibly not-a-weakening, MUST cold-verify upgrade-GREEN + P7-negative at re-claim; NO verdict yet 2026-05-29 12:21:30 +01:00
cc4af49c99 status(2): Q3.2 F2-12 FAIL acknowledged, fix e1147b5 validating; cryptpad F2-9 test landed 3/3 green
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:58:03 +01:00
e1147b5fe3 fix(2): F2-12 lasuite-drive upgrade tier — own convergence wait (abra -c) + collabora READY_PROBE
Adversary cold-verify FAILed Q3.2 (F2-12): the prev→PR-head chaos upgrade's abra converge monitor
FATAs while the NEW collabora 25.04.9.4.1's healthcheck is still in start_period (jail/config init),
even though it converges given swarm's healthcheck retries. My WOPI pre-gate fixed the OLD collabora
being killed mid-boot but not the NEW collabora's convergence. Flaky (3x green for me, 1x fail cold).

Fix (cc-ci-side, stronger verification — not weaker):
- abra.deploy gains no_converge_checks (`-c`); chaos_redeploy passes it for the upgrade op so abra's
  impatient monitor no longer FATAs (the stack spec is applied regardless).
- perform_upgrade now OWNS the convergence verification after the redeploy: wait_healthy (services
  N/N + app HEALTH_PATH) + new lifecycle.wait_ready_probes (recipe READY_PROBE), bounded by the
  recipe DEPLOY_TIMEOUT (generous) not abra's impatient window. meta threaded _perform_op→perform_upgrade.
- recipe_meta READY_PROBE hook (added to _load_meta whitelist): lasuite-drive probes collabora WOPI
  discovery (/hosting/discovery on collabora-<domain>) → 200. Called after install deploy AND after
  the upgrade redeploy. No-op for recipes without a READY_PROBE.

NOT re-claiming yet — validating the upgrade tier is now reliably green (incl. the slow-collabora
crossover) across multiple runs before re-claiming Q3.2. F2-12 stays open (Adversary-owned).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:55:53 +01:00
aab77ea0f3 review(2): FAIL gate Q3.2 lasuite-drive (claim 911680f/code 4b38b66) — cold re-run upgrade tier FAILS (abra chaos-deploy FATA: new collabora 25.04.9.4.1 not converged; WOPI pre-gate DID work). install/backup/restore/custom+OIDC pass, deploy-count=1, teardown clean. Filed F2-12 BLOCKING 2026-05-29 11:47:58 +01:00
05d0dc14eb feat(2): cryptpad create-pad content roundtrip Playwright test — resolves F2-9 (§4.3 create+read-back)
Adds tests/cryptpad/playwright/test_pad_content_roundtrip.py: open /pad/ → CryptPad auto-creates a
fragment-keyed pad → type a unique marker into the CKEditor body → wait for encrypted sync → open a
FRESH browser context (no shared localStorage/cookies) → navigate to the captured pad URL → assert
the marker survives in the re-decrypted body. Proves genuine end-to-end-encrypted server-side
persistence (the fresh session carries only the URL+fragment key), the §4.3 create-and-read-back
floor F2-9 requires — not a health/SPA stand-in.

Empirically mapped against CryptPad 2026.2.0 (the prior deferral cited version-fragility on 5.7.0):
editor is the deep nested frame …/pad/ckeditor-inner.html; ~15s cold-cache LESS-compile init; the
fragment-keyed pad URL DOES appear after init; transient net::ERR_NETWORK_CHANGED handled by the
shared goto_with_retry + a mid-load reload retry in the frame wait. PASSED against a live probe
instance. PARITY.md updated (roundtrip = the P3/§4.3 test; SPA-render test kept as fast liveness).

F2-9 is Adversary-owned — left for the Adversary to close after cold-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:46:02 +01:00
911680f843 claim(2): Q3.2 lasuite-drive — full lifecycle 3x green via install-time OIDC + collabora-ready upgrade gate
3× repeat-green (logs /root/ccci-drive-q32a-r2/r3/r4.log): install+upgrade+backup+restore+custom all
pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown each run. Resolves the
Adversary's standing veto-eligible obligation (lasuite-drive upgrade tier GREEN + reliable OIDC).

Fixes: install-time OIDC wiring (a151489: _provision_deps before single deploy + OIDC_AT_INSTALL +
install_steps.sh) eliminated the flaky post-deploy --chaos reconverge; collabora-WOPI-ready upgrade
gate + DEPLOY_TIMEOUT plumbing (4b38b66) fixed the upgrade tier (was killing a still-booting collabora,
exit 70). Gate evidence + cold-verify HOW/EXPECTED/WHERE in STATUS-2.md. BACKLOG-2 Q3.2/Q3.2a ticked;
DEFERRED.md disk follow-on noted done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:16:18 +01:00
5e0af07b86 journal(2): Q3.2a fixed-code run 1 FULL SUITE GREEN (collabora-ready gate fixed upgrade tier); launching 3x repeat-green 2026-05-29 10:52:44 +01:00
e0a80124bc inbox(2): consume BUILDER-INBOX (flag rename relay) + finish --extra rename in BACKLOG-2 Adversary-section lines 241/248/292 (Adversary explicitly delegated)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:40:49 +01:00
a22ba9c9cc inbox(2): relay orchestrator flag rename --extra-tests -> --extra to Builder (DEFERRED.md 12 occ + BACKLOG-2 4 occ; single-writer files, not editing them myself) 2026-05-29 10:39:46 +01:00
4b38b66fa5 fix(2): lasuite-drive Q3.2a — gate upgrade redeploy on collabora-ready + plumb DEPLOY_TIMEOUT
Q3.2a run 1: Part A (install-time OIDC) GREEN — deploy-count=1, install/backup/restore/custom +
OIDC test all PASS. BUT upgrade tier FAILED: the in-place `abra app deploy --chaos` redeploy landed
on a STILL-BOOTING collabora (coolwsd ~2min boot: 1300+ l10n files + RSA keygen) and SIGTERMed it
mid-init ("Shutdown requested while starting up", forced exit 70) → abra aborted the deploy. The
install wait_healthy returns on container 1/1 while coolwsd is still loading. Fixes (plan §C
readiness-gating, no test weakened):

- tests/lasuite-drive/ops.py::pre_upgrade — wait for collabora WOPI discovery (/hosting/discovery
  on collabora-<domain>) → 200 BEFORE the chaos redeploy, so it replaces a ready collabora cleanly.
- runner/harness/lifecycle.chaos_redeploy + generic.perform_upgrade + run_recipe_ci._perform_op —
  plumb the recipe DEPLOY_TIMEOUT to the upgrade chaos redeploy (was abra.deploy's 900s default,
  while the .env internal TIMEOUT is 1500s → Python could SIGKILL abra mid-wait on the slow
  collabora/onlyoffice reconverge). Mirrors the install deploy_app timeout plumbing.

Also (operator naming change 2026-05-29): renamed `--extra-tests` -> `--extra` in DEFERRED.md +
BACKLOG-2.md Build-backlog section. 3 refs remain in BACKLOG-2 Adversary-findings section
(241/248/292, closed findings) — left for the Adversary (single-writer); orchestrator updated
IDEAS.md/plan-sso-dep-testing.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:37:55 +01:00
0b558529c9 review(2): pre-claim recon lasuite-drive Q3.2a Part A — minio scale is recipe one-shot (replicas:0) NOT a bypass; install-time OIDC=deploy-once; minio test is real round-trip; NO verdict (gate not claimed) 2026-05-29 10:33:01 +01:00
f89cf9b1b8 status(2): Q3.2a lasuite-drive Part A in validation — install-time OIDC landed, full-suite run in flight
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:13:21 +01:00
a151489996 feat(2): lasuite-drive Q3.2a Part A — wire OIDC at INSTALL, eliminate flaky redeploy
Q3.2a / plan-lasuite-drive-oidc-robustness.md Part A. The old setup_custom_tests.sh did a
post-deploy in-place `abra app deploy --force --chaos` of the heavy 12-service stack to apply
the OIDC env — flaky (collabora WOPI-discovery race + gunicorn-perms; JOURNAL Step 0). Since
the OIDC env only affects backend/app and keycloak is live-warm, provision the per-run realm
BEFORE the single deploy and wire OIDC into the .env at install time (no reconverge).

- runner/run_recipe_ci.py: new _provision_deps() helper (warm/cold split + SSO enrich + write
  $CCCI_DEPS_FILE), used by both paths. New per-recipe OIDC_AT_INSTALL meta flag (added to
  _load_meta whitelist). When set + deps live-warm: provision BEFORE deploy_app; the install
  tier's install_steps.sh wires OIDC into the single deploy; post-deploy step runs only the
  MinIO bucket one-shot — no re-provision, no redeploy. Legacy post-deploy path unchanged for
  all other dep recipes (gated on `not oidc_at_install`).
- tests/lasuite-drive/install_steps.sh (NEW): install-time OIDC env + secret wiring; no-ops on
  empty deps file (recipe still boots, OIDC test skips → F2-11 RED).
- tests/lasuite-drive/setup_custom_tests.sh: trimmed to MinIO-bucket-only (OIDC moved out).
- tests/lasuite-drive/recipe_meta.py: OIDC_AT_INSTALL = True.
- JOURNAL-2: Step-0 root-cause failure logs captured before the fix.

NOT a claim — validating 3x green (incl. now-required upgrade tier) before claiming Q3.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:10:05 +01:00
4356f0009c review(2): cross-phase probe — 2pc prune-policy did NOT regress 2w warm infra (volumes survived, timers active, canonical idle@1.11.0); no finding, standing obligations stand 2026-05-29 10:00:38 +01:00
d389dd516b status(2pc): ## DONE — Adversary PASS for PC1+PC2+PC3, F2pc-1 closed, no VETO
Phase 2pc complete: conservative surgical gated prune (ci-docker-prune) live + reproducible from
git, local Docker store retained as the cache (PAT-authenticated, layer reuse proven), registry
pull-through cache deferred to IDEAS. Adversary review(2pc) 486d162 PASS @2026-05-29. Watchdog
auto-returns to Phase 2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:53:30 +01:00
486d162663 review(2pc): PASS gate 2pc (re-claim 9e73ebd) — PC1+PC2+PC3 cold-verified; F2pc-1 CLEARED. git==host: docker-prune.nix+swarm.nix byte-identical to /root/cc-ci, committed units now ci-docker-prune = live (enabled+active), old docker-prune.timer not-found. Live re-confirm: no-op prune@<80% images 18->18, cold->warm redis reuse. Pressure-branch keep-cache property structural (image prune w/o --all). PC2 PAT nptest2+retention+no-mirror, PC3 teardown-keeps-images+bogus-tag-fails GREEN from prior pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:52:28 +01:00
9e73ebda3d claim(2pc): re-claim — F2pc-1 resolved (git==host==ci-docker-prune via b9bbd25)
Adversary FAILed claim de6103d because that commit still named the units docker-prune while the
host runs ci-docker-prune; the rename was committed in b9bbd25 (its endorsed fix) which is in the
current pushed HEAD. git now defines the same ci-docker-prune units STATUS documents and the host
runs. Behavior was already cold-verified GREEN. Inert NixOS-builtin docker-prune.service
(inactive/linked, no timer) is unchanged by this and reproduces identically from git.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:50:39 +01:00
49892be7b0 review(2pc): FAIL gate 2pc (claim de6103d) — PC1/PC2/PC3 behavior cold-verified GREEN on host (surgical gated prune no-op@31%, images 17→17; teardown keeps images; PAT nptest2; cold→teardown→warm reuses local layers; bogus tag still fails), BUT committed code != verified host: git defines docker-prune units, host runs ci-docker-prune from uncommitted /root/cc-ci → not reproducible from git (D8). Filed F2pc-1 BLOCKING.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:47:45 +01:00