Commit Graph

199 Commits

Author SHA1 Message Date
1dd7376ff4 status(2): HQ1 image pre-pull Adversary PASS (0215bd2) 2026-05-29 16:19:27 +01:00
0215bd2203 review(2): PASS gate HQ1 image pre-pull (claim 475ad5c/code 2bf40d6) — 4 unit pass (non-vacuous, raises on pull-fail); LIVE warm-cache skip (present n8n, zero network); LIVE bad-tag RAISES clear pull error BEFORE deploy (manifest unknown, not converge timeout); abra deploy real+UNCHANGED (prepull before, no service update/scale); honest scope (pull-time not init-time). No VETO 2026-05-29 16:18:28 +01:00
475ad5c774 claim(2): HQ1 image pre-pull — warm local store before deploy (4 unit tests + warm-cache-skip + bad-tag-clear-error + abra-unchanged)
lifecycle.prepull_images (commit 2bf40d6): docker compose config --images → docker pull skip-if-present,
before deploy_app's abra.deploy + perform_upgrade's chaos redeploy. Adversary criteria all met:
warm-cache 2nd run 'present' (no redownload, n8n-prepull2), bad-tag → clear RuntimeError pre-deploy,
abra deploy path unchanged (no service update/scale), real-run green. 4 unit tests pass. Gate evidence
in STATUS-2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 16:14:25 +01:00
e6e5436942 backlog(2): Q3.5 immich [~] partial — 4/5 green + §4.3; restore P4 blocked by upstream recipe (pg_dump hook needed, DEFERRED) 2026-05-29 15:54:10 +01:00
9272c20727 journal/deferred(2): Q3.5 immich PARTIAL — restore P4 blocked by upstream recipe (volume backup, no pg_dump hook); recipe-PR unit filed (drive/meet pg_backup.sh pattern) 2026-05-29 15:53:22 +01:00
250bed4768 status(2): cryptpad F2-9 + F2-13 Adversary CLOSED (f7ed2d9) — §4.3 create-pad floor demonstrated; DONE-blocker cleared 2026-05-29 15:38:21 +01:00
f7ed2d967c review(2): cryptpad F2-9 + F2-13 CLOSED — re-verify after fix b44d75b (poll-all-frames). create-pad roundtrip test_cryptpad_pad_content_survives_fresh_session PASSED (46s, was 340s timeout), all 5 tiers green, deploy-count=1, clean teardown. Fix non-vacuous (still asserts marker surfaces in fresh context = server-side encrypted persistence). §4.3 create-pad floor demonstrated; conditional sign-off satisfied 2026-05-29 15:37:12 +01:00
62ac9b59e0 journal/status(2): F2-13 cryptpad read-back robustness FIXED (b44d75b, poll-all-frames) — 3x green vs cold probe; awaiting Adversary re-verify/F2-9 close 2026-05-29 15:26:25 +01:00
1cbb1ccd73 review(2): cryptpad F2-9 NOT closed — create-pad roundtrip read-back leg FAILED on cold-verify (CKEditor frame never attached on fresh context, line 133; 1 failed in 340s) → test is flaky not 3x-reliable. Filed F2-13: make read-back robust before F2-9 closes. install/upgrade/backup/restore pass, only the §4.3-floor pad-persist test red; teardown clean. NOT a VETO (F2-9 was conditional/open) 2026-05-29 15:05:22 +01:00
754f508231 review(2): record forward-looking Adversary criteria for pre-pull harness unit (plan-prepull-images.md) — verify warm-cache no-redownload + bad-tag=clear-pull-error-pre-deploy + abra stays real/unchanged + honest scope (pull-time not init-time; F2-12 init races still need healthcheck) 2026-05-29 14:58:38 +01:00
f8af5b2307 backlog(2): HQ1 — image pre-pull harness unit (plan-prepull-images.md), near-term; fixes the first-deploy 'No such image' race 2026-05-29 14:56:18 +01:00
b0f1e0b0ad status(2): Q3.3 lasuite-meet Adversary PASS (a46f7d4); immich Q3.5 validating 2026-05-29 14:44:09 +01:00
a46f7d4593 review(2): PASS gate Q3.3 lasuite-meet (claim 5af513e/code 1f7806a) — cold-verify all 5 tiers GREEN, deploy-count=1, real upgrade crossover 0.2.0+v1.15.0->0.3.0+v1.16.0, meeting_flow (room create->read-back->LiveKit video-grant JWT->delete) PASSED, OIDC PASSED not-skipped, ci_marker survives, teardown clean+realm reaped. WebRTC media-relay non-port: ADVERSARY SIGN-OFF (genuine UDP env-blocker, maximal subset=LiveKit token issuance shipped) 2026-05-29 14:40:15 +01:00
5af513e2c8 claim(2): Q3.3 lasuite-meet — full lifecycle green (meeting_flow §4.3 + OIDC; R014 chaos-base; webrtc env-blocker non-port)
lasuite-meet full suite GREEN (log /root/ccci-meet-full6.log): install/upgrade/backup/restore/custom
all pass, deploy-count=1, clean teardown, real upgrade crossover 0.2.0+v1.15.0→0.3.0+v1.16.0.
- §4.3 test_meeting_flow: create-room (201) → read-back (200) → LiveKit join token (JWT room grant) →
  delete. test_oidc_password_grant PASSED. Parity: health_check + oidc_login. Reused lasuite-drive
  OIDC-at-install machinery.
- R014 fix (72719fe): upstream lightweight tag → chaos-base deploy of the checked-out prev version
  (skips lint, deploys prev not latest — verified by the crossover).
- webrtc-media/relay UDP media-relay = documented env-blocker non-port; maximal subset (LiveKit token
  issuance) shipped in meeting_flow.
Gate evidence/HOW/EXPECTED/WHERE in STATUS-2. DECISIONS: R014 chaos-base + webrtc non-port. BACKLOG-2
[idea]: harness image pre-pull. Single cold-verified green is the bar (operator clarification).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 14:33:31 +01:00
9c9a0059c1 journal(2): record operator clarification — 3x repeat-green is flakiness-specific (lasuite-drive), not the general gate standard (normal = 1 cold-verified green) 2026-05-29 13:25:56 +01:00
c7b36ebb6a review(2): record operator clarification — 3x repeat-green bar is lasuite-drive-recipe-PR ONLY (flakiness proof); normal gates = ONE cold-verified green per §6.1; cryptpad F2-9 needs only 1x 2026-05-29 13:25:46 +01:00
3a8c5ca076 journal(2): both Phase-2 blockers cleared (Q3.2 PASS, F2-9 resolved); scout Q3.3 lasuite-meet as next (reuses lasuite-drive OIDC-at-install machinery) 2026-05-29 13:13:32 +01:00
a48543f57b status/journal/deferred(2): cryptpad F2-9 RESOLVED — roundtrip green in full harness custom tier (cold deploy); awaiting Adversary close
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:11:35 +01:00
118305b92f status(2): Q3.2 lasuite-drive Adversary PASS (F2-12 closed); cryptpad roundtrip cold-timing fix in validation 2026-05-29 13:01:43 +01:00
af1481f6fc review(2): record forward-looking Adversary criteria for parked lasuite-drive recipe-PR (Q3.2b) — keystone collabora healthcheck must let cc-ci drop -c backstop to abra-native convergence w/o regressing F2-12; repeat-green+cold-verify before operator merge. Does NOT reopen Q3.2 (PASS stands) 2026-05-29 13:01:01 +01:00
3f5d58a7c2 review(2): PASS gate Q3.2 lasuite-drive (re-claim a13d2ae/code e1147b5+6506c4a) — F2-12 CLOSED. Cold re-run: all 5 tiers GREEN, upgrade tier now passes, deploy-count=1, ready-probe OK(200) twice, OIDC+minio round-trip PASS (not skipped), data-integrity survives, teardown clean. abra -c + owned wait_healthy/READY_PROBE proven non-vacuous (5 P7-negative units + code-read RAISE paths). DECISIONS: record operator READY_PROBE principle 2026-05-29 12:59:52 +01:00
ac241d44c7 backlog(2): park Q3.2b — lasuite-drive recipe-PR (plan-lasuite-drive-recipe-pr.md) behind Q3.2; keystone collabora healthcheck lets cc-ci drop the F2-12 -c backstop later
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:59:37 +01:00
7dab4f5cb6 decisions(2): record operator principle — real-abra-only deploys, abra convergence by default, READY_PROBE (strict + negative-tested) only when abra doesn't fit; F2-12 applied
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:57:41 +01:00
a13d2ae48b claim(2): Q3.2 re-claim — F2-12 fixed (own convergence wait + READY_PROBE; upgrade 3x green; P7-negative unit-proven)
lasuite-drive full lifecycle 3x repeat-green (logs ccci-drive-f212-v1/v2/v3): install+upgrade+backup+
restore+custom all pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown, ready-
probe OK (200) twice (post-install + post-upgrade collabora WOPI). F2-12 fix e1147b5: upgrade chaos
redeploy uses abra -c (drop abra's impatient converge monitor that FATA'd while new collabora 25.04.9.4.1
was in healthcheck start_period) + perform_upgrade OWNS a stricter convergence wait (services N/N + app
health + collabora WOPI READY_PROBE) bounded by DEPLOY_TIMEOUT. Non-vacuous proven by 5 P7-negative unit
tests (6506c4a). Gate evidence/HOW/EXPECTED/WHERE in STATUS-2. F2-12 Adversary-owned (left to close).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:45:02 +01:00
f7c5681cd0 review(2): pre-claim recon F2-12 fix e1147b5 — abra -c skips converge monitor BUT harness owns stricter wait_healthy(N/N all svcs)+READY_PROBE(collabora 200, raises on timeout); plausibly not-a-weakening, MUST cold-verify upgrade-GREEN + P7-negative at re-claim; NO verdict yet 2026-05-29 12:21:30 +01:00
cc4af49c99 status(2): Q3.2 F2-12 FAIL acknowledged, fix e1147b5 validating; cryptpad F2-9 test landed 3/3 green
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:58:03 +01:00
aab77ea0f3 review(2): FAIL gate Q3.2 lasuite-drive (claim 911680f/code 4b38b66) — cold re-run upgrade tier FAILS (abra chaos-deploy FATA: new collabora 25.04.9.4.1 not converged; WOPI pre-gate DID work). install/backup/restore/custom+OIDC pass, deploy-count=1, teardown clean. Filed F2-12 BLOCKING 2026-05-29 11:47:58 +01:00
911680f843 claim(2): Q3.2 lasuite-drive — full lifecycle 3x green via install-time OIDC + collabora-ready upgrade gate
3× repeat-green (logs /root/ccci-drive-q32a-r2/r3/r4.log): install+upgrade+backup+restore+custom all
pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown each run. Resolves the
Adversary's standing veto-eligible obligation (lasuite-drive upgrade tier GREEN + reliable OIDC).

Fixes: install-time OIDC wiring (a151489: _provision_deps before single deploy + OIDC_AT_INSTALL +
install_steps.sh) eliminated the flaky post-deploy --chaos reconverge; collabora-WOPI-ready upgrade
gate + DEPLOY_TIMEOUT plumbing (4b38b66) fixed the upgrade tier (was killing a still-booting collabora,
exit 70). Gate evidence + cold-verify HOW/EXPECTED/WHERE in STATUS-2.md. BACKLOG-2 Q3.2/Q3.2a ticked;
DEFERRED.md disk follow-on noted done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:16:18 +01:00
5e0af07b86 journal(2): Q3.2a fixed-code run 1 FULL SUITE GREEN (collabora-ready gate fixed upgrade tier); launching 3x repeat-green 2026-05-29 10:52:44 +01:00
e0a80124bc inbox(2): consume BUILDER-INBOX (flag rename relay) + finish --extra rename in BACKLOG-2 Adversary-section lines 241/248/292 (Adversary explicitly delegated)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:40:49 +01:00
a22ba9c9cc inbox(2): relay orchestrator flag rename --extra-tests -> --extra to Builder (DEFERRED.md 12 occ + BACKLOG-2 4 occ; single-writer files, not editing them myself) 2026-05-29 10:39:46 +01:00
4b38b66fa5 fix(2): lasuite-drive Q3.2a — gate upgrade redeploy on collabora-ready + plumb DEPLOY_TIMEOUT
Q3.2a run 1: Part A (install-time OIDC) GREEN — deploy-count=1, install/backup/restore/custom +
OIDC test all PASS. BUT upgrade tier FAILED: the in-place `abra app deploy --chaos` redeploy landed
on a STILL-BOOTING collabora (coolwsd ~2min boot: 1300+ l10n files + RSA keygen) and SIGTERMed it
mid-init ("Shutdown requested while starting up", forced exit 70) → abra aborted the deploy. The
install wait_healthy returns on container 1/1 while coolwsd is still loading. Fixes (plan §C
readiness-gating, no test weakened):

- tests/lasuite-drive/ops.py::pre_upgrade — wait for collabora WOPI discovery (/hosting/discovery
  on collabora-<domain>) → 200 BEFORE the chaos redeploy, so it replaces a ready collabora cleanly.
- runner/harness/lifecycle.chaos_redeploy + generic.perform_upgrade + run_recipe_ci._perform_op —
  plumb the recipe DEPLOY_TIMEOUT to the upgrade chaos redeploy (was abra.deploy's 900s default,
  while the .env internal TIMEOUT is 1500s → Python could SIGKILL abra mid-wait on the slow
  collabora/onlyoffice reconverge). Mirrors the install deploy_app timeout plumbing.

Also (operator naming change 2026-05-29): renamed `--extra-tests` -> `--extra` in DEFERRED.md +
BACKLOG-2.md Build-backlog section. 3 refs remain in BACKLOG-2 Adversary-findings section
(241/248/292, closed findings) — left for the Adversary (single-writer); orchestrator updated
IDEAS.md/plan-sso-dep-testing.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:37:55 +01:00
0b558529c9 review(2): pre-claim recon lasuite-drive Q3.2a Part A — minio scale is recipe one-shot (replicas:0) NOT a bypass; install-time OIDC=deploy-once; minio test is real round-trip; NO verdict (gate not claimed) 2026-05-29 10:33:01 +01:00
f89cf9b1b8 status(2): Q3.2a lasuite-drive Part A in validation — install-time OIDC landed, full-suite run in flight
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:13:21 +01:00
a151489996 feat(2): lasuite-drive Q3.2a Part A — wire OIDC at INSTALL, eliminate flaky redeploy
Q3.2a / plan-lasuite-drive-oidc-robustness.md Part A. The old setup_custom_tests.sh did a
post-deploy in-place `abra app deploy --force --chaos` of the heavy 12-service stack to apply
the OIDC env — flaky (collabora WOPI-discovery race + gunicorn-perms; JOURNAL Step 0). Since
the OIDC env only affects backend/app and keycloak is live-warm, provision the per-run realm
BEFORE the single deploy and wire OIDC into the .env at install time (no reconverge).

- runner/run_recipe_ci.py: new _provision_deps() helper (warm/cold split + SSO enrich + write
  $CCCI_DEPS_FILE), used by both paths. New per-recipe OIDC_AT_INSTALL meta flag (added to
  _load_meta whitelist). When set + deps live-warm: provision BEFORE deploy_app; the install
  tier's install_steps.sh wires OIDC into the single deploy; post-deploy step runs only the
  MinIO bucket one-shot — no re-provision, no redeploy. Legacy post-deploy path unchanged for
  all other dep recipes (gated on `not oidc_at_install`).
- tests/lasuite-drive/install_steps.sh (NEW): install-time OIDC env + secret wiring; no-ops on
  empty deps file (recipe still boots, OIDC test skips → F2-11 RED).
- tests/lasuite-drive/setup_custom_tests.sh: trimmed to MinIO-bucket-only (OIDC moved out).
- tests/lasuite-drive/recipe_meta.py: OIDC_AT_INSTALL = True.
- JOURNAL-2: Step-0 root-cause failure logs captured before the fix.

NOT a claim — validating 3x green (incl. now-required upgrade tier) before claiming Q3.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:10:05 +01:00
4356f0009c review(2): cross-phase probe — 2pc prune-policy did NOT regress 2w warm infra (volumes survived, timers active, canonical idle@1.11.0); no finding, standing obligations stand 2026-05-29 10:00:38 +01:00
d389dd516b status(2pc): ## DONE — Adversary PASS for PC1+PC2+PC3, F2pc-1 closed, no VETO
Phase 2pc complete: conservative surgical gated prune (ci-docker-prune) live + reproducible from
git, local Docker store retained as the cache (PAT-authenticated, layer reuse proven), registry
pull-through cache deferred to IDEAS. Adversary review(2pc) 486d162 PASS @2026-05-29. Watchdog
auto-returns to Phase 2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:53:30 +01:00
486d162663 review(2pc): PASS gate 2pc (re-claim 9e73ebd) — PC1+PC2+PC3 cold-verified; F2pc-1 CLEARED. git==host: docker-prune.nix+swarm.nix byte-identical to /root/cc-ci, committed units now ci-docker-prune = live (enabled+active), old docker-prune.timer not-found. Live re-confirm: no-op prune@<80% images 18->18, cold->warm redis reuse. Pressure-branch keep-cache property structural (image prune w/o --all). PC2 PAT nptest2+retention+no-mirror, PC3 teardown-keeps-images+bogus-tag-fails GREEN from prior pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:52:28 +01:00
9e73ebda3d claim(2pc): re-claim — F2pc-1 resolved (git==host==ci-docker-prune via b9bbd25)
Adversary FAILed claim de6103d because that commit still named the units docker-prune while the
host runs ci-docker-prune; the rename was committed in b9bbd25 (its endorsed fix) which is in the
current pushed HEAD. git now defines the same ci-docker-prune units STATUS documents and the host
runs. Behavior was already cold-verified GREEN. Inert NixOS-builtin docker-prune.service
(inactive/linked, no timer) is unchanged by this and reproduces identically from git.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:50:39 +01:00
49892be7b0 review(2pc): FAIL gate 2pc (claim de6103d) — PC1/PC2/PC3 behavior cold-verified GREEN on host (surgical gated prune no-op@31%, images 17→17; teardown keeps images; PAT nptest2; cold→teardown→warm reuses local layers; bogus tag still fails), BUT committed code != verified host: git defines docker-prune units, host runs ci-docker-prune from uncommitted /root/cc-ci → not reproducible from git (D8). Filed F2pc-1 BLOCKING.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:47:45 +01:00
f6af7edd97 status(2pc): add probe-5 evidence — surgical prune reclaimed 2.34GB (dangling+old only), all tagged images kept, disk bounded without -af
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:44:57 +01:00
de6103d41d claim(2pc): PC1 conservative prune deployed+verified; PC2/PC3 local-store cache confirmed
ci-docker-prune (gated surgical prune) live on cc-ci: old autoPrune --all gone, new timer
enabled (daily), no-ops below 80% disk keeping the local image cache, never --all/--volumes.
Daemon stays PAT-authenticated (nptest2); /var/lib/docker retained across rebuild. PC3 proof:
redis:7-alpine deploy->teardown(service rm, image retained)->redeploy = "Image is up to date",
no layer re-download (cold 5303ms -> warm 674ms). Docs: runbook "Image cache & prune policy",
warm.md, DECISIONS Phase-2pc, IDEAS (registry pull-through cache deferred + revisit trigger).
Gate 2pc CLAIMED, awaiting Adversary cold-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:42:36 +01:00
16d177e73a feat(2pc): PC1 conservative prune — drop autoPrune --all, add gated surgical docker-prune
Removes virtualisation.docker.autoPrune (daily `docker system prune --all` evicted in-use base
images → cold re-pull → Hub rate-limit churn, JOURNAL-2). Adds modules/docker-prune.nix: daily
timer + oneshot that prunes only dangling+until=24h, gated on disk pressure (>=80%) AND no run-app
live AND no swarm service converging; never --all, never --volumes. Teardown unchanged (never
removes images). Registry pull-through cache dropped per operator scope correction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:30:07 +01:00
e42753c17c note(2pc): realign REVIEW-2pc to narrowed scope — registry pull-through cache DROPPED per operator; 2pc is now prune-policy only (PC1 surgical prune + teardown must NOT remove images, PC2 confirm PAT-auth+local-store retention, PC3 deploy/teardown/redeploy reuses local layers). Break-it checklist updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:25:55 +01:00
863bbac4de note(2pc): init REVIEW-2pc — AWAITING CLAIM; baseline recon of current prune (swarm.nix --all until=24h) + confirm no pull-through cache exists yet; break-it checklist staged
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:22:11 +01:00
78cf95aad3 status(2): Q3.2 truthful update — disk-blocker RESOLVED (cc-ci 64G); upgrade tier now REQUIRED green (not deferrable), runs via Q3.2a rework; F2-7 closed out-of-scope per SSO policy 2026-05-29 09:10:55 +01:00
139e8b9797 review(2): close F2-7 out-of-scope per operator SSO policy (keycloak default; Phase-2 DONE not gated on authentik; re-entry only if a recipe REQUIRES authentik); Builder owns DECISIONS/DEFERRED#9/cryptpad-keycloak edits 2026-05-29 09:10:00 +01:00
1537a928d5 decisions(2): record operator SSO-provider policy — keycloak DEFAULT for all recipe OIDC; authentik NOT a Phase-2 DONE gate (enroll only if a recipe REQUIRES it); cryptpad OIDC under keycloak; narrow DEFERRED #9 authentik re-entry trigger 2026-05-29 09:09:38 +01:00
779fb8917c status(2): link plan-lasuite-drive-oidc-robustness.md into Q3.2a (Step 0 logs → Part A install-time OIDC vs warm keycloak [deploy once, no reconverge, real-abra-only] → Part B recipe PR; 3x-green + cold-verified before Q3.2 claim) 2026-05-29 09:06:43 +01:00
542028a6a4 status(2): Q4.5 mattermost-lts DONE — full lifecycle green (install+upgrade+backup+restore+custom, deploy-count=1, clean teardown); P1+P3 met; P4 ops → Q5 sweep 2026-05-29 09:05:55 +01:00