9c9a0059c1
journal(2): record operator clarification — 3x repeat-green is flakiness-specific (lasuite-drive), not the general gate standard (normal = 1 cold-verified green)
2026-05-29 13:25:56 +01:00
c7b36ebb6a
review(2): record operator clarification — 3x repeat-green bar is lasuite-drive-recipe-PR ONLY (flakiness proof); normal gates = ONE cold-verified green per §6.1; cryptpad F2-9 needs only 1x
2026-05-29 13:25:46 +01:00
3a8c5ca076
journal(2): both Phase-2 blockers cleared (Q3.2 PASS, F2-9 resolved); scout Q3.3 lasuite-meet as next (reuses lasuite-drive OIDC-at-install machinery)
2026-05-29 13:13:32 +01:00
a48543f57b
status/journal/deferred(2): cryptpad F2-9 RESOLVED — roundtrip green in full harness custom tier (cold deploy); awaiting Adversary close
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 13:11:35 +01:00
118305b92f
status(2): Q3.2 lasuite-drive Adversary PASS (F2-12 closed); cryptpad roundtrip cold-timing fix in validation
2026-05-29 13:01:43 +01:00
af1481f6fc
review(2): record forward-looking Adversary criteria for parked lasuite-drive recipe-PR (Q3.2b) — keystone collabora healthcheck must let cc-ci drop -c backstop to abra-native convergence w/o regressing F2-12; repeat-green+cold-verify before operator merge. Does NOT reopen Q3.2 (PASS stands)
2026-05-29 13:01:01 +01:00
3f5d58a7c2
review(2): PASS gate Q3.2 lasuite-drive (re-claim a13d2ae/code e1147b5+6506c4a) — F2-12 CLOSED. Cold re-run: all 5 tiers GREEN, upgrade tier now passes, deploy-count=1, ready-probe OK(200) twice, OIDC+minio round-trip PASS (not skipped), data-integrity survives, teardown clean. abra -c + owned wait_healthy/READY_PROBE proven non-vacuous (5 P7-negative units + code-read RAISE paths). DECISIONS: record operator READY_PROBE principle
2026-05-29 12:59:52 +01:00
ac241d44c7
backlog(2): park Q3.2b — lasuite-drive recipe-PR (plan-lasuite-drive-recipe-pr.md) behind Q3.2; keystone collabora healthcheck lets cc-ci drop the F2-12 -c backstop later
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 12:59:37 +01:00
7dab4f5cb6
decisions(2): record operator principle — real-abra-only deploys, abra convergence by default, READY_PROBE (strict + negative-tested) only when abra doesn't fit; F2-12 applied
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 12:57:41 +01:00
a13d2ae48b
claim(2): Q3.2 re-claim — F2-12 fixed (own convergence wait + READY_PROBE; upgrade 3x green; P7-negative unit-proven)
...
lasuite-drive full lifecycle 3x repeat-green (logs ccci-drive-f212-v1/v2/v3): install+upgrade+backup+
restore+custom all pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown, ready-
probe OK (200) twice (post-install + post-upgrade collabora WOPI). F2-12 fix e1147b5 : upgrade chaos
redeploy uses abra -c (drop abra's impatient converge monitor that FATA'd while new collabora 25.04.9.4.1
was in healthcheck start_period) + perform_upgrade OWNS a stricter convergence wait (services N/N + app
health + collabora WOPI READY_PROBE) bounded by DEPLOY_TIMEOUT. Non-vacuous proven by 5 P7-negative unit
tests (6506c4a ). Gate evidence/HOW/EXPECTED/WHERE in STATUS-2. F2-12 Adversary-owned (left to close).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 12:45:02 +01:00
f7c5681cd0
review(2): pre-claim recon F2-12 fix e1147b5 — abra -c skips converge monitor BUT harness owns stricter wait_healthy(N/N all svcs)+READY_PROBE(collabora 200, raises on timeout); plausibly not-a-weakening, MUST cold-verify upgrade-GREEN + P7-negative at re-claim; NO verdict yet
2026-05-29 12:21:30 +01:00
cc4af49c99
status(2): Q3.2 F2-12 FAIL acknowledged, fix e1147b5 validating; cryptpad F2-9 test landed 3/3 green
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 11:58:03 +01:00
aab77ea0f3
review(2): FAIL gate Q3.2 lasuite-drive (claim 911680f/code 4b38b66) — cold re-run upgrade tier FAILS (abra chaos-deploy FATA: new collabora 25.04.9.4.1 not converged; WOPI pre-gate DID work). install/backup/restore/custom+OIDC pass, deploy-count=1, teardown clean. Filed F2-12 BLOCKING
2026-05-29 11:47:58 +01:00
911680f843
claim(2): Q3.2 lasuite-drive — full lifecycle 3x green via install-time OIDC + collabora-ready upgrade gate
...
3× repeat-green (logs /root/ccci-drive-q32a-r2/r3/r4.log): install+upgrade+backup+restore+custom all
pass, OIDC password-grant PASSED (not skip), deploy-count=1, clean teardown each run. Resolves the
Adversary's standing veto-eligible obligation (lasuite-drive upgrade tier GREEN + reliable OIDC).
Fixes: install-time OIDC wiring (a151489 : _provision_deps before single deploy + OIDC_AT_INSTALL +
install_steps.sh) eliminated the flaky post-deploy --chaos reconverge; collabora-WOPI-ready upgrade
gate + DEPLOY_TIMEOUT plumbing (4b38b66 ) fixed the upgrade tier (was killing a still-booting collabora,
exit 70). Gate evidence + cold-verify HOW/EXPECTED/WHERE in STATUS-2.md. BACKLOG-2 Q3.2/Q3.2a ticked;
DEFERRED.md disk follow-on noted done.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 11:16:18 +01:00
5e0af07b86
journal(2): Q3.2a fixed-code run 1 FULL SUITE GREEN (collabora-ready gate fixed upgrade tier); launching 3x repeat-green
2026-05-29 10:52:44 +01:00
e0a80124bc
inbox(2): consume BUILDER-INBOX (flag rename relay) + finish --extra rename in BACKLOG-2 Adversary-section lines 241/248/292 (Adversary explicitly delegated)
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 10:40:49 +01:00
a22ba9c9cc
inbox(2): relay orchestrator flag rename --extra-tests -> --extra to Builder (DEFERRED.md 12 occ + BACKLOG-2 4 occ; single-writer files, not editing them myself)
2026-05-29 10:39:46 +01:00
4b38b66fa5
fix(2): lasuite-drive Q3.2a — gate upgrade redeploy on collabora-ready + plumb DEPLOY_TIMEOUT
...
Q3.2a run 1: Part A (install-time OIDC) GREEN — deploy-count=1, install/backup/restore/custom +
OIDC test all PASS. BUT upgrade tier FAILED: the in-place `abra app deploy --chaos` redeploy landed
on a STILL-BOOTING collabora (coolwsd ~2min boot: 1300+ l10n files + RSA keygen) and SIGTERMed it
mid-init ("Shutdown requested while starting up", forced exit 70) → abra aborted the deploy. The
install wait_healthy returns on container 1/1 while coolwsd is still loading. Fixes (plan §C
readiness-gating, no test weakened):
- tests/lasuite-drive/ops.py::pre_upgrade — wait for collabora WOPI discovery (/hosting/discovery
on collabora-<domain>) → 200 BEFORE the chaos redeploy, so it replaces a ready collabora cleanly.
- runner/harness/lifecycle.chaos_redeploy + generic.perform_upgrade + run_recipe_ci._perform_op —
plumb the recipe DEPLOY_TIMEOUT to the upgrade chaos redeploy (was abra.deploy's 900s default,
while the .env internal TIMEOUT is 1500s → Python could SIGKILL abra mid-wait on the slow
collabora/onlyoffice reconverge). Mirrors the install deploy_app timeout plumbing.
Also (operator naming change 2026-05-29): renamed `--extra-tests` -> `--extra` in DEFERRED.md +
BACKLOG-2.md Build-backlog section. 3 refs remain in BACKLOG-2 Adversary-findings section
(241/248/292, closed findings) — left for the Adversary (single-writer); orchestrator updated
IDEAS.md/plan-sso-dep-testing.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 10:37:55 +01:00
0b558529c9
review(2): pre-claim recon lasuite-drive Q3.2a Part A — minio scale is recipe one-shot (replicas:0) NOT a bypass; install-time OIDC=deploy-once; minio test is real round-trip; NO verdict (gate not claimed)
2026-05-29 10:33:01 +01:00
f89cf9b1b8
status(2): Q3.2a lasuite-drive Part A in validation — install-time OIDC landed, full-suite run in flight
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 10:13:21 +01:00
a151489996
feat(2): lasuite-drive Q3.2a Part A — wire OIDC at INSTALL, eliminate flaky redeploy
...
Q3.2a / plan-lasuite-drive-oidc-robustness.md Part A. The old setup_custom_tests.sh did a
post-deploy in-place `abra app deploy --force --chaos` of the heavy 12-service stack to apply
the OIDC env — flaky (collabora WOPI-discovery race + gunicorn-perms; JOURNAL Step 0). Since
the OIDC env only affects backend/app and keycloak is live-warm, provision the per-run realm
BEFORE the single deploy and wire OIDC into the .env at install time (no reconverge).
- runner/run_recipe_ci.py: new _provision_deps() helper (warm/cold split + SSO enrich + write
$CCCI_DEPS_FILE), used by both paths. New per-recipe OIDC_AT_INSTALL meta flag (added to
_load_meta whitelist). When set + deps live-warm: provision BEFORE deploy_app; the install
tier's install_steps.sh wires OIDC into the single deploy; post-deploy step runs only the
MinIO bucket one-shot — no re-provision, no redeploy. Legacy post-deploy path unchanged for
all other dep recipes (gated on `not oidc_at_install`).
- tests/lasuite-drive/install_steps.sh (NEW): install-time OIDC env + secret wiring; no-ops on
empty deps file (recipe still boots, OIDC test skips → F2-11 RED).
- tests/lasuite-drive/setup_custom_tests.sh: trimmed to MinIO-bucket-only (OIDC moved out).
- tests/lasuite-drive/recipe_meta.py: OIDC_AT_INSTALL = True.
- JOURNAL-2: Step-0 root-cause failure logs captured before the fix.
NOT a claim — validating 3x green (incl. now-required upgrade tier) before claiming Q3.2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 10:10:05 +01:00
4356f0009c
review(2): cross-phase probe — 2pc prune-policy did NOT regress 2w warm infra (volumes survived, timers active, canonical idle@1.11.0); no finding, standing obligations stand
2026-05-29 10:00:38 +01:00
d389dd516b
status(2pc): ## DONE — Adversary PASS for PC1+PC2+PC3, F2pc-1 closed, no VETO
...
Phase 2pc complete: conservative surgical gated prune (ci-docker-prune) live + reproducible from
git, local Docker store retained as the cache (PAT-authenticated, layer reuse proven), registry
pull-through cache deferred to IDEAS. Adversary review(2pc) 486d162 PASS @2026-05-29. Watchdog
auto-returns to Phase 2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:53:30 +01:00
486d162663
review(2pc): PASS gate 2pc (re-claim 9e73ebd) — PC1+PC2+PC3 cold-verified; F2pc-1 CLEARED. git==host: docker-prune.nix+swarm.nix byte-identical to /root/cc-ci, committed units now ci-docker-prune = live (enabled+active), old docker-prune.timer not-found. Live re-confirm: no-op prune@<80% images 18->18, cold->warm redis reuse. Pressure-branch keep-cache property structural (image prune w/o --all). PC2 PAT nptest2+retention+no-mirror, PC3 teardown-keeps-images+bogus-tag-fails GREEN from prior pass.
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:52:28 +01:00
9e73ebda3d
claim(2pc): re-claim — F2pc-1 resolved (git==host==ci-docker-prune via b9bbd25)
...
Adversary FAILed claim de6103d because that commit still named the units docker-prune while the
host runs ci-docker-prune; the rename was committed in b9bbd25 (its endorsed fix) which is in the
current pushed HEAD. git now defines the same ci-docker-prune units STATUS documents and the host
runs. Behavior was already cold-verified GREEN. Inert NixOS-builtin docker-prune.service
(inactive/linked, no timer) is unchanged by this and reproduces identically from git.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:50:39 +01:00
49892be7b0
review(2pc): FAIL gate 2pc (claim de6103d) — PC1/PC2/PC3 behavior cold-verified GREEN on host (surgical gated prune no-op@31%, images 17→17; teardown keeps images; PAT nptest2; cold→teardown→warm reuses local layers; bogus tag still fails), BUT committed code != verified host: git defines docker-prune units, host runs ci-docker-prune from uncommitted /root/cc-ci → not reproducible from git (D8). Filed F2pc-1 BLOCKING.
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:47:45 +01:00
f6af7edd97
status(2pc): add probe-5 evidence — surgical prune reclaimed 2.34GB (dangling+old only), all tagged images kept, disk bounded without -af
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:44:57 +01:00
de6103d41d
claim(2pc): PC1 conservative prune deployed+verified; PC2/PC3 local-store cache confirmed
...
ci-docker-prune (gated surgical prune) live on cc-ci: old autoPrune --all gone, new timer
enabled (daily), no-ops below 80% disk keeping the local image cache, never --all/--volumes.
Daemon stays PAT-authenticated (nptest2); /var/lib/docker retained across rebuild. PC3 proof:
redis:7-alpine deploy->teardown(service rm, image retained)->redeploy = "Image is up to date",
no layer re-download (cold 5303ms -> warm 674ms). Docs: runbook "Image cache & prune policy",
warm.md, DECISIONS Phase-2pc, IDEAS (registry pull-through cache deferred + revisit trigger).
Gate 2pc CLAIMED, awaiting Adversary cold-verify.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:42:36 +01:00
16d177e73a
feat(2pc): PC1 conservative prune — drop autoPrune --all, add gated surgical docker-prune
...
Removes virtualisation.docker.autoPrune (daily `docker system prune --all` evicted in-use base
images → cold re-pull → Hub rate-limit churn, JOURNAL-2). Adds modules/docker-prune.nix: daily
timer + oneshot that prunes only dangling+until=24h, gated on disk pressure (>=80%) AND no run-app
live AND no swarm service converging; never --all, never --volumes. Teardown unchanged (never
removes images). Registry pull-through cache dropped per operator scope correction.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:30:07 +01:00
e42753c17c
note(2pc): realign REVIEW-2pc to narrowed scope — registry pull-through cache DROPPED per operator; 2pc is now prune-policy only (PC1 surgical prune + teardown must NOT remove images, PC2 confirm PAT-auth+local-store retention, PC3 deploy/teardown/redeploy reuses local layers). Break-it checklist updated.
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:25:55 +01:00
863bbac4de
note(2pc): init REVIEW-2pc — AWAITING CLAIM; baseline recon of current prune (swarm.nix --all until=24h) + confirm no pull-through cache exists yet; break-it checklist staged
...
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com >
2026-05-29 09:22:11 +01:00
78cf95aad3
status(2): Q3.2 truthful update — disk-blocker RESOLVED (cc-ci 64G); upgrade tier now REQUIRED green (not deferrable), runs via Q3.2a rework; F2-7 closed out-of-scope per SSO policy
2026-05-29 09:10:55 +01:00
139e8b9797
review(2): close F2-7 out-of-scope per operator SSO policy (keycloak default; Phase-2 DONE not gated on authentik; re-entry only if a recipe REQUIRES authentik); Builder owns DECISIONS/DEFERRED#9/cryptpad-keycloak edits
2026-05-29 09:10:00 +01:00
1537a928d5
decisions(2): record operator SSO-provider policy — keycloak DEFAULT for all recipe OIDC; authentik NOT a Phase-2 DONE gate (enroll only if a recipe REQUIRES it); cryptpad OIDC under keycloak; narrow DEFERRED #9 authentik re-entry trigger
2026-05-29 09:09:38 +01:00
779fb8917c
status(2): link plan-lasuite-drive-oidc-robustness.md into Q3.2a (Step 0 logs → Part A install-time OIDC vs warm keycloak [deploy once, no reconverge, real-abra-only] → Part B recipe PR; 3x-green + cold-verified before Q3.2 claim)
2026-05-29 09:06:43 +01:00
542028a6a4
status(2): Q4.5 mattermost-lts DONE — full lifecycle green (install+upgrade+backup+restore+custom, deploy-count=1, clean teardown); P1+P3 met; P4 ops → Q5 sweep
2026-05-29 09:05:55 +01:00
200d599c06
status(2): Q4.5 mattermost-lts ENROLLED + install+custom GREEN (create-message §4.3 round-trip validated live); full lifecycle in flight for P1
2026-05-29 08:59:57 +01:00
6ff68e625a
note(2): record Adversary cold-verify criteria for queued lasuite-drive Q3.2 rework (real-abra-only enforcement, repeat-green + upgrade tier required); not active yet
2026-05-29 08:58:32 +01:00
9b6c0e03dc
review(2): disk-blocker LIFTED — cold-verified 64G/44G-free + infra healthy post-resize; lasuite-drive upgrade tier now REQUIRED green (deferral void, veto-eligible open obligation); DEFERRED.md edit left to Builder
2026-05-29 08:42:52 +01:00
6df4757f85
status(2): CLOSE disk-blocker DEFERRED — cc-ci resized to 64G (44G free); heavy-recipe upgrade tiers runnable; lasuite-drive full-lifecycle Q3.2 now active backlog
2026-05-29 08:42:24 +01:00
aca1fd5185
inbox(2): consume Adversary BUILDER-INBOX — disk-blocker deferral VOID post-resize; Q3.2 now requires the FULL lasuite-drive lifecycle incl. a GREEN upgrade tier (cold-verified). Aligns with my plan: re-run full after cc-ci healthy, claim only when upgrade green.
2026-05-29 08:37:10 +01:00
4eae6eb208
inbox(2): disk resize 30→70GB in progress — deferral VOID; lasuite-drive upgrade tier now REQUIRED green for Q3.2 sign-off (no longer deferrable); pausing host verify during restart
2026-05-29 08:36:32 +01:00
dd137f9683
status(2): disk resize 30->70GB in progress (orchestrator) — disk-blocker LIFTING; deploys paused; plan to re-run lasuite-drive FULL lifecycle + mattermost after cc-ci healthy
2026-05-29 08:36:17 +01:00
9df900d1cc
journal(2): mumble scope correction — non-HTTP health = high-blast-radius core-harness feature (wait_healthy/canonical/generic), deserves dedicated effort; re-pick next unit = mattermost-lts (HTTP-native, no core changes)
2026-05-29 08:06:03 +01:00
7997b98935
journal(2): scouted mumble (Q4.2) — first non-HTTP recipe; design = python sidecar probe on app overlay network for the TLS protocol test; enrollment plan recorded for next tick
2026-05-29 07:47:42 +01:00
426a953c2b
status(2): lasuite-drive Q3.2 NOT claimed — OIDC setup redeploy flaky (collabora reconverge); --detach fix validated; test assertions proven correct (run 1); Q3.2a robustness item added; prune-during-deploy lesson recorded
2026-05-29 07:27:50 +01:00
75ae226c0d
status(2): Q3.2 lasuite-drive maximal subset GREEN (install+backup+restore+custom: health+MinIO roundtrip+OIDC JWT); upgrade tier deferred pending disk resize; clean re-run w/ --detach fix in flight before claim
2026-05-29 06:28:03 +01:00
d1aae43c7e
inbox(2): consume Adversary BUILDER-INBOX — conditional/deferred sign-off model for lasuite-drive upgrade tier (deferred pending disk resize, NOT waived; veto-eligible open item until cold-verified green). Q3.2 claim will frame accordingly.
2026-05-29 05:54:49 +01:00
ccc42699ff
chore(2): consume ADVERSARY-INBOX (Q3.2 lasuite-drive heads-up); reply via BUILDER-INBOX — disk blocker is operator-removable, will grant CONDITIONAL/deferred sign-off only, upgrade tier still blocks Phase-2 DONE
2026-05-29 05:53:51 +01:00
b78d708c49
decisions/deferred(2): lasuite-drive upgrade tier = disk env-blocker (28GB host, dual multi-GB office image crossover); maximal subset in flight; operator disk-resize escalation; adversary heads-up
2026-05-29 05:51:31 +01:00