Files
cc-ci/nix/modules/warm-keycloak.nix
autonomic-bot a044abb298 feat(2w): W0.6 unpinned warm reconciler + WC1.2 safety gate + WC1.1 scaffold
runner/warm_reconcile.py (python, packaged into nix store, replaces bash
reconcile): UNPIN keycloak (deploy latest published version TAG; recipe fetched
at runtime -> D8 closure byte-identical). WC1.2 pre-deploy safety gate (runs
FIRST): major recipe/app-version bump OR releaseNotes manual-migration marker
-> hold-on-current + alert sentinel (no deploy churn). WC1.1 health-gated
upgrade-with-rollback: record last-good -> [keycloak: undeploy->warmsnap.snapshot
->deploy latest] -> health-gate -> commit-or-(restore+redeploy-prior+alert).
Alerts = /var/lib/ci-warm/alerts/*.json (Builder loop relays). current version
read from abra TYPE=<recipe>:<version>. CCCI_SKIP_FETCH test hook.
+8 unit tests for the version gate (56 unit pass).

Proven on cc-ci: nixos-rebuild switch -> warm-keycloak.service runs the python
reconciler -> noop-healthy (system 0-failed, /realms/master=200). WC1.2 holds
proven live: MAJOR bump -> held-major (keycloak untouched); minor+manual-
migration notes -> held-manual-migration (alert carries notes); no deploy churn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 00:42:02 +01:00

48 lines
2.4 KiB
Nix

# Phase 2w / WC1+WC1.1+WC1.2 — a live-warm, shared keycloak SSO provider, auto-updating to LATEST
# with a pre-deploy safety gate + post-deploy health-gated rollback. Deployed via abra at a STABLE
# domain (distinct from cold per-run `<recipe[:4]>-<6hex>`; see DECISIONS.md Phase-2w). SSO-dependent
# recipe runs use this one instance (per-run namespaced realm, created+deleted) instead of
# co-deploying a fresh keycloak each run.
#
# The reconcile logic lives in `runner/warm_reconcile.py` (Python — reuses warmsnap/abra/lifecycle so
# there is ONE snapshot impl, also used by the runner for WC5). The runner/ tree is copied into the
# nix store, so this is D8-clean (no dependence on the /root/cc-ci checkout) and the recipe is fetched
# at *runtime* → the nix closure stays byte-identical regardless of which keycloak version is live
# (UNPINNED; the kcVersion pin is gone).
#
# Idempotent RECONCILE oneshot (like deploy-proxy / swarm-init): converges every activation/boot.
# WC1.2 safety gate (major / manual-migration → hold + alert, no churn) runs BEFORE WC1.1's
# health-gated upgrade-with-rollback (snapshot keycloak's data volume before upgrade; restore +
# redeploy prior version on an unhealthy upgrade). Alerts are sentinel JSON under
# /var/lib/ci-warm/alerts/ relayed by the Builder loop (see DECISIONS).
{ pkgs, ... }:
let
runnerSrc = ../../runner;
reconcile = pkgs.writeShellApplication {
name = "cc-ci-reconcile-warm-keycloak";
runtimeInputs = with pkgs; [ abra docker git curl jq gnused gnugrep gnutar coreutils ];
text = ''
export HOME=/root
exec ${pkgs.python3}/bin/python3 ${runnerSrc}/warm_reconcile.py keycloak
'';
};
in
{
systemd.services.warm-keycloak = {
description = "Reconcile the live-warm shared keycloak SSO provider (WC1/WC1.1/WC1.2) via abra";
after = [ "deploy-proxy.service" "swarm-init.service" "docker.service" "network-online.target" ];
requires = [ "swarm-init.service" "docker.service" ];
wants = [ "deploy-proxy.service" "network-online.target" ];
wantedBy = [ "multi-user.target" ];
environment.HOME = "/root";
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
# Generous: a cold keycloak boot (JVM + DB migration) can take ~10min, and a health-gated
# upgrade may snapshot + deploy + (rollback) within one run.
TimeoutStartSec = "1800";
ExecStart = "${reconcile}/bin/cc-ci-reconcile-warm-keycloak";
};
};
}