Files
cc-ci-orchestrator/ideas/coop-cloud-nixos-modules.md
2026-06-02 05:06:30 +00:00

8.6 KiB

Idea: Co-op Cloud NixOS modules

Status: research / pre-design. Not started.
Origin: conversation 2026-06-02 between mfowler and the assistant.


The idea

A public Nix flake that lets NixOS operators deploy Co-op Cloud apps declaratively — via git, via nixos-rebuild switch — instead of via abra imperative commands. Each app is a thin NixOS module backed by a shared mkCcApp factory. Docker Swarm still does the actual container work, so the container-isolation story is unchanged; Nix manages what is deployed and at what version.

# In a user's NixOS configuration.nix:
services.coop-cloud.ghost = {
  enable  = true;
  domain  = "blog.example.org";
  version = "1.3.0+6.42.0-alpine";
  autoUpdate = true;
};

Why this over native NixOS modules

NixOS already has native service modules for ~14 of the 18 maintained recipes (matrix-synapse, keycloak, nextcloud, jitsi-meet, hedgedoc, immich, etc.). The argument for doing it this way instead:

Container isolation is an advantage, not a legacy. Native NixOS modules run as systemd units sharing the host's namespaces. Containers give hard network and filesystem isolation between apps — a compromised Ghost instance cannot see other services' sockets. For a single-node multi-app host that matters.

The recipes already exist and are maintained. The Co-op Cloud recipe ecosystem (compose.yml, tested version combinations, backup hooks, Traefik wiring) is the real value. Wrapping it in Nix preserves that investment rather than reimplementing it as native modules.

The target user. Most Co-op Cloud operators are single-node. They want the isolation and curation of Co-op Cloud recipes but a declarative, git-tracked, idempotent deployment model rather than abra's imperative one. This gives them that without giving up containers.


Existing art

  • Arion (hercules-ci/arion) — runs docker-compose via NixOS modules. Could be an implementation backend but requires Docker daemon; adds a layer.
  • compose2nix (aksiksi/compose2nix) — converts compose.yml to NixOS systemd-nspawn configs. Loses Docker Swarm isolation model.
  • No existing Co-op Cloud + Nix bridge was found. The space is open.

Proof of concept already in cc-ci

The cc-ci NixOS config (cc-ci/nix/modules/) already implements this pattern for its own internal services. The key modules:

Module Pattern
swarm.nix Enables Docker, initialises single-node Swarm + proxy overlay network as a systemd oneshot
proxy.nix Deploys the Co-op Cloud traefik recipe via abra app deploy, health-gated
warm-keycloak.nix Deploys keycloak via abra app deploy, with snapshot→upgrade→health-gate→rollback
nightly-sweep.nix systemd timer + oneshot that runs nightly upgrades across all warm apps

These are bespoke (hard-coded app names, cc-ci-specific reconcile scripts) but the structure is exactly what a general mkCcApp would produce. The flake idea is: extract and parameterise this pattern, one thin wrapper per recipe.


Proposed design

mkCcApp — the shared factory

A function in lib/mkCcApp.nix that takes per-recipe parameters and returns a NixOS module (attrset of systemd.services, systemd.timers, sops.secrets, etc.):

mkCcApp {
  recipe      = "ghost";        # Co-op Cloud recipe name
  appName     = "ghost";        # abra app name (often == recipe)
  domain      = "blog.example.org";
  version     = "1.3.0+6.42.0-alpine";
  env         = { MAIL_TRANSPORT = "SMTP"; };   # extra .env vars
  healthPath  = "/ghost/api/admin/site";         # HTTP path for health gate
  healthOk    = [ 200 ];
  healthTimeout = 120;
  stateful    = true;            # snapshot data volumes before upgrade
  autoUpdate  = true;            # add a nightly timer
  updateSchedule = "03:00:00";   # systemd OnCalendar time
  after       = [];              # extra systemd ordering deps
  timeout     = 600;             # deploy timeout in seconds
}

This emits:

  1. systemd.services.cc-app-<appName> — a oneshot that:

    • Creates the abra app if it doesn't exist (abra app new <recipe> <appName>)
    • Writes env vars into the abra .env file
    • Runs abra app deploy <appName> --no-input
    • Orders after swarm-init.service and deploy-proxy.service
  2. systemd.services.cc-app-<appName>-reconcile (if autoUpdate = true) — a oneshot that implements the health-gated upgrade/rollback loop (see below), driven by a timer.

  3. systemd.timers.cc-app-<appName>-reconcile (if autoUpdate = true) — fires the reconcile service on updateSchedule.

  4. sops.secrets.* entries for any declared secrets, wired to paths the abra env file references.

Per-recipe module (thin wrapper)

Each recipe becomes a file like apps/ghost.nix:

{ mkCcApp, ... }:
mkCcApp {
  recipe    = "ghost";
  healthPath = "/ghost/api/admin/site";
  stateful  = true;
}

An operator's NixOS config imports the recipe module, sets their domain/version, done.

Health-gated upgrade/rollback

Modelled directly on cc-ci/runner/warm_reconcile.py. The reconcile oneshot:

read running version (last-good)
fetch latest available version via abra
if running == latest → health-check → update last-good if healthy → exit
if major-version jump → hold + alert, no deploy
record last-good = current
if stateful:
    abra app undeploy
    abra app snapshot (data volumes)
abra app upgrade → latest
wait for healthPath to return healthOk (up to healthTimeout)
if healthy:
    write last-good = latest
if unhealthy:
    if stateful: abra app restore snapshot
    abra app deploy last-good version
    write alert sentinel to /var/lib/coop-cloud/alerts/<appName>.json

For stateless apps (e.g. traefik, custom-html) the snapshot/restore steps are skipped — only the version is rolled back.

Swarm bootstrap

A coop-cloud-base.nix module (imported once by the host, not per-app) handles:

  • virtualisation.docker.enable = true
  • swarm-init oneshot (identical to cc-ci/nix/modules/swarm.nix)
  • deploy-proxy oneshot for the traefik recipe

All per-app services order after deploy-proxy.service.


Secrets model

The cc-ci approach is sops-nix: secrets live in a git-tracked encrypted secrets.yaml, decrypted at activation by the host's SSH key (age identity). That's the right model for operator use too — no out-of-band secret drops. Each mkCcApp call can declare its secrets:

secrets = {
  db_password = { sopsPath = "ghost_db_password"; };
  smtp_password = { sopsPath = "ghost_smtp_password"; };
};

The factory generates the sops.secrets.ghost_db_password entry and wires the decrypted path into the abra .env file (or a swarm secret, depending on how the recipe reads it).


Open questions

  1. Abra state vs Nix store. Abra manages its own state in ~/.abra/apps/. The Nix module writes .env files there at deploy time. This is slightly un-Nix (mutable state outside the store), but it's how cc-ci works today and it's fine for single-node operators.

  2. Version pinning vs autoUpdate. If autoUpdate = false, the operator pins a version in their NixOS config and upgrades by bumping the string and running nixos-rebuild switch. Clean model. If autoUpdate = true, the reconciler diverges from the declared version — the Nix config becomes the floor ("at least this version") rather than the exact pin. Worth documenting this tension.

  3. Recipe flake vs per-operator flake. Two distribution models:

    • A single public coop-cloud-nix flake with all 18 recipes, operators add it as an input.
    • Operators fork/extend. Probably start with option A; per-recipe modules stay thin enough that forks are easy.
  4. Recipes without a clean health endpoint. Some apps (mumble, mailu) don't have a simple HTTP health path. The healthPath = null case would skip the gate and just wait for the swarm service to stabilise — weaker but still useful.

  5. Relationship to Co-op Cloud upstream. This is a parallel deployment interface for the same recipes, not a fork. Recipe compose.yml files stay upstream. The flake just wraps them. Worth coordinating with the Co-op Cloud maintainers rather than building in isolation.


Recipes to cover (the 18 maintained)

bluesky-pds, cryptpad, custom-html, custom-html-tiny, discourse, ghost, hedgedoc, immich, keycloak, lasuite-docs, lasuite-drive, lasuite-meet, mailu, matrix-synapse, mattermost-lts, mumble, n8n, plausible, uptime-kuma.

Notable gaps vs nixpkgs native modules: ghost (no nixpkgs module), mailu (no nixpkgs module). The rest have native modules but the container-isolation argument still applies.