224 lines
8.6 KiB
Markdown
224 lines
8.6 KiB
Markdown
# Idea: Co-op Cloud NixOS modules
|
|
|
|
**Status:** research / pre-design. Not started.
|
|
**Origin:** conversation 2026-06-02 between mfowler and the assistant.
|
|
|
|
---
|
|
|
|
## The idea
|
|
|
|
A public Nix flake that lets NixOS operators deploy Co-op Cloud apps declaratively — via git, via
|
|
`nixos-rebuild switch` — instead of via `abra` imperative commands. Each app is a thin NixOS module
|
|
backed by a shared `mkCcApp` factory. Docker Swarm still does the actual container work, so the
|
|
container-isolation story is unchanged; Nix manages *what* is deployed and *at what version*.
|
|
|
|
```nix
|
|
# In a user's NixOS configuration.nix:
|
|
services.coop-cloud.ghost = {
|
|
enable = true;
|
|
domain = "blog.example.org";
|
|
version = "1.3.0+6.42.0-alpine";
|
|
autoUpdate = true;
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## Why this over native NixOS modules
|
|
|
|
NixOS already has native service modules for ~14 of the 18 maintained recipes (matrix-synapse,
|
|
keycloak, nextcloud, jitsi-meet, hedgedoc, immich, etc.). The argument for doing it this way
|
|
instead:
|
|
|
|
**Container isolation is an advantage, not a legacy.** Native NixOS modules run as systemd units
|
|
sharing the host's namespaces. Containers give hard network and filesystem isolation between apps —
|
|
a compromised Ghost instance cannot see other services' sockets. For a single-node multi-app host
|
|
that matters.
|
|
|
|
**The recipes already exist and are maintained.** The Co-op Cloud recipe ecosystem (compose.yml,
|
|
tested version combinations, backup hooks, Traefik wiring) is the real value. Wrapping it in Nix
|
|
preserves that investment rather than reimplementing it as native modules.
|
|
|
|
**The target user.** Most Co-op Cloud operators are single-node. They want the isolation and
|
|
curation of Co-op Cloud recipes but a declarative, git-tracked, idempotent deployment model rather
|
|
than `abra`'s imperative one. This gives them that without giving up containers.
|
|
|
|
---
|
|
|
|
## Existing art
|
|
|
|
- **Arion** (hercules-ci/arion) — runs docker-compose via NixOS modules. Could be an
|
|
implementation backend but requires Docker daemon; adds a layer.
|
|
- **compose2nix** (aksiksi/compose2nix) — converts compose.yml to NixOS systemd-nspawn configs.
|
|
Loses Docker Swarm isolation model.
|
|
- **No existing Co-op Cloud + Nix bridge** was found. The space is open.
|
|
|
|
---
|
|
|
|
## Proof of concept already in cc-ci
|
|
|
|
The cc-ci NixOS config (`cc-ci/nix/modules/`) already implements this pattern for its own internal
|
|
services. The key modules:
|
|
|
|
| Module | Pattern |
|
|
|---|---|
|
|
| `swarm.nix` | Enables Docker, initialises single-node Swarm + `proxy` overlay network as a systemd oneshot |
|
|
| `proxy.nix` | Deploys the Co-op Cloud `traefik` recipe via `abra app deploy`, health-gated |
|
|
| `warm-keycloak.nix` | Deploys keycloak via `abra app deploy`, with snapshot→upgrade→health-gate→rollback |
|
|
| `nightly-sweep.nix` | systemd timer + oneshot that runs nightly upgrades across all warm apps |
|
|
|
|
These are bespoke (hard-coded app names, cc-ci-specific reconcile scripts) but the structure is
|
|
exactly what a general `mkCcApp` would produce. The flake idea is: extract and parameterise this
|
|
pattern, one thin wrapper per recipe.
|
|
|
|
---
|
|
|
|
## Proposed design
|
|
|
|
### `mkCcApp` — the shared factory
|
|
|
|
A function in `lib/mkCcApp.nix` that takes per-recipe parameters and returns a NixOS module
|
|
(attrset of `systemd.services`, `systemd.timers`, `sops.secrets`, etc.):
|
|
|
|
```nix
|
|
mkCcApp {
|
|
recipe = "ghost"; # Co-op Cloud recipe name
|
|
appName = "ghost"; # abra app name (often == recipe)
|
|
domain = "blog.example.org";
|
|
version = "1.3.0+6.42.0-alpine";
|
|
env = { MAIL_TRANSPORT = "SMTP"; }; # extra .env vars
|
|
healthPath = "/ghost/api/admin/site"; # HTTP path for health gate
|
|
healthOk = [ 200 ];
|
|
healthTimeout = 120;
|
|
stateful = true; # snapshot data volumes before upgrade
|
|
autoUpdate = true; # add a nightly timer
|
|
updateSchedule = "03:00:00"; # systemd OnCalendar time
|
|
after = []; # extra systemd ordering deps
|
|
timeout = 600; # deploy timeout in seconds
|
|
}
|
|
```
|
|
|
|
This emits:
|
|
|
|
1. **`systemd.services.cc-app-<appName>`** — a oneshot that:
|
|
- Creates the abra app if it doesn't exist (`abra app new <recipe> <appName>`)
|
|
- Writes env vars into the abra `.env` file
|
|
- Runs `abra app deploy <appName> --no-input`
|
|
- Orders after `swarm-init.service` and `deploy-proxy.service`
|
|
|
|
2. **`systemd.services.cc-app-<appName>-reconcile`** (if `autoUpdate = true`) — a oneshot that
|
|
implements the health-gated upgrade/rollback loop (see below), driven by a timer.
|
|
|
|
3. **`systemd.timers.cc-app-<appName>-reconcile`** (if `autoUpdate = true`) — fires the reconcile
|
|
service on `updateSchedule`.
|
|
|
|
4. **`sops.secrets.*`** entries for any declared secrets, wired to paths the abra env file
|
|
references.
|
|
|
|
### Per-recipe module (thin wrapper)
|
|
|
|
Each recipe becomes a file like `apps/ghost.nix`:
|
|
|
|
```nix
|
|
{ mkCcApp, ... }:
|
|
mkCcApp {
|
|
recipe = "ghost";
|
|
healthPath = "/ghost/api/admin/site";
|
|
stateful = true;
|
|
}
|
|
```
|
|
|
|
An operator's NixOS config imports the recipe module, sets their domain/version, done.
|
|
|
|
### Health-gated upgrade/rollback
|
|
|
|
Modelled directly on `cc-ci/runner/warm_reconcile.py`. The reconcile oneshot:
|
|
|
|
```
|
|
read running version (last-good)
|
|
fetch latest available version via abra
|
|
if running == latest → health-check → update last-good if healthy → exit
|
|
if major-version jump → hold + alert, no deploy
|
|
record last-good = current
|
|
if stateful:
|
|
abra app undeploy
|
|
abra app snapshot (data volumes)
|
|
abra app upgrade → latest
|
|
wait for healthPath to return healthOk (up to healthTimeout)
|
|
if healthy:
|
|
write last-good = latest
|
|
if unhealthy:
|
|
if stateful: abra app restore snapshot
|
|
abra app deploy last-good version
|
|
write alert sentinel to /var/lib/coop-cloud/alerts/<appName>.json
|
|
```
|
|
|
|
For stateless apps (e.g. traefik, custom-html) the snapshot/restore steps are skipped — only the
|
|
version is rolled back.
|
|
|
|
### Swarm bootstrap
|
|
|
|
A `coop-cloud-base.nix` module (imported once by the host, not per-app) handles:
|
|
|
|
- `virtualisation.docker.enable = true`
|
|
- `swarm-init` oneshot (identical to `cc-ci/nix/modules/swarm.nix`)
|
|
- `deploy-proxy` oneshot for the traefik recipe
|
|
|
|
All per-app services order after `deploy-proxy.service`.
|
|
|
|
---
|
|
|
|
## Secrets model
|
|
|
|
The cc-ci approach is sops-nix: secrets live in a git-tracked encrypted `secrets.yaml`, decrypted
|
|
at activation by the host's SSH key (age identity). That's the right model for operator use too —
|
|
no out-of-band secret drops. Each `mkCcApp` call can declare its secrets:
|
|
|
|
```nix
|
|
secrets = {
|
|
db_password = { sopsPath = "ghost_db_password"; };
|
|
smtp_password = { sopsPath = "ghost_smtp_password"; };
|
|
};
|
|
```
|
|
|
|
The factory generates the `sops.secrets.ghost_db_password` entry and wires the decrypted path into
|
|
the abra `.env` file (or a swarm secret, depending on how the recipe reads it).
|
|
|
|
---
|
|
|
|
## Open questions
|
|
|
|
1. **Abra state vs Nix store.** Abra manages its own state in `~/.abra/apps/`. The Nix module
|
|
writes `.env` files there at deploy time. This is slightly un-Nix (mutable state outside the
|
|
store), but it's how `cc-ci` works today and it's fine for single-node operators.
|
|
|
|
2. **Version pinning vs autoUpdate.** If `autoUpdate = false`, the operator pins a version in their
|
|
NixOS config and upgrades by bumping the string and running `nixos-rebuild switch`. Clean model.
|
|
If `autoUpdate = true`, the reconciler diverges from the declared version — the Nix config
|
|
becomes the floor ("at least this version") rather than the exact pin. Worth documenting this
|
|
tension.
|
|
|
|
3. **Recipe flake vs per-operator flake.** Two distribution models:
|
|
- A single public `coop-cloud-nix` flake with all 18 recipes, operators add it as an input.
|
|
- Operators fork/extend. Probably start with option A; per-recipe modules stay thin enough that
|
|
forks are easy.
|
|
|
|
4. **Recipes without a clean health endpoint.** Some apps (mumble, mailu) don't have a simple
|
|
HTTP health path. The `healthPath = null` case would skip the gate and just wait for the swarm
|
|
service to stabilise — weaker but still useful.
|
|
|
|
5. **Relationship to Co-op Cloud upstream.** This is a parallel deployment interface for the same
|
|
recipes, not a fork. Recipe compose.yml files stay upstream. The flake just wraps them. Worth
|
|
coordinating with the Co-op Cloud maintainers rather than building in isolation.
|
|
|
|
---
|
|
|
|
## Recipes to cover (the 18 maintained)
|
|
|
|
bluesky-pds, cryptpad, custom-html, custom-html-tiny, discourse, ghost, hedgedoc, immich,
|
|
keycloak, lasuite-docs, lasuite-drive, lasuite-meet, mailu, matrix-synapse, mattermost-lts,
|
|
mumble, n8n, plausible, uptime-kuma.
|
|
|
|
Notable gaps vs nixpkgs native modules: ghost (no nixpkgs module), mailu (no nixpkgs module).
|
|
The rest have native modules but the container-isolation argument still applies.
|