M1: proxy via real coop-cloud/traefik (abra, wildcard/no-ACME); recipe deploy+teardown; M1 CLAIMED

Orchestrator decision: deploy canonical coop-cloud traefik via abra instead of a
hand-rolled module. abra packaged in Nix (pinned). custom-html deployed over HTTPS
(200) via the gateway and torn down clean. docs/install.md seeded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 22:21:12 +01:00
parent c006083967
commit 12f86fd3fb
10 changed files with 224 additions and 106 deletions

View File

@ -16,11 +16,14 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver
### M1 — Swarm + abra target
- [x] Docker + single-node swarm via Nix (modules/swarm.nix: docker + swarm-init oneshot + `proxy`
overlay net + daily autoprune). Verified: Swarm=active, proxy overlay present.
- [x] Traefik (file provider → /var/lib/ci-certs/live/) as a swarm stack on `proxy`; wildcard cert
served as default cert. Verified end-to-end: gateway 143.244.213.108:443 SNI-passthrough →
cc-ci Traefik terminates TLS w/ `CN=*.ci.commoninternet.net` (LE E8), HTTP 404 (no router yet).
- [ ] abra installed; deploy + tear down a trivial recipe by hand over HTTPS
- [ ] Gate: M1 — recipe reachable over HTTPS at *.ci.commoninternet.net, torn down clean
- [x] Proxy = real coop-cloud/traefik via abra (orchestrator decision, replaces custom traefik.nix):
wildcard/file-provider mode, pre-issued cert as ssl_cert/ssl_key swarm secrets, LETS_ENCRYPT_ENV
empty → no ACME. `scripts/deploy-proxy.sh` (idempotent). Verified E2E via gateway: wildcard cert
served, 0 ACME log lines.
- [x] abra installed (modules/abra.nix, pinned 0.13.0-beta); deployed custom-html by hand over HTTPS
(HTTP 200 nginx page via gateway) and tore it down clean (services/volumes/secrets/containers=0).
- [x] Gate: M1 — recipe reachable over HTTPS at *.ci.commoninternet.net, torn down clean →
CLAIMED 2026-05-26, awaiting Adversary.
### M2 — Drone online
- [ ] Drone server + exec runner via Nix; Gitea OAuth app

View File

@ -10,6 +10,28 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
- **Git credentials:** helper script in repo-local git config sources `/srv/cc-ci/.testenv` at call
time — no secret values stored in `.git/config` or commits.
- **Proxy: real coop-cloud/traefik via abra — SETTLED (M1, orchestrator decision 2026-05-26,
overrides plan §3 `modules/traefik.nix`).** Instead of a hand-rolled Traefik we deploy the
canonical Co-op Cloud `traefik` recipe via abra in **wildcard / file-provider mode**, for
end-to-end fidelity (canonical `web`/`web-secure` entrypoints + proxy/swarm conventions every
recipe expects — this also fixed an entrypoint-name mismatch the custom build hit). NO ACME, NO
DNS token on the box:
- `WILDCARDS_ENABLED=1` + append `compose.wildcard.yml`; the pre-issued cert is fed as the
`ssl_cert`/`ssl_key` swarm secrets (v1) via `abra app secret insert … -f` from
`/var/lib/ci-certs/live/{fullchain,privkey}.pem`. The file provider serves it (`tls.certificates`).
- `LETS_ENCRYPT_ENV=` **empty** on the traefik app *and* on every test app → the recipe's
`tls.certresolver=${LETS_ENCRYPT_ENV}` label resolves to no resolver → routers serve the
wildcard via SNI from the file provider, ACME never fires. (Verified: 0 ACME log lines.)
- Reproducibility (D8): `scripts/deploy-proxy.sh` is idempotent (ensures local abra server, fetches
recipe, writes the wildcard/no-ACME env, inserts cert secrets, deploys). Documented in
`docs/install.md`. The custom `modules/traefik.nix` was removed; `modules/swarm.nix` keeps swarm
init + `proxy` net + firewall 80/443.
- **Renewal (manual, ~90d):** operator re-issues the wildcard at the same paths, then
`abra app secret rm traefik.ci.commoninternet.net ssl_cert -n` + re-insert at a new version (bump
`SECRET_WILDCARD_CERT_VERSION`) and redeploy. (Documented in docs/secrets.md at M7.)
- **abra teardown syntax** (for harness, §4.3): `abra app undeploy <d> -n`,
`abra app volume remove <d> -f -n`, `abra app secret remove <d> --all -n`. None take `--chaos`.
## Open (defaults from §8, to confirm as reality lands)
- **Deploy mechanism — SETTLED (M0):** `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run *on

View File

@ -146,3 +146,43 @@ firewall 80/443 (gateway forwards over enp5s0).
**Next:** install abra (M1 last task), `abra app new` a trivial recipe (custom-html) → deploy →
reach over HTTPS at <app>.ci.commoninternet.net → teardown leaving no volumes. That completes M1
→ CLAIM M1 gate.
## 2026-05-26 — M1: proxy pivot to real coop-cloud/traefik via abra; recipe deploy/teardown (M1 CLAIMED)
**Orchestrator decision (mid-M1):** replace the hand-rolled Traefik with the canonical Co-op Cloud
`traefik` recipe deployed via abra, wildcard/file-provider mode, no ACME/token. Removed custom
`modules/traefik.nix`; moved firewall 80/443 into `modules/swarm.nix`. Recorded in DECISIONS.md.
**Why the pivot also fixed a real bug:** my custom Traefik used entrypoint `websecure`; coop-cloud
recipes label `entrypoints=web-secure`. While chasing that I also hit a sharp **systemd-run gotcha**:
`systemd-run … nixos-rebuild switch --flake .#cc-ci` runs with cwd `/`, so `.#` → `/` → "could not
find a flake.nix"; the switch silently failed while a post-`--collect` `systemctl show` returned a
stale `Result=success`. Fix: always use the **absolute** flake path `/root/cc-ci#cc-ci`, and read the
result before resetting. (rebuild6/7 had silently not applied; rebuild25 used the absolute path.)
**abra packaged** (modules/abra.nix): release binary 0.13.0-beta, pinned by sha256, autoPatchelf'd.
`abra --version` → `0.13.0-beta-06a57de`.
**scripts/deploy-proxy.sh** (idempotent, pure-bash — host has no python3): ensure local abra server,
fetch traefik, write wildcard/no-ACME env (`WILDCARDS_ENABLED=1`, `SECRET_WILDCARD_*_VERSION=v1`,
`COMPOSE_FILE=compose.yml:compose.wildcard.yml`, `LETS_ENCRYPT_ENV=` empty), insert cert secrets via
`abra app secret insert … -f` from /var/lib/ci-certs/live, deploy. Bugs fixed en route: multi-line
PEM must use `-f` (not arg); secret-presence must check `docker secret ls` (abra's recipe list always
shows the name with `created on server:false`).
**Traefik deploy:** `abra app deploy` → `deploy succeeded 🟢` (traefik v3.6.15 + socket-proxy).
Verify: `docker service ls` → app+socket-proxy 1/1; via gateway `curl --resolve probe.*:443:
143.244.213.108` → `CN=*.ci.commoninternet.net` (LE E8); **0 ACME log lines**.
**M1 gate (recipe over HTTPS + teardown):**
- `abra app new custom-html -s default -D cchtml1.ci.commoninternet.net -S -n` then set
`LETS_ENCRYPT_ENV=` and `abra app deploy -n -C` → `🟢` (nginx 1.29.0).
- `curl -ks --resolve cchtml1.ci.commoninternet.net:443:143.244.213.108 https://…/` →
`http_code=200 size=615`, served the nginx welcome page over HTTPS with the wildcard cert.
- Teardown: `abra app undeploy -n` → 🟢; `abra app volume remove -f -n` → "1 volumes removed";
leak check → services 0 / volumes 0 / secrets 0 / containers 0. **Clean.**
- Correct teardown syntax confirmed: `secret remove <d> --all -n` (not `--all-secrets`).
**docs/install.md** seeded (flake apply + deploy-proxy + verify). M1 gate CLAIMED in STATUS.md.
**Next:** M2 — Drone server + exec runner via Nix, Gitea OAuth app, hello-world .drone.yml green.

View File

@ -1,9 +1,8 @@
# STATUS — cc-ci Builder
**Phase:** M0 → M1. M0 complete & CLAIMED; starting M1 (swarm + Traefik + abra) while awaiting verdict.
**In-flight:** M1abra install + by-hand HTTPS deploy/teardown of a trivial recipe (M1 gate).
Swarm + Traefik (wildcard cert via gateway passthrough) both up and verified.
**Last updated:** 2026-05-26 (M1 Traefik up, HTTPS path proven)
**Phase:** M1 complete & CLAIMED starting M2 (Drone). M0 PASS (Adversary @21:35Z). M1 awaiting verdict.
**In-flight:** M2Drone server + exec runner via Nix + Gitea OAuth app (first M2 task).
**Last updated:** 2026-05-26 (M1 claimed)
## Gates
- **Gate: M0 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: flake rebuilds cc-ci from repo
@ -11,6 +10,13 @@ Swarm + Traefik (wildcard cert via gateway passthrough) both up and verified.
`/run/secrets/test_secret` (0400 root, value = generated `cc-ci-m0-…`). Repro: clone repo, sync to
host, `nixos-rebuild switch --flake .#cc-ci`, then `systemctl is-system-running` + check the secret.
Per §6.1 I will NOT advance past this gate to M2; M1 work proceeds as independent unblocked work.
**M0 PASS** logged by Adversary in REVIEW.md @2026-05-26T21:35Z (cold verify, leak probe clean).
- **Gate: M1 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: Docker single-node swarm +
`proxy` overlay; real coop-cloud/traefik via abra (wildcard/file-provider, no ACME); custom-html
deployed by hand → HTTP 200 over HTTPS via gateway at cchtml1.ci.commoninternet.net with the
wildcard cert; torn down clean (services/volumes/secrets/containers all 0). Repro:
`scripts/deploy-proxy.sh` + `abra app new/deploy/undeploy`. Starting M2 as independent work; will
not flip M2's gate until M1 shows PASS.
## Blocked
- (none)

54
docs/install.md Normal file
View File

@ -0,0 +1,54 @@
# Installing cc-ci from scratch
> WORK IN PROGRESS — grows with each milestone; the full from-scratch rebuild is verified at M9 (D8).
cc-ci is declared as a NixOS flake (this repo) plus a reproducible proxy-deploy step. Target:
a NixOS 24.11 host reachable as `cc-ci` over SSH (root), with the operator preconditions in place.
## Operator preconditions (class-A1, see DECISIONS.md / docs/baseline.md)
- Wildcard TLS cert at `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}`
(`*.ci.commoninternet.net` + `ci.commoninternet.net`). **Renewed out-of-band; never ACME here.**
- DNS: `*.ci.commoninternet.net` (+ bare) → the **gateway**, which TLS-passthroughs (SNI) to cc-ci.
- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `modules/swarm.nix`).
## 1. Apply the NixOS flake
The flake (`flake.nix`, `hosts/cc-ci/`, `modules/`) declares: base host, sops-nix (decrypts via the
host SSH key), Docker + single-node Swarm + the `proxy` overlay (`modules/swarm.nix`), and abra
(`modules/abra.nix`).
```sh
# materialise the repo on the host (the build runs on cc-ci itself — see DECISIONS.md deploy mech)
# e.g. git clone <repo> /root/cc-ci (or sync it)
nixos-rebuild switch --flake /root/cc-ci#cc-ci
# verify
systemctl is-system-running # -> running
docker info --format '{{.Swarm.LocalNodeState}}' # -> active
docker network ls | grep proxy # -> proxy ... overlay swarm
```
> Tip: when driving the switch over an SSH session that rides Tailscale, run it as a detached unit so
> it survives a momentary drop, and **use the absolute flake path** (systemd units run with cwd `/`):
> `systemd-run --unit=ccci-sw --property=Type=oneshot nixos-rebuild switch --flake /root/cc-ci#cc-ci`
## 2. Deploy the reverse proxy (coop-cloud traefik, wildcard/file-provider, no ACME)
```sh
bash /root/cc-ci/scripts/deploy-proxy.sh
```
This idempotently deploys the canonical Co-op Cloud `traefik` recipe via abra in wildcard mode,
serving the pre-issued cert as the `ssl_cert`/`ssl_key` swarm secrets, with `LETS_ENCRYPT_ENV` empty
so no ACME ever runs (see DECISIONS.md "Proxy: real coop-cloud/traefik via abra"). Verify:
```sh
docker service ls | grep traefik # app + socket-proxy, 1/1
# wildcard cert served end-to-end via the gateway:
curl -ksv --resolve probe.ci.commoninternet.net:443:<gateway-ip> https://probe.ci.commoninternet.net/ \
2>&1 | grep -E 'subject:|HTTP/' # -> CN=*.ci.commoninternet.net, HTTP 404 (no app router yet)
```
## 3. (later milestones) Drone, comment-bridge, dashboard, recipe enrollment
See `docs/enroll-recipe.md` (D5), `docs/secrets.md` (D6), `docs/runbook.md`. Added as those land.

View File

@ -7,7 +7,7 @@
./hardware.nix
../../modules/secrets.nix
../../modules/swarm.nix
../../modules/traefik.nix
../../modules/abra.nix
];
# --- Tailscale (ACCESS-CRITICAL: do not break, this is the only route in) ---

25
modules/abra.nix Normal file
View File

@ -0,0 +1,25 @@
# abra — the Co-op Cloud CLI used by the harness to deploy/upgrade/backup recipes (M1+).
# Packaged from the upstream release binary, pinned by version + hash for reproducibility (D8).
{ pkgs, ... }:
let
abra = pkgs.stdenv.mkDerivation rec {
pname = "abra";
version = "0.13.0-beta";
src = pkgs.fetchurl {
url = "https://git.coopcloud.tech/toolshed/abra/releases/download/${version}/abra_${version}_linux_amd64.tar.gz";
sha256 = "12csk6wp1pk9cspzqfl4a6h5jdz8p055sf0ggxw9k7ljhpd5qvc6";
};
# Tarball has files at the root (LICENSE, README.md, abra), no common subdir.
sourceRoot = ".";
nativeBuildInputs = [ pkgs.autoPatchelfHook ];
buildInputs = [ pkgs.stdenv.cc.cc.lib ];
installPhase = ''
runHook preInstall
install -Dm755 abra "$out/bin/abra"
runHook postInstall
'';
};
in
{
environment.systemPackages = [ abra ];
}

View File

@ -15,6 +15,10 @@
environment.systemPackages = [ pkgs.docker ];
# Gateway forwards 80/443 to cc-ci over the public interface (enp5s0); the coop-cloud
# traefik stack (deployed via abra, see docs/install.md) publishes these ports.
networking.firewall.allowedTCPPorts = [ 80 443 ];
# Bring up a single-node swarm + the shared `proxy` overlay network. Idempotent:
# safe to re-run every boot/rebuild. advertise-addr 127.0.0.1 is fine for a lone node.
systemd.services.swarm-init = {

View File

@ -1,96 +0,0 @@
# Traefik for the test swarm (M1). Runs as a swarm service on the `proxy` overlay so it can
# reach recipe service VIPs (a host process couldn't). TLS terminates here using the operator's
# pre-issued wildcard cert via the file provider — NO ACME for commoninternet.net (§4.0).
# Recipe routers only need `traefik.enable=true` + a Host(...) rule + tls=true; the default
# certificate (the wildcard) is served for every *.ci.commoninternet.net host.
{ pkgs, ... }:
let
# Static config. Docker *Swarm* provider (v3) + file provider for the cert.
staticCfg = pkgs.writeText "traefik.yml" ''
entryPoints:
web:
address: ":80"
websecure:
address: ":443"
providers:
swarm:
endpoint: "unix:///var/run/docker.sock"
exposedByDefault: false
network: proxy
file:
directory: /etc/traefik/dynamic
watch: true
log:
level: INFO
accessLog: {}
api:
dashboard: false
ping: {}
'';
# Dynamic config: serve the pre-issued wildcard as the DEFAULT certificate, so any
# *.ci.commoninternet.net router with tls=true is covered without a cert resolver.
certsCfg = pkgs.writeText "certs.yml" ''
tls:
stores:
default:
defaultCertificate:
certFile: /var/lib/ci-certs/live/fullchain.pem
keyFile: /var/lib/ci-certs/live/privkey.pem
certificates:
- certFile: /var/lib/ci-certs/live/fullchain.pem
keyFile: /var/lib/ci-certs/live/privkey.pem
'';
stack = pkgs.writeText "traefik-stack.yml" ''
version: "3.8"
services:
traefik:
image: traefik:v3.3
ports:
- target: 80
published: 80
mode: host
- target: 443
published: 443
mode: host
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /var/lib/ci-certs/live:/var/lib/ci-certs/live:ro
- ${staticCfg}:/etc/traefik/traefik.yml:ro
- ${certsCfg}:/etc/traefik/dynamic/certs.yml:ro
networks:
- proxy
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: any
networks:
proxy:
external: true
'';
in
{
# Gateway forwards 80/443 to cc-ci over the public interface (enp5s0), so open them.
networking.firewall.allowedTCPPorts = [ 80 443 ];
systemd.services.traefik-deploy = {
description = "Deploy the Traefik swarm stack";
after = [ "swarm-init.service" ];
requires = [ "swarm-init.service" ];
wantedBy = [ "multi-user.target" ];
path = [ pkgs.docker ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
};
script = ''
set -eu
docker stack deploy --detach=true -c ${stack} traefik
'';
};
}

60
scripts/deploy-proxy.sh Executable file
View File

@ -0,0 +1,60 @@
#!/usr/bin/env bash
# Reproducibly deploy the canonical Co-op Cloud `traefik` recipe as cc-ci's reverse proxy,
# in wildcard / file-provider mode — serving the operator's pre-issued wildcard cert, with
# NO ACME and NO DNS token on the box (see DECISIONS.md "Proxy: real coop-cloud/traefik").
#
# Idempotent: safe to re-run. Run as root on cc-ci (abra drives the local Docker swarm).
# ssh cc-ci 'bash /root/cc-ci/scripts/deploy-proxy.sh'
#
# Prereqs (declared in the flake): docker + single-node swarm + `proxy` overlay (modules/swarm.nix),
# abra (modules/abra.nix), and the wildcard cert at /var/lib/ci-certs/live/ (operator-provided).
set -euo pipefail
PROXY_DOMAIN="${PROXY_DOMAIN:-traefik.ci.commoninternet.net}"
CERT_DIR="${CERT_DIR:-/var/lib/ci-certs/live}"
ENV_FILE="$HOME/.abra/servers/default/${PROXY_DOMAIN}.env"
export PATH=/run/current-system/sw/bin:"$PATH"
echo "==> ensure local abra server"
abra server ls -m -n >/dev/null 2>&1 || abra server add --local -n || true
echo "==> fetch traefik recipe"
abra recipe fetch traefik -n >/dev/null
if [ ! -f "$ENV_FILE" ]; then
echo "==> create traefik app ($PROXY_DOMAIN)"
abra app new traefik -s default -D "$PROXY_DOMAIN" -n
fi
echo "==> configure wildcard / no-ACME env"
# Set each var deterministically: drop any existing (commented or not) line, then append.
# Empty LETS_ENCRYPT_ENV => the traefik router uses no cert resolver => no ACME ever fires.
set_env() {
local key="$1" val="$2"
sed -i -E "/^[[:space:]]*#?[[:space:]]*${key}=/d" "$ENV_FILE"
printf '%s=%s\n' "$key" "$val" >> "$ENV_FILE"
}
set_env LETS_ENCRYPT_ENV ""
set_env WILDCARDS_ENABLED "1"
set_env SECRET_WILDCARD_CERT_VERSION "v1"
set_env SECRET_WILDCARD_KEY_VERSION "v1"
set_env COMPOSE_FILE '"compose.yml:compose.wildcard.yml"'
echo " env written: $ENV_FILE"
echo "==> insert wildcard cert secrets (v1) from $CERT_DIR (idempotent)"
# Check the actual swarm secret (generated name ${STACK_NAME}_<name>_v1), not abra's
# recipe-defined list (which always shows the names with "created on server":"false").
have_secret() { docker secret ls --format '{{.Name}}' | grep -q "_${1}_v1\$"; }
# Insert from file (-f) so the multi-line PEM is read verbatim, not arg-parsed.
if ! have_secret ssl_cert; then
abra app secret insert "$PROXY_DOMAIN" ssl_cert v1 "$CERT_DIR/fullchain.pem" -f -n
fi
if ! have_secret ssl_key; then
abra app secret insert "$PROXY_DOMAIN" ssl_key v1 "$CERT_DIR/privkey.pem" -f -n
fi
echo "==> deploy traefik"
abra app deploy "$PROXY_DOMAIN" -n -C
echo "==> done"