Some checks failed
continuous-integration/drone/push Build is failing
Break the deploy-proxy ↔ dashboard health-gate circular dependency (Adversary A1, pvfix): - runner/warm_reconcile.py: remove health_domain override (was ci.commoninternet.net, the dashboard). Change health_path from / to /api/version. The probe now uses traefik.ci.commoninternet.net/api/version — traefik's own API, no backend/dashboard dep. - nix/modules/proxy.nix: update comment to reflect new health probe. - machine-docs/DECISIONS.md: pxgate fix logged (supersedes pvfix manual workaround). - machine-docs/DEFERRED.md: 2026-06-13 circular-dependency entry closed. - Consumed BUILDER-INBOX.md (Adversary orientation msg). Controlled reproduction (dashboard swarm scaled to 0): OLD probe (ci.commoninternet.net): HTTP 404 ← gate would loop → timeout NEW probe (traefik.../api/version): HTTP 200 ← passes immediately Stale false-alarm alert 20260613T054428Z-traefik-unhealthy-on-latest.json cleared on host. No After=deploy-proxy consumers changed (ordering preserved). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
48 lines
2.5 KiB
Nix
48 lines
2.5 KiB
Nix
# Reverse proxy = the canonical Co-op Cloud `traefik` recipe, deployed via abra in
|
|
# wildcard / file-provider mode (wildcard cert as ssl_cert/ssl_key swarm secrets,
|
|
# LETS_ENCRYPT_ENV empty => NO ACME, no DNS token). See DECISIONS.md "Proxy: real coop-cloud/traefik".
|
|
# Phase-1c: the cert at CERT_DIR is sops-decrypted from git (cc-ci-secrets) at activation
|
|
# (modules/secrets.nix wildcard_cert/wildcard_key), NOT an out-of-band operator file drop.
|
|
#
|
|
# Phase-2w / WC1.1: traefik is now UNPINNED + health-gated like keycloak — the deploy is driven by
|
|
# the shared `runner/warm_reconcile.py traefik` (STATELESS = version-rollback-only, NO snapshot):
|
|
# record last-good version → deploy latest tag → health-gate (traefik.ci.commoninternet.net/api/version
|
|
# returns 200 — traefik's own API, no backend dep) → healthy commits last-good / unhealthy rolls back
|
|
# to last-good + alert. Phase-pxgate: changed from ci.commoninternet.net (dashboard) to avoid the
|
|
# cold-boot deadlock (deploy-dashboard is After=deploy-proxy; A1 fix). traefik's wildcard-cert/file-
|
|
# provider config (ssl_cert/ssl_key secrets, WILDCARDS_ENABLED, COMPOSE_FILE) is preserved EXACTLY by
|
|
# the spec's `setup` (warm_reconcile._traefik_setup). The runner/ tree is copied into the nix store →
|
|
# D8-clean; recipe fetched at runtime → closure stable.
|
|
#
|
|
# Idempotent-RECONCILE systemd oneshot (unchanged unit name `deploy-proxy` — other modules order
|
|
# after it): converges every activation/boot, self-healing drift. No run-once sentinel.
|
|
{ pkgs, ... }:
|
|
let
|
|
runnerSrc = ../../runner;
|
|
reconcile = pkgs.writeShellApplication {
|
|
name = "cc-ci-reconcile-proxy";
|
|
runtimeInputs = with pkgs; [ abra docker git curl jq gnused gnugrep gnutar coreutils ];
|
|
text = ''
|
|
export HOME=/root
|
|
exec ${pkgs.python3}/bin/python3 ${runnerSrc}/warm_reconcile.py traefik
|
|
'';
|
|
};
|
|
in
|
|
{
|
|
systemd.services.deploy-proxy = {
|
|
description = "Reconcile the Co-op Cloud traefik proxy (wildcard/no-ACME, health-gated) via abra";
|
|
after = [ "swarm-init.service" "docker.service" "network-online.target" ];
|
|
requires = [ "swarm-init.service" "docker.service" ];
|
|
wants = [ "network-online.target" ];
|
|
wantedBy = [ "multi-user.target" ];
|
|
environment.HOME = "/root";
|
|
serviceConfig = {
|
|
Type = "oneshot";
|
|
RemainAfterExit = true;
|
|
# Generous: a traefik (re)deploy + health-gate; rollback on an unhealthy upgrade.
|
|
TimeoutStartSec = "900";
|
|
ExecStart = "${reconcile}/bin/cc-ci-reconcile-proxy";
|
|
};
|
|
};
|
|
}
|