Files
cc-ci/docs/secrets.md
autonomic-bot fc07d15800
All checks were successful
continuous-integration/drone/push Build is passing
M7/D6: secrets rotation doc + log redaction filter
docs/secrets.md documents the 3 secret classes (A1 external, A2 internal-generated, B recipe-app),
the sops-nix decryption chain, and rotation procedures for each (cert version bump, sops re-encrypt +
swarm-secret version bump, recipe-app ephemeral). run_recipe_ci streams each stage's output through a
redaction filter that masks any /run/secrets/* value (>=8 chars) before it reaches Drone logs —
belt-and-suspenders over 'harness never prints secrets + abra doesn't echo'. Live streaming + exit
code preserved (locally tested). Recipe-ci clones cc-ci fresh per build, so this applies next run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 07:44:53 +01:00

5.9 KiB

Secrets model & rotation (D6)

cc-ci handles three classes of secret in deliberately different ways (plan §4.4). No plaintext secret ever lives in git, logs, or the results UI — only sops-encrypted ciphertext and references-by-location. The Adversary's leak test greps published Drone logs + the dashboard for known secret patterns and any generated app password; it must find nothing.

Decryption chain (sops-nix)

  • Infra secrets live sops-encrypted in secrets/secrets.yaml (committed). /.sops.yaml lists two age recipients: the host key (age1h90ut…, derived from cc-ci's SSH host key via ssh-to-age) and an off-box master recovery key (age1cmk26t…; its private half is kept only at /srv/cc-ci/.sops/master-age.txt on the build host, never in the repo).
  • On cc-ci, sops-nix decrypts at activation using the host's ed25519 SSH host key as the age identity (sops.age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]), materialising each secret at /run/secrets/<name> (mode 0400 root). No extra key file to manage on the box.
  • Swarm services don't read /run/secrets directly; the reconcile oneshots copy each into a docker swarm secret (docker secret create … /run/secrets/<name>) which the service mounts. abra-managed apps use abra app secret ….

Class A1 — external inputs (operator-provided; the loop CANNOT create them)

Secret Location Rotation
Tailscale auth key /srv/cc-ci/.testenv (sandbox) operator re-issues; re-run tailscale up
cc-ci SSH root key ~/.ssh/cc-ci-root-ed25519 (sandbox) operator re-keys authorized_keys
Gitea bot creds /srv/cc-ci/.testenv (GITEA_USERNAME/PASSWORD) operator resets; update .testenv
Wildcard TLS cert cc-ci /var/lib/ci-certs/live/{fullchain,privkey}.pem operator re-issues (LE DNS-01/Gandi, ~90d, next ~2026-08-24) — see below
Registry pull creds (if needed) sops secrets/secrets.yaml operator-provided

A missing/invalid A1 secret is a ## Blocked condition — the agent never invents or works around it, and never runs ACME/DNS-01 for commoninternet.net.

Wildcard cert rotation (manual, operator + agent):

  1. Operator re-issues the SAN cert (*.ci.commoninternet.net + ci.commoninternet.net) out-of-band and writes it to /var/lib/ci-certs/live/{fullchain,privkey}.pem on cc-ci.
  2. Bump SECRET_WILDCARD_CERT_VERSION / SECRET_WILDCARD_KEY_VERSION on the traefik app env (modules/proxy.nix) so the next reconcile inserts the new cert as a fresh swarm secret version.
  3. nixos-rebuild switch (re-runs the proxy reconcile → re-inserts + redeploys traefik). One cert covers every per-run subdomain (SNI), so no per-app cert work.

Class A2 — internal infra secrets (the loop GENERATES + manages; never a blocker)

All sops-encrypted in secrets/secrets.yaml, decrypted to /run/secrets/<name>:

Secret Used by Generate
drone_rpc_secret Drone server ↔ exec runner RPC openssl rand -hex 32
drone_gitea_client_secret Drone↔Gitea OAuth app from the Gitea OAuth app creation
bridge_webhook_hmac comment-bridge webhook HMAC openssl rand -hex 32
bridge_drone_token bridge + dashboard → Drone API minted Drone user token
bridge_gitea_token bridge → Gitea API (poll/comment) minted Gitea token (bot)
restic_password backup-bot-two restic repo abra-generated (abra app secret generate, kept stable across reconciles)

Rotate an A2 secret (e.g. bridge_webhook_hmac):

  1. set -a; . /srv/cc-ci/.testenv; set +a (for the editor key, not echoed).
  2. In the repo: sops secrets/secrets.yaml → replace the value (or openssl rand -hex 32 | …), save. (Re-encrypts to both recipients automatically per .sops.yaml.)
  3. For swarm-secret-backed values, bump the consuming app's secret version so the reconcile re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone RPC_SECRET_VERSION v1→v2 (modules/drone.nix), bridge cc_ci_bridge_*_v<n> (modules/bridge.nix). Update both ends (server + runner share drone_rpc_secret).
  4. git commit + push, sync to host, nixos-rebuild switch → reconcile re-inserts + redeploys.
  5. Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers).

Re-key sops recipients (e.g. cc-ci host re-provisioned → new host age key): add the new age1… to /.sops.yaml, sops updatekeys secrets/secrets.yaml (run from the build host, which holds the master identity), commit. The master recovery key lets you re-encrypt even if the host key is lost.

Class B — recipe app secrets (the harness generates per run; NEVER a blocker)

  • Generated at install: abra app secret generate <app> --all (+ any deterministic test fixtures the harness chooses) when the recipe deploys.
  • Persisted for the run: the same generated values survive install → upgrade → backup/restore because abra/swarm holds them keyed by the per-run app name (<recipe[:4]>-<6hex>); the harness re-reads them between stages. Concurrent runs are isolated by the unique per-run app name (and MAX_TESTS=1 means no concurrency anyway).
  • Destroyed at teardown: the same teardown that removes the app/volumes runs abra app secret remove <app> --all (+ docker-secret cleanup by stack name as a fallback). Nothing generated for a run outlives it.

No-plaintext guarantees

  • Secrets are referenced by /run/secrets/<name> path or read inline (e.g. PGPASSWORD=$(cat /run/secrets/…) inside the app container), never printed by the harness.
  • abra does not echo generated secret values; reconciles redirect secret-generate stdout to /dev/null.
  • The results dashboard renders run status only (no log bodies); per-run logs live in Drone's UI.
  • Adversary leak test: greps published Drone logs + the dashboard for the known infra-secret values and any generated app password → must be zero. (Baseline + recipe-CI log scans: clean.)