docs/secrets.md documents the 3 secret classes (A1 external, A2 internal-generated, B recipe-app), the sops-nix decryption chain, and rotation procedures for each (cert version bump, sops re-encrypt + swarm-secret version bump, recipe-app ephemeral). run_recipe_ci streams each stage's output through a redaction filter that masks any /run/secrets/* value (>=8 chars) before it reaches Drone logs — belt-and-suspenders over 'harness never prints secrets + abra doesn't echo'. Live streaming + exit code preserved (locally tested). Recipe-ci clones cc-ci fresh per build, so this applies next run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.9 KiB
Secrets model & rotation (D6)
cc-ci handles three classes of secret in deliberately different ways (plan §4.4). No plaintext secret ever lives in git, logs, or the results UI — only sops-encrypted ciphertext and references-by-location. The Adversary's leak test greps published Drone logs + the dashboard for known secret patterns and any generated app password; it must find nothing.
Decryption chain (sops-nix)
- Infra secrets live sops-encrypted in
secrets/secrets.yaml(committed)./.sops.yamllists two age recipients: the host key (age1h90ut…, derived from cc-ci's SSH host key via ssh-to-age) and an off-box master recovery key (age1cmk26t…; its private half is kept only at/srv/cc-ci/.sops/master-age.txton the build host, never in the repo). - On cc-ci,
sops-nixdecrypts at activation using the host's ed25519 SSH host key as the age identity (sops.age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]), materialising each secret at/run/secrets/<name>(mode 0400 root). No extra key file to manage on the box. - Swarm services don't read
/run/secretsdirectly; the reconcile oneshots copy each into a docker swarm secret (docker secret create … /run/secrets/<name>) which the service mounts. abra-managed apps useabra app secret ….
Class A1 — external inputs (operator-provided; the loop CANNOT create them)
| Secret | Location | Rotation |
|---|---|---|
| Tailscale auth key | /srv/cc-ci/.testenv (sandbox) |
operator re-issues; re-run tailscale up |
| cc-ci SSH root key | ~/.ssh/cc-ci-root-ed25519 (sandbox) |
operator re-keys authorized_keys |
| Gitea bot creds | /srv/cc-ci/.testenv (GITEA_USERNAME/PASSWORD) |
operator resets; update .testenv |
| Wildcard TLS cert | cc-ci /var/lib/ci-certs/live/{fullchain,privkey}.pem |
operator re-issues (LE DNS-01/Gandi, ~90d, next ~2026-08-24) — see below |
| Registry pull creds (if needed) | sops secrets/secrets.yaml |
operator-provided |
A missing/invalid A1 secret is a ## Blocked condition — the agent never invents or works around it,
and never runs ACME/DNS-01 for commoninternet.net.
Wildcard cert rotation (manual, operator + agent):
- Operator re-issues the SAN cert (
*.ci.commoninternet.net+ci.commoninternet.net) out-of-band and writes it to/var/lib/ci-certs/live/{fullchain,privkey}.pemon cc-ci. - Bump
SECRET_WILDCARD_CERT_VERSION/SECRET_WILDCARD_KEY_VERSIONon the traefik app env (modules/proxy.nix) so the next reconcile inserts the new cert as a fresh swarm secret version. nixos-rebuild switch(re-runs the proxy reconcile → re-inserts + redeploys traefik). One cert covers every per-run subdomain (SNI), so no per-app cert work.
Class A2 — internal infra secrets (the loop GENERATES + manages; never a blocker)
All sops-encrypted in secrets/secrets.yaml, decrypted to /run/secrets/<name>:
| Secret | Used by | Generate |
|---|---|---|
drone_rpc_secret |
Drone server ↔ exec runner RPC | openssl rand -hex 32 |
drone_gitea_client_secret |
Drone↔Gitea OAuth app | from the Gitea OAuth app creation |
bridge_webhook_hmac |
comment-bridge webhook HMAC | openssl rand -hex 32 |
bridge_drone_token |
bridge + dashboard → Drone API | minted Drone user token |
bridge_gitea_token |
bridge → Gitea API (poll/comment) | minted Gitea token (bot) |
restic_password |
backup-bot-two restic repo | abra-generated (abra app secret generate, kept stable across reconciles) |
Rotate an A2 secret (e.g. bridge_webhook_hmac):
set -a; . /srv/cc-ci/.testenv; set +a(for the editor key, not echoed).- In the repo:
sops secrets/secrets.yaml→ replace the value (oropenssl rand -hex 32 | …), save. (Re-encrypts to both recipients automatically per.sops.yaml.) - For swarm-secret-backed values, bump the consuming app's secret version so the reconcile
re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone
RPC_SECRET_VERSIONv1→v2 (modules/drone.nix), bridgecc_ci_bridge_*_v<n>(modules/bridge.nix). Update both ends (server + runner sharedrone_rpc_secret). git commit+ push, sync to host,nixos-rebuild switch→ reconcile re-inserts + redeploys.- Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers).
Re-key sops recipients (e.g. cc-ci host re-provisioned → new host age key): add the new
age1… to /.sops.yaml, sops updatekeys secrets/secrets.yaml (run from the build host, which
holds the master identity), commit. The master recovery key lets you re-encrypt even if the host key
is lost.
Class B — recipe app secrets (the harness generates per run; NEVER a blocker)
- Generated at install:
abra app secret generate <app> --all(+ any deterministic test fixtures the harness chooses) when the recipe deploys. - Persisted for the run: the same generated values survive install → upgrade → backup/restore
because abra/swarm holds them keyed by the per-run app name (
<recipe[:4]>-<6hex>); the harness re-reads them between stages. Concurrent runs are isolated by the unique per-run app name (and MAX_TESTS=1 means no concurrency anyway). - Destroyed at teardown: the same teardown that removes the app/volumes runs
abra app secret remove <app> --all(+ docker-secret cleanup by stack name as a fallback). Nothing generated for a run outlives it.
No-plaintext guarantees
- Secrets are referenced by
/run/secrets/<name>path or read inline (e.g.PGPASSWORD=$(cat /run/secrets/…)inside the app container), never printed by the harness. - abra does not echo generated secret values; reconciles redirect secret-generate stdout to
/dev/null. - The results dashboard renders run status only (no log bodies); per-run logs live in Drone's UI.
- Adversary leak test: greps published Drone logs + the dashboard for the known infra-secret values and any generated app password → must be zero. (Baseline + recipe-CI log scans: clean.)