All checks were successful
continuous-integration/drone/push Build is passing
docs/secrets.md documents the 3 secret classes (A1 external, A2 internal-generated, B recipe-app), the sops-nix decryption chain, and rotation procedures for each (cert version bump, sops re-encrypt + swarm-secret version bump, recipe-app ephemeral). run_recipe_ci streams each stage's output through a redaction filter that masks any /run/secrets/* value (>=8 chars) before it reaches Drone logs — belt-and-suspenders over 'harness never prints secrets + abra doesn't echo'. Live streaming + exit code preserved (locally tested). Recipe-ci clones cc-ci fresh per build, so this applies next run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
92 lines
5.9 KiB
Markdown
92 lines
5.9 KiB
Markdown
# Secrets model & rotation (D6)
|
|
|
|
cc-ci handles three classes of secret in deliberately different ways (plan §4.4). **No plaintext
|
|
secret ever lives in git, logs, or the results UI** — only sops-encrypted ciphertext and
|
|
references-by-location. The Adversary's leak test greps published Drone logs + the dashboard for
|
|
known secret patterns and any generated app password; it must find nothing.
|
|
|
|
## Decryption chain (sops-nix)
|
|
|
|
- Infra secrets live sops-encrypted in `secrets/secrets.yaml` (committed). `/.sops.yaml` lists two
|
|
age recipients: the **host key** (`age1h90ut…`, derived from cc-ci's SSH host key via ssh-to-age)
|
|
and an off-box **master recovery key** (`age1cmk26t…`; its private half is kept only at
|
|
`/srv/cc-ci/.sops/master-age.txt` on the build host, never in the repo).
|
|
- On cc-ci, `sops-nix` decrypts at activation using the host's ed25519 SSH host key as the age
|
|
identity (`sops.age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]`), materialising each secret at
|
|
`/run/secrets/<name>` (mode 0400 root). No extra key file to manage on the box.
|
|
- Swarm services don't read `/run/secrets` directly; the reconcile oneshots copy each into a **docker
|
|
swarm secret** (`docker secret create … /run/secrets/<name>`) which the service mounts. abra-managed
|
|
apps use `abra app secret …`.
|
|
|
|
## Class A1 — external inputs (operator-provided; the loop CANNOT create them)
|
|
|
|
| Secret | Location | Rotation |
|
|
|---|---|---|
|
|
| Tailscale auth key | `/srv/cc-ci/.testenv` (sandbox) | operator re-issues; re-run `tailscale up` |
|
|
| cc-ci SSH root key | `~/.ssh/cc-ci-root-ed25519` (sandbox) | operator re-keys `authorized_keys` |
|
|
| Gitea bot creds | `/srv/cc-ci/.testenv` (`GITEA_USERNAME/PASSWORD`) | operator resets; update `.testenv` |
|
|
| **Wildcard TLS cert** | cc-ci `/var/lib/ci-certs/live/{fullchain,privkey}.pem` | **operator** re-issues (LE DNS-01/Gandi, ~90d, next ~2026-08-24) — see below |
|
|
| Registry pull creds (if needed) | sops `secrets/secrets.yaml` | operator-provided |
|
|
|
|
A missing/invalid A1 secret is a `## Blocked` condition — the agent never invents or works around it,
|
|
and **never** runs ACME/DNS-01 for commoninternet.net.
|
|
|
|
**Wildcard cert rotation (manual, operator + agent):**
|
|
1. Operator re-issues the SAN cert (`*.ci.commoninternet.net` + `ci.commoninternet.net`) out-of-band
|
|
and writes it to `/var/lib/ci-certs/live/{fullchain,privkey}.pem` on cc-ci.
|
|
2. Bump `SECRET_WILDCARD_CERT_VERSION` / `SECRET_WILDCARD_KEY_VERSION` on the traefik app env
|
|
(modules/proxy.nix) so the next reconcile inserts the new cert as a fresh swarm secret version.
|
|
3. `nixos-rebuild switch` (re-runs the proxy reconcile → re-inserts + redeploys traefik). One cert
|
|
covers every per-run subdomain (SNI), so no per-app cert work.
|
|
|
|
## Class A2 — internal infra secrets (the loop GENERATES + manages; never a blocker)
|
|
|
|
All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/<name>`:
|
|
|
|
| Secret | Used by | Generate |
|
|
|---|---|---|
|
|
| `drone_rpc_secret` | Drone server ↔ exec runner RPC | `openssl rand -hex 32` |
|
|
| `drone_gitea_client_secret` | Drone↔Gitea OAuth app | from the Gitea OAuth app creation |
|
|
| `bridge_webhook_hmac` | comment-bridge webhook HMAC | `openssl rand -hex 32` |
|
|
| `bridge_drone_token` | bridge + dashboard → Drone API | minted Drone user token |
|
|
| `bridge_gitea_token` | bridge → Gitea API (poll/comment) | minted Gitea token (bot) |
|
|
| `restic_password` | backup-bot-two restic repo | **abra-generated** (`abra app secret generate`, kept stable across reconciles) |
|
|
|
|
**Rotate an A2 secret** (e.g. `bridge_webhook_hmac`):
|
|
1. `set -a; . /srv/cc-ci/.testenv; set +a` (for the editor key, not echoed).
|
|
2. In the repo: `sops secrets/secrets.yaml` → replace the value (or `openssl rand -hex 32 | …`),
|
|
save. (Re-encrypts to both recipients automatically per `.sops.yaml`.)
|
|
3. For swarm-secret-backed values, **bump the consuming app's secret version** so the reconcile
|
|
re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone `RPC_SECRET_VERSION`
|
|
v1→v2 (modules/drone.nix), bridge `cc_ci_bridge_*_v<n>` (modules/bridge.nix). Update both ends
|
|
(server + runner share `drone_rpc_secret`).
|
|
4. `git commit` + push, sync to host, `nixos-rebuild switch` → reconcile re-inserts + redeploys.
|
|
5. Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers).
|
|
|
|
**Re-key sops recipients** (e.g. cc-ci host re-provisioned → new host age key): add the new
|
|
`age1…` to `/.sops.yaml`, `sops updatekeys secrets/secrets.yaml` (run from the build host, which
|
|
holds the master identity), commit. The master recovery key lets you re-encrypt even if the host key
|
|
is lost.
|
|
|
|
## Class B — recipe app secrets (the harness generates per run; NEVER a blocker)
|
|
|
|
- **Generated at install:** `abra app secret generate <app> --all` (+ any deterministic test fixtures
|
|
the harness chooses) when the recipe deploys.
|
|
- **Persisted for the run:** the same generated values survive install → upgrade → backup/restore
|
|
because abra/swarm holds them keyed by the per-run app name (`<recipe[:4]>-<6hex>`); the harness
|
|
re-reads them between stages. Concurrent runs are isolated by the unique per-run app name (and
|
|
MAX_TESTS=1 means no concurrency anyway).
|
|
- **Destroyed at teardown:** the same teardown that removes the app/volumes runs
|
|
`abra app secret remove <app> --all` (+ docker-secret cleanup by stack name as a fallback). Nothing
|
|
generated for a run outlives it.
|
|
|
|
## No-plaintext guarantees
|
|
|
|
- Secrets are referenced by `/run/secrets/<name>` path or read inline (e.g.
|
|
`PGPASSWORD=$(cat /run/secrets/…)` *inside* the app container), never printed by the harness.
|
|
- abra does not echo generated secret values; reconciles redirect secret-generate stdout to
|
|
`/dev/null`.
|
|
- The results dashboard renders run status only (no log bodies); per-run logs live in Drone's UI.
|
|
- Adversary leak test: greps published Drone logs + the dashboard for the known infra-secret values
|
|
and any generated app password → must be zero. (Baseline + recipe-CI log scans: clean.)
|