flake.nix/flake.lock STAY at root so the build ref #cc-ci is unchanged; only flake's internal configuration.nix path updated. Root-relative refs inside moved modules re-based ../X -> ../../X (secrets/bridge/dashboard); configuration.nix's ../../modules imports unchanged (both dirs under nix/). Living docs (README, architecture/install/secrets/enroll) + .drone.yml comment updated to nix/...; append-only history logs left as-is. DECISIONS.md records RL5 + the deferred-coordinated RL6. Verified on cc-ci: nixos-rebuild build 'path:#cc-ci' -> toplevel 8i3jcad9 (BYTE-IDENTICAL to the pre-move build — store derivations are content-addressed on file contents, module .nix not in the runtime closure); scripts/lint.sh -> lint: PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.9 KiB
Secrets model & rotation (D6)
cc-ci handles three classes of secret in deliberately different ways (plan §4.4). No plaintext secret ever lives in git, logs, or the results UI — only sops-encrypted ciphertext and references-by-location. The Adversary's leak test greps published Drone logs + the dashboard for known secret patterns and any generated app password; it must find nothing.
Where secrets live (Phase-1c: a private companion repo)
All sops-encrypted secret material — including the wildcard TLS cert+key — lives in a separate
private repo recipe-maintainers/cc-ci-secrets, mounted into this repo as a git submodule at
secrets/ (so the base resolves secrets/secrets.yaml). The base cc-ci repo holds no secrets,
only code/config + instance parameters; secrets/.sops.yaml (in the submodule) lists the two age
recipients: the host key (age1h90ut…, cc-ci's SSH host key via ssh-to-age) and the off-box
master/recovery key (age1cmk26t…; private half only at /srv/cc-ci/.sops/master-age.txt on the
build host / provisioned to a fresh host — never in either repo). Clone with git clone --recursive
(bot/deploy creds for the private submodule); build with ?submodules=1 (see docs/install.md).
Decryption chain (sops-nix) — the ONE out-of-band secret
- Bootstrap age key (the only secret not in git): provisioned to
/var/lib/sops-nix/key.txt(0600) before the first rebuild.sops.age.keyFilepoints there;sops.age.sshKeyPathsalso offers cc-ci's SSH host key. On the canonical cc-ci the keyFile holds the host-derived age identity (ssh-to-age -private-key -i /etc/ssh/ssh_host_ed25519_key, == thehostrecipient); on a fresh/cloned host whose SSH key is NOT a recipient (e.g. the throwaway rebuild), it holds the recovery key — so any host decrypts every secret. (sops-install-secrets aborts if a configured keyFile is missing, so it must exist beforenixos-rebuild.) sops-nixdecrypts at activation into/run/secrets/<name>(ramfs, mode 0400 root). The wildcard cert/key are placed at/var/lib/ci-certs/live/{fullchain,privkey}.pem(symlinks → /run/secrets) viasops.secrets.<name>.path— the path traefik reads (no out-of-band cert file).- Swarm services don't read
/run/secretsdirectly; the reconcile oneshots copy each into a docker swarm secret which the service mounts. abra-managed apps useabra app secret ….
Class A1 — external inputs (operator-provided; the loop CANNOT create them)
| Secret | Location | Rotation |
|---|---|---|
| Tailscale auth key | /srv/cc-ci/.testenv (sandbox) |
operator re-issues; re-run tailscale up |
| cc-ci SSH root key | ~/.ssh/cc-ci-root-ed25519 (sandbox) |
operator re-keys authorized_keys |
| Gitea bot creds | /srv/cc-ci/.testenv (GITEA_USERNAME/PASSWORD) |
operator resets; update .testenv |
| Bootstrap age key | host /var/lib/sops-nix/key.txt (0600) — the one out-of-band secret |
host-derived (cc-ci) or recovery key (clone); re-provision on host re-key |
| Wildcard TLS cert+key | sops in cc-ci-secrets → decrypted to /var/lib/ci-certs/live/ |
operator re-issues then commits the new cert into cc-ci-secrets (see below) |
| Registry pull creds (if needed) | sops cc-ci-secrets/secrets.yaml |
operator-provided |
A missing/invalid A1 secret is a ## Blocked condition — the agent never invents or works around it,
and never runs ACME/DNS-01 for commoninternet.net. (Phase-1c: the cert is now committed encrypted
in cc-ci-secrets, not dropped as a file — but issuance is still operator-only; the Gandi token never
touches the repo or the box.)
Wildcard cert rotation (operator; the cert now lives in git):
- Operator re-issues the SAN cert (
*.ci.commoninternet.net+ci.commoninternet.net) out-of-band (LE DNS-01/Gandi, ~90d, next ~2026-08-24). - Re-encrypt it into the secrets repo:
sops cc-ci-secrets/secrets.yamland replacewildcard_cert/wildcard_key(each a PEM block scalar); commit + pushcc-ci-secrets, bump the base submodule pointer. nixos-rebuild switch: sops re-writes/var/lib/ci-certs/live/*from git; the proxy reconcile re-inserts the swarm secret + redeploys traefik. One cert covers every per-run subdomain (SNI).
Class A2 — internal infra secrets (the loop GENERATES + manages; never a blocker)
All sops-encrypted in secrets/secrets.yaml, decrypted to /run/secrets/<name>:
| Secret | Used by | Generate |
|---|---|---|
drone_rpc_secret |
Drone server ↔ exec runner RPC | openssl rand -hex 32 |
drone_gitea_client_secret |
Drone↔Gitea OAuth app | from the Gitea OAuth app creation |
bridge_webhook_hmac |
comment-bridge webhook HMAC | openssl rand -hex 32 |
bridge_drone_token |
bridge + dashboard → Drone API | hex token; injected as the bot's Drone machine token via DRONE_USER_CREATE=…,token:$(cat /run/secrets/bridge_drone_token) (nix/modules/drone.nix) so it's reproducible on a fresh Drone DB (else the bridge gets 401 on a clean-room rebuild) |
bridge_gitea_token |
bridge → Gitea API (poll/comment) | minted Gitea token (bot) |
restic_password |
backup-bot-two restic repo | abra-generated (abra app secret generate, kept stable across reconciles) |
Rotate an A2 secret (e.g. bridge_webhook_hmac):
- Have an age identity that is a recipient (the host key via ssh-to-age, or the recovery key).
- In the
cc-ci-secretssubmodule:sops secrets.yaml→ replace the value (oropenssl rand -hex 32), save (re-encrypts to both recipients per its.sops.yaml); commit + pushcc-ci-secrets, then bump the base repo's submodule pointer (git add secrets && commit). - For swarm-secret-backed values, bump the consuming app's secret version so the reconcile
re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone
RPC_SECRET_VERSIONv1→v2 (nix/modules/drone.nix), bridgecc_ci_bridge_*_v<n>(nix/modules/bridge.nix). Update both ends (server + runner sharedrone_rpc_secret). git commit+ push, sync to host,nixos-rebuild switch→ reconcile re-inserts + redeploys.- Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers).
Re-key sops recipients (e.g. cc-ci host re-provisioned → new host age key): add the new
age1… to cc-ci-secrets/.sops.yaml, sops updatekeys secrets.yaml (run with the master identity),
commit cc-ci-secrets + bump the submodule pointer. The master/recovery key lets you re-encrypt even
if the host key is lost — and is itself the bootstrap key a fresh host uses (/var/lib/sops-nix/key.txt).
Class B — recipe app secrets (the harness generates per run; NEVER a blocker)
- Generated at install:
abra app secret generate <app> --all(+ any deterministic test fixtures the harness chooses) when the recipe deploys. - Persisted for the run: the same generated values survive install → upgrade → backup/restore
because abra/swarm holds them keyed by the per-run app name (
<recipe[:4]>-<6hex>); the harness re-reads them between stages. Concurrent runs are isolated by the unique per-run app name (and MAX_TESTS=1 means no concurrency anyway). - Destroyed at teardown: the same teardown that removes the app/volumes runs
abra app secret remove <app> --all(+ docker-secret cleanup by stack name as a fallback). Nothing generated for a run outlives it.
No-plaintext guarantees
- Secrets are referenced by
/run/secrets/<name>path or read inline (e.g.PGPASSWORD=$(cat /run/secrets/…)inside the app container), never printed by the harness. - abra does not echo generated secret values; reconciles redirect secret-generate stdout to
/dev/null. - The results dashboard renders run status only (no log bodies); per-run logs live in Drone's UI.
- Adversary leak test: greps published Drone logs + the dashboard for the known infra-secret values and any generated app password → must be zero. (Baseline + recipe-CI log scans: clean.)