1c/C7: docs — secrets.md + architecture.md updated to the 1c model (cc-ci-secrets submodule, cert-in-git, bootstrap age key, Drone-token injection, verified D8)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -5,18 +5,31 @@ secret ever lives in git, logs, or the results UI** — only sops-encrypted ciph
|
||||
references-by-location. The Adversary's leak test greps published Drone logs + the dashboard for
|
||||
known secret patterns and any generated app password; it must find nothing.
|
||||
|
||||
## Decryption chain (sops-nix)
|
||||
## Where secrets live (Phase-1c: a private companion repo)
|
||||
|
||||
- Infra secrets live sops-encrypted in `secrets/secrets.yaml` (committed). `/.sops.yaml` lists two
|
||||
age recipients: the **host key** (`age1h90ut…`, derived from cc-ci's SSH host key via ssh-to-age)
|
||||
and an off-box **master recovery key** (`age1cmk26t…`; its private half is kept only at
|
||||
`/srv/cc-ci/.sops/master-age.txt` on the build host, never in the repo).
|
||||
- On cc-ci, `sops-nix` decrypts at activation using the host's ed25519 SSH host key as the age
|
||||
identity (`sops.age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]`), materialising each secret at
|
||||
`/run/secrets/<name>` (mode 0400 root). No extra key file to manage on the box.
|
||||
All sops-encrypted secret material — including the **wildcard TLS cert+key** — lives in a **separate
|
||||
private repo `recipe-maintainers/cc-ci-secrets`**, mounted into this repo as a **git submodule at
|
||||
`secrets/`** (so the base resolves `secrets/secrets.yaml`). The base `cc-ci` repo holds **no secrets**,
|
||||
only code/config + instance parameters; `secrets/.sops.yaml` (in the submodule) lists the two age
|
||||
recipients: the **host key** (`age1h90ut…`, cc-ci's SSH host key via ssh-to-age) and the off-box
|
||||
**master/recovery key** (`age1cmk26t…`; private half only at `/srv/cc-ci/.sops/master-age.txt` on the
|
||||
build host / provisioned to a fresh host — never in either repo). Clone with `git clone --recursive`
|
||||
(bot/deploy creds for the private submodule); build with `?submodules=1` (see docs/install.md).
|
||||
|
||||
## Decryption chain (sops-nix) — the ONE out-of-band secret
|
||||
|
||||
- **Bootstrap age key (the only secret not in git):** provisioned to `/var/lib/sops-nix/key.txt`
|
||||
(0600) before the first rebuild. `sops.age.keyFile` points there; `sops.age.sshKeyPaths` also offers
|
||||
cc-ci's SSH host key. On the canonical cc-ci the keyFile holds the host-derived age identity
|
||||
(`ssh-to-age -private-key -i /etc/ssh/ssh_host_ed25519_key`, == the `host` recipient); on a
|
||||
fresh/cloned host whose SSH key is NOT a recipient (e.g. the throwaway rebuild), it holds the
|
||||
**recovery key** — so any host decrypts every secret. (sops-install-secrets aborts if a configured
|
||||
keyFile is missing, so it must exist before `nixos-rebuild`.)
|
||||
- `sops-nix` decrypts at activation into `/run/secrets/<name>` (ramfs, mode 0400 root). The wildcard
|
||||
cert/key are placed at `/var/lib/ci-certs/live/{fullchain,privkey}.pem` (symlinks → /run/secrets) via
|
||||
`sops.secrets.<name>.path` — the path traefik reads (no out-of-band cert file).
|
||||
- Swarm services don't read `/run/secrets` directly; the reconcile oneshots copy each into a **docker
|
||||
swarm secret** (`docker secret create … /run/secrets/<name>`) which the service mounts. abra-managed
|
||||
apps use `abra app secret …`.
|
||||
swarm secret** which the service mounts. abra-managed apps use `abra app secret …`.
|
||||
|
||||
## Class A1 — external inputs (operator-provided; the loop CANNOT create them)
|
||||
|
||||
@ -25,19 +38,23 @@ known secret patterns and any generated app password; it must find nothing.
|
||||
| Tailscale auth key | `/srv/cc-ci/.testenv` (sandbox) | operator re-issues; re-run `tailscale up` |
|
||||
| cc-ci SSH root key | `~/.ssh/cc-ci-root-ed25519` (sandbox) | operator re-keys `authorized_keys` |
|
||||
| Gitea bot creds | `/srv/cc-ci/.testenv` (`GITEA_USERNAME/PASSWORD`) | operator resets; update `.testenv` |
|
||||
| **Wildcard TLS cert** | cc-ci `/var/lib/ci-certs/live/{fullchain,privkey}.pem` | **operator** re-issues (LE DNS-01/Gandi, ~90d, next ~2026-08-24) — see below |
|
||||
| Registry pull creds (if needed) | sops `secrets/secrets.yaml` | operator-provided |
|
||||
| **Bootstrap age key** | host `/var/lib/sops-nix/key.txt` (0600) — **the one out-of-band secret** | host-derived (cc-ci) or recovery key (clone); re-provision on host re-key |
|
||||
| **Wildcard TLS cert+key** | sops in **`cc-ci-secrets`** → decrypted to `/var/lib/ci-certs/live/` | operator re-issues then **commits the new cert into `cc-ci-secrets`** (see below) |
|
||||
| Registry pull creds (if needed) | sops `cc-ci-secrets/secrets.yaml` | operator-provided |
|
||||
|
||||
A missing/invalid A1 secret is a `## Blocked` condition — the agent never invents or works around it,
|
||||
and **never** runs ACME/DNS-01 for commoninternet.net.
|
||||
and **never** runs ACME/DNS-01 for commoninternet.net. (Phase-1c: the cert is now *committed encrypted*
|
||||
in `cc-ci-secrets`, not dropped as a file — but issuance is still operator-only; the Gandi token never
|
||||
touches the repo or the box.)
|
||||
|
||||
**Wildcard cert rotation (manual, operator + agent):**
|
||||
**Wildcard cert rotation (operator; the cert now lives in git):**
|
||||
1. Operator re-issues the SAN cert (`*.ci.commoninternet.net` + `ci.commoninternet.net`) out-of-band
|
||||
and writes it to `/var/lib/ci-certs/live/{fullchain,privkey}.pem` on cc-ci.
|
||||
2. Bump `SECRET_WILDCARD_CERT_VERSION` / `SECRET_WILDCARD_KEY_VERSION` on the traefik app env
|
||||
(modules/proxy.nix) so the next reconcile inserts the new cert as a fresh swarm secret version.
|
||||
3. `nixos-rebuild switch` (re-runs the proxy reconcile → re-inserts + redeploys traefik). One cert
|
||||
covers every per-run subdomain (SNI), so no per-app cert work.
|
||||
(LE DNS-01/Gandi, ~90d, next ~2026-08-24).
|
||||
2. Re-encrypt it into the secrets repo: `sops cc-ci-secrets/secrets.yaml` and replace
|
||||
`wildcard_cert` / `wildcard_key` (each a PEM block scalar); commit + push `cc-ci-secrets`, bump the
|
||||
base submodule pointer.
|
||||
3. `nixos-rebuild switch`: sops re-writes `/var/lib/ci-certs/live/*` from git; the proxy reconcile
|
||||
re-inserts the swarm secret + redeploys traefik. One cert covers every per-run subdomain (SNI).
|
||||
|
||||
## Class A2 — internal infra secrets (the loop GENERATES + manages; never a blocker)
|
||||
|
||||
@ -48,14 +65,15 @@ All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/<name>`
|
||||
| `drone_rpc_secret` | Drone server ↔ exec runner RPC | `openssl rand -hex 32` |
|
||||
| `drone_gitea_client_secret` | Drone↔Gitea OAuth app | from the Gitea OAuth app creation |
|
||||
| `bridge_webhook_hmac` | comment-bridge webhook HMAC | `openssl rand -hex 32` |
|
||||
| `bridge_drone_token` | bridge + dashboard → Drone API | minted Drone user token |
|
||||
| `bridge_drone_token` | bridge + dashboard → Drone API | hex token; **injected as the bot's Drone machine token** via `DRONE_USER_CREATE=…,token:$(cat /run/secrets/bridge_drone_token)` (modules/drone.nix) so it's reproducible on a fresh Drone DB (else the bridge gets 401 on a clean-room rebuild) |
|
||||
| `bridge_gitea_token` | bridge → Gitea API (poll/comment) | minted Gitea token (bot) |
|
||||
| `restic_password` | backup-bot-two restic repo | **abra-generated** (`abra app secret generate`, kept stable across reconciles) |
|
||||
|
||||
**Rotate an A2 secret** (e.g. `bridge_webhook_hmac`):
|
||||
1. `set -a; . /srv/cc-ci/.testenv; set +a` (for the editor key, not echoed).
|
||||
2. In the repo: `sops secrets/secrets.yaml` → replace the value (or `openssl rand -hex 32 | …`),
|
||||
save. (Re-encrypts to both recipients automatically per `.sops.yaml`.)
|
||||
1. Have an age identity that is a recipient (the host key via ssh-to-age, or the recovery key).
|
||||
2. In the **`cc-ci-secrets`** submodule: `sops secrets.yaml` → replace the value (or
|
||||
`openssl rand -hex 32`), save (re-encrypts to both recipients per its `.sops.yaml`); commit + push
|
||||
`cc-ci-secrets`, then bump the base repo's submodule pointer (`git add secrets && commit`).
|
||||
3. For swarm-secret-backed values, **bump the consuming app's secret version** so the reconcile
|
||||
re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone `RPC_SECRET_VERSION`
|
||||
v1→v2 (modules/drone.nix), bridge `cc_ci_bridge_*_v<n>` (modules/bridge.nix). Update both ends
|
||||
@ -64,9 +82,9 @@ All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/<name>`
|
||||
5. Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers).
|
||||
|
||||
**Re-key sops recipients** (e.g. cc-ci host re-provisioned → new host age key): add the new
|
||||
`age1…` to `/.sops.yaml`, `sops updatekeys secrets/secrets.yaml` (run from the build host, which
|
||||
holds the master identity), commit. The master recovery key lets you re-encrypt even if the host key
|
||||
is lost.
|
||||
`age1…` to `cc-ci-secrets/.sops.yaml`, `sops updatekeys secrets.yaml` (run with the master identity),
|
||||
commit `cc-ci-secrets` + bump the submodule pointer. The master/recovery key lets you re-encrypt even
|
||||
if the host key is lost — and is itself the bootstrap key a fresh host uses (`/var/lib/sops-nix/key.txt`).
|
||||
|
||||
## Class B — recipe app secrets (the harness generates per run; NEVER a blocker)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user