M7/D6: secrets rotation doc + log redaction filter
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
docs/secrets.md documents the 3 secret classes (A1 external, A2 internal-generated, B recipe-app), the sops-nix decryption chain, and rotation procedures for each (cert version bump, sops re-encrypt + swarm-secret version bump, recipe-app ephemeral). run_recipe_ci streams each stage's output through a redaction filter that masks any /run/secrets/* value (>=8 chars) before it reaches Drone logs — belt-and-suspenders over 'harness never prints secrets + abra doesn't echo'. Live streaming + exit code preserved (locally tested). Recipe-ci clones cc-ci fresh per build, so this applies next run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
91
docs/secrets.md
Normal file
91
docs/secrets.md
Normal file
@ -0,0 +1,91 @@
|
||||
# Secrets model & rotation (D6)
|
||||
|
||||
cc-ci handles three classes of secret in deliberately different ways (plan §4.4). **No plaintext
|
||||
secret ever lives in git, logs, or the results UI** — only sops-encrypted ciphertext and
|
||||
references-by-location. The Adversary's leak test greps published Drone logs + the dashboard for
|
||||
known secret patterns and any generated app password; it must find nothing.
|
||||
|
||||
## Decryption chain (sops-nix)
|
||||
|
||||
- Infra secrets live sops-encrypted in `secrets/secrets.yaml` (committed). `/.sops.yaml` lists two
|
||||
age recipients: the **host key** (`age1h90ut…`, derived from cc-ci's SSH host key via ssh-to-age)
|
||||
and an off-box **master recovery key** (`age1cmk26t…`; its private half is kept only at
|
||||
`/srv/cc-ci/.sops/master-age.txt` on the build host, never in the repo).
|
||||
- On cc-ci, `sops-nix` decrypts at activation using the host's ed25519 SSH host key as the age
|
||||
identity (`sops.age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]`), materialising each secret at
|
||||
`/run/secrets/<name>` (mode 0400 root). No extra key file to manage on the box.
|
||||
- Swarm services don't read `/run/secrets` directly; the reconcile oneshots copy each into a **docker
|
||||
swarm secret** (`docker secret create … /run/secrets/<name>`) which the service mounts. abra-managed
|
||||
apps use `abra app secret …`.
|
||||
|
||||
## Class A1 — external inputs (operator-provided; the loop CANNOT create them)
|
||||
|
||||
| Secret | Location | Rotation |
|
||||
|---|---|---|
|
||||
| Tailscale auth key | `/srv/cc-ci/.testenv` (sandbox) | operator re-issues; re-run `tailscale up` |
|
||||
| cc-ci SSH root key | `~/.ssh/cc-ci-root-ed25519` (sandbox) | operator re-keys `authorized_keys` |
|
||||
| Gitea bot creds | `/srv/cc-ci/.testenv` (`GITEA_USERNAME/PASSWORD`) | operator resets; update `.testenv` |
|
||||
| **Wildcard TLS cert** | cc-ci `/var/lib/ci-certs/live/{fullchain,privkey}.pem` | **operator** re-issues (LE DNS-01/Gandi, ~90d, next ~2026-08-24) — see below |
|
||||
| Registry pull creds (if needed) | sops `secrets/secrets.yaml` | operator-provided |
|
||||
|
||||
A missing/invalid A1 secret is a `## Blocked` condition — the agent never invents or works around it,
|
||||
and **never** runs ACME/DNS-01 for commoninternet.net.
|
||||
|
||||
**Wildcard cert rotation (manual, operator + agent):**
|
||||
1. Operator re-issues the SAN cert (`*.ci.commoninternet.net` + `ci.commoninternet.net`) out-of-band
|
||||
and writes it to `/var/lib/ci-certs/live/{fullchain,privkey}.pem` on cc-ci.
|
||||
2. Bump `SECRET_WILDCARD_CERT_VERSION` / `SECRET_WILDCARD_KEY_VERSION` on the traefik app env
|
||||
(modules/proxy.nix) so the next reconcile inserts the new cert as a fresh swarm secret version.
|
||||
3. `nixos-rebuild switch` (re-runs the proxy reconcile → re-inserts + redeploys traefik). One cert
|
||||
covers every per-run subdomain (SNI), so no per-app cert work.
|
||||
|
||||
## Class A2 — internal infra secrets (the loop GENERATES + manages; never a blocker)
|
||||
|
||||
All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/<name>`:
|
||||
|
||||
| Secret | Used by | Generate |
|
||||
|---|---|---|
|
||||
| `drone_rpc_secret` | Drone server ↔ exec runner RPC | `openssl rand -hex 32` |
|
||||
| `drone_gitea_client_secret` | Drone↔Gitea OAuth app | from the Gitea OAuth app creation |
|
||||
| `bridge_webhook_hmac` | comment-bridge webhook HMAC | `openssl rand -hex 32` |
|
||||
| `bridge_drone_token` | bridge + dashboard → Drone API | minted Drone user token |
|
||||
| `bridge_gitea_token` | bridge → Gitea API (poll/comment) | minted Gitea token (bot) |
|
||||
| `restic_password` | backup-bot-two restic repo | **abra-generated** (`abra app secret generate`, kept stable across reconciles) |
|
||||
|
||||
**Rotate an A2 secret** (e.g. `bridge_webhook_hmac`):
|
||||
1. `set -a; . /srv/cc-ci/.testenv; set +a` (for the editor key, not echoed).
|
||||
2. In the repo: `sops secrets/secrets.yaml` → replace the value (or `openssl rand -hex 32 | …`),
|
||||
save. (Re-encrypts to both recipients automatically per `.sops.yaml`.)
|
||||
3. For swarm-secret-backed values, **bump the consuming app's secret version** so the reconcile
|
||||
re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone `RPC_SECRET_VERSION`
|
||||
v1→v2 (modules/drone.nix), bridge `cc_ci_bridge_*_v<n>` (modules/bridge.nix). Update both ends
|
||||
(server + runner share `drone_rpc_secret`).
|
||||
4. `git commit` + push, sync to host, `nixos-rebuild switch` → reconcile re-inserts + redeploys.
|
||||
5. Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers).
|
||||
|
||||
**Re-key sops recipients** (e.g. cc-ci host re-provisioned → new host age key): add the new
|
||||
`age1…` to `/.sops.yaml`, `sops updatekeys secrets/secrets.yaml` (run from the build host, which
|
||||
holds the master identity), commit. The master recovery key lets you re-encrypt even if the host key
|
||||
is lost.
|
||||
|
||||
## Class B — recipe app secrets (the harness generates per run; NEVER a blocker)
|
||||
|
||||
- **Generated at install:** `abra app secret generate <app> --all` (+ any deterministic test fixtures
|
||||
the harness chooses) when the recipe deploys.
|
||||
- **Persisted for the run:** the same generated values survive install → upgrade → backup/restore
|
||||
because abra/swarm holds them keyed by the per-run app name (`<recipe[:4]>-<6hex>`); the harness
|
||||
re-reads them between stages. Concurrent runs are isolated by the unique per-run app name (and
|
||||
MAX_TESTS=1 means no concurrency anyway).
|
||||
- **Destroyed at teardown:** the same teardown that removes the app/volumes runs
|
||||
`abra app secret remove <app> --all` (+ docker-secret cleanup by stack name as a fallback). Nothing
|
||||
generated for a run outlives it.
|
||||
|
||||
## No-plaintext guarantees
|
||||
|
||||
- Secrets are referenced by `/run/secrets/<name>` path or read inline (e.g.
|
||||
`PGPASSWORD=$(cat /run/secrets/…)` *inside* the app container), never printed by the harness.
|
||||
- abra does not echo generated secret values; reconciles redirect secret-generate stdout to
|
||||
`/dev/null`.
|
||||
- The results dashboard renders run status only (no log bodies); per-run logs live in Drone's UI.
|
||||
- Adversary leak test: greps published Drone logs + the dashboard for the known infra-secret values
|
||||
and any generated app password → must be zero. (Baseline + recipe-CI log scans: clean.)
|
||||
@ -33,6 +33,40 @@ STAGE_FILES = {
|
||||
}
|
||||
|
||||
|
||||
def _redact_values() -> list[str]:
|
||||
"""Values to scrub from published logs (D6 redaction filter, plan §4.4). The infra secrets
|
||||
materialised at /run/secrets/* — if any subprocess ever echoes one, mask it. Only >=8-char
|
||||
values, so it never false-positives on short strings / SHAs."""
|
||||
vals = set()
|
||||
for p in glob.glob("/run/secrets/*"):
|
||||
try:
|
||||
v = open(p).read().strip()
|
||||
except OSError:
|
||||
continue
|
||||
if len(v) >= 8:
|
||||
vals.add(v)
|
||||
return sorted(vals, key=len, reverse=True)
|
||||
|
||||
|
||||
_REDACT = _redact_values()
|
||||
|
||||
|
||||
def run_stage_redacted(cmd: list[str], env: dict | None = None) -> int:
|
||||
"""Run a stage subprocess, streaming its output live (so Drone logs stay tail-able) but masking
|
||||
any known infra-secret value first. Belt-and-suspenders: the harness already never prints
|
||||
secrets and abra doesn't echo generated ones."""
|
||||
proc = subprocess.Popen(cmd, cwd=ROOT, env=env, stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT, text=True, bufsize=1)
|
||||
assert proc.stdout is not None
|
||||
for line in proc.stdout:
|
||||
for v in _REDACT:
|
||||
if v in line:
|
||||
line = line.replace(v, "***REDACTED***")
|
||||
sys.stdout.write(line)
|
||||
sys.stdout.flush()
|
||||
return proc.wait()
|
||||
|
||||
|
||||
def _gitea_token() -> str | None:
|
||||
tok = os.environ.get("GITEA_TOKEN")
|
||||
if not tok and os.path.exists("/run/secrets/bridge_gitea_token"):
|
||||
@ -94,7 +128,7 @@ def main() -> int:
|
||||
continue
|
||||
print(f"\n===== STAGE: {stage} =====", flush=True)
|
||||
# each stage is its own pytest invocation => its own reported result (D2 separate stages)
|
||||
rc = subprocess.call([sys.executable, "-m", "pytest", "-v", "-rA", path], cwd=ROOT)
|
||||
rc = run_stage_redacted([sys.executable, "-m", "pytest", "-v", "-rA", path])
|
||||
ran += 1
|
||||
if rc != 0:
|
||||
overall = rc
|
||||
@ -135,8 +169,7 @@ def run_recipe_local(recipe: str, local_tests: str | None) -> int | None:
|
||||
lifecycle.deploy_app(recipe, domain, version=os.environ.get("VERSION") or None)
|
||||
lifecycle.wait_healthy(domain)
|
||||
env = dict(os.environ, CCCI_APP_DOMAIN=domain, CCCI_BASE_URL=f"https://{domain}")
|
||||
return subprocess.call([sys.executable, "-m", "pytest", "-v", "-rA", local_tests],
|
||||
cwd=ROOT, env=env)
|
||||
return run_stage_redacted([sys.executable, "-m", "pytest", "-v", "-rA", local_tests], env=env)
|
||||
finally:
|
||||
lifecycle.teardown_app(domain, verify=False)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user