Phase-1c: split only secrets into a separate cc-ci-secrets repo; base stays parameterized
Per operator: the split boundary is secrecy, not modularity. Only the sops-encrypted secrets (incl. the wildcard cert) move to a separate private repo `cc-ci-secrets` (extra access-control layer), consumed by the base via a flake input. Instance non-secret vars (domain, gateway, recipients) stay in the well-parameterized base cc-ci repo — another admin repoints by editing params, no second config repo. Guardrail reworded: instance vars in base are fine; only plaintext SECRETS must never leak into base/store. Updated model/C1/C2/W2/§6/§7 + README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -16,7 +16,7 @@ autonomous Claude loops (a Builder and an adversarial Reviewer) running over day
|
||||
|---|---|
|
||||
| `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. |
|
||||
| `plan-phase1b-review-lint.md` | **Phase 1b** (bounded pass at the end of Phase 1): deterministic linting/formatting in CI + a white-box review checklist (real tests, DRY harness, idempotent Nix, no footguns/secrets). |
|
||||
| `plan-phase1c-full-reproducibility.md` | **Phase 1c**: make the VM fully reproducible from git (all secrets incl. the wildcard cert in sops; generic base + private instance flake input) and do the **genuine throwaway-VM live rebuild** to close D8 honestly (the "infeasible by design" was overstated). |
|
||||
| `plan-phase1c-full-reproducibility.md` | **Phase 1c**: make the VM fully reproducible from git (all secrets incl. the wildcard cert in sops, in a separate private `cc-ci-secrets` repo as a flake input; base stays well-parameterized) and do the **genuine throwaway-VM live rebuild** to close D8 honestly (the "infeasible by design" was overstated). |
|
||||
| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1b): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. |
|
||||
| `plan-phase2b-test-performance.md` | **Phase 2b** (after Phase 2, before Phase 3): empirically measure where test time goes and reduce it (image cache, readiness tuning, dedup deploys, warm infra, concurrency) — no weakened tests. |
|
||||
| `plan-phase3-results-ux.md` | **Phase 3** (after Phase 2b): beautiful YunoHost-style results — per-run **level**, image-forward PR comment (badge + summary card + app screenshot), polished dashboard. |
|
||||
|
||||
@ -21,8 +21,8 @@ Phase-1 D8 was marked PASS with the throwaway-VM **live rebuild "documented infe
|
||||
refused to self-certify; the bar then slipped to "infeasible."
|
||||
|
||||
This phase does two connected things: **(A)** make the VM **fully reproducible from git, including
|
||||
all secrets** (move the wildcard cert and everything else into sops-in-git; split generic base from
|
||||
private instance), and **(B)** actually perform and verify the **throwaway-VM live rebuild**, closing
|
||||
all secrets** (move the wildcard cert and everything else into sops-in-git, in a separate private
|
||||
`cc-ci-secrets` repo), and **(B)** actually perform and verify the **throwaway-VM live rebuild**, closing
|
||||
D8 honestly. The byte-identical-closure evidence from Phase 1 stays valid as the *static* half of D8;
|
||||
this adds the *live* half it was missing.
|
||||
|
||||
@ -30,7 +30,7 @@ this adds the *live* half it was missing.
|
||||
|
||||
## 1. Mission
|
||||
|
||||
A blank NixOS VM, given only **(1)** the two git repos (generic base + private instance), **(2)** the
|
||||
A blank NixOS VM, given only **(1)** the two git repos (base `cc-ci` + private `cc-ci-secrets`), **(2)** the
|
||||
single bootstrap age key, and **(3)** the external DNS/gateway already pointing at it, becomes a
|
||||
working cc-ci via **`nixos-rebuild switch`** with **no undocumented manual steps** — secrets and the
|
||||
wildcard cert included, decrypted from git. Proven on a real throwaway VM by the loops.
|
||||
@ -39,32 +39,36 @@ wildcard cert included, decrypted from git. Proven on a real throwaway VM by the
|
||||
|
||||
## 2. The reproducibility model (target architecture)
|
||||
|
||||
**Two repos, composed via a flake input (default) — generic base + private instance.**
|
||||
**Split only the *secrets* into their own repo. The base stays one well-parameterized repo.** The
|
||||
boundary is *secrecy*, not modularity: secrets get a separate private repo (an extra access-control
|
||||
layer); instance-specific **non-secret** vars (domain, gateway/DNS facts) stay in the base as plain,
|
||||
changeable parameters — another admin can repoint cc-ci by editing them, no second config repo needed.
|
||||
|
||||
- **`cc-ci` (base, instance-agnostic):** `flake.nix` exposing a parameterized `nixosModules.cc-ci`,
|
||||
plus `runner/`, `tests/`, `docs/`. **No hardcoded domain, no instance secrets.**
|
||||
- **`cc-ci-instance` (private, e.g. `recipe-maintainers/cc-ci-instance`):** `instance.nix`
|
||||
(DOMAIN=`ci.commoninternet.net`, gateway/DNS facts, sops recipients), `secrets/secrets.yaml`
|
||||
(sops-encrypted: **wildcard cert + key**, Drone OAuth client_id/secret + RPC secret, webhook HMAC,
|
||||
registry creds if any, app/infra secrets), and `.sops.yaml`.
|
||||
- **`cc-ci` (base — one repo, well-parameterized):** `flake.nix`, modules, `runner/`, `tests/`,
|
||||
`docs/`, **and** the instance config as parameters/defaults (e.g. `DOMAIN = "ci.commoninternet.net"`,
|
||||
gateway/DNS facts, sops recipients). Instance *config* lives here; only *secret material* is external.
|
||||
Keep it well-parameterized so changing the domain/recipients is a one-line edit, not a fork.
|
||||
- **`cc-ci-secrets` (private, `recipe-maintainers/cc-ci-secrets`):** holds **only** the sops-encrypted
|
||||
secret material — `secrets/secrets.yaml` (**wildcard cert + key**, Drone OAuth client_id/secret +
|
||||
RPC secret, webhook HMAC, registry creds if any, app/infra secrets) + `.sops.yaml`. **No code, no
|
||||
config logic** — just encrypted secrets, as a separate security layer with its own access control.
|
||||
- **Linkage (default = flake input):** base `flake.nix` has
|
||||
`inputs.instance.url = "git+https://git.autonomic.zone/recipe-maintainers/cc-ci-instance"`
|
||||
(private; fetched via the bot token / a read deploy key), and
|
||||
`nixosConfigurations.cc-ci = nixpkgs.lib.nixosSystem { modules = [ self.nixosModules.cc-ci instance.nixosModules.instance ]; }`.
|
||||
*Alternative:* a git **submodule** at `cc-ci/instance/` (simpler single checkout; submodule
|
||||
footguns). Record the choice in `DECISIONS.md`; flake input is the recommended default.
|
||||
- **sops-nix wiring:** `sops.defaultSopsFile = instance secrets.yaml`; `sops.age.sshKeyPaths` = host
|
||||
key + the recovery recipient. The **wildcard cert/key are sops secrets** decrypted at activation to
|
||||
`/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` and fed into the Traefik recipe's
|
||||
`ssl_cert`/`ssl_key` swarm secrets — **no out-of-band cert file.**
|
||||
- **The one irreducible out-of-band secret:** the **age private key** that unlocks the repo's sops
|
||||
secrets (the host key, or the provisioned recovery key) — it cannot live in the repo it decrypts.
|
||||
This is the *only* permitted "not in git" secret, and it's provisioned to the host at creation.
|
||||
`inputs.secrets.url = "git+https://git.autonomic.zone/recipe-maintainers/cc-ci-secrets"`
|
||||
(private; fetched via the bot token / a read deploy key); sops-nix reads `secrets/secrets.yaml` from
|
||||
it. *Alternative:* a git **submodule** at `cc-ci/secrets/`. Record the choice in `DECISIONS.md`;
|
||||
flake input is the recommended default.
|
||||
- **sops-nix wiring:** `sops.defaultSopsFile` → the `cc-ci-secrets` `secrets.yaml`;
|
||||
`sops.age.sshKeyPaths` = host key + the recovery recipient. The **wildcard cert/key are sops
|
||||
secrets** decrypted at activation to `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` and fed
|
||||
into the Traefik recipe's `ssl_cert`/`ssl_key` swarm secrets — **no out-of-band cert file.**
|
||||
- **The one irreducible out-of-band secret:** the **age private key** that unlocks the secrets (the
|
||||
host key, or the provisioned recovery key) — it cannot live in the repo it decrypts. This is the
|
||||
*only* permitted "not in git" secret, provisioned to the host at creation.
|
||||
- **Still external (not the VM's git, by nature):** the DNS records + the TLS-passthrough gateway
|
||||
(network infra) — documented as preconditions. (IaC for those is out of scope — see §7.)
|
||||
- **Token discipline preserved:** only the cert *artifact* enters git (encrypted); the **Gandi DNS
|
||||
token never enters the repo or the agent**. Renewal = operator re-issues the cert out-of-band, then
|
||||
commits the new sops-encrypted cert to the instance repo (a versioned, reproducible renewal).
|
||||
- **Token discipline preserved:** only the cert *artifact* enters git (encrypted, in `cc-ci-secrets`);
|
||||
the **Gandi DNS token never enters any repo or the agent**. Renewal = operator re-issues the cert
|
||||
out-of-band, then commits the new sops-encrypted cert to `cc-ci-secrets` (versioned, reproducible).
|
||||
|
||||
---
|
||||
|
||||
@ -73,10 +77,12 @@ wildcard cert included, decrypted from git. Proven on a real throwaway VM by the
|
||||
Terminates only when every item holds **and the Adversary has independently re-verified each within
|
||||
24h, from a cold start** (logged in `REVIEW.md`):
|
||||
|
||||
- [ ] **C1 — Repo split.** Generic base + private instance repo, composed (flake input by default).
|
||||
The base builds with no instance secrets/domain baked in; the instance carries all instance
|
||||
specifics. `nixosConfigurations.cc-ci` still builds byte-identically to the running system.
|
||||
- [ ] **C2 — Cert in git.** The wildcard cert+key are sops secrets in the instance repo, decrypted
|
||||
- [ ] **C1 — Secrets-repo split.** A separate private `cc-ci-secrets` repo holds **only** the
|
||||
sops-encrypted secrets (+ `.sops.yaml`), consumed by the base via a flake input. The base
|
||||
`cc-ci` stays one well-parameterized repo — instance vars (domain, gateway, recipients) remain
|
||||
changeable parameters in the base, **not** moved out (only secrets are external).
|
||||
`nixosConfigurations.cc-ci` still builds byte-identically to the running system.
|
||||
- [ ] **C2 — Cert in git.** The wildcard cert+key are sops secrets in `cc-ci-secrets`, decrypted
|
||||
at activation to the cert path + Traefik secret; the prior "operator drops a cert file" step is
|
||||
gone. Verified: a rebuild serves valid TLS from the git-sourced cert.
|
||||
- [ ] **C3 — All secrets in git (one exception).** Every infra/app secret (cert, Drone OAuth/RPC,
|
||||
@ -126,9 +132,10 @@ out — see memory).
|
||||
1. **W1 — Headroom.** Resize `cc-nix-test` 6 GB→**4 GB** (stop→set→start) so a **4 GB** throwaway VM
|
||||
fits within the ~12 GB running guideline (4 + lichen 4 + throwaway 4). *Accept:* b1 has room;
|
||||
cc-nix-test healthy at 4 GB (avoid heavy recipe CI during 1c). *(Final sizing decided in W6.)*
|
||||
2. **W2 — Repo split + secrets into git.** Create the private `cc-ci-instance` repo; move instance
|
||||
specifics + all secrets (incl. the **wildcard cert+key**, read from `/var/lib/ci-certs/live`) into
|
||||
sops there; wire the base flake to consume it (flake input). *Accept:* `nixos-rebuild build` of the
|
||||
2. **W2 — Secrets repo + cert into git.** Create the private `cc-ci-secrets` repo; move **all secrets**
|
||||
into sops there — including the **wildcard cert+key** (read from the current `/var/lib/ci-certs/live`)
|
||||
and the existing `secrets/secrets.yaml` contents; keep instance vars parameterized in the base;
|
||||
wire the base flake to consume `cc-ci-secrets` (flake input). *Accept:* `nixos-rebuild build` of the
|
||||
restructured config is **byte-identical** to the running system (zero drift), and `cc-nix-test`
|
||||
`nixos-rebuild switch`es cleanly onto the new structure with TLS still served from the git cert.
|
||||
3. **W3 — Throwaway VM.** Create a blank NixOS VM in `terraform-ci` (the incus-base image), sized
|
||||
@ -155,8 +162,10 @@ out — see memory).
|
||||
accept.
|
||||
- **Gandi token stays out of repo/agent** — only the cert artifact is committed (encrypted). Renewal
|
||||
is operator-issues-then-commits.
|
||||
- **Base repo stays generic** — no instance domain/secret leakage into the base; the Adversary checks
|
||||
the base builds/clones clean of instance specifics.
|
||||
- **No plaintext secret leaks into the base (or the store).** Instance *vars* (domain, gateway) may
|
||||
live in the base as parameters — that's fine; what must NOT leak is any *secret* (cert/keys/tokens):
|
||||
those stay encrypted in `cc-ci-secrets`. The Adversary greps the base + the Nix store for plaintext
|
||||
secret material.
|
||||
- **Incus guardrails** (§4): terraform-ci only, respect the RAM cap, destroy the throwaway VM, don't
|
||||
touch other instances.
|
||||
- **No weakened tests / no drift** — the restructured config must remain byte-identical to running
|
||||
@ -165,7 +174,7 @@ out — see memory).
|
||||
---
|
||||
|
||||
## 7. Open decisions (log in DECISIONS.md)
|
||||
- **Flake input vs git submodule** for the instance repo (default: flake input).
|
||||
- **Flake input vs git submodule** for the `cc-ci-secrets` repo (default: flake input).
|
||||
- **Bootstrap-key provisioning** for a new VM: provision the off-box recovery age key to the host
|
||||
(decrypt as-is) vs generate the new host's key + re-encrypt secrets to it. (Recovery key is
|
||||
simpler for a clone; per-host re-encrypt is cleaner long-term.)
|
||||
@ -173,5 +182,5 @@ out — see memory).
|
||||
reproducible VM** to be the canonical cc-ci and retire the old one.
|
||||
- **DNS/gateway as IaC** (terraform for the Gandi records + the gateway) — likely a separate future
|
||||
item ([[IDEAS]]), out of 1c scope; 1c keeps them as documented external preconditions.
|
||||
- Whether the instance repo is private under `recipe-maintainers` (bot is admin) and how the loops
|
||||
fetch it during rebuild (token-in-URL vs read deploy key).
|
||||
- How the loops fetch the private `cc-ci-secrets` repo during rebuild (bot token-in-URL vs a read
|
||||
deploy key) — it's private under `recipe-maintainers` (bot is admin).
|
||||
|
||||
Reference in New Issue
Block a user