diff --git a/BACKLOG-1c.md b/BACKLOG-1c.md new file mode 100644 index 0000000..e6a24e2 --- /dev/null +++ b/BACKLOG-1c.md @@ -0,0 +1,34 @@ +# BACKLOG — Phase 1c + +Single-writer rule (§6.1): Builder edits `## Build backlog`; Adversary edits `## Adversary findings`. + +## Build backlog + +Method W1–W6 from the phase plan §5. Each milestone ends with an Adversary gate. + +- [ ] **W2 — Secrets repo + cert into git.** + - [ ] Create private repo `recipe-maintainers/cc-ci-secrets` (bot is admin). + - [ ] Move `secrets/secrets.yaml` contents + add wildcard cert+key (from `/var/lib/ci-certs/live`) + as sops secrets into `cc-ci-secrets/secrets/secrets.yaml`; copy `.sops.yaml`. + - [ ] Wire base flake to consume `cc-ci-secrets` (linkage: see DECISIONS — flake input vs submodule). + - [ ] secrets.nix: add `wildcard_cert`/`wildcard_key` secrets with `path =` → `/var/lib/ci-certs/live/*`. + - [ ] proxy.nix: cert now sops-decrypted (keep the read, drop "operator precondition" framing). + - [ ] Verify: `nixos-rebuild build --flake .#cc-ci` byte-identical to `/run/current-system`. + - [ ] Verify: `nixos-rebuild switch` on cc-nix-test clean; TLS still served from the git-sourced cert. + - [ ] **Gate W2 CLAIMED** → Adversary verifies byte-identical + TLS-from-git-cert. +- [ ] **W1 — Headroom (just before W3).** Resize `cc-nix-test` 6 GB→4 GB (stop→set→start). Accept: + b1 has room; cc-nix-test healthy at 4 GB. +- [ ] **W3 — Throwaway VM.** Create blank NixOS VM in `terraform-ci` (incus-base), 4 GB; provision + ONLY the bootstrap age key by the documented mechanism. Accept: VM reachable. +- [ ] **W4 — Reproducible live rebuild.** On throwaway VM: clone base+secrets, `nixos-rebuild switch`, + watch oneshots converge, secrets+cert decrypt. Accept: fully up, no step outside docs/install.md; + capture evidence. **Gate W4 CLAIMED.** +- [ ] **W5 — Adversary cold proof + honest D8.** Adversary repeats W4 independently; rewrites D8 + evidence (static+live), removes "infeasible by design". Accept: Adversary D8 live-rebuild PASS + (or narrow signed-off limitation per C5). +- [ ] **W6 — Cleanup + docs + final sizing.** Destroy throwaway VM; update docs (C7); decide+apply + final cc-nix-test sizing. Accept: no leftover; docs match; flip STATUS-1c → `## DONE`. + +## Adversary findings + +(none yet — Adversary owns this section) diff --git a/DECISIONS.md b/DECISIONS.md index 195757c..c2901d3 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -186,3 +186,28 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8) ## Dead-ends - (none yet) + +## Phase 1c (full reproducibility + genuine D8 live rebuild) — 2026-05-27 + +- **Secrets linkage = git SUBMODULE (deviates from plan §7 flake-input default).** `cc-ci-secrets` + is mounted as a submodule at `cc-ci/secrets/` rather than a flake `inputs.secrets`. Rationale: a + private flake input must be re-fetched at **every nix eval**, requiring the bot token persistently + in nix config/netrc on cc-ci AND the throwaway VM (a token in the store/config = a 2nd out-of-band + secret, which 1c forbids). A submodule makes `secrets/secrets.yaml` a plain path in the working + tree → `defaultSopsFile = ../secrets/secrets.yaml` is unchanged (minimal diff, trivially + byte-identical), and the only credential use is the one `git clone --recursive` at provisioning + ("the two repos are *given*", Mission §1). Build invocation becomes + `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` so the submodule tree is + included. (Revisit if `?submodules=1` proves unreliable on cc-ci's nix version.) +- **Bootstrap key for the throwaway VM = the existing RECOVERY (master) age key, via + `sops.age.keyFile`.** The recovery key (`age1cmk26…`, private at `/srv/cc-ci/.sops/master-age.txt`) + is already a sops recipient, so a fresh host with a *different* ssh host key still decrypts every + secret with no re-keying — this is exactly the §0 argument that defeats "host-key binding". + Provisioned to the VM at a fixed path (the ONE out-of-band secret). cc-ci itself keeps decrypting + via its host key (`age.sshKeyPaths`); secrets.nix will offer both identity sources. (Per-host + re-encrypt is cleaner for a *permanent* new instance — documented as the alternative, not used for + the throwaway test.) +- **Cert into git:** wildcard cert+key become sops secrets in `cc-ci-secrets`, decrypted at + activation back to `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` via + `sops.secrets..path`; proxy.nix keeps reading that path (now sops-sourced, not operator-drop). +- **cc-nix-test final sizing:** TBD in W6 (keep 4 GB / restore 6 GB / promote rebuilt VM). diff --git a/JOURNAL-1c.md b/JOURNAL-1c.md new file mode 100644 index 0000000..11993c1 --- /dev/null +++ b/JOURNAL-1c.md @@ -0,0 +1,38 @@ +# JOURNAL — Phase 1c (Builder) + +Append-only. Each entry: what I did + verifying command/output + next. + +## 2026-05-27 — Phase 1c kickoff / orientation + +**Context:** Phase 1 closed DONE (1c10fa5; all D1–D10 Adversary-PASS, no VETO; Adversary loop +terminated for Phase 1). Starting Phase 1c per `plan-phase1c-full-reproducibility.md`. + +**Did:** +- Read the phase plan in full + plan.md §6.1/§7/§9. Verified access & surveyed current state. +- `ssh cc-ci 'hostname && nixos-version && systemctl is-system-running'` + → `nixos` / `24.11.20250630.50ab793 (Vicuna)` / `running`. Healthy. +- Surveyed secrets layout: + - `secrets/secrets.yaml` holds 6 infra secrets: test_secret, drone_rpc_secret, + drone_gitea_client_secret, bridge_drone_token, bridge_gitea_token, bridge_webhook_hmac. + - `.sops.yaml`: recipients = `&host` (ssh-to-age of cc-ci ed25519 host key) + `&master` + (off-box recovery, private at `/srv/cc-ci/.sops/master-age.txt`, sandbox-only). + - `modules/secrets.nix`: `defaultSopsFile = ../secrets/secrets.yaml`, + `age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]`. + - Wildcard cert OUT-OF-BAND at `cc-ci:/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` + (fullchain 2909 B, privkey 227 B → privkey looks like an EC key). `modules/proxy.nix` reconcile + reads it from there and inserts as traefik `ssl_cert`/`ssl_key` swarm secrets; treats a missing + cert as FATAL (operator precondition). + - `flake.nix`: nixpkgs pinned `50ab793`, sops-nix pinned; single local `nixosConfigurations.cc-ci`. +- Tooling note: sandbox host has NO sops/nix/age/ssh-to-age (`which` → only git). sops/age work + must run on cc-ci (has nix + host age key) or via a sops binary fetched there with the master key. +- Bootstrapped Phase-1c state: STATUS-1c.md, BACKLOG-1c.md, JOURNAL-1c.md (this file). REVIEW-1c.md + left for the Adversary (its file per §6.1). Appended Phase-1c decisions to DECISIONS.md. + +**Decisions recorded (DECISIONS.md):** secrets linkage = **git submodule** (deviates from the +flake-input default — rationale: no private-repo fetch credential needed at nix-eval on every +rebuild, keeps `defaultSopsFile` a local path = minimal change + trivially byte-identical); +bootstrap key for throwaway = **recovery age key via `sops.age.keyFile`**. + +**Next (W2):** create private `recipe-maintainers/cc-ci-secrets`; move secrets + wildcard cert into +sops there as a submodule of the base; wire secrets.nix (cert→`/var/lib/ci-certs/live` via `path=`); +prove byte-identical build + clean switch with TLS from the git cert. Then claim Gate W2. diff --git a/STATUS-1c.md b/STATUS-1c.md new file mode 100644 index 0000000..8336a36 --- /dev/null +++ b/STATUS-1c.md @@ -0,0 +1,41 @@ +# STATUS — Phase 1c (full git reproducibility + genuine D8 live rebuild) + +**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase1c-full-reproducibility.md` +**Loop state for THIS phase:** STATUS-1c / BACKLOG-1c / REVIEW-1c / JOURNAL-1c (DECISIONS.md shared). +The repo's STATUS.md / BACKLOG.md / REVIEW.md are Phase-1 HISTORY — not this phase's state. + +## Phase +**1c kickoff** — Phase 1 is DONE & Adversary-signed-off (1c10fa5; all D1–D10 PASS, no VETO). +Now: make the VM fully reproducible from git (secrets+cert in a private `cc-ci-secrets` repo) and +perform a genuine throwaway-VM live rebuild to close D8 honestly. + +## In flight +- **W2 (next):** create private `cc-ci-secrets` repo; move all secrets + the wildcard cert into sops + there; wire the base flake to consume it. (W1 resize deferred until just before W3 — its only + purpose is RAM headroom for the throwaway VM, and it briefly stops the live server.) + +## Definition of Done (C1–C7 — see phase plan §3) +- [ ] C1 — Secrets-repo split (private `cc-ci-secrets`, base stays one parameterized repo, byte-identical build) +- [ ] C2 — Cert in git (wildcard cert+key as sops secrets, decrypted at activation; no operator cert-drop step) +- [ ] C3 — All secrets in git, one exception = bootstrap age key (documented) +- [ ] C4 — Genuine throwaway-VM live rebuild (Incus terraform-ci, only age key provisioned) +- [ ] C5 — Honest D8 (static byte-identical + live rebuild; "infeasible by design" removed) +- [ ] C6 — Resource fit + cleanup (cc-nix-test 6→4 GB, throwaway 4 GB, destroyed after; final sizing decided) +- [ ] C7 — Docs (install.md/secrets.md/architecture.md + main plan refs updated to new model) + +## Gate +None claimed yet. (Milestone gates W2/W4/W5 will be CLAIMED here per §6.1.) + +## Blocked +(none) + +## Notes +- Current secret layout: `secrets/secrets.yaml` (6 infra secrets), recipients = host age key + (ssh-to-age of cc-ci's ed25519 host key) + off-box master recovery key + (`/srv/cc-ci/.sops/master-age.txt`, sandbox-only). `.sops.yaml` at repo root. +- Wildcard cert currently out-of-band at `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` + (operator-provided, LE, next renewal ~2026-08-24); proxy.nix reads it from there. 1c moves it + into sops-in-git, decrypted back to that path at activation. +- Sandbox host has NO sops/nix/age — sops ops run on cc-ci (has nix + host age key) or via the master + key with a sops binary fetched on cc-ci. +- cc-nix-test == the live cc-ci server (100.90.116.4); resizing it (W1) briefly stops it.