1c: bootstrap Phase 1c loop state (STATUS/BACKLOG/JOURNAL-1c) + decisions (submodule linkage, recovery-key bootstrap)
All checks were successful
continuous-integration/drone/push Build is passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-27 16:06:00 +01:00
parent be37eccd31
commit 8e2357e5bf
4 changed files with 138 additions and 0 deletions

34
BACKLOG-1c.md Normal file
View File

@ -0,0 +1,34 @@
# BACKLOG — Phase 1c
Single-writer rule (§6.1): Builder edits `## Build backlog`; Adversary edits `## Adversary findings`.
## Build backlog
Method W1W6 from the phase plan §5. Each milestone ends with an Adversary gate.
- [ ] **W2 — Secrets repo + cert into git.**
- [ ] Create private repo `recipe-maintainers/cc-ci-secrets` (bot is admin).
- [ ] Move `secrets/secrets.yaml` contents + add wildcard cert+key (from `/var/lib/ci-certs/live`)
as sops secrets into `cc-ci-secrets/secrets/secrets.yaml`; copy `.sops.yaml`.
- [ ] Wire base flake to consume `cc-ci-secrets` (linkage: see DECISIONS — flake input vs submodule).
- [ ] secrets.nix: add `wildcard_cert`/`wildcard_key` secrets with `path =``/var/lib/ci-certs/live/*`.
- [ ] proxy.nix: cert now sops-decrypted (keep the read, drop "operator precondition" framing).
- [ ] Verify: `nixos-rebuild build --flake .#cc-ci` byte-identical to `/run/current-system`.
- [ ] Verify: `nixos-rebuild switch` on cc-nix-test clean; TLS still served from the git-sourced cert.
- [ ] **Gate W2 CLAIMED** → Adversary verifies byte-identical + TLS-from-git-cert.
- [ ] **W1 — Headroom (just before W3).** Resize `cc-nix-test` 6 GB→4 GB (stop→set→start). Accept:
b1 has room; cc-nix-test healthy at 4 GB.
- [ ] **W3 — Throwaway VM.** Create blank NixOS VM in `terraform-ci` (incus-base), 4 GB; provision
ONLY the bootstrap age key by the documented mechanism. Accept: VM reachable.
- [ ] **W4 — Reproducible live rebuild.** On throwaway VM: clone base+secrets, `nixos-rebuild switch`,
watch oneshots converge, secrets+cert decrypt. Accept: fully up, no step outside docs/install.md;
capture evidence. **Gate W4 CLAIMED.**
- [ ] **W5 — Adversary cold proof + honest D8.** Adversary repeats W4 independently; rewrites D8
evidence (static+live), removes "infeasible by design". Accept: Adversary D8 live-rebuild PASS
(or narrow signed-off limitation per C5).
- [ ] **W6 — Cleanup + docs + final sizing.** Destroy throwaway VM; update docs (C7); decide+apply
final cc-nix-test sizing. Accept: no leftover; docs match; flip STATUS-1c → `## DONE`.
## Adversary findings
(none yet — Adversary owns this section)

View File

@ -186,3 +186,28 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
## Dead-ends
- (none yet)
## Phase 1c (full reproducibility + genuine D8 live rebuild) — 2026-05-27
- **Secrets linkage = git SUBMODULE (deviates from plan §7 flake-input default).** `cc-ci-secrets`
is mounted as a submodule at `cc-ci/secrets/` rather than a flake `inputs.secrets`. Rationale: a
private flake input must be re-fetched at **every nix eval**, requiring the bot token persistently
in nix config/netrc on cc-ci AND the throwaway VM (a token in the store/config = a 2nd out-of-band
secret, which 1c forbids). A submodule makes `secrets/secrets.yaml` a plain path in the working
tree → `defaultSopsFile = ../secrets/secrets.yaml` is unchanged (minimal diff, trivially
byte-identical), and the only credential use is the one `git clone --recursive` at provisioning
("the two repos are *given*", Mission §1). Build invocation becomes
`nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` so the submodule tree is
included. (Revisit if `?submodules=1` proves unreliable on cc-ci's nix version.)
- **Bootstrap key for the throwaway VM = the existing RECOVERY (master) age key, via
`sops.age.keyFile`.** The recovery key (`age1cmk26…`, private at `/srv/cc-ci/.sops/master-age.txt`)
is already a sops recipient, so a fresh host with a *different* ssh host key still decrypts every
secret with no re-keying — this is exactly the §0 argument that defeats "host-key binding".
Provisioned to the VM at a fixed path (the ONE out-of-band secret). cc-ci itself keeps decrypting
via its host key (`age.sshKeyPaths`); secrets.nix will offer both identity sources. (Per-host
re-encrypt is cleaner for a *permanent* new instance — documented as the alternative, not used for
the throwaway test.)
- **Cert into git:** wildcard cert+key become sops secrets in `cc-ci-secrets`, decrypted at
activation back to `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` via
`sops.secrets.<name>.path`; proxy.nix keeps reading that path (now sops-sourced, not operator-drop).
- **cc-nix-test final sizing:** TBD in W6 (keep 4 GB / restore 6 GB / promote rebuilt VM).

38
JOURNAL-1c.md Normal file
View File

@ -0,0 +1,38 @@
# JOURNAL — Phase 1c (Builder)
Append-only. Each entry: what I did + verifying command/output + next.
## 2026-05-27 — Phase 1c kickoff / orientation
**Context:** Phase 1 closed DONE (1c10fa5; all D1D10 Adversary-PASS, no VETO; Adversary loop
terminated for Phase 1). Starting Phase 1c per `plan-phase1c-full-reproducibility.md`.
**Did:**
- Read the phase plan in full + plan.md §6.1/§7/§9. Verified access & surveyed current state.
- `ssh cc-ci 'hostname && nixos-version && systemctl is-system-running'`
`nixos` / `24.11.20250630.50ab793 (Vicuna)` / `running`. Healthy.
- Surveyed secrets layout:
- `secrets/secrets.yaml` holds 6 infra secrets: test_secret, drone_rpc_secret,
drone_gitea_client_secret, bridge_drone_token, bridge_gitea_token, bridge_webhook_hmac.
- `.sops.yaml`: recipients = `&host` (ssh-to-age of cc-ci ed25519 host key) + `&master`
(off-box recovery, private at `/srv/cc-ci/.sops/master-age.txt`, sandbox-only).
- `modules/secrets.nix`: `defaultSopsFile = ../secrets/secrets.yaml`,
`age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]`.
- Wildcard cert OUT-OF-BAND at `cc-ci:/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}`
(fullchain 2909 B, privkey 227 B → privkey looks like an EC key). `modules/proxy.nix` reconcile
reads it from there and inserts as traefik `ssl_cert`/`ssl_key` swarm secrets; treats a missing
cert as FATAL (operator precondition).
- `flake.nix`: nixpkgs pinned `50ab793`, sops-nix pinned; single local `nixosConfigurations.cc-ci`.
- Tooling note: sandbox host has NO sops/nix/age/ssh-to-age (`which` → only git). sops/age work
must run on cc-ci (has nix + host age key) or via a sops binary fetched there with the master key.
- Bootstrapped Phase-1c state: STATUS-1c.md, BACKLOG-1c.md, JOURNAL-1c.md (this file). REVIEW-1c.md
left for the Adversary (its file per §6.1). Appended Phase-1c decisions to DECISIONS.md.
**Decisions recorded (DECISIONS.md):** secrets linkage = **git submodule** (deviates from the
flake-input default — rationale: no private-repo fetch credential needed at nix-eval on every
rebuild, keeps `defaultSopsFile` a local path = minimal change + trivially byte-identical);
bootstrap key for throwaway = **recovery age key via `sops.age.keyFile`**.
**Next (W2):** create private `recipe-maintainers/cc-ci-secrets`; move secrets + wildcard cert into
sops there as a submodule of the base; wire secrets.nix (cert→`/var/lib/ci-certs/live` via `path=`);
prove byte-identical build + clean switch with TLS from the git cert. Then claim Gate W2.

41
STATUS-1c.md Normal file
View File

@ -0,0 +1,41 @@
# STATUS — Phase 1c (full git reproducibility + genuine D8 live rebuild)
**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase1c-full-reproducibility.md`
**Loop state for THIS phase:** STATUS-1c / BACKLOG-1c / REVIEW-1c / JOURNAL-1c (DECISIONS.md shared).
The repo's STATUS.md / BACKLOG.md / REVIEW.md are Phase-1 HISTORY — not this phase's state.
## Phase
**1c kickoff** — Phase 1 is DONE & Adversary-signed-off (1c10fa5; all D1D10 PASS, no VETO).
Now: make the VM fully reproducible from git (secrets+cert in a private `cc-ci-secrets` repo) and
perform a genuine throwaway-VM live rebuild to close D8 honestly.
## In flight
- **W2 (next):** create private `cc-ci-secrets` repo; move all secrets + the wildcard cert into sops
there; wire the base flake to consume it. (W1 resize deferred until just before W3 — its only
purpose is RAM headroom for the throwaway VM, and it briefly stops the live server.)
## Definition of Done (C1C7 — see phase plan §3)
- [ ] C1 — Secrets-repo split (private `cc-ci-secrets`, base stays one parameterized repo, byte-identical build)
- [ ] C2 — Cert in git (wildcard cert+key as sops secrets, decrypted at activation; no operator cert-drop step)
- [ ] C3 — All secrets in git, one exception = bootstrap age key (documented)
- [ ] C4 — Genuine throwaway-VM live rebuild (Incus terraform-ci, only age key provisioned)
- [ ] C5 — Honest D8 (static byte-identical + live rebuild; "infeasible by design" removed)
- [ ] C6 — Resource fit + cleanup (cc-nix-test 6→4 GB, throwaway 4 GB, destroyed after; final sizing decided)
- [ ] C7 — Docs (install.md/secrets.md/architecture.md + main plan refs updated to new model)
## Gate
None claimed yet. (Milestone gates W2/W4/W5 will be CLAIMED here per §6.1.)
## Blocked
(none)
## Notes
- Current secret layout: `secrets/secrets.yaml` (6 infra secrets), recipients = host age key
(ssh-to-age of cc-ci's ed25519 host key) + off-box master recovery key
(`/srv/cc-ci/.sops/master-age.txt`, sandbox-only). `.sops.yaml` at repo root.
- Wildcard cert currently out-of-band at `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}`
(operator-provided, LE, next renewal ~2026-08-24); proxy.nix reads it from there. 1c moves it
into sops-in-git, decrypted back to that path at activation.
- Sandbox host has NO sops/nix/age — sops ops run on cc-ci (has nix + host age key) or via the master
key with a sops binary fetched on cc-ci.
- cc-nix-test == the live cc-ci server (100.90.116.4); resizing it (W1) briefly stops it.