M0: flake + base NixOS config, rebuilt from repo on cc-ci

Pins nixpkgs to the rev cc-ci already ran (no-op-then-base); deploy via
switch --flake on-host. System healthy (gen 3) post-switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 21:25:48 +01:00
parent c21cce51b9
commit 9bffb55b28
9 changed files with 186 additions and 17 deletions

View File

@ -6,10 +6,11 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver
## Build backlog
### M0 — Foundations
- [ ] Author flake.nix (NixOS host cc-ci) + hosts/cc-ci/{configuration,hardware}.nix from baseline
- [ ] Deploy mechanism decision + first rebuild from repo (DECISIONS.md)
- [x] Author flake.nix (NixOS host cc-ci) + hosts/cc-ci/{configuration,hardware}.nix from baseline
- [x] Deploy mechanism decision + first rebuild from repo (DECISIONS.md) — switch --flake on host
- [ ] sops-nix wiring: host age key, secrets/secrets.yaml, decrypt a test secret on host
- [ ] Gate: M0 — `ssh cc-ci 'systemctl is-system-running'` healthy after rebuild from repo
- [ ] Gate: M0 — `ssh cc-ci 'systemctl is-system-running'` healthy after rebuild from repo (base
rebuild verified healthy 2026-05-26; will CLAIM gate once sops test-secret also lands)
### M1 — Swarm + abra target
- [ ] Docker + single-node swarm via Nix

View File

@ -12,10 +12,17 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
## Open (defaults from §8, to confirm as reality lands)
- **Deploy mechanism:** TBD in M0. Leaning `nixos-rebuild switch --flake` run *on cc-ci itself*
(repo cloned on host) rather than `--target-host`/deploy-rs from the sandbox, to avoid copying
large Nix closures over the userspace-tailscaled SOCKS proxy. Atomic-rollback is preserved by
Nix generations. Will record final choice + rationale when M0 lands.
- **Deploy mechanism — SETTLED (M0):** `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run *on
cc-ci itself*, with the repo materialised on the host at `/root/cc-ci`. Chosen over
`--target-host`/deploy-rs to avoid pushing large closures over the userspace-tailscaled SOCKS
proxy (slow/fragile). Atomic rollback preserved by Nix generations (`nixos-rebuild --rollback`).
The switch is launched as a **detached transient systemd unit** (`systemd-run --unit=ccci-rebuild
--collect`) so it survives a momentary ssh-over-tailscale drop during activation. For the build
loop the host copy is synced from the sandbox clone via `tar | ssh` (rsync absent on host);
source of truth stays the git repo. D8/install.md will document the from-scratch path (clone repo
on a fresh host, then `nixos-rebuild switch --flake .#cc-ci`).
- **nixpkgs pin:** flake pins the exact rev cc-ci already ran (`50ab793…`) so the first rebuild
is a true no-op-then-base. Bump deliberately, never drift.
- **Webhook scope:** default per-repo via enroll script.
- **Drone runner type:** default exec (must drive host abra).
- **Secret tool:** default sops-nix.
@ -25,9 +32,11 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
## Risks
- **Disk:** cc-ci has only ~3.8 GiB free on an 8.9 GiB root. Multiple recipe images + volumes may
exhaust it during M6.5 breadth. Mitigation: aggressive teardown + image prune; if insufficient,
request operator grow the VM disk (Incus, recreatable per the incus skill). Not yet blocking.
- **Disk — RESOLVED 2026-05-26.** Original 8.9 GiB root had only ~3.8 GiB free *and* a hard
**inode** ceiling (586k total, ~6k free) — the flake's nixpkgs fetch (~50k files) hit ENOSPC on
inodes before bytes. Operator grew the VM to **28 GiB** (22 GiB free, 1.78M inodes / 1.21M free);
the ext4 fs auto-resized (new block groups carry proportional inodes). Keep aggressive teardown +
periodic `docker image prune` to avoid regressing during M6.5 breadth.
## Dead-ends
- (none yet)

View File

@ -22,3 +22,39 @@
- Seeded skeleton layout (§3) + loop-state files + docs/baseline.md.
**Next:** commit + push bootstrap, then M0 (flake + base config + sops test secret).
## 2026-05-26 — M0: flake + base config rebuilt from repo
**Authored** `flake.nix` (pins nixpkgs rev `50ab793786d9…`, the exact rev cc-ci ran),
`hosts/cc-ci/hardware.nix` (incus VM module + cloud-init + DHCP/nameservers) and
`hosts/cc-ci/configuration.nix` (faithful baseline repro: tailscale w/ hardcoded `--hostname=
cc-nix-test` since `builtins.readFile /etc/ts-hostname` is impure under flakes; sshd root; firewall
trust tailscale0 + tcp/22; base pkgs).
**Disk/inode hiccup → resolved:** first `nix flake lock`/build hit `No space left on device`
diagnosed as **inode** exhaustion (`df -i` → 6005 free of 586336; old 8.9 GiB fs). Operator grew
the VM to 28 GiB while I was measuring; ext4 auto-resized → 22 GiB free, 1.21M inodes free. Retried.
**Build + switch (commands + output):**
- `ssh cc-ci 'cd /root/cc-ci && nix flake lock && nixos-rebuild build --flake .#cc-ci'``BUILD EXIT 0`,
produced `nixos-system-nixos-24.11.20250630.50ab793`.
- `ssh cc-ci 'systemd-run --unit=ccci-rebuild --collect --property=Type=oneshot nixos-rebuild switch
--flake /root/cc-ci#cc-ci'` (detached so it survives ssh drop) → unit `Result=success
ExecMainStatus=0`.
**Gate verification:**
- `systemctl is-system-running` → `running`
- `readlink /run/current-system` → `…-nixos-system-nixos-24.11.20250630.50ab793` (gen 3, from flake)
- `systemctl is-active tailscaled` → `active`; `sshd.socket` → `active` (sshd is socket-activated, so
`sshd.service` reads inactive — live ssh proves it works)
- `systemctl --failed` → none
- `nixos-rebuild list-generations` → gen 3 current @20:23, prior channel gen 2 retained for rollback.
**Known warning (tracked, non-blocking):** incus module enables `systemd.network` while we keep
`networking.useDHCP=true` (scripted dhcpcd); Nix warns both may manage interfaces. Inherited from
baseline; networking is up. Clean up by choosing one stack later.
**Deploy mechanism settled** (DECISIONS.md): `switch --flake` on-host, repo synced via `tar | ssh`.
**Next:** sops-nix wiring (host age key from ssh host key + a decrypt-a-test-secret proof), then
CLAIM the M0 gate for the Adversary.

View File

@ -1,17 +1,22 @@
# STATUS — cc-ci Builder
**Phase:** M0 — Foundations
**In-flight:** Bootstrap complete; starting M0 (flake + base config + sops test secret).
**Last updated:** 2026-05-26 (bootstrap)
**In-flight:** Base flake config deployed + verified. Next M0 task: sops-nix + decrypt a test secret.
**Last updated:** 2026-05-26 (M0 base config live)
## Gates
- (none claimed yet)
- (none claimed yet — M0 gate pends sops wiring)
## Blocked
- (none)
## Notes
- cc-ci baseline: Incus VM, 2 vCPU, 3.5 GiB RAM, **3.8 GiB free disk** — tight for multi-recipe
docker deploys; watch disk pressure, may need operator to grow the VM disk before M6.5 breadth.
- Server config is currently channel-based `/etc/nixos/configuration.nix` (no flake). M0 converts
to a flake checked out from this repo on the host.
- **Disk RESOLVED:** operator grew the VM 8.9→**28 GiB** (22 GiB free) on 2026-05-26. Inodes
1.78M total / 1.21M free (was ~6k free — old 8.9 GiB fs had only 586k inodes, which the flake's
nixpkgs fetch exhausted). Both byte + inode pressure gone.
- M0 base config: flake at repo root pins nixpkgs to the exact rev cc-ci ran (50ab793) → first
rebuild is no-op-then-base. Deployed via `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run as
a detached transient systemd unit (survives ssh-over-tailscale drops). Gen 3 current, healthy.
- Open warning: incus module enables `systemd.network` while we set `networking.useDHCP=true`
(scripted dhcpcd) — Nix warns both may manage interfaces. Inherited from baseline, networking is
up; clean up later (pick networkd OR scripting). Tracked, non-blocking.

27
flake.lock generated Normal file
View File

@ -0,0 +1,27 @@
{
"nodes": {
"nixpkgs": {
"locked": {
"lastModified": 1751274312,
"narHash": "sha256-/bVBlRpECLVzjV19t5KMdMFWSwKLtb5RyXdjz3LJT+g=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "50ab793786d9de88ee30ec4e4c24fb4236fc2674",
"type": "github"
},
"original": {
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "50ab793786d9de88ee30ec4e4c24fb4236fc2674",
"type": "github"
}
},
"root": {
"inputs": {
"nixpkgs": "nixpkgs"
}
}
},
"root": "root",
"version": 7
}

28
flake.nix Normal file
View File

@ -0,0 +1,28 @@
{
description = "cc-ci Co-op Cloud recipe CI server (NixOS)";
inputs = {
# Pinned to the exact revision cc-ci already runs, so the first rebuild from
# this repo is a true no-op-then-base (M0). Bump deliberately, not drift.
nixpkgs.url = "github:NixOS/nixpkgs/50ab793786d9de88ee30ec4e4c24fb4236fc2674";
};
outputs = { self, nixpkgs }:
let
system = "x86_64-linux";
pkgs = nixpkgs.legacyPackages.${system};
in
{
nixosConfigurations.cc-ci = nixpkgs.lib.nixosSystem {
inherit system;
modules = [ ./hosts/cc-ci/configuration.nix ];
};
# Devshell for working on the harness/bridge locally.
devShells.${system}.default = pkgs.mkShell {
packages = with pkgs; [ git jq curl nixpkgs-fmt ];
};
formatter.${system} = pkgs.nixpkgs-fmt;
};
}

View File

View File

@ -0,0 +1,42 @@
# cc-ci machine config. M0 = faithful reproduction of the baseline (docs/baseline.md)
# so the first flake rebuild is a no-op-then-base. Services (swarm/Traefik/Drone/
# bridge/dashboard) are layered in via ./modules/* in later milestones.
{ pkgs, lib, ... }:
{
imports = [
./hardware.nix
];
# --- Tailscale (ACCESS-CRITICAL: do not break, this is the only route in) ---
# Baseline read the hostname from /etc/ts-hostname at eval time; that is impure
# under flakes, so we pin the known hostname. The reusable auth-key file persists.
services.tailscale = {
enable = true;
authKeyFile = "/etc/ts-auth-key";
extraUpFlags = [ "--hostname=cc-nix-test" ];
};
# --- SSH (root login over tailscale) ---
services.openssh = {
enable = true;
settings.PermitRootLogin = "yes";
};
# --- Firewall: trust tailscale, allow SSH ---
networking.firewall = {
enable = true;
trustedInterfaces = [ "tailscale0" ];
allowedTCPPorts = [ 22 ];
};
environment.systemPackages = with pkgs; [
curl
git
jq
openssh
];
nix.settings.experimental-features = [ "nix-command" "flakes" ];
system.stateVersion = "24.11";
}

21
hosts/cc-ci/hardware.nix Normal file
View File

@ -0,0 +1,21 @@
# Hardware / platform for cc-ci: an Incus VM (x86_64) on the autonomic infra.
# Mirrors the pre-flake baseline (docs/baseline.md).
{ modulesPath, ... }:
{
imports = [
"${modulesPath}/virtualisation/incus-virtual-machine.nix"
];
# incus-agent for `incus exec`
virtualisation.incus.agent.enable = true;
# cloud-init seeded the VM (network + /etc/ts-* files); keep it enabled.
services.cloud-init = {
enable = true;
network.enable = true;
};
# DHCP from the incus bridge; bridge provides no resolver, so set our own.
networking.useDHCP = true;
networking.nameservers = [ "1.1.1.1" "8.8.8.8" ];
}