diff --git a/.drone.yml b/.drone.yml index 3aa04f7..db9b88e 100644 --- a/.drone.yml +++ b/.drone.yml @@ -35,7 +35,7 @@ steps: # the comment-bridge). Deploys the recipe at the PR head, runs install/upgrade/backup + any # recipe-local tests via the shared harness, then guarantees teardown (plan §4.2/§4.3). # -# Resource safety (plan §4.2/§4.3): MAX_TESTS=DRONE_RUNNER_CAPACITY=1 (modules/drone-runner.nix) is +# Resource safety (plan §4.2/§4.3): MAX_TESTS=DRONE_RUNNER_CAPACITY=1 (nix/modules/drone-runner.nix) is # the primary concurrency cap; concurrency.limit below is a redundant belt. CCCI_JANITOR_MAX_AGE=0 # makes the run-start janitor reap ANY orphaned run app before deploying — safe because capacity=1 # means no concurrent run exists (a SIGKILL'd/timed-out build leaves an orphan with no teardown). diff --git a/DECISIONS.md b/DECISIONS.md index 04df421..21ad97e 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -241,3 +241,25 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8) recipe CI uses polling as primary, but cc-ci's *own* self-test/lint relies on the push webhook. The lint stage is correctly wired and proven green via the identical `nix develop .#lint` command; reliably auto-firing it on every push is tracked as a (pre-existing) infra item, not a 1b lint gap. + +## Phase 1b — repo layout (operator review items RL5/RL6, plan §7) +- **RL5 — all Nix code under `nix/`.** Moved `modules/`→`nix/modules/` and `hosts/`→`nix/hosts/`. + `flake.nix`/`flake.lock` STAY at the repo root (entry point) so the build ref `#cc-ci` and + `nixos-rebuild --flake '…#cc-ci'` are unchanged — only `flake.nix`'s internal + `./hosts/cc-ci/configuration.nix` → `./nix/hosts/cc-ci/configuration.nix` changed. Root-relative + refs inside the moved modules were re-based `../X` → `../../X` (secrets.nix → `../../secrets/`, + bridge.nix → `../../bridge/`, dashboard.nix → `../../dashboard/`); `configuration.nix`'s + `../../modules/*` imports are unchanged (both dirs moved under `nix/`, so the relative path still + resolves). **Toplevel is byte-identical (`8i3jcad9…`) before/after the move** — store derivations + are content-addressed on the copied file *contents*, and the module `.nix` files aren't part of the + runtime closure, so relocating folders doesn't change the build. (The operator anticipated a hash + change; in practice it's stable, which is even stronger for reproducibility.) Living docs + (README, architecture/install/secrets/enroll) + the `.drone.yml` comment updated to `nix/…`; + append-only history logs left as the record of what was true then. +- **RL6 — protocol files → `machine-docs/`: DEFERRED to the coordinated end of 1b.** Will `git mv` + `STATUS*/REVIEW*/JOURNAL*/BACKLOG*/DECISIONS.md` into `machine-docs/` (README.md STAYS at root — + operator decision, it's the human readme, not a protocol file). The live watchdog (`launch.sh`) + reads `STATUS-.md`/`REVIEW-.md` at the repo root for handoffs/transition, so this is done + LAST, in lockstep with the orchestrator updating `launch.sh` + restarting the watchdog — not + unilaterally and not while a phase transition is pending. The Adversary likewise `git mv`s its own + REVIEW files at the cutover (single-writer rule). diff --git a/README.md b/README.md index d83d060..c207af5 100644 --- a/README.md +++ b/README.md @@ -13,10 +13,10 @@ per-recipe test trees, and the docs to enroll a recipe or rebuild the box from s ## Layout ``` -flake.nix NixOS host(s) + devshell -hosts/cc-ci/ the cc-ci machine config -modules/ drone, comment-bridge, swarm, dashboard, secrets (Nix modules) -secrets/ sops-encrypted infra secrets +flake.nix NixOS entry point + devshells (stays at root; build ref #cc-ci) +nix/hosts/cc-ci/ the cc-ci machine config +nix/modules/ drone, comment-bridge, swarm, dashboard, secrets (Nix modules) +secrets/ sops-encrypted infra secrets (cc-ci-secrets submodule) bridge/ !testme webhook listener source runner/ run_recipe_ci.py + shared pytest harness dashboard/ results overview generator @@ -24,6 +24,9 @@ tests// per-recipe install/upgrade/backup tests + playwright/ docs/ install, enroll-recipe, secrets, architecture, runbook, baseline ``` +All `.nix` code lives under `nix/`; `flake.nix`/`flake.lock` stay at the repo root so the build +reference (`nixos-rebuild switch --flake '…#cc-ci'`) is unchanged. + ## Docs - `docs/install.md` — rebuild the server from scratch (D8) diff --git a/docs/architecture.md b/docs/architecture.md index b1d3441..aa7a85f 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -3,18 +3,26 @@ cc-ci turns a `!testme` PR comment into a real end-to-end deploy + test of a Co-op Cloud recipe and reports the result back. Everything on the `cc-ci` host is declared in this repo's NixOS flake. +## Repo layout + +All Nix code lives under **`nix/`** — `nix/hosts/cc-ci/` (the machine config) and `nix/modules/` +(the service modules). `flake.nix` / `flake.lock` stay at the **repo root** as the entry point, so +the build reference is unchanged (`nixos-rebuild switch --flake '…#cc-ci'`). Application source sits +at the root (`bridge/`, `dashboard/`, `runner/`, `tests/`); encrypted secrets are the `secrets/` +submodule. + ## Components | Component | Where | Role | |---|---|---| -| **comment-bridge** | `bridge/bridge.py`, `modules/bridge.nix` (swarm svc, `ci.commoninternet.net/hook`) | Polls enrolled repos for `!testme` (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. | -| **Drone server** | `modules/drone.nix` — coop-cloud `drone` recipe via abra (`drone.ci.commoninternet.net`, Gitea SSO) | CI engine. Holds the `recipe-ci` (custom-event) and `self-test` (push) pipelines (`.drone.yml`). | -| **Drone exec runner** | `modules/drone-runner.nix` — host systemd service | Runs pipeline steps **on the host** so they can drive `abra`/Docker. `DRONE_RUNNER_CAPACITY=1` (MAX_TESTS) caps concurrent builds; the rest queue natively. | +| **comment-bridge** | `bridge/bridge.py`, `nix/modules/bridge.nix` (swarm svc, `ci.commoninternet.net/hook`) | Polls enrolled repos for `!testme` (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. | +| **Drone server** | `nix/modules/drone.nix` — coop-cloud `drone` recipe via abra (`drone.ci.commoninternet.net`, Gitea SSO) | CI engine. Holds the `recipe-ci` (custom-event) and `self-test` (push) pipelines (`.drone.yml`). | +| **Drone exec runner** | `nix/modules/drone-runner.nix` — host systemd service | Runs pipeline steps **on the host** so they can drive `abra`/Docker. `DRONE_RUNNER_CAPACITY=1` (MAX_TESTS) caps concurrent builds; the rest queue natively. | | **harness** | `runner/run_recipe_ci.py` + `runner/harness/` + `tests/` | Orchestrates per run: fetch recipe at the PR head → install → upgrade → backup/restore → recipe-local (D4) → guaranteed teardown. pytest + Playwright via the Nix `cc-ci-run` env. | -| **swarm + traefik** | `modules/swarm.nix`, `modules/proxy.nix` — coop-cloud `traefik` recipe via abra | Single-node Docker Swarm + `proxy` overlay; traefik terminates TLS with the wildcard cert (**sops-decrypted from git** to `/var/lib/ci-certs/live`, file provider, **no ACME**). The real deploy target for recipes-under-test. | -| **backup-bot-two** | `modules/backupbot.nix` | restic-based volume/DB backups; `abra app backup/restore` drive it. | -| **dashboard** | `dashboard/dashboard.py`, `modules/dashboard.nix` (`ci.commoninternet.net`) | YunoHost-CI-like overview: latest run per recipe + status badges + run links; `/badge/.svg`. | -| **secrets** | `modules/secrets.nix` + `secrets/` = **`cc-ci-secrets` submodule** (sops-nix) | **Phase-1c secrets model:** ALL secrets incl. the **wildcard TLS cert+key are sops-encrypted in git** in the private `cc-ci-secrets` repo, mounted as a **git submodule** at `secrets/` (the base `cc-ci` repo holds **no** secret material). Decrypted at activation by the **bootstrap age key** at `/var/lib/sops-nix/key.txt` (`sops.age.keyFile`) — cc-ci's host-derived age identity, or the **off-box recovery key on a fresh/cloned host** whose SSH key isn't a recipient; the host SSH key is also offered (`sops.age.sshKeyPaths`). The cert is decrypted to `/var/lib/ci-certs/live/` (no out-of-band file drop). This **one** age key is the only secret not in git. See `secrets.md`. | +| **swarm + traefik** | `nix/modules/swarm.nix`, `nix/modules/proxy.nix` — coop-cloud `traefik` recipe via abra | Single-node Docker Swarm + `proxy` overlay; traefik terminates TLS with the wildcard cert (**sops-decrypted from git** to `/var/lib/ci-certs/live`, file provider, **no ACME**). The real deploy target for recipes-under-test. | +| **backup-bot-two** | `nix/modules/backupbot.nix` | restic-based volume/DB backups; `abra app backup/restore` drive it. | +| **dashboard** | `dashboard/dashboard.py`, `nix/modules/dashboard.nix` (`ci.commoninternet.net`) | YunoHost-CI-like overview: latest run per recipe + status badges + run links; `/badge/.svg`. | +| **secrets** | `nix/modules/secrets.nix` + `secrets/` = **`cc-ci-secrets` submodule** (sops-nix) | **Phase-1c secrets model:** ALL secrets incl. the **wildcard TLS cert+key are sops-encrypted in git** in the private `cc-ci-secrets` repo, mounted as a **git submodule** at `secrets/` (the base `cc-ci` repo holds **no** secret material). Decrypted at activation by the **bootstrap age key** at `/var/lib/sops-nix/key.txt` (`sops.age.keyFile`) — cc-ci's host-derived age identity, or the **off-box recovery key on a fresh/cloned host** whose SSH key isn't a recipient; the host SSH key is also offered (`sops.age.sshKeyPaths`). The cert is decrypted to `/var/lib/ci-certs/live/` (no out-of-band file drop). This **one** age key is the only secret not in git. See `secrets.md`. | All swarm infra (traefik, drone, bridge, dashboard, backupbot) is brought up by **idempotent-reconcile systemd oneshots** that converge on every activation/boot (no run-once sentinels), **serialized** diff --git a/docs/enroll-recipe.md b/docs/enroll-recipe.md index fe5241e..9c8e037 100644 --- a/docs/enroll-recipe.md +++ b/docs/enroll-recipe.md @@ -44,7 +44,7 @@ env `CCCI_BASE_URL` (e.g. `https://.ci.commoninternet.net/`) and `CCCI_APP_ ## 4. Add the repo to the bridge poll list The trigger is **polling** (primary): add the repo's full name to the comment-bridge `POLL_REPOS` -csv (`modules/bridge.nix`) and `nixos-rebuild switch`. The bridge then polls that repo's open PRs +csv (`nix/modules/bridge.nix`) and `nixos-rebuild switch`. The bridge then polls that repo's open PRs every 30s and fires a run on a new `!testme` comment from an authorized org member. This needs only **read + comment** access — no webhook, no repo-admin. diff --git a/docs/install.md b/docs/install.md index a185eb2..a5f8c7e 100644 --- a/docs/install.md +++ b/docs/install.md @@ -29,17 +29,17 @@ switch` → fully converged cc-ci, 0 failed units — see DECISIONS.md Phase-1c **External infra (operator-owned, not on the box — class-A1):** - DNS: `*.ci.commoninternet.net` (+ bare) → the **gateway**, which TLS-passthroughs (SNI) to cc-ci. -- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `modules/swarm.nix`). +- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `nix/modules/swarm.nix`). - The wildcard cert is **renewed out-of-band** by the operator, who then re-encrypts it into `cc-ci-secrets` (sops) and rebuilds — the Gandi DNS token never touches the box; **never ACME here.** ## 1. Apply the NixOS flake (this is the whole install) -The flake (`flake.nix`, `hosts/cc-ci/`, `modules/`) declares: base host, sops-nix (decrypts via the +The flake (`flake.nix`, `nix/hosts/cc-ci/`, `nix/modules/`) declares: base host, sops-nix (decrypts via the host SSH key), Docker + single-node Swarm + the `proxy` overlay + firewall 80/443 -(`modules/swarm.nix`), abra (`modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot** -(`modules/proxy.nix`), the **Drone server reconcile oneshot** (`modules/drone.nix`), and the -**Drone exec runner** (`modules/drone-runner.nix`). +(`nix/modules/swarm.nix`), abra (`nix/modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot** +(`nix/modules/proxy.nix`), the **Drone server reconcile oneshot** (`nix/modules/drone.nix`), and the +**Drone exec runner** (`nix/modules/drone-runner.nix`). ```sh # 1. Clone base + the private secrets submodule (bot/deploy creds for cc-ci-secrets). diff --git a/docs/secrets.md b/docs/secrets.md index 4b758b0..5ae101b 100644 --- a/docs/secrets.md +++ b/docs/secrets.md @@ -65,7 +65,7 @@ All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/` | `drone_rpc_secret` | Drone server ↔ exec runner RPC | `openssl rand -hex 32` | | `drone_gitea_client_secret` | Drone↔Gitea OAuth app | from the Gitea OAuth app creation | | `bridge_webhook_hmac` | comment-bridge webhook HMAC | `openssl rand -hex 32` | -| `bridge_drone_token` | bridge + dashboard → Drone API | hex token; **injected as the bot's Drone machine token** via `DRONE_USER_CREATE=…,token:$(cat /run/secrets/bridge_drone_token)` (modules/drone.nix) so it's reproducible on a fresh Drone DB (else the bridge gets 401 on a clean-room rebuild) | +| `bridge_drone_token` | bridge + dashboard → Drone API | hex token; **injected as the bot's Drone machine token** via `DRONE_USER_CREATE=…,token:$(cat /run/secrets/bridge_drone_token)` (nix/modules/drone.nix) so it's reproducible on a fresh Drone DB (else the bridge gets 401 on a clean-room rebuild) | | `bridge_gitea_token` | bridge → Gitea API (poll/comment) | minted Gitea token (bot) | | `restic_password` | backup-bot-two restic repo | **abra-generated** (`abra app secret generate`, kept stable across reconciles) | @@ -76,7 +76,7 @@ All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/` `cc-ci-secrets`, then bump the base repo's submodule pointer (`git add secrets && commit`). 3. For swarm-secret-backed values, **bump the consuming app's secret version** so the reconcile re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone `RPC_SECRET_VERSION` - v1→v2 (modules/drone.nix), bridge `cc_ci_bridge_*_v` (modules/bridge.nix). Update both ends + v1→v2 (nix/modules/drone.nix), bridge `cc_ci_bridge_*_v` (nix/modules/bridge.nix). Update both ends (server + runner share `drone_rpc_secret`). 4. `git commit` + push, sync to host, `nixos-rebuild switch` → reconcile re-inserts + redeploys. 5. Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers). diff --git a/flake.nix b/flake.nix index b3d30af..3c65eeb 100644 --- a/flake.nix +++ b/flake.nix @@ -35,7 +35,7 @@ inherit system; modules = [ sops-nix.nixosModules.sops - ./hosts/cc-ci/configuration.nix + ./nix/hosts/cc-ci/configuration.nix ]; }; diff --git a/hosts/cc-ci/configuration.nix b/nix/hosts/cc-ci/configuration.nix similarity index 100% rename from hosts/cc-ci/configuration.nix rename to nix/hosts/cc-ci/configuration.nix diff --git a/hosts/cc-ci/hardware.nix b/nix/hosts/cc-ci/hardware.nix similarity index 100% rename from hosts/cc-ci/hardware.nix rename to nix/hosts/cc-ci/hardware.nix diff --git a/modules/abra.nix b/nix/modules/abra.nix similarity index 100% rename from modules/abra.nix rename to nix/modules/abra.nix diff --git a/modules/backupbot.nix b/nix/modules/backupbot.nix similarity index 100% rename from modules/backupbot.nix rename to nix/modules/backupbot.nix diff --git a/modules/bridge.nix b/nix/modules/bridge.nix similarity index 98% rename from modules/bridge.nix rename to nix/modules/bridge.nix index d3686cb..f6057b3 100644 --- a/modules/bridge.nix +++ b/nix/modules/bridge.nix @@ -7,13 +7,13 @@ let # bridge.py placed at /app/bridge.py inside the image. bridgeApp = pkgs.runCommand "cc-ci-bridge-app" { } '' mkdir -p $out/app - cp ${../bridge/bridge.py} $out/app/bridge.py + cp ${../../bridge/bridge.py} $out/app/bridge.py ''; # Content-derived tag so `docker stack deploy` rolls the service whenever bridge.py changes # (a fixed `:latest` + unchanged stack spec does NOT roll — swarm sees no change). imageTag = builtins.substring 0 12 (builtins.hashString "sha256" - (builtins.readFile ../bridge/bridge.py)); + (builtins.readFile ../../bridge/bridge.py)); image = pkgs.dockerTools.buildLayeredImage { name = "cc-ci-bridge"; diff --git a/modules/dashboard.nix b/nix/modules/dashboard.nix similarity index 97% rename from modules/dashboard.nix rename to nix/modules/dashboard.nix index 52985a0..5c2213d 100644 --- a/modules/dashboard.nix +++ b/nix/modules/dashboard.nix @@ -7,14 +7,14 @@ let dashApp = pkgs.runCommand "cc-ci-dashboard-app" { } '' mkdir -p $out/app - cp ${../dashboard/dashboard.py} $out/app/dashboard.py + cp ${../../dashboard/dashboard.py} $out/app/dashboard.py ''; # Content-derived tag: changes whenever dashboard.py changes, so `docker stack deploy` actually # rolls the service to the new image (a fixed `:latest` tag + unchanged stack spec does NOT roll — # swarm sees no change). Reproducible + self-healing. imageTag = builtins.substring 0 12 (builtins.hashString "sha256" - (builtins.readFile ../dashboard/dashboard.py)); + (builtins.readFile ../../dashboard/dashboard.py)); image = pkgs.dockerTools.buildLayeredImage { name = "cc-ci-dashboard"; diff --git a/modules/drone-runner.nix b/nix/modules/drone-runner.nix similarity index 100% rename from modules/drone-runner.nix rename to nix/modules/drone-runner.nix diff --git a/modules/drone.nix b/nix/modules/drone.nix similarity index 100% rename from modules/drone.nix rename to nix/modules/drone.nix diff --git a/modules/harness.nix b/nix/modules/harness.nix similarity index 100% rename from modules/harness.nix rename to nix/modules/harness.nix diff --git a/modules/packages.nix b/nix/modules/packages.nix similarity index 100% rename from modules/packages.nix rename to nix/modules/packages.nix diff --git a/modules/proxy.nix b/nix/modules/proxy.nix similarity index 100% rename from modules/proxy.nix rename to nix/modules/proxy.nix diff --git a/modules/secrets.nix b/nix/modules/secrets.nix similarity index 88% rename from modules/secrets.nix rename to nix/modules/secrets.nix index 8334c2f..f2a9059 100644 --- a/modules/secrets.nix +++ b/nix/modules/secrets.nix @@ -1,12 +1,13 @@ # sops-nix wiring (D6 infra secrets). cc-ci decrypts secrets at activation using its own # ed25519 SSH host key as the age identity (no separate key file to manage on the box). -# Encrypted material lives in ../secrets/secrets.yaml — Phase-1c moved this into the private -# `cc-ci-secrets` repo, mounted here as a git SUBMODULE at ../secrets/ (so the path is unchanged). -# Readable only by the recipients in secrets/.sops.yaml (host key + off-box master recovery key). +# Encrypted material lives in the repo-root `secrets/` git SUBMODULE (the private `cc-ci-secrets` +# repo, Phase-1c). RL5 put this module under nix/modules/, so the relative path is +# ../../secrets/secrets.yaml. Readable only by the recipients in secrets/.sops.yaml (host key + +# off-box master recovery key). { config, ... }: { sops = { - defaultSopsFile = ../secrets/secrets.yaml; + defaultSopsFile = ../../secrets/secrets.yaml; # Decrypt using the host's SSH host key (converted to an age identity by sops-nix). age.sshKeyPaths = [ "/etc/ssh/ssh_host_ed25519_key" ]; # Phase-1c: also accept a bootstrap age key at a fixed path — THE one out-of-band secret, diff --git a/modules/swarm.nix b/nix/modules/swarm.nix similarity index 100% rename from modules/swarm.nix rename to nix/modules/swarm.nix