refactor(1b): RL5 — consolidate Nix code under nix/ (modules->nix/modules, hosts->nix/hosts)
flake.nix/flake.lock STAY at root so the build ref #cc-ci is unchanged; only flake's internal configuration.nix path updated. Root-relative refs inside moved modules re-based ../X -> ../../X (secrets/bridge/dashboard); configuration.nix's ../../modules imports unchanged (both dirs under nix/). Living docs (README, architecture/install/secrets/enroll) + .drone.yml comment updated to nix/...; append-only history logs left as-is. DECISIONS.md records RL5 + the deferred-coordinated RL6. Verified on cc-ci: nixos-rebuild build 'path:#cc-ci' -> toplevel 8i3jcad9 (BYTE-IDENTICAL to the pre-move build — store derivations are content-addressed on file contents, module .nix not in the runtime closure); scripts/lint.sh -> lint: PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -35,7 +35,7 @@ steps:
|
||||
# the comment-bridge). Deploys the recipe at the PR head, runs install/upgrade/backup + any
|
||||
# recipe-local tests via the shared harness, then guarantees teardown (plan §4.2/§4.3).
|
||||
#
|
||||
# Resource safety (plan §4.2/§4.3): MAX_TESTS=DRONE_RUNNER_CAPACITY=1 (modules/drone-runner.nix) is
|
||||
# Resource safety (plan §4.2/§4.3): MAX_TESTS=DRONE_RUNNER_CAPACITY=1 (nix/modules/drone-runner.nix) is
|
||||
# the primary concurrency cap; concurrency.limit below is a redundant belt. CCCI_JANITOR_MAX_AGE=0
|
||||
# makes the run-start janitor reap ANY orphaned run app before deploying — safe because capacity=1
|
||||
# means no concurrent run exists (a SIGKILL'd/timed-out build leaves an orphan with no teardown).
|
||||
|
||||
22
DECISIONS.md
22
DECISIONS.md
@ -241,3 +241,25 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
|
||||
recipe CI uses polling as primary, but cc-ci's *own* self-test/lint relies on the push webhook.
|
||||
The lint stage is correctly wired and proven green via the identical `nix develop .#lint` command;
|
||||
reliably auto-firing it on every push is tracked as a (pre-existing) infra item, not a 1b lint gap.
|
||||
|
||||
## Phase 1b — repo layout (operator review items RL5/RL6, plan §7)
|
||||
- **RL5 — all Nix code under `nix/`.** Moved `modules/`→`nix/modules/` and `hosts/`→`nix/hosts/`.
|
||||
`flake.nix`/`flake.lock` STAY at the repo root (entry point) so the build ref `#cc-ci` and
|
||||
`nixos-rebuild --flake '…#cc-ci'` are unchanged — only `flake.nix`'s internal
|
||||
`./hosts/cc-ci/configuration.nix` → `./nix/hosts/cc-ci/configuration.nix` changed. Root-relative
|
||||
refs inside the moved modules were re-based `../X` → `../../X` (secrets.nix → `../../secrets/`,
|
||||
bridge.nix → `../../bridge/`, dashboard.nix → `../../dashboard/`); `configuration.nix`'s
|
||||
`../../modules/*` imports are unchanged (both dirs moved under `nix/`, so the relative path still
|
||||
resolves). **Toplevel is byte-identical (`8i3jcad9…`) before/after the move** — store derivations
|
||||
are content-addressed on the copied file *contents*, and the module `.nix` files aren't part of the
|
||||
runtime closure, so relocating folders doesn't change the build. (The operator anticipated a hash
|
||||
change; in practice it's stable, which is even stronger for reproducibility.) Living docs
|
||||
(README, architecture/install/secrets/enroll) + the `.drone.yml` comment updated to `nix/…`;
|
||||
append-only history logs left as the record of what was true then.
|
||||
- **RL6 — protocol files → `machine-docs/`: DEFERRED to the coordinated end of 1b.** Will `git mv`
|
||||
`STATUS*/REVIEW*/JOURNAL*/BACKLOG*/DECISIONS.md` into `machine-docs/` (README.md STAYS at root —
|
||||
operator decision, it's the human readme, not a protocol file). The live watchdog (`launch.sh`)
|
||||
reads `STATUS-<id>.md`/`REVIEW-<id>.md` at the repo root for handoffs/transition, so this is done
|
||||
LAST, in lockstep with the orchestrator updating `launch.sh` + restarting the watchdog — not
|
||||
unilaterally and not while a phase transition is pending. The Adversary likewise `git mv`s its own
|
||||
REVIEW files at the cutover (single-writer rule).
|
||||
|
||||
11
README.md
11
README.md
@ -13,10 +13,10 @@ per-recipe test trees, and the docs to enroll a recipe or rebuild the box from s
|
||||
## Layout
|
||||
|
||||
```
|
||||
flake.nix NixOS host(s) + devshell
|
||||
hosts/cc-ci/ the cc-ci machine config
|
||||
modules/ drone, comment-bridge, swarm, dashboard, secrets (Nix modules)
|
||||
secrets/ sops-encrypted infra secrets
|
||||
flake.nix NixOS entry point + devshells (stays at root; build ref #cc-ci)
|
||||
nix/hosts/cc-ci/ the cc-ci machine config
|
||||
nix/modules/ drone, comment-bridge, swarm, dashboard, secrets (Nix modules)
|
||||
secrets/ sops-encrypted infra secrets (cc-ci-secrets submodule)
|
||||
bridge/ !testme webhook listener source
|
||||
runner/ run_recipe_ci.py + shared pytest harness
|
||||
dashboard/ results overview generator
|
||||
@ -24,6 +24,9 @@ tests/<recipe>/ per-recipe install/upgrade/backup tests + playwright/
|
||||
docs/ install, enroll-recipe, secrets, architecture, runbook, baseline
|
||||
```
|
||||
|
||||
All `.nix` code lives under `nix/`; `flake.nix`/`flake.lock` stay at the repo root so the build
|
||||
reference (`nixos-rebuild switch --flake '…#cc-ci'`) is unchanged.
|
||||
|
||||
## Docs
|
||||
|
||||
- `docs/install.md` — rebuild the server from scratch (D8)
|
||||
|
||||
@ -3,18 +3,26 @@
|
||||
cc-ci turns a `!testme` PR comment into a real end-to-end deploy + test of a Co-op Cloud recipe and
|
||||
reports the result back. Everything on the `cc-ci` host is declared in this repo's NixOS flake.
|
||||
|
||||
## Repo layout
|
||||
|
||||
All Nix code lives under **`nix/`** — `nix/hosts/cc-ci/` (the machine config) and `nix/modules/`
|
||||
(the service modules). `flake.nix` / `flake.lock` stay at the **repo root** as the entry point, so
|
||||
the build reference is unchanged (`nixos-rebuild switch --flake '…#cc-ci'`). Application source sits
|
||||
at the root (`bridge/`, `dashboard/`, `runner/`, `tests/`); encrypted secrets are the `secrets/`
|
||||
submodule.
|
||||
|
||||
## Components
|
||||
|
||||
| Component | Where | Role |
|
||||
|---|---|---|
|
||||
| **comment-bridge** | `bridge/bridge.py`, `modules/bridge.nix` (swarm svc, `ci.commoninternet.net/hook`) | Polls enrolled repos for `!testme` (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. |
|
||||
| **Drone server** | `modules/drone.nix` — coop-cloud `drone` recipe via abra (`drone.ci.commoninternet.net`, Gitea SSO) | CI engine. Holds the `recipe-ci` (custom-event) and `self-test` (push) pipelines (`.drone.yml`). |
|
||||
| **Drone exec runner** | `modules/drone-runner.nix` — host systemd service | Runs pipeline steps **on the host** so they can drive `abra`/Docker. `DRONE_RUNNER_CAPACITY=1` (MAX_TESTS) caps concurrent builds; the rest queue natively. |
|
||||
| **comment-bridge** | `bridge/bridge.py`, `nix/modules/bridge.nix` (swarm svc, `ci.commoninternet.net/hook`) | Polls enrolled repos for `!testme` (primary, read-only) + optional admin webhook; authorizes the commenter (org membership); triggers a parameterized Drone build; posts/edits the PR comment with the run link + final pass/fail. |
|
||||
| **Drone server** | `nix/modules/drone.nix` — coop-cloud `drone` recipe via abra (`drone.ci.commoninternet.net`, Gitea SSO) | CI engine. Holds the `recipe-ci` (custom-event) and `self-test` (push) pipelines (`.drone.yml`). |
|
||||
| **Drone exec runner** | `nix/modules/drone-runner.nix` — host systemd service | Runs pipeline steps **on the host** so they can drive `abra`/Docker. `DRONE_RUNNER_CAPACITY=1` (MAX_TESTS) caps concurrent builds; the rest queue natively. |
|
||||
| **harness** | `runner/run_recipe_ci.py` + `runner/harness/` + `tests/` | Orchestrates per run: fetch recipe at the PR head → install → upgrade → backup/restore → recipe-local (D4) → guaranteed teardown. pytest + Playwright via the Nix `cc-ci-run` env. |
|
||||
| **swarm + traefik** | `modules/swarm.nix`, `modules/proxy.nix` — coop-cloud `traefik` recipe via abra | Single-node Docker Swarm + `proxy` overlay; traefik terminates TLS with the wildcard cert (**sops-decrypted from git** to `/var/lib/ci-certs/live`, file provider, **no ACME**). The real deploy target for recipes-under-test. |
|
||||
| **backup-bot-two** | `modules/backupbot.nix` | restic-based volume/DB backups; `abra app backup/restore` drive it. |
|
||||
| **dashboard** | `dashboard/dashboard.py`, `modules/dashboard.nix` (`ci.commoninternet.net`) | YunoHost-CI-like overview: latest run per recipe + status badges + run links; `/badge/<recipe>.svg`. |
|
||||
| **secrets** | `modules/secrets.nix` + `secrets/` = **`cc-ci-secrets` submodule** (sops-nix) | **Phase-1c secrets model:** ALL secrets incl. the **wildcard TLS cert+key are sops-encrypted in git** in the private `cc-ci-secrets` repo, mounted as a **git submodule** at `secrets/` (the base `cc-ci` repo holds **no** secret material). Decrypted at activation by the **bootstrap age key** at `/var/lib/sops-nix/key.txt` (`sops.age.keyFile`) — cc-ci's host-derived age identity, or the **off-box recovery key on a fresh/cloned host** whose SSH key isn't a recipient; the host SSH key is also offered (`sops.age.sshKeyPaths`). The cert is decrypted to `/var/lib/ci-certs/live/` (no out-of-band file drop). This **one** age key is the only secret not in git. See `secrets.md`. |
|
||||
| **swarm + traefik** | `nix/modules/swarm.nix`, `nix/modules/proxy.nix` — coop-cloud `traefik` recipe via abra | Single-node Docker Swarm + `proxy` overlay; traefik terminates TLS with the wildcard cert (**sops-decrypted from git** to `/var/lib/ci-certs/live`, file provider, **no ACME**). The real deploy target for recipes-under-test. |
|
||||
| **backup-bot-two** | `nix/modules/backupbot.nix` | restic-based volume/DB backups; `abra app backup/restore` drive it. |
|
||||
| **dashboard** | `dashboard/dashboard.py`, `nix/modules/dashboard.nix` (`ci.commoninternet.net`) | YunoHost-CI-like overview: latest run per recipe + status badges + run links; `/badge/<recipe>.svg`. |
|
||||
| **secrets** | `nix/modules/secrets.nix` + `secrets/` = **`cc-ci-secrets` submodule** (sops-nix) | **Phase-1c secrets model:** ALL secrets incl. the **wildcard TLS cert+key are sops-encrypted in git** in the private `cc-ci-secrets` repo, mounted as a **git submodule** at `secrets/` (the base `cc-ci` repo holds **no** secret material). Decrypted at activation by the **bootstrap age key** at `/var/lib/sops-nix/key.txt` (`sops.age.keyFile`) — cc-ci's host-derived age identity, or the **off-box recovery key on a fresh/cloned host** whose SSH key isn't a recipient; the host SSH key is also offered (`sops.age.sshKeyPaths`). The cert is decrypted to `/var/lib/ci-certs/live/` (no out-of-band file drop). This **one** age key is the only secret not in git. See `secrets.md`. |
|
||||
|
||||
All swarm infra (traefik, drone, bridge, dashboard, backupbot) is brought up by **idempotent-reconcile
|
||||
systemd oneshots** that converge on every activation/boot (no run-once sentinels), **serialized**
|
||||
|
||||
@ -44,7 +44,7 @@ env `CCCI_BASE_URL` (e.g. `https://<app>.ci.commoninternet.net/`) and `CCCI_APP_
|
||||
## 4. Add the repo to the bridge poll list
|
||||
|
||||
The trigger is **polling** (primary): add the repo's full name to the comment-bridge `POLL_REPOS`
|
||||
csv (`modules/bridge.nix`) and `nixos-rebuild switch`. The bridge then polls that repo's open PRs
|
||||
csv (`nix/modules/bridge.nix`) and `nixos-rebuild switch`. The bridge then polls that repo's open PRs
|
||||
every 30s and fires a run on a new `!testme` comment from an authorized org member. This needs only
|
||||
**read + comment** access — no webhook, no repo-admin.
|
||||
|
||||
|
||||
@ -29,17 +29,17 @@ switch` → fully converged cc-ci, 0 failed units — see DECISIONS.md Phase-1c
|
||||
|
||||
**External infra (operator-owned, not on the box — class-A1):**
|
||||
- DNS: `*.ci.commoninternet.net` (+ bare) → the **gateway**, which TLS-passthroughs (SNI) to cc-ci.
|
||||
- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `modules/swarm.nix`).
|
||||
- Firewall path: gateway reaches cc-ci on tcp/80+443 (opened by `nix/modules/swarm.nix`).
|
||||
- The wildcard cert is **renewed out-of-band** by the operator, who then re-encrypts it into
|
||||
`cc-ci-secrets` (sops) and rebuilds — the Gandi DNS token never touches the box; **never ACME here.**
|
||||
|
||||
## 1. Apply the NixOS flake (this is the whole install)
|
||||
|
||||
The flake (`flake.nix`, `hosts/cc-ci/`, `modules/`) declares: base host, sops-nix (decrypts via the
|
||||
The flake (`flake.nix`, `nix/hosts/cc-ci/`, `nix/modules/`) declares: base host, sops-nix (decrypts via the
|
||||
host SSH key), Docker + single-node Swarm + the `proxy` overlay + firewall 80/443
|
||||
(`modules/swarm.nix`), abra (`modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot**
|
||||
(`modules/proxy.nix`), the **Drone server reconcile oneshot** (`modules/drone.nix`), and the
|
||||
**Drone exec runner** (`modules/drone-runner.nix`).
|
||||
(`nix/modules/swarm.nix`), abra (`nix/modules/abra.nix` / `packages.nix`), the **traefik reconcile oneshot**
|
||||
(`nix/modules/proxy.nix`), the **Drone server reconcile oneshot** (`nix/modules/drone.nix`), and the
|
||||
**Drone exec runner** (`nix/modules/drone-runner.nix`).
|
||||
|
||||
```sh
|
||||
# 1. Clone base + the private secrets submodule (bot/deploy creds for cc-ci-secrets).
|
||||
|
||||
@ -65,7 +65,7 @@ All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/<name>`
|
||||
| `drone_rpc_secret` | Drone server ↔ exec runner RPC | `openssl rand -hex 32` |
|
||||
| `drone_gitea_client_secret` | Drone↔Gitea OAuth app | from the Gitea OAuth app creation |
|
||||
| `bridge_webhook_hmac` | comment-bridge webhook HMAC | `openssl rand -hex 32` |
|
||||
| `bridge_drone_token` | bridge + dashboard → Drone API | hex token; **injected as the bot's Drone machine token** via `DRONE_USER_CREATE=…,token:$(cat /run/secrets/bridge_drone_token)` (modules/drone.nix) so it's reproducible on a fresh Drone DB (else the bridge gets 401 on a clean-room rebuild) |
|
||||
| `bridge_drone_token` | bridge + dashboard → Drone API | hex token; **injected as the bot's Drone machine token** via `DRONE_USER_CREATE=…,token:$(cat /run/secrets/bridge_drone_token)` (nix/modules/drone.nix) so it's reproducible on a fresh Drone DB (else the bridge gets 401 on a clean-room rebuild) |
|
||||
| `bridge_gitea_token` | bridge → Gitea API (poll/comment) | minted Gitea token (bot) |
|
||||
| `restic_password` | backup-bot-two restic repo | **abra-generated** (`abra app secret generate`, kept stable across reconciles) |
|
||||
|
||||
@ -76,7 +76,7 @@ All sops-encrypted in `secrets/secrets.yaml`, decrypted to `/run/secrets/<name>`
|
||||
`cc-ci-secrets`, then bump the base repo's submodule pointer (`git add secrets && commit`).
|
||||
3. For swarm-secret-backed values, **bump the consuming app's secret version** so the reconcile
|
||||
re-creates the swarm secret (docker swarm secrets are immutable): e.g. drone `RPC_SECRET_VERSION`
|
||||
v1→v2 (modules/drone.nix), bridge `cc_ci_bridge_*_v<n>` (modules/bridge.nix). Update both ends
|
||||
v1→v2 (nix/modules/drone.nix), bridge `cc_ci_bridge_*_v<n>` (nix/modules/bridge.nix). Update both ends
|
||||
(server + runner share `drone_rpc_secret`).
|
||||
4. `git commit` + push, sync to host, `nixos-rebuild switch` → reconcile re-inserts + redeploys.
|
||||
5. Verify: the consuming service is healthy and re-auth works (e.g. a fresh build triggers).
|
||||
|
||||
@ -35,7 +35,7 @@
|
||||
inherit system;
|
||||
modules = [
|
||||
sops-nix.nixosModules.sops
|
||||
./hosts/cc-ci/configuration.nix
|
||||
./nix/hosts/cc-ci/configuration.nix
|
||||
];
|
||||
};
|
||||
|
||||
|
||||
@ -7,13 +7,13 @@ let
|
||||
# bridge.py placed at /app/bridge.py inside the image.
|
||||
bridgeApp = pkgs.runCommand "cc-ci-bridge-app" { } ''
|
||||
mkdir -p $out/app
|
||||
cp ${../bridge/bridge.py} $out/app/bridge.py
|
||||
cp ${../../bridge/bridge.py} $out/app/bridge.py
|
||||
'';
|
||||
|
||||
# Content-derived tag so `docker stack deploy` rolls the service whenever bridge.py changes
|
||||
# (a fixed `:latest` + unchanged stack spec does NOT roll — swarm sees no change).
|
||||
imageTag = builtins.substring 0 12 (builtins.hashString "sha256"
|
||||
(builtins.readFile ../bridge/bridge.py));
|
||||
(builtins.readFile ../../bridge/bridge.py));
|
||||
|
||||
image = pkgs.dockerTools.buildLayeredImage {
|
||||
name = "cc-ci-bridge";
|
||||
@ -7,14 +7,14 @@
|
||||
let
|
||||
dashApp = pkgs.runCommand "cc-ci-dashboard-app" { } ''
|
||||
mkdir -p $out/app
|
||||
cp ${../dashboard/dashboard.py} $out/app/dashboard.py
|
||||
cp ${../../dashboard/dashboard.py} $out/app/dashboard.py
|
||||
'';
|
||||
|
||||
# Content-derived tag: changes whenever dashboard.py changes, so `docker stack deploy` actually
|
||||
# rolls the service to the new image (a fixed `:latest` tag + unchanged stack spec does NOT roll —
|
||||
# swarm sees no change). Reproducible + self-healing.
|
||||
imageTag = builtins.substring 0 12 (builtins.hashString "sha256"
|
||||
(builtins.readFile ../dashboard/dashboard.py));
|
||||
(builtins.readFile ../../dashboard/dashboard.py));
|
||||
|
||||
image = pkgs.dockerTools.buildLayeredImage {
|
||||
name = "cc-ci-dashboard";
|
||||
@ -1,12 +1,13 @@
|
||||
# sops-nix wiring (D6 infra secrets). cc-ci decrypts secrets at activation using its own
|
||||
# ed25519 SSH host key as the age identity (no separate key file to manage on the box).
|
||||
# Encrypted material lives in ../secrets/secrets.yaml — Phase-1c moved this into the private
|
||||
# `cc-ci-secrets` repo, mounted here as a git SUBMODULE at ../secrets/ (so the path is unchanged).
|
||||
# Readable only by the recipients in secrets/.sops.yaml (host key + off-box master recovery key).
|
||||
# Encrypted material lives in the repo-root `secrets/` git SUBMODULE (the private `cc-ci-secrets`
|
||||
# repo, Phase-1c). RL5 put this module under nix/modules/, so the relative path is
|
||||
# ../../secrets/secrets.yaml. Readable only by the recipients in secrets/.sops.yaml (host key +
|
||||
# off-box master recovery key).
|
||||
{ config, ... }:
|
||||
{
|
||||
sops = {
|
||||
defaultSopsFile = ../secrets/secrets.yaml;
|
||||
defaultSopsFile = ../../secrets/secrets.yaml;
|
||||
# Decrypt using the host's SSH host key (converted to an age identity by sops-nix).
|
||||
age.sshKeyPaths = [ "/etc/ssh/ssh_host_ed25519_key" ];
|
||||
# Phase-1c: also accept a bootstrap age key at a fixed path — THE one out-of-band secret,
|
||||
Reference in New Issue
Block a user