From bdc78da9212c9523a49169672ffcf9e9191297d1 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Tue, 26 May 2026 20:46:28 +0100 Subject: [PATCH] Initial commit: cc-ci autonomous orchestrator Planning + launch + setup material for the cc-ci Co-op Cloud recipe CI server: plan.md (single source of truth), kickoff/launch supervision, and the Builder/Adversary loop prompts. Secrets (.testenv) and runtime dirs are gitignored. Co-Authored-By: Claude Opus 4.7 (1M context) --- .gitignore | 12 + cc-ci-plan/README.md | 34 ++ cc-ci-plan/brief.md | 35 ++ cc-ci-plan/kickoff.md | 143 +++++++ cc-ci-plan/launch.sh | 203 ++++++++++ cc-ci-plan/plan.md | 635 ++++++++++++++++++++++++++++++++ cc-ci-plan/prompts/adversary.md | 19 + cc-ci-plan/prompts/builder.md | 25 ++ references/recipe-maintainer | 1 + 9 files changed, 1107 insertions(+) create mode 100644 .gitignore create mode 100644 cc-ci-plan/README.md create mode 100644 cc-ci-plan/brief.md create mode 100644 cc-ci-plan/kickoff.md create mode 100755 cc-ci-plan/launch.sh create mode 100644 cc-ci-plan/plan.md create mode 100644 cc-ci-plan/prompts/adversary.md create mode 100644 cc-ci-plan/prompts/builder.md create mode 120000 references/recipe-maintainer diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..36fcfc2 --- /dev/null +++ b/.gitignore @@ -0,0 +1,12 @@ +# Secrets — NEVER commit +.testenv +*.tfstate +*.tfstate.* +*.key +*.pem + +# Loop runtime / working clones (created at launch by launch.sh) +/cc-ci/ +/cc-ci-adv/ +/.cc-ci-watch/ +/.cc-ci-logs/ diff --git a/cc-ci-plan/README.md b/cc-ci-plan/README.md new file mode 100644 index 0000000..d093444 --- /dev/null +++ b/cc-ci-plan/README.md @@ -0,0 +1,34 @@ +# cc-ci-plan + +Self-contained handoff package for building the **cc-ci** Co-op Cloud recipe CI server with two +autonomous Claude loops (a Builder and an adversarial Reviewer) running over days. + +## Start here + +1. Read **`plan.md`** — the full plan and single source of truth (mission, Definition of Done, + architecture, milestones, the two-agent coordination protocol, loop discipline). +2. Read **`kickoff.md`** — how to launch and supervise the loops. +3. Run **`./launch.sh start`** to bring up both loops + the watchdog. + +## Files + +| File | Purpose | +|---|---| +| `plan.md` | The plan. Agents treat it as their single source of truth. | +| `brief.md` | The original one-page brief (context only; `plan.md` supersedes it). | +| `kickoff.md` | Launch & supervision guide. | +| `launch.sh` | Starts both loops + a watchdog; restarts dead loops; stops on `## DONE`. | +| `prompts/builder.md` | Builder loop prompt (fed to `claude` by the script). | +| `prompts/adversary.md` | Adversary loop prompt. | + +## Before launching + +- Set the org in `plan.md` (`git.autonomic.zone/recipe-maintainers/cc-ci`) and lock the six proof recipes (§8). +- Ensure the launching shell has: SSH+sudo to `cc-ci`, the Gitea token, `git.autonomic.zone` access. +- Preconfigure test-app DNS + TLS (plan §4.0): point a wildcard `*.ci.commoninternet.net` record at a gateway that TLS-passthroughs to cc-ci, and **pre-issue the wildcard cert** (`*.ci.commoninternet.net` + `ci.commoninternet.net`, via Gandi DNS-01) into `/var/lib/ci-certs/live/` on cc-ci. The agent handles everything else on cc-ci (Traefik file provider → that cert, swarm, routing) and does **no ACME**; renewal (~90 days) is an out-of-band operator task, so the DNS token never goes to the agent. +- `export CC_CI_REPO=https://git.autonomic.zone/recipe-maintainers/cc-ci.git` so the watchdog can detect `## DONE`. + +## What "done" means + +The loops stop only when all of `plan.md` §2 (D1–D10) hold **and** the Adversary has independently +re-verified each within 24h. The watchdog then tears the loops down automatically. diff --git a/cc-ci-plan/brief.md b/cc-ci-plan/brief.md new file mode 100644 index 0000000..ef0883b --- /dev/null +++ b/cc-ci-plan/brief.md @@ -0,0 +1,35 @@ +we are working on making a CI server + +I want you to work in an autonomous loop over the next few days until the CI server is fully functional, polished and documented + +on any PR on git.autonomic.zone it should be invokable by writing !testme as a comment + +this should invoke the set of CI tests to be run for the recipe code at that PR + +the CI tests should be run via drone + +the tests run for a recipe should be written in python. e2e testing via playwright should be used whe necessary to confirm functionality + +there should be tests which test +- new install +- upgrade +- backups (including restore) + +all the tests should be fully e2e, with a real deployed recipe + +the CI runner should be deployed on a server called cc-ci which is running nixos + +cc-ci git repo should also live on git.autonomic.zone which contains all the nix configuration for the server, as well as the code for the CI test runner + +the CI test runner should have its own folder of tests, with one folder for each recipe, with each of those folders containg a set of tests as python files which get invoked for that recipe + +secrets should also be handled in a reasonable and repeatable way + +additionally, if a recipe repo itself contains a tests folder in the recipe, the CI runner should also invoke those tests as part of the CI run for those tests + +the results of the test run should be easily viewable, with trackable logs, and a final result, very similar in style to the way the yunohost CI runner looks and feels + +you will have ssh access to cc-ci server, as well as sudo access there + +you will also have access to create and modify repos on git.autonomic.zone + diff --git a/cc-ci-plan/kickoff.md b/cc-ci-plan/kickoff.md new file mode 100644 index 0000000..0877e6f --- /dev/null +++ b/cc-ci-plan/kickoff.md @@ -0,0 +1,143 @@ +# cc-ci — Kickoff & Launch + +Everything needed to start the autonomous cc-ci build loop. The substance lives in `plan.md`; +this file explains how to launch and supervise the two agents. + +## Folder contents + +``` +cc-ci-plan/ +├── plan.md # THE plan — single source of truth (read this in full) +├── brief.md # original one-page brief (context only; superseded by plan.md) +├── kickoff.md # this file — how to launch & supervise +├── launch.sh # starts both loops + watchdog, stops on ## DONE +└── prompts/ + ├── builder.md # Builder loop prompt (fed to claude by launch.sh) + └── adversary.md # Adversary loop prompt +``` + +> Note: `/srv/cc-ci/cc-ci-plan/` (this folder) is the **planning + launch material**. The actual +> CI project — NixOS config, runner, tests — lives in a **separate git repo** the Builder creates +> at `git.autonomic.zone/recipe-maintainers/cc-ci`, cloned to `/srv/cc-ci/cc-ci` (Builder) and +> `/srv/cc-ci/cc-ci-adv` (Adversary). Don't confuse the two. + +## Model: two independent loops (plan §6 / §6.1) + +- **Builder** — builds the CI server; owns code + `STATUS.md`/`JOURNAL.md`/`DECISIONS.md` + the + `## Build backlog` section of `BACKLOG.md`. +- **Adversary** — independently disbelieves and re-verifies; owns `REVIEW.md` + `## Adversary + findings`. Holds veto over `## DONE`. + +They run as two separate processes and coordinate **only** through the git repo. Single-writer file +ownership keeps concurrent pushes merge-clean. + +## Two layers of "looping" — and why you want both + +| Concern | Mechanism | Who provides it | +|---|---|---| +| **Iteration** — keep doing one unit of work, then wake again | `/loop` self-paced (ScheduleWakeup), per plan §7 pacing | each agent, in-session | +| **Resilience** — restart a loop whose process/sandbox died; stop all on `## DONE` | `launch.sh` watchdog (tmux + git poll) | this script | + +`/loop` alone is bound to its process: if the sandbox restarts, that loop is gone until something +relaunches it. The watchdog is that something. Use both. + +## Launch + +```bash +cd /srv/cc-ci/cc-ci-plan + +# Optional but recommended once the repo exists, so the watchdog can detect ## DONE: +export CC_CI_REPO=https://git.autonomic.zone/recipe-maintainers/cc-ci.git + +./launch.sh start # starts cc-ci-builder + cc-ci-adv + cc-ci-watchdog (tmux sessions) +./launch.sh status # session + DONE state +./launch.sh logs builder # tail a loop; also: logs adversary | logs watchdog +tmux attach -t cc-ci-builder # watch a loop live locally (detach: Ctrl-b d) +./launch.sh stop # stop everything +``` + +`launch.sh` is idempotent — re-running `start` won't duplicate a live session. Each agent runs as an +**interactive** `claude` in tmux (kickoff prompt passed as a positional arg, *not* piped — piping +forces print mode and breaks `/loop`). With `REMOTE_CONTROL=1` (default) each agent is launched with +`--remote-control`, so you can **watch and steer both loops from [claude.ai/code](https://claude.ai/code)** +(or the Claude mobile app) — not just via `tmux attach`. The box must be logged into the claude.ai +account (`claude auth status`); set `REMOTE_CONTROL=0` to skip the remote surface. The watchdog +(default every 300s) restarts any dead session — note a >~10-min network outage will exit the +`claude` process, after which the watchdog brings it back (a fresh remote-control session) — and +when `STATUS.md` shows `## DONE`, it kills the loops and exits. + +Prerequisites the sessions inherit from your shell: SSH (root) to `cc-ci` via the Tailscale proxy +(§1.5), Gitea bot creds, and `git.autonomic.zone` access. Plus **preconfigured** operator inputs the +loop depends on (plan §4.0/§4.4): the wildcard `*.ci.commoninternet.net` DNS record pointing at a +gateway that TLS-passthroughs to cc-ci, and the **pre-issued wildcard cert** at +`/var/lib/ci-certs/live/` on cc-ci. The operator owns the DNS record + gateway + cert +issuance/renewal; the agent builds Traefik (file provider → that cert) + routing on cc-ci and does +**no ACME**. If any prerequisite is absent, the Builder parks at `STATUS.md ## Blocked` (plan §1/§9) +rather than improvise. + +> Host deps: `launch.sh` needs **tmux** (and `claude`) — tmux is installed on this sandbox host +> (3.5a). On a fresh host: `sudo apt-get install -y tmux`. The script's `*_DIR` +> defaults now point at `/srv/cc-ci/...` (Builder clone `/srv/cc-ci/cc-ci`, Adversary +> `/srv/cc-ci/cc-ci-adv`); override the `*_DIR` env vars only if your layout differs. + +## Optional: a cloud-side `/schedule` watchdog + +`launch.sh`'s watchdog is itself a local process — if the *whole host* goes down it stops too. For +belt-and-suspenders durability, also create a `/schedule` routine (a remote agent that fires on a +cron and re-orients from the repo). From inside a Claude session: + +``` +/schedule every 2 hours: read /srv/cc-ci/cc-ci-plan/plan.md §7 and the cc-ci repo STATUS.md; if the +Builder/Adversary loops are not making progress (or launch.sh is not running), restart them via +/srv/cc-ci/cc-ci-plan/launch.sh start; stop when STATUS.md says ## DONE. +``` + +This complements the local watchdog: scheduled runs are fresh, independent agents, so they survive +process/context death that would take the in-session `/loop` and the local watchdog with it. + +## Fallback: restart/recreate the cc-ci VM (orchestrator only) + +**This is primarily an escape hatch for *you*, the supervising orchestrator.** The loops normally +reconfigure cc-ci only from inside (via Nix); power-cycling or recreating the VM shouldn't be their +default move — but it's not forbidden if one gets genuinely stuck. Reach for this when cc-ci itself +is wedged at a level that can't be fixed from inside (won't boot, disk full, swarm/Docker corrupted, +unreachable even after a proxy restart): use the Incus skill to power-cycle or rebuild the VM, then +re-bootstrap. + +`cc-nix-test` (the cc-ci server, tailnet `100.90.116.4`) is a **NixOS Incus VM** on host **b1** +(`100.117.251.31:8443`, Incus project `terraform-ci`). Skill + Terraform live at +`/srv/incus-terraform-nix-vm-creator/` (`skills/incus-terraform/SKILL.md`); read that for full usage. + +- **Access:** b1 is on the *same* cc-ci tailnet, so reach the Incus API through the existing + `cc-ci-tailscaled` SOCKS proxy (`127.0.0.1:1055`) with the mTLS certs in that repo's + `terraform-secrets/` — no second tailscaled needed. Quick check: + ```bash + CRT=/srv/incus-terraform-nix-vm-creator/terraform-secrets/terraform.crt + KEY=/srv/incus-terraform-nix-vm-creator/terraform-secrets/terraform.key + curl --proxy socks5h://localhost:1055 --cert "$CRT" --key "$KEY" -k -s \ + https://100.117.251.31:8443/1.0/instances/cc-nix-test/state?project=terraform-ci + ``` +- **Soft restart (keeps the disk — preferred):** `POST .../1.0/instances/cc-nix-test/state?project=terraform-ci` + with `{"action":"restart"}` (or `"stop"` / `"start"`). +- **Full recreate (last resort):** the Terraform module in `/srv/incus-terraform-nix-vm-creator/projects/` + (`terraform apply` with `-var incus_remote_address=100.117.251.31 -var incus_project=terraform-ci + -var ts_auth_key=$TSKEY`). ⚠ **Recreating wipes the VM disk** — you must then re-apply the cc-ci + preconditions: the pre-issued TLS cert into `/var/lib/ci-certs/live/` and the + `cc-ci-root-ed25519` pubkey into root's `authorized_keys` (see the access notes), and the loops + re-run §1 Bootstrap. Prefer a soft restart; only recreate if the VM is truly unrecoverable. + +(Project cap: keep total RAM across `terraform-ci` instances under 10 GB — check before recreating.) + +## Manual launch (no script) + +If you'd rather not use `launch.sh`, start each agent interactively yourself (same result, no +supervision/restart), passing the prompt as a positional argument so the session stays interactive +and remote-controllable: + +```bash +claude --remote-control 'cc-ci-builder' --dangerously-skip-permissions "$(cat prompts/builder.md)" +claude --remote-control 'cc-ci-adv' --dangerously-skip-permissions "$(cat prompts/adversary.md)" +``` + +Do **not** pipe the prompt (`cat prompts/builder.md | claude …`) — that forces print/headless mode, +which breaks `/loop` and remote control. diff --git a/cc-ci-plan/launch.sh b/cc-ci-plan/launch.sh new file mode 100755 index 0000000..ab9f17f --- /dev/null +++ b/cc-ci-plan/launch.sh @@ -0,0 +1,203 @@ +#!/usr/bin/env bash +# +# launch.sh — start and supervise the two cc-ci autonomous loops + a watchdog. +# +# Model (see plan.md §6 / §6.1): two INDEPENDENT Claude Code sessions — +# • Builder (tmux session: cc-ci-builder) working clone /srv/cc-ci/cc-ci +# • Adversary (tmux session: cc-ci-adv) working clone /srv/cc-ci/cc-ci-adv +# coordinating only through the git repo on git.autonomic.zone. +# +# Each agent self-paces with a `/loop` (ScheduleWakeup) — that handles ITERATION. +# This script's watchdog handles RESILIENCE: it restarts a session that has died +# and stops everything once STATUS.md reports "## DONE". +# +# Usage: +# ./launch.sh start # start both loops + watchdog (idempotent) +# ./launch.sh watchdog # run only the supervision loop in the foreground +# ./launch.sh status # show session + DONE state +# ./launch.sh logs builder|adversary|watchdog # tail a session/log +# ./launch.sh stop # stop both loops + watchdog +# +# Configure via env vars (defaults below). At minimum set CC_CI_REPO once the +# Builder has created the repo, so the watchdog can detect DONE. + +set -euo pipefail + +# ----- config ------------------------------------------------------------- +PLAN_DIR="${PLAN_DIR:-/srv/cc-ci/cc-ci-plan}" +CLAUDE_BIN="${CLAUDE_BIN:-claude}" +# Flags for unattended operation in a sandbox. Override if your setup differs. +CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}" +# REMOTE_CONTROL=1 launches each agent as an INTERACTIVE session with --remote-control, +# viewable/steerable at claude.ai/code (and the Claude mobile app). This is required for +# /loop + ScheduleWakeup to work at all (they are interactive-only — a piped/print-mode +# session cannot self-pace). Set REMOTE_CONTROL=0 for a plain interactive session with no +# remote surface. The box must be logged into the claude.ai account (run `claude` once to +# check `claude auth status`). Each agent gets its own RC session named after its tmux session. +REMOTE_CONTROL="${REMOTE_CONTROL:-1}" + +BUILDER_DIR="${BUILDER_DIR:-/srv/cc-ci/cc-ci}" # Builder's repo clone (it creates this) +ADV_DIR="${ADV_DIR:-/srv/cc-ci/cc-ci-adv}" # Adversary's repo clone +WATCH_DIR="${WATCH_DIR:-/srv/cc-ci/.cc-ci-watch}" # tiny clone the watchdog reads STATUS.md from +LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}" + +CC_CI_REPO="${CC_CI_REPO:-https://git.autonomic.zone/recipe-maintainers/cc-ci.git}" # CI project repo (DONE detection); harmless until the Builder creates it +CC_CI_BRANCH="${CC_CI_BRANCH:-main}" + +WATCH_INTERVAL="${WATCH_INTERVAL:-300}" # seconds between watchdog checks + +BUILDER_SESSION="cc-ci-builder" +ADV_SESSION="cc-ci-adv" +WATCHDOG_SESSION="cc-ci-watchdog" +# -------------------------------------------------------------------------- + +log() { printf '[launch %(%H:%M:%S)T] %s\n' -1 "$*"; } +die() { log "ERROR: $*"; exit 1; } + +need() { command -v "$1" >/dev/null 2>&1 || die "missing dependency: $1"; } + +preflight() { + need tmux + command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)" + [[ -f "$PLAN_DIR/prompts/builder.md" ]] || die "missing $PLAN_DIR/prompts/builder.md" + [[ -f "$PLAN_DIR/prompts/adversary.md" ]] || die "missing $PLAN_DIR/prompts/adversary.md" + mkdir -p "$LOG_DIR" +} + +session_alive() { tmux has-session -t "$1" 2>/dev/null; } + +# Start one agent loop in its own tmux session, cd'd into its working dir, with +# the kickoff prompt passed to claude as a positional argument (see below for why +# not stdin). +start_agent() { + local session="$1" workdir="$2" prompt_file="$3" + if session_alive "$session"; then + log "$session already running — leaving it" + return 0 + fi + mkdir -p "$workdir" + log "starting $session (cwd=$workdir, remote_control=$REMOTE_CONTROL)" + # tmux gives claude a real PTY, so we run claude INTERACTIVELY (required for /loop + + # ScheduleWakeup). The kickoff prompt is passed as a POSITIONAL argument via an inner + # `$(cat ...)` — NOT piped on stdin, because piping forces print/headless mode which + # breaks both interactivity and --remote-control. The `\$(...)` defers to the inner shell + # so the whole multi-line prompt arrives as a single argument. + local rc="" + [[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$session'" + tmux new-session -d -s "$session" -c "$workdir" \ + "$CLAUDE_BIN $rc $CLAUDE_FLAGS \"\$(cat '$prompt_file')\"" + # Log the pane WITHOUT redirecting claude's stdout: a `>>log` redirect makes stdout a + # non-tty and drops claude out of interactive/remote-control mode. pipe-pane mirrors the + # live pane to the log file while claude keeps the PTY tmux gave it. + tmux pipe-pane -o -t "$session" "cat >> '$LOG_DIR/$session.log'" +} + +start_loops() { + start_agent "$BUILDER_SESSION" "$BUILDER_DIR" "$PLAN_DIR/prompts/builder.md" + start_agent "$ADV_SESSION" "$ADV_DIR" "$PLAN_DIR/prompts/adversary.md" +} + +# Returns 0 (true) if the repo's STATUS.md contains a "## DONE" heading. +is_done() { + [[ -n "$CC_CI_REPO" ]] || return 1 + if [[ ! -d "$WATCH_DIR/.git" ]]; then + git clone --depth 1 --branch "$CC_CI_BRANCH" "$CC_CI_REPO" "$WATCH_DIR" >/dev/null 2>&1 || return 1 + fi + git -C "$WATCH_DIR" fetch --depth 1 origin "$CC_CI_BRANCH" >/dev/null 2>&1 || return 1 + git -C "$WATCH_DIR" reset --hard "origin/$CC_CI_BRANCH" >/dev/null 2>&1 || return 1 + grep -qE '^##[[:space:]]+DONE' "$WATCH_DIR/STATUS.md" 2>/dev/null +} + +watchdog_loop() { + log "watchdog up (interval=${WATCH_INTERVAL}s, repo=${CC_CI_REPO:-})" + while true; do + # 1) DONE? then wind everything down. + if is_done; then + log "STATUS.md reports ## DONE — stopping loops." + stop_loops + log "watchdog exiting (project complete)." + exit 0 + fi + # 2) restart any dead loop (resilience the in-session /loop can't provide). + if ! session_alive "$BUILDER_SESSION"; then + log "builder session gone — restarting" + start_agent "$BUILDER_SESSION" "$BUILDER_DIR" "$PLAN_DIR/prompts/builder.md" + fi + if ! session_alive "$ADV_SESSION"; then + log "adversary session gone — restarting" + start_agent "$ADV_SESSION" "$ADV_DIR" "$PLAN_DIR/prompts/adversary.md" + fi + sleep "$WATCH_INTERVAL" + done +} + +start_watchdog() { + if session_alive "$WATCHDOG_SESSION"; then + log "watchdog already running" + return 0 + fi + log "starting watchdog" + tmux new-session -d -s "$WATCHDOG_SESSION" -c "$PLAN_DIR" \ + "exec >>'$LOG_DIR/watchdog.log' 2>&1; '$0' watchdog" +} + +stop_loops() { + for s in "$BUILDER_SESSION" "$ADV_SESSION"; do + if session_alive "$s"; then log "killing $s"; tmux kill-session -t "$s" || true; fi + done +} + +cmd_status() { + for s in "$BUILDER_SESSION" "$ADV_SESSION" "$WATCHDOG_SESSION"; do + if session_alive "$s"; then echo " $s: RUNNING"; else echo " $s: stopped"; fi + done + if [[ -n "$CC_CI_REPO" ]]; then + if is_done; then echo " project: ## DONE"; else echo " project: in progress"; fi + else + echo " project: (CC_CI_REPO unset — DONE-detection disabled)" + fi +} + +case "${1:-}" in + start) + preflight + start_loops + start_watchdog + log "started. inspect with: ./launch.sh status | attach: tmux attach -t $BUILDER_SESSION" + ;; + watchdog) preflight; watchdog_loop ;; + status) cmd_status ;; + logs) + case "${2:-}" in + builder) tail -f "$LOG_DIR/$BUILDER_SESSION.log" ;; + adversary) tail -f "$LOG_DIR/$ADV_SESSION.log" ;; + watchdog) tail -f "$LOG_DIR/watchdog.log" ;; + *) die "usage: $0 logs builder|adversary|watchdog" ;; + esac + ;; + stop) + stop_loops + if session_alive "$WATCHDOG_SESSION"; then log "killing $WATCHDOG_SESSION"; tmux kill-session -t "$WATCHDOG_SESSION" || true; fi + log "stopped." + ;; + *) + cat <} + CLAUDE_BIN = $CLAUDE_BIN + CLAUDE_FLAGS = $CLAUDE_FLAGS + REMOTE_CONTROL = $REMOTE_CONTROL (1 = interactive --remote-control, viewable at claude.ai/code) + BUILDER_DIR = $BUILDER_DIR + ADV_DIR = $ADV_DIR + WATCH_INTERVAL = ${WATCH_INTERVAL}s +EOF + ;; +esac diff --git a/cc-ci-plan/plan.md b/cc-ci-plan/plan.md new file mode 100644 index 0000000..fade52c --- /dev/null +++ b/cc-ci-plan/plan.md @@ -0,0 +1,635 @@ +# cc-ci — Co-op Cloud Recipe CI Server (Autonomous Build Plan) + +**Status:** ACTIVE — autonomous loop +**Owner agent:** Builder (primary) + Adversary (reviewer) +**Source brief:** `brief.md` (do not edit; this file supersedes it) +**This file's canonical path:** `/srv/cc-ci/cc-ci-plan/plan.md` +**Target server:** `cc-ci` (NixOS) +**Code/config home:** `git.autonomic.zone/recipe-maintainers/cc-ci` (the CI project repo — distinct from this +`/srv/cc-ci/cc-ci-plan/` planning+launch folder) +**Last updated:** keep current via `STATUS.md` (see §7) + +--- + +## 0. How to read this document + +This plan is written to be handed to an **autonomous Claude agent running in a sandbox over +several days**, driving itself in a loop until the CI server is "done" per §2. A second agent +(the **Adversary**) independently tries to disprove every "done" claim. Neither agent is +trusted to mark its own work complete. + +If you are an agent waking up into this loop for the first time, go straight to **§1 Bootstrap**. +On every subsequent wake, go to **§7 The Loop Protocol** and continue from `STATUS.md`. + +The rest of the document (§3–§6) is the technical design. Treat it as the default architecture, +but you are allowed to revise it when reality disagrees — record any deviation in `DECISIONS.md` +with a one-line rationale. + +--- + +## 1. Bootstrap (first wake only) + +Do these in order. Each step is idempotent; re-running is safe. + +1. **Verify access.** (Full credential map + how each is used is in **§1.5** — read it first.) + - `ssh cc-ci 'hostname && whoami'` — you log in as **root** on cc-ci (NixOS), so there is no + separate sudo step. `ssh cc-ci` is preconfigured to tunnel through the userspace-tailscaled + SOCKS proxy (§1.5); if it fails, the proxy/daemon is probably down — restart it (§1.5) before + declaring blocked. + - `ssh cc-ci 'nixos-version'` — confirm NixOS. + - Confirm you can reach the Gitea API with the bot creds from `.testenv` (§1.5): + `curl -s https://$GITEA_URL/api/v1/version`. The bot authenticates with + `GITEA_USERNAME`/`GITEA_PASSWORD` (basic auth) or a token you mint from them via + `POST /api/v1/users//tokens` — do **not** expect a ready-made `$GITEA_TOKEN`. + - Confirm the **preconfigured** test-app DNS (§4.0/§4.4): a random subdomain under the wildcard + resolves, e.g. `getent hosts probe-$RANDOM.ci.commoninternet.net` returns the **gateway's** IP + (not cc-ci's — the gateway TLS-passthroughs to cc-ci, so do not expect cc-ci's address; and use + `getent`, not `dig`, since this host's resolver is Tailscale-only — see §1.5). + Traefik is *not* up yet — you configure it (file provider → the pre-issued cert at + `/var/lib/ci-certs/live/`, **no ACME**); the DNS record + gateway passthrough + cert are the + preconditions, and full end-to-end HTTPS reachability is proven at M1, not now. + If the wildcard does not resolve at all, that's a `## Blocked` item (operator fixes DNS/gateway). + - If any check fails, write the failure to `STATUS.md` under `## Blocked` and stop — a human must fix access. Do **not** try to work around missing access. + +2. **Create the `cc-ci` repo** on git.autonomic.zone if it does not exist. Push an initial + skeleton (see §3 layout). The Builder clones to `/srv/cc-ci/cc-ci`; the Adversary loop keeps + its **own independent clone** at `/srv/cc-ci/cc-ci-adv`. The repo is the only channel between + the two loops (§6.1) — loop state lives inside it (`STATUS.md`, `BACKLOG.md`, etc.). + +3. **Snapshot the starting environment** into `cc-ci/docs/baseline.md`: current NixOS config on + the server (`/etc/nixos` or existing flake), installed packages, whether Docker/Swarm/abra + already exist, DNS that already points at the box. This is the rollback reference. + +4. **Seed the loop state files** (§7) if absent: `STATUS.md`, `BACKLOG.md`, `REVIEW.md`, + `JOURNAL.md`, `DECISIONS.md`. Give `BACKLOG.md` two H2 sections — `## Build backlog` + (populated from §5 milestones) and `## Adversary findings` (empty) — per the single-writer + rule in §6.1. + +5. Commit ("chore: bootstrap cc-ci loop state") and begin the loop at §7. + +--- + +## 1.5 Credentials & access — where everything lives and how to use it + +The loops run **on the sandbox host** (not on cc-ci) and reach cc-ci over Tailscale. This section +is the authoritative map of what credentials exist, where, and how to use them. **Never copy any +secret value into the repo, a commit, a log, or the dashboard** (§9) — reference locations only. + +### Provided credentials (already in place) + +| What | Where | How to use | +|---|---|---| +| **Tailscale auth key** (joins cc-ci's tailnet `taila4a0bf.ts.net`) | `/srv/cc-ci/.testenv` → `TS_AUTH_KEY` (Tailscale SaaS key, keyID ends `CNTRL`) | Used to bring up the userspace tailscaled (below). It's reusable; re-run `tailscale up` with it if the node drops. | +| **cc-ci SSH (root)** | private key `~/.ssh/cc-ci-root-ed25519`; config `Host cc-ci` in `~/.ssh/config` | Just run `ssh cc-ci` (logs in as **root**). The pubkey is already in cc-ci's `/root/.ssh/authorized_keys`. | +| **Gitea bot account** | `/srv/cc-ci/.testenv` → `GITEA_USERNAME` (`autonomic-bot`), `GITEA_PASSWORD`, `GITEA_URL` (`git.autonomic.zone`) | Basic-auth to the Gitea API, or mint a scoped token: `POST https://$GITEA_URL/api/v1/users/$GITEA_USERNAME/tokens`. Used to create/push the `cc-ci` repo, read recipe repos, comment on PRs, and register `!testme` webhooks. | + +Load them in a shell with: `set -a; . /srv/cc-ci/.testenv; set +a` (don't echo the values). + +### The Tailscale connection (how `ssh cc-ci` and the proxy work) + +cc-ci (`cc-nix-test`, **100.90.116.4**) is on a *different* tailnet than the sandbox host's default +one, so it is reached via a **second, userspace tailscaled** — this keeps the host's own tailnet +untouched. State lives in `~/.cc-ci-ts/`; it exposes a **SOCKS5/HTTP proxy on `127.0.0.1:1055`**, +which is the only route to that tailnet (userspace networking ⇒ the host OS can't route the tailnet +IPs directly). + +It runs as a **persistent systemd service** (`cc-ci-tailscaled.service`, enabled, `Restart=always`, +starts on boot; unit at `/etc/systemd/system/cc-ci-tailscaled.service`, runs as user `notplants`). +It reuses the already-authenticated state in `~/.cc-ci-ts/`, so it reconnects across reboots/crashes +without the auth key. + +- `ssh cc-ci` works out of the box (its `ProxyCommand` uses the proxy; logs in as root). +- For HTTP(S) to cc-ci / `*.ci.commoninternet.net` from the sandbox, go through the proxy, e.g. + `curl --proxy socks5h://localhost:1055 https://.ci.commoninternet.net`. +- **If connectivity is down:** `sudo systemctl restart cc-ci-tailscaled` (diagnose with + `systemctl status cc-ci-tailscaled` / `journalctl -u cc-ci-tailscaled`). A dead proxy is an access + failure to recover, not a `## Blocked`-and-stop condition — *unless* the auth key itself is + rejected (then re-auth with `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock up + --auth-key="$TS_AUTH_KEY" --hostname=cc-ci-claude-sandbox --accept-routes --accept-dns=false`, and + if that fails the key is a class-A1 blocker). +- **DNS gotcha:** this host's `/etc/resolv.conf` lists only Tailscale resolvers, so direct + `dig @1.1.1.1 …` queries get no answer and look falsely empty. Use `getent hosts ` to + resolve from the sandbox. `commoninternet.net` itself is a normal public zone hosted at **Gandi**. + +### Credentials the loop GENERATES itself (do not wait on a human for these) + +- **Drone RPC secret** and **webhook HMAC secret** — generate (`openssl rand -hex 32`), store + sops-encrypted in `secrets/`, and wire both ends. Internal shared secrets, not human inputs. +- **Gitea OAuth app for Drone** — create it under the bot account via the API + (`POST /api/v1/user/applications/oauth2`); capture client id/secret into `secrets/`. +- **cc-ci host age/GPG key for sops** — generate on the host (or derive from its SSH host key); + add as a sops recipient. Keep a recovery copy of the master age identity off-box if desired. +- **Per-recipe app secrets** (class-B, §4.4) — the harness generates these per run. + +### Credentials STILL NEEDED from the operator (class-A — block if missing, per §9) + +- **Wildcard TLS cert — PROVIDED, not a token.** The operator has pre-issued the wildcard SAN cert + (`*.ci.commoninternet.net` + `ci.commoninternet.net`) and placed it on cc-ci at + `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0). The agent points Traefik's file + provider at those paths and runs **no ACME** for this domain. **Do not request or expect a + `commoninternet.net` DNS token** — issuance/renewal is handled out-of-band by the operator (LE + 90-day cert; next renewal ~2026-08-24). A missing/expired cert is a finding for the operator, not + an agent re-issue. +- **Registry pull credentials** (e.g. Docker Hub) — *recommended* to avoid anonymous pull-rate + limits breaking deploys under load. Treat a rate-limit failure traced to this as a finding, then + request creds. Store sops-encrypted in `secrets/`. +- **Gitea bot permissions** (a grant, not a secret) — confirm `autonomic-bot` can: create/push + `recipe-maintainers/cc-ci`, read the recipe repos to be enrolled, comment on their PRs, and add + webhooks to them. If any is missing, that's a `## Blocked` item for the operator to fix. + +--- + +## 2. Definition of Done (the loop's exit condition) + +The loop terminates **only** when every item below is true *and the Adversary has independently +re-verified each one within the last 24h* (logged in `REVIEW.md` with timestamps and command +output). Partial credit does not count. + +- [ ] **D1 — Trigger.** Commenting `!testme` on any open PR in any enrolled recipe repo on + git.autonomic.zone starts a CI run for the code *at that PR's head commit* within 60s. + Other comments do not. Re-commenting re-runs. +- [ ] **D2 — Test matrix.** For a recipe under test, the run executes, as separate reported + stages: **new install**, **upgrade** (previous published version → PR version), and + **backup + restore**. All are genuine end-to-end against a really-deployed recipe (real + containers, real Traefik routing, real volumes) — no mocks, no stubs. +- [ ] **D3 — Python + Playwright.** Tests are Python. Functional assertions that require a + browser use Playwright against the live deployed app. +- [ ] **D4 — Recipe-local tests.** If the recipe repo contains its own `tests/` folder, those + tests are also discovered and run as part of the same CI run, with results merged in. +- [ ] **D5 — Per-recipe test tree.** The cc-ci repo holds `tests//` with the + install/upgrade/backup tests as Python files, plus a shared harness. Adding a new recipe is + a documented, small, repeatable operation. +- [ ] **D6 — Secrets.** App + infra secrets are handled reproducibly (committed encrypted, + decrypted on the server), documented, and rotatable. No plaintext secrets in git, logs, or + the results UI. +- [ ] **D7 — Results UX.** Each run has a stable URL with live, tail-able logs per stage and a + final pass/fail; there is an overview page listing recipes with their latest status — + look-and-feel comparable to the YunoHost app CI (`ci-apps.yunohost.org`). A PR comment links + back to its run and reflects the outcome. +- [ ] **D8 — Reproducible server.** The entire server (Drone, runner, comment bridge, swarm, + Traefik, dashboard, secrets wiring) is declared in the `cc-ci` repo's NixOS flake and can be + rebuilt from scratch onto a blank NixOS host following `docs/install.md`, verified by the + Adversary doing exactly that on a throwaway VM (or documenting why a full from-scratch + rebuild was infeasible and what was tested instead). +- [ ] **D9 — Documentation.** `README.md` + `docs/` explain architecture, how to enroll a recipe, + how to add/run tests locally, how to operate/rotate secrets, and how to debug a failed run. + A new engineer can enroll a recipe and get a green run using only the docs. +- [ ] **D10 — Proof (breadth).** At least **six real recipes** spanning the meaningful + categories have a full green run triggered by `!testme` on a real PR, with all three stages + (install / upgrade / backup+restore) actually exercised. The set must cover: + a stateless/simple app, a single-DB app, a multi-service app, an SSO/identity app, and an + object-storage/large-volume app. **Target set (all previously verified deployable):** + `hedgedoc` (simple), `cryptpad` (stateful, no external DB), `keycloak` + `authentik` + (SSO/identity, DB-backed), `lasuite-docs` and/or `lasuite-drive` (multi-service + S3/MinIO), + `matrix-synapse` (DB + media store), `immich` (large volumes + Postgres), `bluesky-pds` + (TLS-passthrough/atproto). Pick six that together satisfy the categories; record the chosen + set and per-recipe green-run evidence in `REVIEW.md`. Any recipe that genuinely cannot be CI'd + is a documented finding (in `DECISIONS.md`) with the reason, not a silent omission. + +When all of D1–D10 hold and are Adversary-verified, write `## DONE` to `STATUS.md` with the +evidence links and stop scheduling new iterations. + +--- + +## 3. Repository layout (`git.autonomic.zone/recipe-maintainers/cc-ci`) + +``` +cc-ci/ +├── README.md +├── flake.nix # NixOS host(s) + devshell +├── flake.lock +├── hosts/ +│ └── cc-ci/ +│ ├── configuration.nix # the cc-ci machine +│ └── hardware.nix +├── modules/ +│ ├── drone.nix # Drone server + runner (exec/docker) +│ ├── comment-bridge.nix # !testme webhook listener service +│ ├── swarm.nix # Docker + single-node swarm + Traefik for test apps +│ ├── dashboard.nix # results overview site +│ └── secrets.nix # sops-nix / agenix wiring +├── secrets/ # sops-encrypted (*.enc / *.age); see §4.4 +│ └── secrets.yaml +├── bridge/ # comment-bridge source (small Go/Python service) +├── runner/ # CI orchestration entrypoint invoked by Drone +│ ├── run_recipe_ci.py # top-level: deploy→test→teardown for a recipe@ref +│ └── harness/ # shared pytest fixtures (abra wrappers, app lifecycle) +├── dashboard/ # results UI generator (reads Drone API → static site) +├── tests/ +│ ├── conftest.py # shared fixtures, recipe selection, teardown guarantees +│ ├── / +│ │ ├── test_install.py +│ │ ├── test_upgrade.py +│ │ ├── test_backup.py +│ │ └── playwright/ # e2e flows for this recipe +│ └── _template/ # copy-to-add-a-recipe template +├── docs/ +│ ├── install.md # from-scratch server build (D8) +│ ├── enroll-recipe.md # how to add a recipe (D5) +│ ├── secrets.md # secret model + rotation (D6) +│ ├── architecture.md +│ ├── runbook.md # debugging failed runs +│ └── baseline.md # bootstrap snapshot +├── STATUS.md BACKLOG.md REVIEW.md JOURNAL.md DECISIONS.md # loop state (§7) +└── .drone.yml # pipeline for cc-ci's own repo (lint/self-test) +``` + +--- + +## 4. Technical design (default architecture) + +### 4.0 Domain model (where things live) + +Two DNS zones, deliberately separated — do **not** conflate them: + +- **`git.autonomic.zone` — source of truth for code (unchanged, not ours to reconfigure).** + The Gitea host: the enrolled recipe repos and the `cc-ci` config repo live here. The loop reads, + comments, and (when enrolling) adds a webhook here, but deploys **nothing** here. Per §9 this zone + is read/comment-only — never push recipe code, never point app DNS at it. +- **`commoninternet.net` — the CI server's own zone; everything CI-facing.** A wildcard + `*.ci.commoninternet.net` resolves to a **gateway** (not cc-ci directly — see Network path below). + Under it: + - **Apps under test:** each run deploys to a unique subdomain + `-pr-.ci.commoninternet.net`, so concurrent runs never collide on a + hostname. The subdomain (app, volumes, secrets, Traefik route) is torn down at run end (§4.3). + - **Results dashboard:** `ci.commoninternet.net` — overview page + per-recipe status badges (§4.5). + - **Webhook bridge:** `ci.commoninternet.net/hook` — the Gitea `issue_comment` receiver (§4.1). +- **Network path (gateway → TLS passthrough → cc-ci).** The wildcard record does **not** point at + cc-ci's IP. It points at a gateway that **passes TLS through** to cc-ci: the gateway routes by SNI + and forwards the raw encrypted stream without decrypting it, so TLS still **terminates on cc-ci's + Traefik**. Consequences the agent must respect: + - `dig .ci.commoninternet.net` returns the **gateway's** IP, not cc-ci's — do not assert the + record points at cc-ci. Reachability is proven end-to-end (an HTTPS request lands on cc-ci), + not by comparing A records. + - The gateway is assumed to passthrough the **whole wildcard**, so a fresh per-run subdomain needs + **no gateway change** and **no cert work** (the pre-issued wildcard already covers it) — the + agent only adds the Traefik **router** on cc-ci. (If the gateway + instead needs per-host config, that's an operator/gateway concern and a `## Blocked` item, not + something the agent reconfigures — the gateway is not ours, only cc-ci is, per §9.) + - The gateway is operator-managed and out of scope; the agent configures only cc-ci. + - **Caveat for TLS-passthrough recipes** (e.g. `bluesky-pds`, §2 D10): the default path terminates + TLS at cc-ci's Traefik. A recipe that expects to terminate TLS in its own container needs cc-ci's + Traefik configured to passthrough that host too (the outer gateway already passes the whole + wildcard). Treat this as a per-recipe harness quirk to absorb (§5 M6.5), or pick a non-passthrough + recipe for that D10 category and record the swap in `DECISIONS.md` — not a silent omission. +- **Wildcard TLS — operator pre-issues, agent serves it statically (no token in the agent).** + Routing and certs are separate: the preconfigured wildcard DNS solves routing only; a cert is + still needed because the gateway passes TLS through and cc-ci's Traefik terminates it. **The cert + is pre-provisioned out-of-band** so the DNS-editing token never enters the agent/repo. A wildcard + SAN cert covering **`*.ci.commoninternet.net` + `ci.commoninternet.net`** (issued via Let's + Encrypt DNS-01 against Gandi, by the operator, using a token the agent never sees) lives on cc-ci: + - `/var/lib/ci-certs/live/fullchain.pem` (leaf+intermediate) and `…/privkey.pem`. + - The agent configures **Traefik's file provider** (`tls.certificates`, `certFile`/`keyFile` + pointing at those paths) to serve it, and runs **no ACME resolver** for this domain. One cert + covers every per-run subdomain, so spinning up an app domain needs no cert work at all. + - **Renewal is a manual operator task** (LE 90-day cert): the operator re-issues out-of-band and + drops the new files at the same paths (Traefik file provider hot-reloads). The agent must **not** + attempt ACME/DNS-01 for `commoninternet.net` and must **not** expect a DNS token — a missing/ + expired cert is an operator action, surfaced as a finding, not something the agent re-issues. + (Rationale for choosing a wildcard cert over per-subdomain: a wildcard is reused for every churning + run subdomain and sidesteps LE's 50-certs/week-per-domain limit; only DNS-01 can mint a wildcard. + We keep that DNS-01 issuance with the operator rather than handing the agent the zone token.) +- Record the live facts in `docs/install.md`: the zone + DNS provider (Gandi), that the wildcard + `*.ci.commoninternet.net` (and bare `ci.commoninternet.net`) point at the **gateway**, that the + gateway TLS-passthroughs the wildcard to cc-ci, the gateway's address, the TTL, and that the + wildcard cert is pre-issued/operator-renewed at `/var/lib/ci-certs/live/` (no DNS token on cc-ci). + +### 4.1 The `!testme` trigger path + +Gitea does not natively forward PR-comment events to Drone, and Drone's built-in triggers fire on +push/PR-open, not on a magic comment. So: + +``` +PR comment "!testme" + │ Gitea webhook (issue_comment event) ──► comment-bridge (modules/comment-bridge.nix) + │ • verifies webhook HMAC secret + │ • checks comment body == "!testme" (exact, trimmed) + │ • checks commenter is allowed (org member / collaborator) + │ • resolves PR head repo + SHA via Gitea API + │ • calls Drone API: build for cc-ci pipeline, + │ params RECIPE= REF= PR= SRC= + ▼ +Drone build (cc-ci repo pipeline, parameterized) ──► runner/run_recipe_ci.py + ▼ +Bridge posts/updates a Gitea PR comment with the run URL and (on completion) pass/fail. +``` + +- The bridge is a tiny service (Go or Python+FastAPI). Keep it dependency-light; it's a NixOS + systemd service behind Traefik at e.g. `ci.commoninternet.net/hook` (§4.0). +- Enrollment = registering the Gitea webhook on a recipe repo (script in `runner/` or documented + in `enroll-recipe.md`) + ensuring a `tests//` dir exists. +- Decide and record in DECISIONS.md: one shared Gitea org-level webhook vs per-repo webhooks. + Org-level is fewer moving parts; per-repo is more explicit. Default: per-repo via enroll script. + +### 4.2 Drone + the test target + +- Drone server connects to Gitea via OAuth app (Gitea → Settings → Applications). Runner is the + **exec runner** (or a privileged docker runner) running **on cc-ci itself**, because tests must + drive `abra` to deploy real recipes onto a real swarm. +- cc-ci doubles as the **deploy target**: single-node Docker Swarm + Traefik, abra installed, + serving the `*.ci.commoninternet.net` wildcard, TLS terminated on cc-ci's Traefik using the + **pre-issued static wildcard cert** at `/var/lib/ci-certs/live/` (§4.0). The operator preconfigures + the wildcard DNS record (→ gateway), the gateway's TLS-passthrough to cc-ci, and the cert itself + (§4.4); the agent configures Traefik (file provider → that cert) and swarm on top — **no ACME**. +- Each CI run gets an isolated app domain `-pr-.ci.commoninternet.net` + (§4.0) so concurrent runs don't collide. Teardown removes app, secrets, and volumes. +- Consider a concurrency cap (1–2 deploys at a time) to avoid resource thrash; document it. + +### 4.3 The test harness & recipe test contract + +`runner/run_recipe_ci.py` orchestrates per run: +1. Fetch recipe at `$REF` (the PR head) via abra/git. +2. **Install stage** → `tests//test_install.py`: `abra app new`, generate secrets, + `abra app deploy`, wait healthy, run Playwright smoke + assertions. +3. **Upgrade stage** → deploy previous published version first, then upgrade to `$REF`; assert + data survives and app still healthy. +4. **Backup/restore stage** → `abra app backup`, mutate state, `abra app restore`, assert restored + state matches pre-mutation. +5. **Recipe-local tests (D4)** → if `/tests/` exists, discover & run it in the same + live environment; merge results. +6. **Teardown (always, even on failure)** → `abra app undeploy`, `abra app volume remove`, + `abra app secret remove`, DNS/route cleanup. + +Shared fixtures (`tests/conftest.py` + `runner/harness/`) wrap abra. **Known abra gotchas to bake +in from day one** (carried over from prior work, re-verify on the installed abra version): +- `abra app undeploy` and `abra app volume remove` do **not** accept `--chaos` → never pass it. +- Plumb a `timeout` kwarg through secret-generate/insert/remove-all calls. +- `abra app ls -S -m` returns nested `{server: {apps: [...]}}` — parse the inner structure. +- Pick robust health checks per app (e.g. Keycloak: `/realms/master`, not `/`). + +The teardown guarantee is sacred: a failed test must never leak a deployed app or volume into the +next run. Implement teardown as a pytest fixture finalizer / `try/finally` in the orchestrator and +add a janitor pass at run start that nukes any orphaned `*-pr*` apps older than N hours. + +### 4.4 Secrets (D6) + +There are **two distinct classes of secret** and they are handled in opposite ways. Do not +conflate them. + +**(A) Infra secrets.** All of these end up `sops-nix`-encrypted in `secrets/`, decrypt into the Nix +store at activation, and are never world-readable. But they split into two sub-classes — see §1.5 +for the concrete locations/usage — and only the first sub-class blocks: + +- **(A1) External inputs — provided by the operator, the loop cannot create them.** The Tailscale + auth key + Gitea bot creds (`/srv/cc-ci/.testenv`, already provided), the **pre-issued wildcard + TLS cert** at `/var/lib/ci-certs/live/` (§4.0 — *not* a DNS token; the agent serves it, never + issues it), and **registry pull creds** (if needed). If one of these is **missing or invalid, the + loop is blocked** — write it to `STATUS.md ## Blocked` and stop (§9). The agent must not invent or + work around an external input it wasn't given, and must **not** attempt ACME/DNS-01 for + `commoninternet.net`. +- **(A2) Internal secrets — the loop generates and manages these itself; never block on them.** + Drone RPC secret + webhook HMAC (`openssl rand`), the Gitea OAuth app for Drone (created via the + bot API), and the cc-ci host age/GPG key for sops. These are *not* human inputs; generate, store + in `secrets/`, and wire both ends. + +Alongside these, three **preconfigured network/cert facts** are operator-provided inputs the loop +also depends on (not secrets the agent makes, but class-A in the same "provided, don't improvise" +sense): (1) the wildcard `*.ci.commoninternet.net` record (and bare `ci.commoninternet.net`) already +points at the **gateway**, (2) the gateway **TLS-passthroughs** that wildcard to cc-ci (SNI-routed, +no decryption — see §4.0 Network path), and (3) the **pre-issued wildcard cert** is in place at +`/var/lib/ci-certs/live/`. The operator owns the DNS record, the gateway, and cert issuance/renewal; +**everything else on cc-ci is the agent's job** — Traefik (pointed at the static cert), swarm, +per-run subdomain routing, and teardown. If the wildcard does not resolve, the gateway doesn't reach +cc-ci, or the cert is missing/expired, that is a `## Blocked` condition (operator action), not +something to work around (the gateway and DNS are not ours to reconfigure, per §9). + +**(B) Recipe app secrets — generated by the test, persisted within the run.** These are NOT a +blocker and are NOT pre-provisioned by a human. The harness creates them itself for each app under +test and is responsible for persisting them across the run so the multi-stage lifecycle works: + +- **Generate at install:** the harness runs `abra app secret generate` (+ inserts any deterministic + test fixtures like an admin password / test user it chooses) when it deploys the app. +- **Persist for the run's duration:** the *same* generated secrets must survive across stages — + install → upgrade and especially **backup → restore** — because an app cannot be upgraded or + restored against rotated credentials. Persist them in a per-run secret store keyed by the run's + unique app name (e.g. `-pr-`): the live abra/swarm secrets plus a sidecar record + the harness writes (e.g. the app's `.env` + the generated values) to a run-scoped, non-public + location on the runner, so any stage can re-read them. They are emphemeral by design. +- **Destroy at teardown:** the same teardown that removes the app/volumes also runs + `abra app secret remove` (with `timeout` plumbed) and deletes the per-run sidecar. Nothing + generated for a run outlives that run. +- **How the harness should "figure out" persistence (acceptance for D6):** decide and document one + concrete mechanism — recommended default is "abra/swarm holds the live secrets; the harness keeps + a run-scoped sidecar file under a `runs//` dir on the runner (mode 600), and reloads + from it between stages." Whatever is chosen, it must (1) keep the same values stable across all + three stages, (2) isolate concurrent runs from each other, and (3) leave nothing behind. + +**(C) Drone CI tokens:** store as Drone org/repo secrets, referenced by the pipeline. Where a value +is an external input (A1, e.g. registry creds) it is provided; where it is internal (A2) it is +generated — see the (A) split above. + +Hard rule across all classes: scrub secrets from logs before they reach the dashboard; the results +UI shows sanitized logs only. Add a redaction filter in the log pipeline and an Adversary test that +greps published logs and the overview site for known secret patterns and any generated app +password. + +### 4.5 Results UX (D7) — YunoHost-CI-like + +- **Per-run logs:** Drone's native UI already gives live, per-stage, tail-able logs and a final + status — use it as the canonical run view; the PR comment links to it. +- **Overview page:** a small generator (`dashboard/`) polls the Drone API and renders a static + page at `ci.commoninternet.net` (§4.0): a table of enrolled recipes, latest run status badge + (pass/fail/running), last-tested version, link to history — mirroring the YunoHost app-list + feel. Served by Traefik; regenerated on build-completion webhook or a short timer. +- Provide a status badge endpoint per recipe for embedding in recipe READMEs. + +--- + +## 5. Milestones / initial BACKLOG + +Work top-down; each milestone ends with an **Adversary gate** (Adversary must independently +verify the acceptance check before the next milestone starts). Seed `BACKLOG.md` from this. + +- **M0 — Foundations.** Repo created; flake builds; `nixos-rebuild` (or deploy-rs) applies a + no-op-then-base config to cc-ci; sops decrypts a test secret on the host. + *Accept:* `ssh cc-ci 'systemctl is-system-running'` healthy after a rebuild from the repo. +- **M1 — Swarm + abra target.** Docker + single-node swarm + Traefik up; wildcard DNS + TLS; + abra can deploy and tear down a trivial recipe by hand. + *Accept:* a recipe deployed via abra is reachable over HTTPS at `*.ci.commoninternet.net`, then + fully torn down leaving no volumes. +- **M2 — Drone online.** Drone server+runner via Nix, OAuth to Gitea; a hello-world `.drone.yml` + in cc-ci runs green; logs visible in Drone UI. + *Accept:* push to cc-ci triggers a visible green Drone build. +- **M3 — Comment bridge.** `!testme` on a PR triggers a parameterized Drone build; bridge posts a + PR comment with the run link; non-`!testme` comments and non-collaborators are ignored. + *Accept:* live demo on a scratch PR — comment in, build out, link back, auth enforced. +- **M4 — Harness + install stage.** `run_recipe_ci.py` + conftest; install stage green for one + simple recipe end-to-end with a Playwright assertion; guaranteed teardown. + *Accept:* full green install run for recipe #1, no orphaned app/volume afterward. +- **M5 — Upgrade + backup/restore stages.** Add the other two stages for recipe #1. + *Accept:* upgrade preserves data; backup→mutate→restore returns original state. +- **M6 — Recipe-local tests (D4) + second recipe.** Discover/run recipe-repo `tests/`; enroll a + second, DB-backed recipe via the documented flow. + *Accept:* both recipes green; recipe-local tests demonstrably executed and merged. +- **M6.5 — Breadth ramp.** Enroll recipes 3→6 covering the remaining D10 categories, one at a + time, each via the documented enroll flow (this is the real test of D5: enrolling recipe N + should be template-copy + recipe-specific tests/fixtures, with **no harness surgery**). Expect + per-recipe quirks — multi-service deps, S3/MinIO config, SSO client setup, TLS passthrough, + large-volume backups — and absorb them into the *shared* harness, not one-off per-recipe hacks. + When flakiness appears, add real readiness/wait robustness to the harness rather than sprinkling + `sleep`s. Run benchmarks/long deploys **sequentially**, never in parallel (network contention). + *Accept:* recipes 3–6 each have a full three-stage green run; enrolling N≥3 needed no changes to + shared harness code. +- **M7 — Secrets hardening (D6).** Full sops model, rotation doc, log redaction + leak test. + *Accept:* Adversary's secret-grep over published logs finds nothing; rotation doc followed. +- **M8 — Dashboard (D7).** Overview page + badges + PR-comment outcome reflection. + *Accept:* overview matches reality across several runs; outcomes mirrored to PR comments. +- **M9 — Reproducibility + docs (D8/D9).** `docs/install.md` rebuilds the server from scratch on a + blank VM; all docs complete. + *Accept:* Adversary rebuilds from docs onto a throwaway host (or records the tested subset). +- **M10 — Proof (D10).** All six chosen recipes green via real `!testme` PRs (the breadth set from + M6/M6.5 carried through the hardened pipeline), each with install/upgrade/backup-restore + exercised and Adversary-verified; flip `STATUS.md` to DONE. + +--- + +## 6. The two agents + +### Builder (primary) +Implements the backlog top-down. Discipline: +- One backlog item in flight at a time. Small, committed, reversible steps. +- Every change verified against the *real* system (server, Drone, Gitea) before claiming done — + never "should work". Paste the verifying command + output into `JOURNAL.md`. +- Touch production carefully: cc-ci is the only target; never deploy test apps onto unrelated + production servers; never reuse production domains. Idempotent server changes only (via Nix). +- If blocked on access/secrets/external state, write it to `STATUS.md ## Blocked` and pick up an + unblocked item rather than hacking around it. + +### Adversary (reviewer) +Runs as a **separate, independent loop in its own process/sandbox** (see §6.1 for how the two +loops coordinate). Its job is to **disbelieve**. It: +- Re-verifies each `Definition of Done` and milestone-acceptance claim independently, from a cold + start (fresh shell, own clone, no cached state), and logs PASS/FAIL + evidence in `REVIEW.md`. +- Actively tries to break things: comment `!testmexyz` (should NOT trigger), comment as a + non-collaborator (should be rejected), push a PR that fails tests (must report red, not green), + kill an app mid-run (teardown must still clean up), grep published logs/dashboard for secrets, + run two `!testme`s concurrently (no domain/volume/secret collision), confirm the same generated + app secrets persist across install→upgrade→backup/restore. +- Files every defect as a `BACKLOG.md` item tagged `[adversary]` with repro steps. The Builder + may not close an adversary item; only the Adversary closes it after re-test. +- Has veto power over `STATUS.md → DONE`. + +### 6.1 Coordination protocol (two independent loops, one shared repo) + +The two loops never talk directly; the **git repo is the only coordination medium**. Each agent +has its own clone (e.g. Builder in `/srv/cc-ci/cc-ci`, Adversary in `/srv/cc-ci/cc-ci-adv`) and +its own pacing. To make concurrent writes conflict-free: + +- **File ownership (one writer each — the other only reads):** + - Builder owns: all source code/config, `STATUS.md`, `JOURNAL.md`, `DECISIONS.md`. + - Adversary owns: `REVIEW.md`. + - `BACKLOG.md` is split into two H2 sections — `## Build backlog` (Builder-only) and + `## Adversary findings` (Adversary-only). Each agent edits **only its own section**, so git + merges the two cleanly. Closing an item = checking the box *in your own section*; the Builder + fixes an `[adversary]` finding and notes the fix in JOURNAL, but only the Adversary ticks it + closed after re-test. +- **Append-only where possible.** `JOURNAL.md` and `REVIEW.md` are append-only logs → they never + conflict. Prefer appending over rewriting. +- **Git discipline (both loops, every write):** `git pull --rebase` before editing, make the + smallest change, commit, `git push`. On a rebase conflict, it will be inside the *other* agent's + file/section only if a rule was broken — re-pull and keep to your own files. Never `--force`. +- **Gate handshake via STATUS.md.** When the Builder believes a milestone gate is met, it sets in + `STATUS.md`: `Gate: — CLAIMED, awaiting Adversary` and stops advancing past it. The + Adversary, on its next wake, sees the claim, runs the acceptance check cold, and writes the + verdict to `REVIEW.md` (`: PASS @` with evidence, or `FAIL` + an `[adversary]` item). + The Builder only proceeds past the gate after seeing `PASS` in `REVIEW.md`. +- **DONE handshake.** Builder may write `## DONE` to `STATUS.md` **only** when `REVIEW.md` shows a + PASS dated within 24h for every D1–D10. The Adversary can write `## VETO ` to + `REVIEW.md` at any time, which forbids DONE until cleared. +- **Liveness.** If the Adversary sees a gate `CLAIMED` for too long with no Builder progress, or + the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep + doing independent work — neither loop blocks idle waiting on the other beyond its gate. + +(If you are ever forced to run with a single process, the degraded fallback is to alternate +roles per iteration and keep `JOURNAL.md` and `REVIEW.md` strictly separate — but two loops is +the intended design.) + +--- + +## 7. The Loop Protocol + +Both loops run this same shape; state lives in the repo so it survives restarts/compaction. On +every wake, `git pull --rebase` first, then: + +1. **Orient.** Read `STATUS.md` (phase, in-flight item, gate claims, blockers), `BACKLOG.md`, and + the tail of `REVIEW.md`. Reconcile with reality via cheap probes (Drone health, last build, + `git log`) — never trust the ledger blindly; if it disagrees with the system, fix the ledger + first (your own files only — see §6.1). +2. **Select.** + - *Builder:* highest-priority open item in `## Build backlog`: unresolved `[adversary]` + findings > current milestone's next task > next milestone. Never advance past a milestone gate + until `REVIEW.md` shows its PASS. + - *Adversary:* any standing `Gate: CLAIMED` in `STATUS.md` to verify > re-verify a D1–D10 + gate whose last PASS is stale (>24h) > a fresh break-it probe from §6. +3. **Act.** Smallest change that advances the item. Builder verifies against the real system; + Adversary verifies from a cold start. Commit with a clear message (author per repo convention). +4. **Record (your own files only).** *Builder:* append to `JOURNAL.md` (what you did + verifying + command/output + next), update `STATUS.md`, tick `## Build backlog`. *Adversary:* append PASS/ + FAIL + evidence to `REVIEW.md`, add/close items in `## Adversary findings`. Then `git push`. +5. **Gate handshake (§6.1).** Builder, on reaching a milestone, sets `Gate: CLAIMED, awaiting + Adversary` in `STATUS.md` and works on other unblocked items meanwhile. Adversary clears it with + a `REVIEW.md` verdict. No gate is "passed" without a logged PASS. +6. **Decide continuation.** Builder writes `## DONE` only when `REVIEW.md` shows a <24h PASS for + every D1–D10 and no standing `## VETO`. Otherwise schedule the next wake. + +**Pacing.** Use `/loop` (self-paced) or `ScheduleWakeup`. Most waits here are for things the +harness can't notify you about — a Drone build, a `nixos-rebuild`, a deploy converging — so poll +the *specific* thing: while a build/deploy is in flight, re-check on a short cadence (≈4 min) to +stay cache-warm; when genuinely idle between iterations, sleep longer (20–30 min). Don't burn +iterations spinning on a build that takes minutes. + +**Anti-drift guards.** +- Cap retries: if an approach fails 3× the same way, stop, write the dead-end in `DECISIONS.md`, + and try a different approach or mark blocked. No thrashing. +- Never weaken a test to make it pass. A red test is information; "fix" the recipe/harness or file + a finding — do not delete the assertion. (This is the single most important rule; the Adversary + watches specifically for tests being softened or skipped.) +- Keep changes reversible; prefer Nix-declared state over imperative server edits so any rebuild + reproduces it. +- Don't expand scope beyond §2. New ideas → `BACKLOG.md` (tagged `[idea]`), not into this run. + +--- + +## 8. Open decisions to settle early (log in DECISIONS.md) + +- Deploy mechanism: `nixos-rebuild --target-host` vs `deploy-rs`/`colmena`. (Default: deploy-rs + for atomic rollbacks; nixos-rebuild fine if simpler.) +- Webhook scope: per-repo vs org-level Gitea webhook. (Default: per-repo via enroll script.) +- Drone runner type: exec vs privileged docker. (Default: exec, since it must drive host abra.) +- Secret tool: sops-nix vs agenix. (Default: sops-nix for multi-recipient + yaml ergonomics.) +- Wildcard TLS: **SETTLED — operator pre-issues a wildcard cert; the agent serves it statically, no + token** (§4.0). The operator issued a wildcard SAN cert (`*.ci.commoninternet.net` + + `ci.commoninternet.net`) via LE DNS-01/Gandi out-of-band and placed it at + `/var/lib/ci-certs/live/`; the agent configures Traefik's file provider to serve it and runs no + ACME for this domain. Chosen so the DNS-editing token never enters the repo/agent. **Manual + renewal** every ~90 days (next ~2026-08-24) — operator re-issues and replaces the files in place. +- Proof recipe set (D10 — six, category-spanning). Default candidates, all previously verified + deployable: `hedgedoc`, `cryptpad`, `keycloak`, `authentik`, `lasuite-docs`/`lasuite-drive`, + `matrix-synapse`, `immich`, `bluesky-pds`. Lock the final six early so M4–M6.5 build toward them. + Sequence easy→hard: prove the pipeline on `hedgedoc`/`cryptpad` before tackling SSO, S3, media + stores, and TLS-passthrough recipes. + +Each default stands until the Adversary or reality forces a change; record the change and why. + +--- + +## 9. Guardrails / hard rules + +- **Access boundary:** only cc-ci is yours to reconfigure. Recipe repos: read + comment + (when + enrolling) add a webhook — nothing else. Never push to a recipe repo's code. +- **No secrets in git/logs/UI.** Ever. Verified by the Adversary's leak test. +- **No mocks for the e2e stages.** D2 means real deploys. If something can't be tested for real, + it's a finding, not a pass. +- **Idempotent + reversible.** Anything done to the server must be re-derivable from the repo. +- **Stop on missing *external* infra inputs** (class-A1 in §4.4: cc-ci SSH/root access, the + Tailscale auth key, Gitea bot creds, the pre-issued wildcard cert at `/var/lib/ci-certs/live/`, + registry creds — and the preconfigured DNS/gateway facts) rather than improvising around them; + surface in `STATUS.md ## Blocked`. **Never** attempt ACME/DNS-01 for `commoninternet.net` — the + cert is pre-provided and renewed out-of-band by the operator. **This does NOT apply to** internal infra secrets (class-A2: Drone RPC, + webhook HMAC, Gitea OAuth app, host age key — the agent generates these) or to recipe app secrets + (class-B): those the test harness generates itself (`abra app secret generate` + chosen fixtures), + persists for the run, and destroys at teardown — a missing app secret is never a blocker, it is + something the harness + creates. See §4.4. +- **Honest reporting.** If a stage is skipped or a check failed, say so in `STATUS.md`/`JOURNAL.md` + with the output. The loop's value depends entirely on the ledgers being true. diff --git a/cc-ci-plan/prompts/adversary.md b/cc-ci-plan/prompts/adversary.md new file mode 100644 index 0000000..f024a34 --- /dev/null +++ b/cc-ci-plan/prompts/adversary.md @@ -0,0 +1,19 @@ +You are the Adversary agent for cc-ci — one of two independent loops. Your job is to DISBELIEVE the Builder. Read /srv/cc-ci/cc-ci-plan/plan.md in full, especially §2, §6, §6.1, and §9. + +Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Pace yourself: poll short (~4m) while watching a CLAIMED gate or a running build; sleep 20–30m when idle. Keep running independent break-it probes even when no gate is pending. Stop only when STATUS.md says ## DONE and you have logged a fresh PASS for every D1–D10. + +Credentials/access: §1.5 is the authoritative map. Provided creds are in /srv/cc-ci/.testenv and ~/.ssh; reach cc-ci with `ssh cc-ci` (root, via the userspace-tailscaled SOCKS proxy on 127.0.0.1:1055), and hit the dashboard / *.ci.commoninternet.net through that proxy (`curl --proxy socks5h://localhost:1055 ...`). If the proxy is down, restart it per §1.5. Verify from a COLD START but you may rely on this shared access path. + +You run as a SEPARATE process and coordinate ONLY through the git repo per §6.1: +- Keep your OWN clone at /srv/cc-ci/cc-ci-adv. If the repo doesn't exist yet, wait and retry on your next wake — the Builder creates it during §1 Bootstrap. +- git pull --rebase before every edit; commit; push; never --force. +- Write ONLY your files: REVIEW.md and the "## Adversary findings" section of BACKLOG.md. Everything else (code, STATUS.md, JOURNAL.md, "## Build backlog") is read-only to you. + +Each wake: +1. Pull. Read STATUS.md for any "Gate: CLAIMED, awaiting Adversary". +2. Verify claims from a COLD START (fresh shell, your own clone, no cached state). Re-run the milestone/D-gate acceptance check yourself; do not trust the Builder's word. +3. Actively try to break things: !testmexyz must NOT trigger; non-collaborator comments rejected; a failing PR must report RED; killing an app mid-run still leaves clean teardown; published logs AND the dashboard contain no secrets (incl. generated app passwords); two concurrent !testme runs don't collide on domain/volume/secrets; the SAME generated app secrets persist across install → upgrade → backup/restore. +4. Record verdicts in REVIEW.md (": PASS @" + evidence, or FAIL). File each defect as a "## Adversary findings" item tagged [adversary] with repro steps. Only YOU close those, after re-test. You hold veto power: write "## VETO " to REVIEW.md to forbid DONE until cleared. +5. Push. Schedule the next wake. + +Begin: read /srv/cc-ci/cc-ci-plan/plan.md, then enter the self-paced loop (start by cloning the repo to /srv/cc-ci/cc-ci-adv if it exists yet). diff --git a/cc-ci-plan/prompts/builder.md b/cc-ci-plan/prompts/builder.md new file mode 100644 index 0000000..d032790 --- /dev/null +++ b/cc-ci-plan/prompts/builder.md @@ -0,0 +1,25 @@ +You are the Builder agent for the cc-ci project — one of two independent loops. Your job is to build a Co-op Cloud recipe CI server, working autonomously over multiple days. + +Single source of truth: /srv/cc-ci/cc-ci-plan/plan.md. Read it in full now, then begin at §1 Bootstrap. The original brief /srv/cc-ci/cc-ci-plan/brief.md is context only — do not edit it. + +Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work (see §7). Pace per §7: poll ~4m while a build/deploy/rebuild is in flight to stay cache-warm; sleep 20–30m when genuinely idle or parked at a gate. Do NOT spin on a build that takes minutes. Stop the loop only when STATUS.md says ## DONE. + +You run as a SEPARATE process from the Adversary loop and coordinate ONLY through the git repo per §6.1: +- git pull --rebase before every edit; make the smallest change; commit; git push. Never --force. +- Write ONLY your files: source/config, STATUS.md, JOURNAL.md, DECISIONS.md, and the "## Build backlog" section of BACKLOG.md. Treat REVIEW.md and "## Adversary findings" as read-only — the Adversary owns them. +- At each milestone gate, set "Gate: CLAIMED, awaiting Adversary" in STATUS.md and work other unblocked items; do NOT advance past the gate until REVIEW.md shows its PASS. +- Write "## DONE" only when REVIEW.md shows a PASS dated <24h for every D1–D10 and there is no standing "## VETO". + +Overriding rules: +- "Done" is defined ONLY by §2 (D1–D10), Adversary-verified. No self-certifying. +- Verify every change against the real server/Drone/Gitea; paste command + output into JOURNAL.md. No "should work." +- Never weaken, skip, or delete a test to make a run pass. A red test is information. +- Only cc-ci is yours to reconfigure. Never push code to recipe repos; never touch production servers/domains. Keep server state Nix-declared and reversible. +- 3rd identical failure → stop, record dead-end in DECISIONS.md, change approach or mark blocked. +- Credentials: §1.5 is the authoritative map. Provided creds are in /srv/cc-ci/.testenv (TS_AUTH_KEY, GITEA_USERNAME/PASSWORD/URL) and ~/.ssh (cc-ci-root-ed25519). Reach cc-ci with `ssh cc-ci` (root, via the userspace-tailscaled SOCKS proxy on 127.0.0.1:1055); if it fails, restart the proxy per §1.5 before declaring blocked. There is NO ready-made $GITEA_TOKEN — mint one from the bot creds if you want a token. +- Secret classes (§4.4), handled differently: + • Class A1 EXTERNAL infra inputs (cc-ci SSH/root access, TS auth key, Gitea bot creds, the pre-issued wildcard TLS cert at /var/lib/ci-certs/live/, registry creds; plus the preconfigured DNS/gateway facts): if missing/invalid → STATUS.md ## Blocked and stop. Do NOT improvise/invent. NEVER attempt ACME/DNS-01 for commoninternet.net — the cert is pre-provided and renewed out-of-band; point Traefik's file provider at /var/lib/ci-certs/live/{fullchain.pem,privkey.pem}. + • Class A2 INTERNAL infra secrets (Drone RPC, webhook HMAC, Gitea OAuth app, host age key): you GENERATE these yourself — never block on them. + • Class B RECIPE APP secrets: NOT a blocker. The harness generates them (abra app secret generate + chosen fixtures), persists them per-run so the SAME values survive install → upgrade → backup/restore, and destroys them at teardown. + +Begin: read /srv/cc-ci/cc-ci-plan/plan.md, then execute §1 Bootstrap, then enter the self-paced loop. diff --git a/references/recipe-maintainer b/references/recipe-maintainer new file mode 120000 index 0000000..882e8a1 --- /dev/null +++ b/references/recipe-maintainer @@ -0,0 +1 @@ +/srv/recipe-maintainer/ \ No newline at end of file