Initial commit: cc-ci autonomous orchestrator

Planning + launch + setup material for the cc-ci Co-op Cloud recipe CI server:
plan.md (single source of truth), kickoff/launch supervision, and the
Builder/Adversary loop prompts. Secrets (.testenv) and runtime dirs are gitignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 20:46:28 +01:00
commit bdc78da921
9 changed files with 1107 additions and 0 deletions

12
.gitignore vendored Normal file
View File

@ -0,0 +1,12 @@
# Secrets — NEVER commit
.testenv
*.tfstate
*.tfstate.*
*.key
*.pem
# Loop runtime / working clones (created at launch by launch.sh)
/cc-ci/
/cc-ci-adv/
/.cc-ci-watch/
/.cc-ci-logs/

34
cc-ci-plan/README.md Normal file
View File

@ -0,0 +1,34 @@
# cc-ci-plan
Self-contained handoff package for building the **cc-ci** Co-op Cloud recipe CI server with two
autonomous Claude loops (a Builder and an adversarial Reviewer) running over days.
## Start here
1. Read **`plan.md`** — the full plan and single source of truth (mission, Definition of Done,
architecture, milestones, the two-agent coordination protocol, loop discipline).
2. Read **`kickoff.md`** — how to launch and supervise the loops.
3. Run **`./launch.sh start`** to bring up both loops + the watchdog.
## Files
| File | Purpose |
|---|---|
| `plan.md` | The plan. Agents treat it as their single source of truth. |
| `brief.md` | The original one-page brief (context only; `plan.md` supersedes it). |
| `kickoff.md` | Launch & supervision guide. |
| `launch.sh` | Starts both loops + a watchdog; restarts dead loops; stops on `## DONE`. |
| `prompts/builder.md` | Builder loop prompt (fed to `claude` by the script). |
| `prompts/adversary.md` | Adversary loop prompt. |
## Before launching
- Set the org in `plan.md` (`git.autonomic.zone/recipe-maintainers/cc-ci`) and lock the six proof recipes (§8).
- Ensure the launching shell has: SSH+sudo to `cc-ci`, the Gitea token, `git.autonomic.zone` access.
- Preconfigure test-app DNS + TLS (plan §4.0): point a wildcard `*.ci.commoninternet.net` record at a gateway that TLS-passthroughs to cc-ci, and **pre-issue the wildcard cert** (`*.ci.commoninternet.net` + `ci.commoninternet.net`, via Gandi DNS-01) into `/var/lib/ci-certs/live/` on cc-ci. The agent handles everything else on cc-ci (Traefik file provider → that cert, swarm, routing) and does **no ACME**; renewal (~90 days) is an out-of-band operator task, so the DNS token never goes to the agent.
- `export CC_CI_REPO=https://git.autonomic.zone/recipe-maintainers/cc-ci.git` so the watchdog can detect `## DONE`.
## What "done" means
The loops stop only when all of `plan.md` §2 (D1D10) hold **and** the Adversary has independently
re-verified each within 24h. The watchdog then tears the loops down automatically.

35
cc-ci-plan/brief.md Normal file
View File

@ -0,0 +1,35 @@
we are working on making a CI server
I want you to work in an autonomous loop over the next few days until the CI server is fully functional, polished and documented
on any PR on git.autonomic.zone it should be invokable by writing !testme as a comment
this should invoke the set of CI tests to be run for the recipe code at that PR
the CI tests should be run via drone
the tests run for a recipe should be written in python. e2e testing via playwright should be used whe necessary to confirm functionality
there should be tests which test
- new install
- upgrade
- backups (including restore)
all the tests should be fully e2e, with a real deployed recipe
the CI runner should be deployed on a server called cc-ci which is running nixos
cc-ci git repo should also live on git.autonomic.zone which contains all the nix configuration for the server, as well as the code for the CI test runner
the CI test runner should have its own folder of tests, with one folder for each recipe, with each of those folders containg a set of tests as python files which get invoked for that recipe
secrets should also be handled in a reasonable and repeatable way
additionally, if a recipe repo itself contains a tests folder in the recipe, the CI runner should also invoke those tests as part of the CI run for those tests
the results of the test run should be easily viewable, with trackable logs, and a final result, very similar in style to the way the yunohost CI runner looks and feels
you will have ssh access to cc-ci server, as well as sudo access there
you will also have access to create and modify repos on git.autonomic.zone

143
cc-ci-plan/kickoff.md Normal file
View File

@ -0,0 +1,143 @@
# cc-ci — Kickoff & Launch
Everything needed to start the autonomous cc-ci build loop. The substance lives in `plan.md`;
this file explains how to launch and supervise the two agents.
## Folder contents
```
cc-ci-plan/
├── plan.md # THE plan — single source of truth (read this in full)
├── brief.md # original one-page brief (context only; superseded by plan.md)
├── kickoff.md # this file — how to launch & supervise
├── launch.sh # starts both loops + watchdog, stops on ## DONE
└── prompts/
├── builder.md # Builder loop prompt (fed to claude by launch.sh)
└── adversary.md # Adversary loop prompt
```
> Note: `/srv/cc-ci/cc-ci-plan/` (this folder) is the **planning + launch material**. The actual
> CI project — NixOS config, runner, tests — lives in a **separate git repo** the Builder creates
> at `git.autonomic.zone/recipe-maintainers/cc-ci`, cloned to `/srv/cc-ci/cc-ci` (Builder) and
> `/srv/cc-ci/cc-ci-adv` (Adversary). Don't confuse the two.
## Model: two independent loops (plan §6 / §6.1)
- **Builder** — builds the CI server; owns code + `STATUS.md`/`JOURNAL.md`/`DECISIONS.md` + the
`## Build backlog` section of `BACKLOG.md`.
- **Adversary** — independently disbelieves and re-verifies; owns `REVIEW.md` + `## Adversary
findings`. Holds veto over `## DONE`.
They run as two separate processes and coordinate **only** through the git repo. Single-writer file
ownership keeps concurrent pushes merge-clean.
## Two layers of "looping" — and why you want both
| Concern | Mechanism | Who provides it |
|---|---|---|
| **Iteration** — keep doing one unit of work, then wake again | `/loop` self-paced (ScheduleWakeup), per plan §7 pacing | each agent, in-session |
| **Resilience** — restart a loop whose process/sandbox died; stop all on `## DONE` | `launch.sh` watchdog (tmux + git poll) | this script |
`/loop` alone is bound to its process: if the sandbox restarts, that loop is gone until something
relaunches it. The watchdog is that something. Use both.
## Launch
```bash
cd /srv/cc-ci/cc-ci-plan
# Optional but recommended once the repo exists, so the watchdog can detect ## DONE:
export CC_CI_REPO=https://git.autonomic.zone/recipe-maintainers/cc-ci.git
./launch.sh start # starts cc-ci-builder + cc-ci-adv + cc-ci-watchdog (tmux sessions)
./launch.sh status # session + DONE state
./launch.sh logs builder # tail a loop; also: logs adversary | logs watchdog
tmux attach -t cc-ci-builder # watch a loop live locally (detach: Ctrl-b d)
./launch.sh stop # stop everything
```
`launch.sh` is idempotent — re-running `start` won't duplicate a live session. Each agent runs as an
**interactive** `claude` in tmux (kickoff prompt passed as a positional arg, *not* piped — piping
forces print mode and breaks `/loop`). With `REMOTE_CONTROL=1` (default) each agent is launched with
`--remote-control`, so you can **watch and steer both loops from [claude.ai/code](https://claude.ai/code)**
(or the Claude mobile app) — not just via `tmux attach`. The box must be logged into the claude.ai
account (`claude auth status`); set `REMOTE_CONTROL=0` to skip the remote surface. The watchdog
(default every 300s) restarts any dead session — note a >~10-min network outage will exit the
`claude` process, after which the watchdog brings it back (a fresh remote-control session) — and
when `STATUS.md` shows `## DONE`, it kills the loops and exits.
Prerequisites the sessions inherit from your shell: SSH (root) to `cc-ci` via the Tailscale proxy
(§1.5), Gitea bot creds, and `git.autonomic.zone` access. Plus **preconfigured** operator inputs the
loop depends on (plan §4.0/§4.4): the wildcard `*.ci.commoninternet.net` DNS record pointing at a
gateway that TLS-passthroughs to cc-ci, and the **pre-issued wildcard cert** at
`/var/lib/ci-certs/live/` on cc-ci. The operator owns the DNS record + gateway + cert
issuance/renewal; the agent builds Traefik (file provider → that cert) + routing on cc-ci and does
**no ACME**. If any prerequisite is absent, the Builder parks at `STATUS.md ## Blocked` (plan §1/§9)
rather than improvise.
> Host deps: `launch.sh` needs **tmux** (and `claude`) — tmux is installed on this sandbox host
> (3.5a). On a fresh host: `sudo apt-get install -y tmux`. The script's `*_DIR`
> defaults now point at `/srv/cc-ci/...` (Builder clone `/srv/cc-ci/cc-ci`, Adversary
> `/srv/cc-ci/cc-ci-adv`); override the `*_DIR` env vars only if your layout differs.
## Optional: a cloud-side `/schedule` watchdog
`launch.sh`'s watchdog is itself a local process — if the *whole host* goes down it stops too. For
belt-and-suspenders durability, also create a `/schedule` routine (a remote agent that fires on a
cron and re-orients from the repo). From inside a Claude session:
```
/schedule every 2 hours: read /srv/cc-ci/cc-ci-plan/plan.md §7 and the cc-ci repo STATUS.md; if the
Builder/Adversary loops are not making progress (or launch.sh is not running), restart them via
/srv/cc-ci/cc-ci-plan/launch.sh start; stop when STATUS.md says ## DONE.
```
This complements the local watchdog: scheduled runs are fresh, independent agents, so they survive
process/context death that would take the in-session `/loop` and the local watchdog with it.
## Fallback: restart/recreate the cc-ci VM (orchestrator only)
**This is primarily an escape hatch for *you*, the supervising orchestrator.** The loops normally
reconfigure cc-ci only from inside (via Nix); power-cycling or recreating the VM shouldn't be their
default move — but it's not forbidden if one gets genuinely stuck. Reach for this when cc-ci itself
is wedged at a level that can't be fixed from inside (won't boot, disk full, swarm/Docker corrupted,
unreachable even after a proxy restart): use the Incus skill to power-cycle or rebuild the VM, then
re-bootstrap.
`cc-nix-test` (the cc-ci server, tailnet `100.90.116.4`) is a **NixOS Incus VM** on host **b1**
(`100.117.251.31:8443`, Incus project `terraform-ci`). Skill + Terraform live at
`/srv/incus-terraform-nix-vm-creator/` (`skills/incus-terraform/SKILL.md`); read that for full usage.
- **Access:** b1 is on the *same* cc-ci tailnet, so reach the Incus API through the existing
`cc-ci-tailscaled` SOCKS proxy (`127.0.0.1:1055`) with the mTLS certs in that repo's
`terraform-secrets/` — no second tailscaled needed. Quick check:
```bash
CRT=/srv/incus-terraform-nix-vm-creator/terraform-secrets/terraform.crt
KEY=/srv/incus-terraform-nix-vm-creator/terraform-secrets/terraform.key
curl --proxy socks5h://localhost:1055 --cert "$CRT" --key "$KEY" -k -s \
https://100.117.251.31:8443/1.0/instances/cc-nix-test/state?project=terraform-ci
```
- **Soft restart (keeps the disk — preferred):** `POST .../1.0/instances/cc-nix-test/state?project=terraform-ci`
with `{"action":"restart"}` (or `"stop"` / `"start"`).
- **Full recreate (last resort):** the Terraform module in `/srv/incus-terraform-nix-vm-creator/projects/`
(`terraform apply` with `-var incus_remote_address=100.117.251.31 -var incus_project=terraform-ci
-var ts_auth_key=$TSKEY`). ⚠ **Recreating wipes the VM disk** — you must then re-apply the cc-ci
preconditions: the pre-issued TLS cert into `/var/lib/ci-certs/live/` and the
`cc-ci-root-ed25519` pubkey into root's `authorized_keys` (see the access notes), and the loops
re-run §1 Bootstrap. Prefer a soft restart; only recreate if the VM is truly unrecoverable.
(Project cap: keep total RAM across `terraform-ci` instances under 10 GB — check before recreating.)
## Manual launch (no script)
If you'd rather not use `launch.sh`, start each agent interactively yourself (same result, no
supervision/restart), passing the prompt as a positional argument so the session stays interactive
and remote-controllable:
```bash
claude --remote-control 'cc-ci-builder' --dangerously-skip-permissions "$(cat prompts/builder.md)"
claude --remote-control 'cc-ci-adv' --dangerously-skip-permissions "$(cat prompts/adversary.md)"
```
Do **not** pipe the prompt (`cat prompts/builder.md | claude …`) — that forces print/headless mode,
which breaks `/loop` and remote control.

203
cc-ci-plan/launch.sh Executable file
View File

@ -0,0 +1,203 @@
#!/usr/bin/env bash
#
# launch.sh — start and supervise the two cc-ci autonomous loops + a watchdog.
#
# Model (see plan.md §6 / §6.1): two INDEPENDENT Claude Code sessions —
# • Builder (tmux session: cc-ci-builder) working clone /srv/cc-ci/cc-ci
# • Adversary (tmux session: cc-ci-adv) working clone /srv/cc-ci/cc-ci-adv
# coordinating only through the git repo on git.autonomic.zone.
#
# Each agent self-paces with a `/loop` (ScheduleWakeup) — that handles ITERATION.
# This script's watchdog handles RESILIENCE: it restarts a session that has died
# and stops everything once STATUS.md reports "## DONE".
#
# Usage:
# ./launch.sh start # start both loops + watchdog (idempotent)
# ./launch.sh watchdog # run only the supervision loop in the foreground
# ./launch.sh status # show session + DONE state
# ./launch.sh logs builder|adversary|watchdog # tail a session/log
# ./launch.sh stop # stop both loops + watchdog
#
# Configure via env vars (defaults below). At minimum set CC_CI_REPO once the
# Builder has created the repo, so the watchdog can detect DONE.
set -euo pipefail
# ----- config -------------------------------------------------------------
PLAN_DIR="${PLAN_DIR:-/srv/cc-ci/cc-ci-plan}"
CLAUDE_BIN="${CLAUDE_BIN:-claude}"
# Flags for unattended operation in a sandbox. Override if your setup differs.
CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
# REMOTE_CONTROL=1 launches each agent as an INTERACTIVE session with --remote-control,
# viewable/steerable at claude.ai/code (and the Claude mobile app). This is required for
# /loop + ScheduleWakeup to work at all (they are interactive-only — a piped/print-mode
# session cannot self-pace). Set REMOTE_CONTROL=0 for a plain interactive session with no
# remote surface. The box must be logged into the claude.ai account (run `claude` once to
# check `claude auth status`). Each agent gets its own RC session named after its tmux session.
REMOTE_CONTROL="${REMOTE_CONTROL:-1}"
BUILDER_DIR="${BUILDER_DIR:-/srv/cc-ci/cc-ci}" # Builder's repo clone (it creates this)
ADV_DIR="${ADV_DIR:-/srv/cc-ci/cc-ci-adv}" # Adversary's repo clone
WATCH_DIR="${WATCH_DIR:-/srv/cc-ci/.cc-ci-watch}" # tiny clone the watchdog reads STATUS.md from
LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
CC_CI_REPO="${CC_CI_REPO:-https://git.autonomic.zone/recipe-maintainers/cc-ci.git}" # CI project repo (DONE detection); harmless until the Builder creates it
CC_CI_BRANCH="${CC_CI_BRANCH:-main}"
WATCH_INTERVAL="${WATCH_INTERVAL:-300}" # seconds between watchdog checks
BUILDER_SESSION="cc-ci-builder"
ADV_SESSION="cc-ci-adv"
WATCHDOG_SESSION="cc-ci-watchdog"
# --------------------------------------------------------------------------
log() { printf '[launch %(%H:%M:%S)T] %s\n' -1 "$*"; }
die() { log "ERROR: $*"; exit 1; }
need() { command -v "$1" >/dev/null 2>&1 || die "missing dependency: $1"; }
preflight() {
need tmux
command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)"
[[ -f "$PLAN_DIR/prompts/builder.md" ]] || die "missing $PLAN_DIR/prompts/builder.md"
[[ -f "$PLAN_DIR/prompts/adversary.md" ]] || die "missing $PLAN_DIR/prompts/adversary.md"
mkdir -p "$LOG_DIR"
}
session_alive() { tmux has-session -t "$1" 2>/dev/null; }
# Start one agent loop in its own tmux session, cd'd into its working dir, with
# the kickoff prompt passed to claude as a positional argument (see below for why
# not stdin).
start_agent() {
local session="$1" workdir="$2" prompt_file="$3"
if session_alive "$session"; then
log "$session already running — leaving it"
return 0
fi
mkdir -p "$workdir"
log "starting $session (cwd=$workdir, remote_control=$REMOTE_CONTROL)"
# tmux gives claude a real PTY, so we run claude INTERACTIVELY (required for /loop +
# ScheduleWakeup). The kickoff prompt is passed as a POSITIONAL argument via an inner
# `$(cat ...)` — NOT piped on stdin, because piping forces print/headless mode which
# breaks both interactivity and --remote-control. The `\$(...)` defers to the inner shell
# so the whole multi-line prompt arrives as a single argument.
local rc=""
[[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$session'"
tmux new-session -d -s "$session" -c "$workdir" \
"$CLAUDE_BIN $rc $CLAUDE_FLAGS \"\$(cat '$prompt_file')\""
# Log the pane WITHOUT redirecting claude's stdout: a `>>log` redirect makes stdout a
# non-tty and drops claude out of interactive/remote-control mode. pipe-pane mirrors the
# live pane to the log file while claude keeps the PTY tmux gave it.
tmux pipe-pane -o -t "$session" "cat >> '$LOG_DIR/$session.log'"
}
start_loops() {
start_agent "$BUILDER_SESSION" "$BUILDER_DIR" "$PLAN_DIR/prompts/builder.md"
start_agent "$ADV_SESSION" "$ADV_DIR" "$PLAN_DIR/prompts/adversary.md"
}
# Returns 0 (true) if the repo's STATUS.md contains a "## DONE" heading.
is_done() {
[[ -n "$CC_CI_REPO" ]] || return 1
if [[ ! -d "$WATCH_DIR/.git" ]]; then
git clone --depth 1 --branch "$CC_CI_BRANCH" "$CC_CI_REPO" "$WATCH_DIR" >/dev/null 2>&1 || return 1
fi
git -C "$WATCH_DIR" fetch --depth 1 origin "$CC_CI_BRANCH" >/dev/null 2>&1 || return 1
git -C "$WATCH_DIR" reset --hard "origin/$CC_CI_BRANCH" >/dev/null 2>&1 || return 1
grep -qE '^##[[:space:]]+DONE' "$WATCH_DIR/STATUS.md" 2>/dev/null
}
watchdog_loop() {
log "watchdog up (interval=${WATCH_INTERVAL}s, repo=${CC_CI_REPO:-<unset: DONE-detection disabled>})"
while true; do
# 1) DONE? then wind everything down.
if is_done; then
log "STATUS.md reports ## DONE — stopping loops."
stop_loops
log "watchdog exiting (project complete)."
exit 0
fi
# 2) restart any dead loop (resilience the in-session /loop can't provide).
if ! session_alive "$BUILDER_SESSION"; then
log "builder session gone — restarting"
start_agent "$BUILDER_SESSION" "$BUILDER_DIR" "$PLAN_DIR/prompts/builder.md"
fi
if ! session_alive "$ADV_SESSION"; then
log "adversary session gone — restarting"
start_agent "$ADV_SESSION" "$ADV_DIR" "$PLAN_DIR/prompts/adversary.md"
fi
sleep "$WATCH_INTERVAL"
done
}
start_watchdog() {
if session_alive "$WATCHDOG_SESSION"; then
log "watchdog already running"
return 0
fi
log "starting watchdog"
tmux new-session -d -s "$WATCHDOG_SESSION" -c "$PLAN_DIR" \
"exec >>'$LOG_DIR/watchdog.log' 2>&1; '$0' watchdog"
}
stop_loops() {
for s in "$BUILDER_SESSION" "$ADV_SESSION"; do
if session_alive "$s"; then log "killing $s"; tmux kill-session -t "$s" || true; fi
done
}
cmd_status() {
for s in "$BUILDER_SESSION" "$ADV_SESSION" "$WATCHDOG_SESSION"; do
if session_alive "$s"; then echo " $s: RUNNING"; else echo " $s: stopped"; fi
done
if [[ -n "$CC_CI_REPO" ]]; then
if is_done; then echo " project: ## DONE"; else echo " project: in progress"; fi
else
echo " project: (CC_CI_REPO unset — DONE-detection disabled)"
fi
}
case "${1:-}" in
start)
preflight
start_loops
start_watchdog
log "started. inspect with: ./launch.sh status | attach: tmux attach -t $BUILDER_SESSION"
;;
watchdog) preflight; watchdog_loop ;;
status) cmd_status ;;
logs)
case "${2:-}" in
builder) tail -f "$LOG_DIR/$BUILDER_SESSION.log" ;;
adversary) tail -f "$LOG_DIR/$ADV_SESSION.log" ;;
watchdog) tail -f "$LOG_DIR/watchdog.log" ;;
*) die "usage: $0 logs builder|adversary|watchdog" ;;
esac
;;
stop)
stop_loops
if session_alive "$WATCHDOG_SESSION"; then log "killing $WATCHDOG_SESSION"; tmux kill-session -t "$WATCHDOG_SESSION" || true; fi
log "stopped."
;;
*)
cat <<EOF
cc-ci loop launcher
$0 start start both loops + watchdog (idempotent)
$0 status show session + DONE state
$0 logs builder|adversary|watchdog tail a log
$0 stop stop everything
$0 watchdog run supervision loop in foreground
Key env vars (current value):
CC_CI_REPO = ${CC_CI_REPO:-<unset — set to enable DONE detection>}
CLAUDE_BIN = $CLAUDE_BIN
CLAUDE_FLAGS = $CLAUDE_FLAGS
REMOTE_CONTROL = $REMOTE_CONTROL (1 = interactive --remote-control, viewable at claude.ai/code)
BUILDER_DIR = $BUILDER_DIR
ADV_DIR = $ADV_DIR
WATCH_INTERVAL = ${WATCH_INTERVAL}s
EOF
;;
esac

635
cc-ci-plan/plan.md Normal file
View File

@ -0,0 +1,635 @@
# cc-ci — Co-op Cloud Recipe CI Server (Autonomous Build Plan)
**Status:** ACTIVE — autonomous loop
**Owner agent:** Builder (primary) + Adversary (reviewer)
**Source brief:** `brief.md` (do not edit; this file supersedes it)
**This file's canonical path:** `/srv/cc-ci/cc-ci-plan/plan.md`
**Target server:** `cc-ci` (NixOS)
**Code/config home:** `git.autonomic.zone/recipe-maintainers/cc-ci` (the CI project repo — distinct from this
`/srv/cc-ci/cc-ci-plan/` planning+launch folder)
**Last updated:** keep current via `STATUS.md` (see §7)
---
## 0. How to read this document
This plan is written to be handed to an **autonomous Claude agent running in a sandbox over
several days**, driving itself in a loop until the CI server is "done" per §2. A second agent
(the **Adversary**) independently tries to disprove every "done" claim. Neither agent is
trusted to mark its own work complete.
If you are an agent waking up into this loop for the first time, go straight to **§1 Bootstrap**.
On every subsequent wake, go to **§7 The Loop Protocol** and continue from `STATUS.md`.
The rest of the document (§3§6) is the technical design. Treat it as the default architecture,
but you are allowed to revise it when reality disagrees — record any deviation in `DECISIONS.md`
with a one-line rationale.
---
## 1. Bootstrap (first wake only)
Do these in order. Each step is idempotent; re-running is safe.
1. **Verify access.** (Full credential map + how each is used is in **§1.5** — read it first.)
- `ssh cc-ci 'hostname && whoami'` — you log in as **root** on cc-ci (NixOS), so there is no
separate sudo step. `ssh cc-ci` is preconfigured to tunnel through the userspace-tailscaled
SOCKS proxy (§1.5); if it fails, the proxy/daemon is probably down — restart it (§1.5) before
declaring blocked.
- `ssh cc-ci 'nixos-version'` — confirm NixOS.
- Confirm you can reach the Gitea API with the bot creds from `.testenv` (§1.5):
`curl -s https://$GITEA_URL/api/v1/version`. The bot authenticates with
`GITEA_USERNAME`/`GITEA_PASSWORD` (basic auth) or a token you mint from them via
`POST /api/v1/users/<user>/tokens` — do **not** expect a ready-made `$GITEA_TOKEN`.
- Confirm the **preconfigured** test-app DNS (§4.0/§4.4): a random subdomain under the wildcard
resolves, e.g. `getent hosts probe-$RANDOM.ci.commoninternet.net` returns the **gateway's** IP
(not cc-ci's — the gateway TLS-passthroughs to cc-ci, so do not expect cc-ci's address; and use
`getent`, not `dig`, since this host's resolver is Tailscale-only — see §1.5).
Traefik is *not* up yet — you configure it (file provider → the pre-issued cert at
`/var/lib/ci-certs/live/`, **no ACME**); the DNS record + gateway passthrough + cert are the
preconditions, and full end-to-end HTTPS reachability is proven at M1, not now.
If the wildcard does not resolve at all, that's a `## Blocked` item (operator fixes DNS/gateway).
- If any check fails, write the failure to `STATUS.md` under `## Blocked` and stop — a human must fix access. Do **not** try to work around missing access.
2. **Create the `cc-ci` repo** on git.autonomic.zone if it does not exist. Push an initial
skeleton (see §3 layout). The Builder clones to `/srv/cc-ci/cc-ci`; the Adversary loop keeps
its **own independent clone** at `/srv/cc-ci/cc-ci-adv`. The repo is the only channel between
the two loops (§6.1) — loop state lives inside it (`STATUS.md`, `BACKLOG.md`, etc.).
3. **Snapshot the starting environment** into `cc-ci/docs/baseline.md`: current NixOS config on
the server (`/etc/nixos` or existing flake), installed packages, whether Docker/Swarm/abra
already exist, DNS that already points at the box. This is the rollback reference.
4. **Seed the loop state files** (§7) if absent: `STATUS.md`, `BACKLOG.md`, `REVIEW.md`,
`JOURNAL.md`, `DECISIONS.md`. Give `BACKLOG.md` two H2 sections — `## Build backlog`
(populated from §5 milestones) and `## Adversary findings` (empty) — per the single-writer
rule in §6.1.
5. Commit ("chore: bootstrap cc-ci loop state") and begin the loop at §7.
---
## 1.5 Credentials & access — where everything lives and how to use it
The loops run **on the sandbox host** (not on cc-ci) and reach cc-ci over Tailscale. This section
is the authoritative map of what credentials exist, where, and how to use them. **Never copy any
secret value into the repo, a commit, a log, or the dashboard** (§9) — reference locations only.
### Provided credentials (already in place)
| What | Where | How to use |
|---|---|---|
| **Tailscale auth key** (joins cc-ci's tailnet `taila4a0bf.ts.net`) | `/srv/cc-ci/.testenv``TS_AUTH_KEY` (Tailscale SaaS key, keyID ends `CNTRL`) | Used to bring up the userspace tailscaled (below). It's reusable; re-run `tailscale up` with it if the node drops. |
| **cc-ci SSH (root)** | private key `~/.ssh/cc-ci-root-ed25519`; config `Host cc-ci` in `~/.ssh/config` | Just run `ssh cc-ci` (logs in as **root**). The pubkey is already in cc-ci's `/root/.ssh/authorized_keys`. |
| **Gitea bot account** | `/srv/cc-ci/.testenv``GITEA_USERNAME` (`autonomic-bot`), `GITEA_PASSWORD`, `GITEA_URL` (`git.autonomic.zone`) | Basic-auth to the Gitea API, or mint a scoped token: `POST https://$GITEA_URL/api/v1/users/$GITEA_USERNAME/tokens`. Used to create/push the `cc-ci` repo, read recipe repos, comment on PRs, and register `!testme` webhooks. |
Load them in a shell with: `set -a; . /srv/cc-ci/.testenv; set +a` (don't echo the values).
### The Tailscale connection (how `ssh cc-ci` and the proxy work)
cc-ci (`cc-nix-test`, **100.90.116.4**) is on a *different* tailnet than the sandbox host's default
one, so it is reached via a **second, userspace tailscaled** — this keeps the host's own tailnet
untouched. State lives in `~/.cc-ci-ts/`; it exposes a **SOCKS5/HTTP proxy on `127.0.0.1:1055`**,
which is the only route to that tailnet (userspace networking ⇒ the host OS can't route the tailnet
IPs directly).
It runs as a **persistent systemd service** (`cc-ci-tailscaled.service`, enabled, `Restart=always`,
starts on boot; unit at `/etc/systemd/system/cc-ci-tailscaled.service`, runs as user `notplants`).
It reuses the already-authenticated state in `~/.cc-ci-ts/`, so it reconnects across reboots/crashes
without the auth key.
- `ssh cc-ci` works out of the box (its `ProxyCommand` uses the proxy; logs in as root).
- For HTTP(S) to cc-ci / `*.ci.commoninternet.net` from the sandbox, go through the proxy, e.g.
`curl --proxy socks5h://localhost:1055 https://<app>.ci.commoninternet.net`.
- **If connectivity is down:** `sudo systemctl restart cc-ci-tailscaled` (diagnose with
`systemctl status cc-ci-tailscaled` / `journalctl -u cc-ci-tailscaled`). A dead proxy is an access
failure to recover, not a `## Blocked`-and-stop condition — *unless* the auth key itself is
rejected (then re-auth with `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock up
--auth-key="$TS_AUTH_KEY" --hostname=cc-ci-claude-sandbox --accept-routes --accept-dns=false`, and
if that fails the key is a class-A1 blocker).
- **DNS gotcha:** this host's `/etc/resolv.conf` lists only Tailscale resolvers, so direct
`dig @1.1.1.1 …` queries get no answer and look falsely empty. Use `getent hosts <name>` to
resolve from the sandbox. `commoninternet.net` itself is a normal public zone hosted at **Gandi**.
### Credentials the loop GENERATES itself (do not wait on a human for these)
- **Drone RPC secret** and **webhook HMAC secret** — generate (`openssl rand -hex 32`), store
sops-encrypted in `secrets/`, and wire both ends. Internal shared secrets, not human inputs.
- **Gitea OAuth app for Drone** — create it under the bot account via the API
(`POST /api/v1/user/applications/oauth2`); capture client id/secret into `secrets/`.
- **cc-ci host age/GPG key for sops** — generate on the host (or derive from its SSH host key);
add as a sops recipient. Keep a recovery copy of the master age identity off-box if desired.
- **Per-recipe app secrets** (class-B, §4.4) — the harness generates these per run.
### Credentials STILL NEEDED from the operator (class-A — block if missing, per §9)
- **Wildcard TLS cert — PROVIDED, not a token.** The operator has pre-issued the wildcard SAN cert
(`*.ci.commoninternet.net` + `ci.commoninternet.net`) and placed it on cc-ci at
`/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0). The agent points Traefik's file
provider at those paths and runs **no ACME** for this domain. **Do not request or expect a
`commoninternet.net` DNS token** — issuance/renewal is handled out-of-band by the operator (LE
90-day cert; next renewal ~2026-08-24). A missing/expired cert is a finding for the operator, not
an agent re-issue.
- **Registry pull credentials** (e.g. Docker Hub) — *recommended* to avoid anonymous pull-rate
limits breaking deploys under load. Treat a rate-limit failure traced to this as a finding, then
request creds. Store sops-encrypted in `secrets/`.
- **Gitea bot permissions** (a grant, not a secret) — confirm `autonomic-bot` can: create/push
`recipe-maintainers/cc-ci`, read the recipe repos to be enrolled, comment on their PRs, and add
webhooks to them. If any is missing, that's a `## Blocked` item for the operator to fix.
---
## 2. Definition of Done (the loop's exit condition)
The loop terminates **only** when every item below is true *and the Adversary has independently
re-verified each one within the last 24h* (logged in `REVIEW.md` with timestamps and command
output). Partial credit does not count.
- [ ] **D1 — Trigger.** Commenting `!testme` on any open PR in any enrolled recipe repo on
git.autonomic.zone starts a CI run for the code *at that PR's head commit* within 60s.
Other comments do not. Re-commenting re-runs.
- [ ] **D2 — Test matrix.** For a recipe under test, the run executes, as separate reported
stages: **new install**, **upgrade** (previous published version → PR version), and
**backup + restore**. All are genuine end-to-end against a really-deployed recipe (real
containers, real Traefik routing, real volumes) — no mocks, no stubs.
- [ ] **D3 — Python + Playwright.** Tests are Python. Functional assertions that require a
browser use Playwright against the live deployed app.
- [ ] **D4 — Recipe-local tests.** If the recipe repo contains its own `tests/` folder, those
tests are also discovered and run as part of the same CI run, with results merged in.
- [ ] **D5 — Per-recipe test tree.** The cc-ci repo holds `tests/<recipe>/` with the
install/upgrade/backup tests as Python files, plus a shared harness. Adding a new recipe is
a documented, small, repeatable operation.
- [ ] **D6 — Secrets.** App + infra secrets are handled reproducibly (committed encrypted,
decrypted on the server), documented, and rotatable. No plaintext secrets in git, logs, or
the results UI.
- [ ] **D7 — Results UX.** Each run has a stable URL with live, tail-able logs per stage and a
final pass/fail; there is an overview page listing recipes with their latest status —
look-and-feel comparable to the YunoHost app CI (`ci-apps.yunohost.org`). A PR comment links
back to its run and reflects the outcome.
- [ ] **D8 — Reproducible server.** The entire server (Drone, runner, comment bridge, swarm,
Traefik, dashboard, secrets wiring) is declared in the `cc-ci` repo's NixOS flake and can be
rebuilt from scratch onto a blank NixOS host following `docs/install.md`, verified by the
Adversary doing exactly that on a throwaway VM (or documenting why a full from-scratch
rebuild was infeasible and what was tested instead).
- [ ] **D9 — Documentation.** `README.md` + `docs/` explain architecture, how to enroll a recipe,
how to add/run tests locally, how to operate/rotate secrets, and how to debug a failed run.
A new engineer can enroll a recipe and get a green run using only the docs.
- [ ] **D10 — Proof (breadth).** At least **six real recipes** spanning the meaningful
categories have a full green run triggered by `!testme` on a real PR, with all three stages
(install / upgrade / backup+restore) actually exercised. The set must cover:
a stateless/simple app, a single-DB app, a multi-service app, an SSO/identity app, and an
object-storage/large-volume app. **Target set (all previously verified deployable):**
`hedgedoc` (simple), `cryptpad` (stateful, no external DB), `keycloak` + `authentik`
(SSO/identity, DB-backed), `lasuite-docs` and/or `lasuite-drive` (multi-service + S3/MinIO),
`matrix-synapse` (DB + media store), `immich` (large volumes + Postgres), `bluesky-pds`
(TLS-passthrough/atproto). Pick six that together satisfy the categories; record the chosen
set and per-recipe green-run evidence in `REVIEW.md`. Any recipe that genuinely cannot be CI'd
is a documented finding (in `DECISIONS.md`) with the reason, not a silent omission.
When all of D1D10 hold and are Adversary-verified, write `## DONE` to `STATUS.md` with the
evidence links and stop scheduling new iterations.
---
## 3. Repository layout (`git.autonomic.zone/recipe-maintainers/cc-ci`)
```
cc-ci/
├── README.md
├── flake.nix # NixOS host(s) + devshell
├── flake.lock
├── hosts/
│ └── cc-ci/
│ ├── configuration.nix # the cc-ci machine
│ └── hardware.nix
├── modules/
│ ├── drone.nix # Drone server + runner (exec/docker)
│ ├── comment-bridge.nix # !testme webhook listener service
│ ├── swarm.nix # Docker + single-node swarm + Traefik for test apps
│ ├── dashboard.nix # results overview site
│ └── secrets.nix # sops-nix / agenix wiring
├── secrets/ # sops-encrypted (*.enc / *.age); see §4.4
│ └── secrets.yaml
├── bridge/ # comment-bridge source (small Go/Python service)
├── runner/ # CI orchestration entrypoint invoked by Drone
│ ├── run_recipe_ci.py # top-level: deploy→test→teardown for a recipe@ref
│ └── harness/ # shared pytest fixtures (abra wrappers, app lifecycle)
├── dashboard/ # results UI generator (reads Drone API → static site)
├── tests/
│ ├── conftest.py # shared fixtures, recipe selection, teardown guarantees
│ ├── <recipe>/
│ │ ├── test_install.py
│ │ ├── test_upgrade.py
│ │ ├── test_backup.py
│ │ └── playwright/ # e2e flows for this recipe
│ └── _template/ # copy-to-add-a-recipe template
├── docs/
│ ├── install.md # from-scratch server build (D8)
│ ├── enroll-recipe.md # how to add a recipe (D5)
│ ├── secrets.md # secret model + rotation (D6)
│ ├── architecture.md
│ ├── runbook.md # debugging failed runs
│ └── baseline.md # bootstrap snapshot
├── STATUS.md BACKLOG.md REVIEW.md JOURNAL.md DECISIONS.md # loop state (§7)
└── .drone.yml # pipeline for cc-ci's own repo (lint/self-test)
```
---
## 4. Technical design (default architecture)
### 4.0 Domain model (where things live)
Two DNS zones, deliberately separated — do **not** conflate them:
- **`git.autonomic.zone` — source of truth for code (unchanged, not ours to reconfigure).**
The Gitea host: the enrolled recipe repos and the `cc-ci` config repo live here. The loop reads,
comments, and (when enrolling) adds a webhook here, but deploys **nothing** here. Per §9 this zone
is read/comment-only — never push recipe code, never point app DNS at it.
- **`commoninternet.net` — the CI server's own zone; everything CI-facing.** A wildcard
`*.ci.commoninternet.net` resolves to a **gateway** (not cc-ci directly — see Network path below).
Under it:
- **Apps under test:** each run deploys to a unique subdomain
`<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`, so concurrent runs never collide on a
hostname. The subdomain (app, volumes, secrets, Traefik route) is torn down at run end (§4.3).
- **Results dashboard:** `ci.commoninternet.net` — overview page + per-recipe status badges (§4.5).
- **Webhook bridge:** `ci.commoninternet.net/hook` — the Gitea `issue_comment` receiver (§4.1).
- **Network path (gateway → TLS passthrough → cc-ci).** The wildcard record does **not** point at
cc-ci's IP. It points at a gateway that **passes TLS through** to cc-ci: the gateway routes by SNI
and forwards the raw encrypted stream without decrypting it, so TLS still **terminates on cc-ci's
Traefik**. Consequences the agent must respect:
- `dig <sub>.ci.commoninternet.net` returns the **gateway's** IP, not cc-ci's — do not assert the
record points at cc-ci. Reachability is proven end-to-end (an HTTPS request lands on cc-ci),
not by comparing A records.
- The gateway is assumed to passthrough the **whole wildcard**, so a fresh per-run subdomain needs
**no gateway change** and **no cert work** (the pre-issued wildcard already covers it) — the
agent only adds the Traefik **router** on cc-ci. (If the gateway
instead needs per-host config, that's an operator/gateway concern and a `## Blocked` item, not
something the agent reconfigures — the gateway is not ours, only cc-ci is, per §9.)
- The gateway is operator-managed and out of scope; the agent configures only cc-ci.
- **Caveat for TLS-passthrough recipes** (e.g. `bluesky-pds`, §2 D10): the default path terminates
TLS at cc-ci's Traefik. A recipe that expects to terminate TLS in its own container needs cc-ci's
Traefik configured to passthrough that host too (the outer gateway already passes the whole
wildcard). Treat this as a per-recipe harness quirk to absorb (§5 M6.5), or pick a non-passthrough
recipe for that D10 category and record the swap in `DECISIONS.md` — not a silent omission.
- **Wildcard TLS — operator pre-issues, agent serves it statically (no token in the agent).**
Routing and certs are separate: the preconfigured wildcard DNS solves routing only; a cert is
still needed because the gateway passes TLS through and cc-ci's Traefik terminates it. **The cert
is pre-provisioned out-of-band** so the DNS-editing token never enters the agent/repo. A wildcard
SAN cert covering **`*.ci.commoninternet.net` + `ci.commoninternet.net`** (issued via Let's
Encrypt DNS-01 against Gandi, by the operator, using a token the agent never sees) lives on cc-ci:
- `/var/lib/ci-certs/live/fullchain.pem` (leaf+intermediate) and `…/privkey.pem`.
- The agent configures **Traefik's file provider** (`tls.certificates`, `certFile`/`keyFile`
pointing at those paths) to serve it, and runs **no ACME resolver** for this domain. One cert
covers every per-run subdomain, so spinning up an app domain needs no cert work at all.
- **Renewal is a manual operator task** (LE 90-day cert): the operator re-issues out-of-band and
drops the new files at the same paths (Traefik file provider hot-reloads). The agent must **not**
attempt ACME/DNS-01 for `commoninternet.net` and must **not** expect a DNS token — a missing/
expired cert is an operator action, surfaced as a finding, not something the agent re-issues.
(Rationale for choosing a wildcard cert over per-subdomain: a wildcard is reused for every churning
run subdomain and sidesteps LE's 50-certs/week-per-domain limit; only DNS-01 can mint a wildcard.
We keep that DNS-01 issuance with the operator rather than handing the agent the zone token.)
- Record the live facts in `docs/install.md`: the zone + DNS provider (Gandi), that the wildcard
`*.ci.commoninternet.net` (and bare `ci.commoninternet.net`) point at the **gateway**, that the
gateway TLS-passthroughs the wildcard to cc-ci, the gateway's address, the TTL, and that the
wildcard cert is pre-issued/operator-renewed at `/var/lib/ci-certs/live/` (no DNS token on cc-ci).
### 4.1 The `!testme` trigger path
Gitea does not natively forward PR-comment events to Drone, and Drone's built-in triggers fire on
push/PR-open, not on a magic comment. So:
```
PR comment "!testme"
│ Gitea webhook (issue_comment event) ──► comment-bridge (modules/comment-bridge.nix)
│ • verifies webhook HMAC secret
│ • checks comment body == "!testme" (exact, trimmed)
│ • checks commenter is allowed (org member / collaborator)
│ • resolves PR head repo + SHA via Gitea API
│ • calls Drone API: build for cc-ci pipeline,
│ params RECIPE=<repo> REF=<sha> PR=<n> SRC=<headrepo>
Drone build (cc-ci repo pipeline, parameterized) ──► runner/run_recipe_ci.py
Bridge posts/updates a Gitea PR comment with the run URL and (on completion) pass/fail.
```
- The bridge is a tiny service (Go or Python+FastAPI). Keep it dependency-light; it's a NixOS
systemd service behind Traefik at e.g. `ci.commoninternet.net/hook` (§4.0).
- Enrollment = registering the Gitea webhook on a recipe repo (script in `runner/` or documented
in `enroll-recipe.md`) + ensuring a `tests/<recipe>/` dir exists.
- Decide and record in DECISIONS.md: one shared Gitea org-level webhook vs per-repo webhooks.
Org-level is fewer moving parts; per-repo is more explicit. Default: per-repo via enroll script.
### 4.2 Drone + the test target
- Drone server connects to Gitea via OAuth app (Gitea → Settings → Applications). Runner is the
**exec runner** (or a privileged docker runner) running **on cc-ci itself**, because tests must
drive `abra` to deploy real recipes onto a real swarm.
- cc-ci doubles as the **deploy target**: single-node Docker Swarm + Traefik, abra installed,
serving the `*.ci.commoninternet.net` wildcard, TLS terminated on cc-ci's Traefik using the
**pre-issued static wildcard cert** at `/var/lib/ci-certs/live/` (§4.0). The operator preconfigures
the wildcard DNS record (→ gateway), the gateway's TLS-passthrough to cc-ci, and the cert itself
(§4.4); the agent configures Traefik (file provider → that cert) and swarm on top — **no ACME**.
- Each CI run gets an isolated app domain `<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`
(§4.0) so concurrent runs don't collide. Teardown removes app, secrets, and volumes.
- Consider a concurrency cap (12 deploys at a time) to avoid resource thrash; document it.
### 4.3 The test harness & recipe test contract
`runner/run_recipe_ci.py` orchestrates per run:
1. Fetch recipe at `$REF` (the PR head) via abra/git.
2. **Install stage**`tests/<recipe>/test_install.py`: `abra app new`, generate secrets,
`abra app deploy`, wait healthy, run Playwright smoke + assertions.
3. **Upgrade stage** → deploy previous published version first, then upgrade to `$REF`; assert
data survives and app still healthy.
4. **Backup/restore stage**`abra app backup`, mutate state, `abra app restore`, assert restored
state matches pre-mutation.
5. **Recipe-local tests (D4)** → if `<recipe-repo>/tests/` exists, discover & run it in the same
live environment; merge results.
6. **Teardown (always, even on failure)**`abra app undeploy`, `abra app volume remove`,
`abra app secret remove`, DNS/route cleanup.
Shared fixtures (`tests/conftest.py` + `runner/harness/`) wrap abra. **Known abra gotchas to bake
in from day one** (carried over from prior work, re-verify on the installed abra version):
- `abra app undeploy` and `abra app volume remove` do **not** accept `--chaos` → never pass it.
- Plumb a `timeout` kwarg through secret-generate/insert/remove-all calls.
- `abra app ls -S -m` returns nested `{server: {apps: [...]}}` — parse the inner structure.
- Pick robust health checks per app (e.g. Keycloak: `/realms/master`, not `/`).
The teardown guarantee is sacred: a failed test must never leak a deployed app or volume into the
next run. Implement teardown as a pytest fixture finalizer / `try/finally` in the orchestrator and
add a janitor pass at run start that nukes any orphaned `*-pr*` apps older than N hours.
### 4.4 Secrets (D6)
There are **two distinct classes of secret** and they are handled in opposite ways. Do not
conflate them.
**(A) Infra secrets.** All of these end up `sops-nix`-encrypted in `secrets/`, decrypt into the Nix
store at activation, and are never world-readable. But they split into two sub-classes — see §1.5
for the concrete locations/usage — and only the first sub-class blocks:
- **(A1) External inputs — provided by the operator, the loop cannot create them.** The Tailscale
auth key + Gitea bot creds (`/srv/cc-ci/.testenv`, already provided), the **pre-issued wildcard
TLS cert** at `/var/lib/ci-certs/live/` (§4.0 — *not* a DNS token; the agent serves it, never
issues it), and **registry pull creds** (if needed). If one of these is **missing or invalid, the
loop is blocked** — write it to `STATUS.md ## Blocked` and stop (§9). The agent must not invent or
work around an external input it wasn't given, and must **not** attempt ACME/DNS-01 for
`commoninternet.net`.
- **(A2) Internal secrets — the loop generates and manages these itself; never block on them.**
Drone RPC secret + webhook HMAC (`openssl rand`), the Gitea OAuth app for Drone (created via the
bot API), and the cc-ci host age/GPG key for sops. These are *not* human inputs; generate, store
in `secrets/`, and wire both ends.
Alongside these, three **preconfigured network/cert facts** are operator-provided inputs the loop
also depends on (not secrets the agent makes, but class-A in the same "provided, don't improvise"
sense): (1) the wildcard `*.ci.commoninternet.net` record (and bare `ci.commoninternet.net`) already
points at the **gateway**, (2) the gateway **TLS-passthroughs** that wildcard to cc-ci (SNI-routed,
no decryption — see §4.0 Network path), and (3) the **pre-issued wildcard cert** is in place at
`/var/lib/ci-certs/live/`. The operator owns the DNS record, the gateway, and cert issuance/renewal;
**everything else on cc-ci is the agent's job** — Traefik (pointed at the static cert), swarm,
per-run subdomain routing, and teardown. If the wildcard does not resolve, the gateway doesn't reach
cc-ci, or the cert is missing/expired, that is a `## Blocked` condition (operator action), not
something to work around (the gateway and DNS are not ours to reconfigure, per §9).
**(B) Recipe app secrets — generated by the test, persisted within the run.** These are NOT a
blocker and are NOT pre-provisioned by a human. The harness creates them itself for each app under
test and is responsible for persisting them across the run so the multi-stage lifecycle works:
- **Generate at install:** the harness runs `abra app secret generate` (+ inserts any deterministic
test fixtures like an admin password / test user it chooses) when it deploys the app.
- **Persist for the run's duration:** the *same* generated secrets must survive across stages —
install → upgrade and especially **backup → restore** — because an app cannot be upgraded or
restored against rotated credentials. Persist them in a per-run secret store keyed by the run's
unique app name (e.g. `<recipe>-pr<n>-<sha>`): the live abra/swarm secrets plus a sidecar record
the harness writes (e.g. the app's `.env` + the generated values) to a run-scoped, non-public
location on the runner, so any stage can re-read them. They are emphemeral by design.
- **Destroy at teardown:** the same teardown that removes the app/volumes also runs
`abra app secret remove` (with `timeout` plumbed) and deletes the per-run sidecar. Nothing
generated for a run outlives that run.
- **How the harness should "figure out" persistence (acceptance for D6):** decide and document one
concrete mechanism — recommended default is "abra/swarm holds the live secrets; the harness keeps
a run-scoped sidecar file under a `runs/<app-name>/` dir on the runner (mode 600), and reloads
from it between stages." Whatever is chosen, it must (1) keep the same values stable across all
three stages, (2) isolate concurrent runs from each other, and (3) leave nothing behind.
**(C) Drone CI tokens:** store as Drone org/repo secrets, referenced by the pipeline. Where a value
is an external input (A1, e.g. registry creds) it is provided; where it is internal (A2) it is
generated — see the (A) split above.
Hard rule across all classes: scrub secrets from logs before they reach the dashboard; the results
UI shows sanitized logs only. Add a redaction filter in the log pipeline and an Adversary test that
greps published logs and the overview site for known secret patterns and any generated app
password.
### 4.5 Results UX (D7) — YunoHost-CI-like
- **Per-run logs:** Drone's native UI already gives live, per-stage, tail-able logs and a final
status — use it as the canonical run view; the PR comment links to it.
- **Overview page:** a small generator (`dashboard/`) polls the Drone API and renders a static
page at `ci.commoninternet.net` (§4.0): a table of enrolled recipes, latest run status badge
(pass/fail/running), last-tested version, link to history — mirroring the YunoHost app-list
feel. Served by Traefik; regenerated on build-completion webhook or a short timer.
- Provide a status badge endpoint per recipe for embedding in recipe READMEs.
---
## 5. Milestones / initial BACKLOG
Work top-down; each milestone ends with an **Adversary gate** (Adversary must independently
verify the acceptance check before the next milestone starts). Seed `BACKLOG.md` from this.
- **M0 — Foundations.** Repo created; flake builds; `nixos-rebuild` (or deploy-rs) applies a
no-op-then-base config to cc-ci; sops decrypts a test secret on the host.
*Accept:* `ssh cc-ci 'systemctl is-system-running'` healthy after a rebuild from the repo.
- **M1 — Swarm + abra target.** Docker + single-node swarm + Traefik up; wildcard DNS + TLS;
abra can deploy and tear down a trivial recipe by hand.
*Accept:* a recipe deployed via abra is reachable over HTTPS at `*.ci.commoninternet.net`, then
fully torn down leaving no volumes.
- **M2 — Drone online.** Drone server+runner via Nix, OAuth to Gitea; a hello-world `.drone.yml`
in cc-ci runs green; logs visible in Drone UI.
*Accept:* push to cc-ci triggers a visible green Drone build.
- **M3 — Comment bridge.** `!testme` on a PR triggers a parameterized Drone build; bridge posts a
PR comment with the run link; non-`!testme` comments and non-collaborators are ignored.
*Accept:* live demo on a scratch PR — comment in, build out, link back, auth enforced.
- **M4 — Harness + install stage.** `run_recipe_ci.py` + conftest; install stage green for one
simple recipe end-to-end with a Playwright assertion; guaranteed teardown.
*Accept:* full green install run for recipe #1, no orphaned app/volume afterward.
- **M5 — Upgrade + backup/restore stages.** Add the other two stages for recipe #1.
*Accept:* upgrade preserves data; backup→mutate→restore returns original state.
- **M6 — Recipe-local tests (D4) + second recipe.** Discover/run recipe-repo `tests/`; enroll a
second, DB-backed recipe via the documented flow.
*Accept:* both recipes green; recipe-local tests demonstrably executed and merged.
- **M6.5 — Breadth ramp.** Enroll recipes 3→6 covering the remaining D10 categories, one at a
time, each via the documented enroll flow (this is the real test of D5: enrolling recipe N
should be template-copy + recipe-specific tests/fixtures, with **no harness surgery**). Expect
per-recipe quirks — multi-service deps, S3/MinIO config, SSO client setup, TLS passthrough,
large-volume backups — and absorb them into the *shared* harness, not one-off per-recipe hacks.
When flakiness appears, add real readiness/wait robustness to the harness rather than sprinkling
`sleep`s. Run benchmarks/long deploys **sequentially**, never in parallel (network contention).
*Accept:* recipes 36 each have a full three-stage green run; enrolling N≥3 needed no changes to
shared harness code.
- **M7 — Secrets hardening (D6).** Full sops model, rotation doc, log redaction + leak test.
*Accept:* Adversary's secret-grep over published logs finds nothing; rotation doc followed.
- **M8 — Dashboard (D7).** Overview page + badges + PR-comment outcome reflection.
*Accept:* overview matches reality across several runs; outcomes mirrored to PR comments.
- **M9 — Reproducibility + docs (D8/D9).** `docs/install.md` rebuilds the server from scratch on a
blank VM; all docs complete.
*Accept:* Adversary rebuilds from docs onto a throwaway host (or records the tested subset).
- **M10 — Proof (D10).** All six chosen recipes green via real `!testme` PRs (the breadth set from
M6/M6.5 carried through the hardened pipeline), each with install/upgrade/backup-restore
exercised and Adversary-verified; flip `STATUS.md` to DONE.
---
## 6. The two agents
### Builder (primary)
Implements the backlog top-down. Discipline:
- One backlog item in flight at a time. Small, committed, reversible steps.
- Every change verified against the *real* system (server, Drone, Gitea) before claiming done —
never "should work". Paste the verifying command + output into `JOURNAL.md`.
- Touch production carefully: cc-ci is the only target; never deploy test apps onto unrelated
production servers; never reuse production domains. Idempotent server changes only (via Nix).
- If blocked on access/secrets/external state, write it to `STATUS.md ## Blocked` and pick up an
unblocked item rather than hacking around it.
### Adversary (reviewer)
Runs as a **separate, independent loop in its own process/sandbox** (see §6.1 for how the two
loops coordinate). Its job is to **disbelieve**. It:
- Re-verifies each `Definition of Done` and milestone-acceptance claim independently, from a cold
start (fresh shell, own clone, no cached state), and logs PASS/FAIL + evidence in `REVIEW.md`.
- Actively tries to break things: comment `!testmexyz` (should NOT trigger), comment as a
non-collaborator (should be rejected), push a PR that fails tests (must report red, not green),
kill an app mid-run (teardown must still clean up), grep published logs/dashboard for secrets,
run two `!testme`s concurrently (no domain/volume/secret collision), confirm the same generated
app secrets persist across install→upgrade→backup/restore.
- Files every defect as a `BACKLOG.md` item tagged `[adversary]` with repro steps. The Builder
may not close an adversary item; only the Adversary closes it after re-test.
- Has veto power over `STATUS.md → DONE`.
### 6.1 Coordination protocol (two independent loops, one shared repo)
The two loops never talk directly; the **git repo is the only coordination medium**. Each agent
has its own clone (e.g. Builder in `/srv/cc-ci/cc-ci`, Adversary in `/srv/cc-ci/cc-ci-adv`) and
its own pacing. To make concurrent writes conflict-free:
- **File ownership (one writer each — the other only reads):**
- Builder owns: all source code/config, `STATUS.md`, `JOURNAL.md`, `DECISIONS.md`.
- Adversary owns: `REVIEW.md`.
- `BACKLOG.md` is split into two H2 sections — `## Build backlog` (Builder-only) and
`## Adversary findings` (Adversary-only). Each agent edits **only its own section**, so git
merges the two cleanly. Closing an item = checking the box *in your own section*; the Builder
fixes an `[adversary]` finding and notes the fix in JOURNAL, but only the Adversary ticks it
closed after re-test.
- **Append-only where possible.** `JOURNAL.md` and `REVIEW.md` are append-only logs → they never
conflict. Prefer appending over rewriting.
- **Git discipline (both loops, every write):** `git pull --rebase` before editing, make the
smallest change, commit, `git push`. On a rebase conflict, it will be inside the *other* agent's
file/section only if a rule was broken — re-pull and keep to your own files. Never `--force`.
- **Gate handshake via STATUS.md.** When the Builder believes a milestone gate is met, it sets in
`STATUS.md`: `Gate: <Mn> — CLAIMED, awaiting Adversary` and stops advancing past it. The
Adversary, on its next wake, sees the claim, runs the acceptance check cold, and writes the
verdict to `REVIEW.md` (`<Mn>: PASS @<ts>` with evidence, or `FAIL` + an `[adversary]` item).
The Builder only proceeds past the gate after seeing `PASS` in `REVIEW.md`.
- **DONE handshake.** Builder may write `## DONE` to `STATUS.md` **only** when `REVIEW.md` shows a
PASS dated within 24h for every D1D10. The Adversary can write `## VETO <reason>` to
`REVIEW.md` at any time, which forbids DONE until cleared.
- **Liveness.** If the Adversary sees a gate `CLAIMED` for too long with no Builder progress, or
the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep
doing independent work — neither loop blocks idle waiting on the other beyond its gate.
(If you are ever forced to run with a single process, the degraded fallback is to alternate
roles per iteration and keep `JOURNAL.md` and `REVIEW.md` strictly separate — but two loops is
the intended design.)
---
## 7. The Loop Protocol
Both loops run this same shape; state lives in the repo so it survives restarts/compaction. On
every wake, `git pull --rebase` first, then:
1. **Orient.** Read `STATUS.md` (phase, in-flight item, gate claims, blockers), `BACKLOG.md`, and
the tail of `REVIEW.md`. Reconcile with reality via cheap probes (Drone health, last build,
`git log`) — never trust the ledger blindly; if it disagrees with the system, fix the ledger
first (your own files only — see §6.1).
2. **Select.**
- *Builder:* highest-priority open item in `## Build backlog`: unresolved `[adversary]`
findings > current milestone's next task > next milestone. Never advance past a milestone gate
until `REVIEW.md` shows its PASS.
- *Adversary:* any standing `Gate: <Mn> CLAIMED` in `STATUS.md` to verify > re-verify a D1D10
gate whose last PASS is stale (>24h) > a fresh break-it probe from §6.
3. **Act.** Smallest change that advances the item. Builder verifies against the real system;
Adversary verifies from a cold start. Commit with a clear message (author per repo convention).
4. **Record (your own files only).** *Builder:* append to `JOURNAL.md` (what you did + verifying
command/output + next), update `STATUS.md`, tick `## Build backlog`. *Adversary:* append PASS/
FAIL + evidence to `REVIEW.md`, add/close items in `## Adversary findings`. Then `git push`.
5. **Gate handshake (§6.1).** Builder, on reaching a milestone, sets `Gate: <Mn> CLAIMED, awaiting
Adversary` in `STATUS.md` and works on other unblocked items meanwhile. Adversary clears it with
a `REVIEW.md` verdict. No gate is "passed" without a logged PASS.
6. **Decide continuation.** Builder writes `## DONE` only when `REVIEW.md` shows a <24h PASS for
every D1D10 and no standing `## VETO`. Otherwise schedule the next wake.
**Pacing.** Use `/loop` (self-paced) or `ScheduleWakeup`. Most waits here are for things the
harness can't notify you about — a Drone build, a `nixos-rebuild`, a deploy converging — so poll
the *specific* thing: while a build/deploy is in flight, re-check on a short cadence (≈4 min) to
stay cache-warm; when genuinely idle between iterations, sleep longer (2030 min). Don't burn
iterations spinning on a build that takes minutes.
**Anti-drift guards.**
- Cap retries: if an approach fails 3× the same way, stop, write the dead-end in `DECISIONS.md`,
and try a different approach or mark blocked. No thrashing.
- Never weaken a test to make it pass. A red test is information; "fix" the recipe/harness or file
a finding — do not delete the assertion. (This is the single most important rule; the Adversary
watches specifically for tests being softened or skipped.)
- Keep changes reversible; prefer Nix-declared state over imperative server edits so any rebuild
reproduces it.
- Don't expand scope beyond §2. New ideas → `BACKLOG.md` (tagged `[idea]`), not into this run.
---
## 8. Open decisions to settle early (log in DECISIONS.md)
- Deploy mechanism: `nixos-rebuild --target-host` vs `deploy-rs`/`colmena`. (Default: deploy-rs
for atomic rollbacks; nixos-rebuild fine if simpler.)
- Webhook scope: per-repo vs org-level Gitea webhook. (Default: per-repo via enroll script.)
- Drone runner type: exec vs privileged docker. (Default: exec, since it must drive host abra.)
- Secret tool: sops-nix vs agenix. (Default: sops-nix for multi-recipient + yaml ergonomics.)
- Wildcard TLS: **SETTLED — operator pre-issues a wildcard cert; the agent serves it statically, no
token** (§4.0). The operator issued a wildcard SAN cert (`*.ci.commoninternet.net` +
`ci.commoninternet.net`) via LE DNS-01/Gandi out-of-band and placed it at
`/var/lib/ci-certs/live/`; the agent configures Traefik's file provider to serve it and runs no
ACME for this domain. Chosen so the DNS-editing token never enters the repo/agent. **Manual
renewal** every ~90 days (next ~2026-08-24) — operator re-issues and replaces the files in place.
- Proof recipe set (D10 — six, category-spanning). Default candidates, all previously verified
deployable: `hedgedoc`, `cryptpad`, `keycloak`, `authentik`, `lasuite-docs`/`lasuite-drive`,
`matrix-synapse`, `immich`, `bluesky-pds`. Lock the final six early so M4M6.5 build toward them.
Sequence easy→hard: prove the pipeline on `hedgedoc`/`cryptpad` before tackling SSO, S3, media
stores, and TLS-passthrough recipes.
Each default stands until the Adversary or reality forces a change; record the change and why.
---
## 9. Guardrails / hard rules
- **Access boundary:** only cc-ci is yours to reconfigure. Recipe repos: read + comment + (when
enrolling) add a webhook — nothing else. Never push to a recipe repo's code.
- **No secrets in git/logs/UI.** Ever. Verified by the Adversary's leak test.
- **No mocks for the e2e stages.** D2 means real deploys. If something can't be tested for real,
it's a finding, not a pass.
- **Idempotent + reversible.** Anything done to the server must be re-derivable from the repo.
- **Stop on missing *external* infra inputs** (class-A1 in §4.4: cc-ci SSH/root access, the
Tailscale auth key, Gitea bot creds, the pre-issued wildcard cert at `/var/lib/ci-certs/live/`,
registry creds — and the preconfigured DNS/gateway facts) rather than improvising around them;
surface in `STATUS.md ## Blocked`. **Never** attempt ACME/DNS-01 for `commoninternet.net` — the
cert is pre-provided and renewed out-of-band by the operator. **This does NOT apply to** internal infra secrets (class-A2: Drone RPC,
webhook HMAC, Gitea OAuth app, host age key — the agent generates these) or to recipe app secrets
(class-B): those the test harness generates itself (`abra app secret generate` + chosen fixtures),
persists for the run, and destroys at teardown — a missing app secret is never a blocker, it is
something the harness
creates. See §4.4.
- **Honest reporting.** If a stage is skipped or a check failed, say so in `STATUS.md`/`JOURNAL.md`
with the output. The loop's value depends entirely on the ledgers being true.

View File

@ -0,0 +1,19 @@
You are the Adversary agent for cc-ci — one of two independent loops. Your job is to DISBELIEVE the Builder. Read /srv/cc-ci/cc-ci-plan/plan.md in full, especially §2, §6, §6.1, and §9.
Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Pace yourself: poll short (~4m) while watching a CLAIMED gate or a running build; sleep 2030m when idle. Keep running independent break-it probes even when no gate is pending. Stop only when STATUS.md says ## DONE and you have logged a fresh PASS for every D1D10.
Credentials/access: §1.5 is the authoritative map. Provided creds are in /srv/cc-ci/.testenv and ~/.ssh; reach cc-ci with `ssh cc-ci` (root, via the userspace-tailscaled SOCKS proxy on 127.0.0.1:1055), and hit the dashboard / *.ci.commoninternet.net through that proxy (`curl --proxy socks5h://localhost:1055 ...`). If the proxy is down, restart it per §1.5. Verify from a COLD START but you may rely on this shared access path.
You run as a SEPARATE process and coordinate ONLY through the git repo per §6.1:
- Keep your OWN clone at /srv/cc-ci/cc-ci-adv. If the repo doesn't exist yet, wait and retry on your next wake — the Builder creates it during §1 Bootstrap.
- git pull --rebase before every edit; commit; push; never --force.
- Write ONLY your files: REVIEW.md and the "## Adversary findings" section of BACKLOG.md. Everything else (code, STATUS.md, JOURNAL.md, "## Build backlog") is read-only to you.
Each wake:
1. Pull. Read STATUS.md for any "Gate: <Mn> CLAIMED, awaiting Adversary".
2. Verify claims from a COLD START (fresh shell, your own clone, no cached state). Re-run the milestone/D-gate acceptance check yourself; do not trust the Builder's word.
3. Actively try to break things: !testmexyz must NOT trigger; non-collaborator comments rejected; a failing PR must report RED; killing an app mid-run still leaves clean teardown; published logs AND the dashboard contain no secrets (incl. generated app passwords); two concurrent !testme runs don't collide on domain/volume/secrets; the SAME generated app secrets persist across install → upgrade → backup/restore.
4. Record verdicts in REVIEW.md ("<Mn>: PASS @<ts>" + evidence, or FAIL). File each defect as a "## Adversary findings" item tagged [adversary] with repro steps. Only YOU close those, after re-test. You hold veto power: write "## VETO <reason>" to REVIEW.md to forbid DONE until cleared.
5. Push. Schedule the next wake.
Begin: read /srv/cc-ci/cc-ci-plan/plan.md, then enter the self-paced loop (start by cloning the repo to /srv/cc-ci/cc-ci-adv if it exists yet).

View File

@ -0,0 +1,25 @@
You are the Builder agent for the cc-ci project — one of two independent loops. Your job is to build a Co-op Cloud recipe CI server, working autonomously over multiple days.
Single source of truth: /srv/cc-ci/cc-ci-plan/plan.md. Read it in full now, then begin at §1 Bootstrap. The original brief /srv/cc-ci/cc-ci-plan/brief.md is context only — do not edit it.
Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work (see §7). Pace per §7: poll ~4m while a build/deploy/rebuild is in flight to stay cache-warm; sleep 2030m when genuinely idle or parked at a gate. Do NOT spin on a build that takes minutes. Stop the loop only when STATUS.md says ## DONE.
You run as a SEPARATE process from the Adversary loop and coordinate ONLY through the git repo per §6.1:
- git pull --rebase before every edit; make the smallest change; commit; git push. Never --force.
- Write ONLY your files: source/config, STATUS.md, JOURNAL.md, DECISIONS.md, and the "## Build backlog" section of BACKLOG.md. Treat REVIEW.md and "## Adversary findings" as read-only — the Adversary owns them.
- At each milestone gate, set "Gate: <Mn> CLAIMED, awaiting Adversary" in STATUS.md and work other unblocked items; do NOT advance past the gate until REVIEW.md shows its PASS.
- Write "## DONE" only when REVIEW.md shows a PASS dated <24h for every D1D10 and there is no standing "## VETO".
Overriding rules:
- "Done" is defined ONLY by §2 (D1D10), Adversary-verified. No self-certifying.
- Verify every change against the real server/Drone/Gitea; paste command + output into JOURNAL.md. No "should work."
- Never weaken, skip, or delete a test to make a run pass. A red test is information.
- Only cc-ci is yours to reconfigure. Never push code to recipe repos; never touch production servers/domains. Keep server state Nix-declared and reversible.
- 3rd identical failure stop, record dead-end in DECISIONS.md, change approach or mark blocked.
- Credentials: §1.5 is the authoritative map. Provided creds are in /srv/cc-ci/.testenv (TS_AUTH_KEY, GITEA_USERNAME/PASSWORD/URL) and ~/.ssh (cc-ci-root-ed25519). Reach cc-ci with `ssh cc-ci` (root, via the userspace-tailscaled SOCKS proxy on 127.0.0.1:1055); if it fails, restart the proxy per §1.5 before declaring blocked. There is NO ready-made $GITEA_TOKEN mint one from the bot creds if you want a token.
- Secret classes 4.4), handled differently:
Class A1 EXTERNAL infra inputs (cc-ci SSH/root access, TS auth key, Gitea bot creds, the pre-issued wildcard TLS cert at /var/lib/ci-certs/live/, registry creds; plus the preconfigured DNS/gateway facts): if missing/invalid STATUS.md ## Blocked and stop. Do NOT improvise/invent. NEVER attempt ACME/DNS-01 for commoninternet.net the cert is pre-provided and renewed out-of-band; point Traefik's file provider at /var/lib/ci-certs/live/{fullchain.pem,privkey.pem}.
Class A2 INTERNAL infra secrets (Drone RPC, webhook HMAC, Gitea OAuth app, host age key): you GENERATE these yourself never block on them.
Class B RECIPE APP secrets: NOT a blocker. The harness generates them (abra app secret generate + chosen fixtures), persists them per-run so the SAME values survive install upgrade backup/restore, and destroys them at teardown.
Begin: read /srv/cc-ci/cc-ci-plan/plan.md, then execute §1 Bootstrap, then enter the self-paced loop.

View File

@ -0,0 +1 @@
/srv/recipe-maintainer/