launch.sh watchdog now runs a fast (~30s) handoff_check alongside the heavy (300s) restart/DONE check: when the Builder writes a CLAIMED gate it pings the Adversary to verify now; when the Adversary updates REVIEW.md it pings the Builder to proceed (edge-triggered, reads local clones). So a pending handoff resolves in <~30s instead of a whole idle interval. Pacing revised: the Adversary may idle freely when nothing's pending (no pointless re-verify/busy-poll) and is woken by the watchdog; Builder waits on the ping + a fallback ~2-4m self-poll. kickoff documents the new "handoff signalling" role. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
52 KiB
cc-ci — Co-op Cloud Recipe CI Server (Autonomous Build Plan)
Status: ACTIVE — autonomous loop
Owner agent: Builder (primary) + Adversary (reviewer)
Source brief: brief.md (do not edit; this file supersedes it)
This file's canonical path: /srv/cc-ci/cc-ci-plan/plan.md
Target server: cc-ci (NixOS)
Code/config home: git.autonomic.zone/recipe-maintainers/cc-ci (the CI project repo — distinct from this
/srv/cc-ci/cc-ci-plan/ planning+launch folder)
Last updated: keep current via STATUS.md (see §7)
0. How to read this document
This plan is written to be handed to an autonomous Claude agent running in a sandbox over several days, driving itself in a loop until the CI server is "done" per §2. A second agent (the Adversary) independently tries to disprove every "done" claim. Neither agent is trusted to mark its own work complete.
If you are an agent waking up into this loop for the first time, go straight to §1 Bootstrap.
On every subsequent wake, go to §7 The Loop Protocol and continue from STATUS.md.
The rest of the document (§3–§6) is the technical design. Treat it as the default architecture,
but you are allowed to revise it when reality disagrees — record any deviation in DECISIONS.md
with a one-line rationale.
1. Bootstrap (first wake only)
Do these in order. Each step is idempotent; re-running is safe.
-
Verify access. (Full credential map + how each is used is in §1.5 — read it first.)
ssh cc-ci 'hostname && whoami'— you log in as root on cc-ci (NixOS), so there is no separate sudo step.ssh cc-ciis preconfigured to tunnel through the userspace-tailscaled SOCKS proxy (§1.5); if it fails, the proxy/daemon is probably down — restart it (§1.5) before declaring blocked.ssh cc-ci 'nixos-version'— confirm NixOS.- Confirm you can reach the Gitea API with the bot creds from
.testenv(§1.5):curl -s https://$GITEA_URL/api/v1/version. The bot authenticates withGITEA_USERNAME/GITEA_PASSWORD(basic auth) or a token you mint from them viaPOST /api/v1/users/<user>/tokens— do not expect a ready-made$GITEA_TOKEN. - Confirm the preconfigured test-app DNS (§4.0/§4.4): a random subdomain under the wildcard
resolves, e.g.
getent hosts probe-$RANDOM.ci.commoninternet.netreturns the gateway's IP (not cc-ci's — the gateway TLS-passthroughs to cc-ci, so do not expect cc-ci's address; and usegetent, notdig, since this host's resolver is Tailscale-only — see §1.5). Traefik is not up yet — you deploy it at M1 (the realcoop-cloud/traefikrecipe via abra, wildcard/file-provider mode → the pre-issued cert at/var/lib/ci-certs/live/, no ACME); the DNS record + gateway passthrough + cert are the preconditions, and full end-to-end HTTPS reachability is proven at M1, not now. If the wildcard does not resolve at all, that's a## Blockeditem (operator fixes DNS/gateway). - If any check fails, write the failure to
STATUS.mdunder## Blockedand stop — a human must fix access. Do not try to work around missing access.
-
Create the
cc-cirepo on git.autonomic.zone if it does not exist. Push an initial skeleton (see §3 layout). The Builder clones to/srv/cc-ci/cc-ci; the Adversary loop keeps its own independent clone at/srv/cc-ci/cc-ci-adv. The repo is the only channel between the two loops (§6.1) — loop state lives inside it (STATUS.md,BACKLOG.md, etc.). -
Snapshot the starting environment into
cc-ci/docs/baseline.md: current NixOS config on the server (/etc/nixosor existing flake), installed packages, whether Docker/Swarm/abra already exist, DNS that already points at the box. This is the rollback reference. -
Seed the loop state files (§7) if absent:
STATUS.md,BACKLOG.md,REVIEW.md,JOURNAL.md,DECISIONS.md. GiveBACKLOG.mdtwo H2 sections —## Build backlog(populated from §5 milestones) and## Adversary findings(empty) — per the single-writer rule in §6.1. -
Commit ("chore: bootstrap cc-ci loop state") and begin the loop at §7.
1.5 Credentials & access — where everything lives and how to use it
The loops run on the sandbox host (not on cc-ci) and reach cc-ci over Tailscale. This section is the authoritative map of what credentials exist, where, and how to use them. Never copy any secret value into the repo, a commit, a log, or the dashboard (§9) — reference locations only.
Provided credentials (already in place)
| What | Where | How to use |
|---|---|---|
Tailscale auth key (joins cc-ci's tailnet taila4a0bf.ts.net) |
/srv/cc-ci/.testenv → TS_AUTH_KEY (Tailscale SaaS key, keyID ends CNTRL) |
Used to bring up the userspace tailscaled (below). It's reusable; re-run tailscale up with it if the node drops. |
| cc-ci SSH (root) | private key ~/.ssh/cc-ci-root-ed25519; config Host cc-ci in ~/.ssh/config |
Just run ssh cc-ci (logs in as root). The pubkey is already in cc-ci's /root/.ssh/authorized_keys. |
| Gitea bot account | /srv/cc-ci/.testenv → GITEA_USERNAME (autonomic-bot), GITEA_PASSWORD, GITEA_URL (git.autonomic.zone) |
Basic-auth to the Gitea API, or mint a scoped token: POST https://$GITEA_URL/api/v1/users/$GITEA_USERNAME/tokens. Used to push the cc-ci project repo, read recipe repos, comment on PRs, and poll for !testme (read-level; the bot does not register webhooks). |
Load them in a shell with: set -a; . /srv/cc-ci/.testenv; set +a (don't echo the values).
The Tailscale connection (how ssh cc-ci and the proxy work)
cc-ci (cc-nix-test, 100.90.116.4) is on a different tailnet than the sandbox host's default
one, so it is reached via a second, userspace tailscaled — this keeps the host's own tailnet
untouched. State lives in ~/.cc-ci-ts/; it exposes a SOCKS5/HTTP proxy on 127.0.0.1:1055,
which is the only route to that tailnet (userspace networking ⇒ the host OS can't route the tailnet
IPs directly).
It runs as a persistent systemd service (cc-ci-tailscaled.service, enabled, Restart=always,
starts on boot; unit at /etc/systemd/system/cc-ci-tailscaled.service, runs as user notplants).
It reuses the already-authenticated state in ~/.cc-ci-ts/, so it reconnects across reboots/crashes
without the auth key.
ssh cc-ciworks out of the box (itsProxyCommanduses the proxy; logs in as root).- For HTTP(S) to cc-ci /
*.ci.commoninternet.netfrom the sandbox, go through the proxy, e.g.curl --proxy socks5h://localhost:1055 https://<app>.ci.commoninternet.net. - If connectivity is down:
sudo systemctl restart cc-ci-tailscaled(diagnose withsystemctl status cc-ci-tailscaled/journalctl -u cc-ci-tailscaled). A dead proxy is an access failure to recover, not a## Blocked-and-stop condition — unless the auth key itself is rejected (then re-auth withtailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock up --auth-key="$TS_AUTH_KEY" --hostname=cc-ci-claude-sandbox --accept-routes --accept-dns=false, and if that fails the key is a class-A1 blocker). - DNS gotcha: this host's
/etc/resolv.conflists only Tailscale resolvers, so directdig @1.1.1.1 …queries get no answer and look falsely empty. Usegetent hosts <name>to resolve from the sandbox.commoninternet.netitself is a normal public zone hosted at Gandi.
Credentials the loop GENERATES itself (do not wait on a human for these)
- Drone RPC secret and webhook HMAC secret — generate (
openssl rand -hex 32), store sops-encrypted insecrets/, and wire both ends. Internal shared secrets, not human inputs. - Gitea OAuth app for Drone — create it under the bot account via the API
(
POST /api/v1/user/applications/oauth2); capture client id/secret intosecrets/. - cc-ci host age/GPG key for sops — generate on the host (or derive from its SSH host key); add as a sops recipient. Keep a recovery copy of the master age identity off-box if desired.
- Per-recipe app secrets (class-B, §4.4) — the harness generates these per run.
Credentials STILL NEEDED from the operator (class-A — block if missing, per §9)
- Wildcard TLS cert — PROVIDED, not a token. The operator has pre-issued the wildcard SAN cert
(
*.ci.commoninternet.net+ci.commoninternet.net) and placed it on cc-ci at/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}(§4.0). The agent feeds these into thecoop-cloud/traefikrecipe as itsssl_cert/ssl_keyswarm secrets (wildcard/file-provider mode) and runs no ACME for this domain. Do not request or expect acommoninternet.netDNS token — issuance/renewal is handled out-of-band by the operator (LE 90-day cert; next renewal ~2026-08-24). A missing/expired cert is a finding for the operator, not an agent re-issue. - Registry pull credentials (e.g. Docker Hub) — recommended to avoid anonymous pull-rate
limits breaking deploys under load. Treat a rate-limit failure traced to this as a finding, then
request creds. Store sops-encrypted in
secrets/. - Gitea bot permissions (a grant, not a secret) — least privilege: read, not admin. The bot
needs: write on its own
recipe-maintainers/cc-ciproject repo; read + comment on the recipe repos under test; and org membership inrecipe-maintainers(read-level — used both to authorize commenters via the members endpoint and to read members). It does not need repo-admin and does not register webhooks (that's an optional manual admin task, §4.1). If a needed grant is missing, that's a## Blockeditem for the operator.
2. Definition of Done (the loop's exit condition)
The loop terminates only when every item below is true and the Adversary has independently
re-verified each one within the last 24h (logged in REVIEW.md with timestamps and command
output). Partial credit does not count.
- D1 — Trigger. Commenting
!testmeon any open PR in any enrolled recipe repo on git.autonomic.zone starts a CI run for the code at that PR's head commit within 60s. Other comments do not. Re-commenting re-runs. - D2 — Test matrix. For a recipe under test, the run executes, as separate reported stages: new install, upgrade (previous published version → PR version), and backup + restore. All are genuine end-to-end against a really-deployed recipe (real containers, real Traefik routing, real volumes) — no mocks, no stubs.
- D3 — Python + Playwright. Tests are Python. Functional assertions that require a browser use Playwright against the live deployed app.
- D4 — Recipe-local tests. If the recipe repo contains its own
tests/folder, those tests are also discovered and run as part of the same CI run, with results merged in. - D5 — Per-recipe test tree. The cc-ci repo holds
tests/<recipe>/with the install/upgrade/backup tests as Python files, plus a shared harness. Adding a new recipe is a documented, small, repeatable operation. - D6 — Secrets. App + infra secrets are handled reproducibly (committed encrypted, decrypted on the server), documented, and rotatable. No plaintext secrets in git, logs, or the results UI.
- D7 — Results UX. Each run has a stable URL with live, tail-able logs per stage and a
final pass/fail; there is an overview page listing recipes with their latest status —
look-and-feel comparable to the YunoHost app CI (
ci-apps.yunohost.org). A PR comment links back to its run and reflects the outcome. - D8 — Reproducible server. The entire server (Drone, runner, comment bridge, swarm,
Traefik, dashboard, secrets wiring) is declared in the
cc-cirepo's NixOS flake and can be rebuilt from scratch onto a blank NixOS host followingdocs/install.md, verified by the Adversary doing exactly that on a throwaway VM (or documenting why a full from-scratch rebuild was infeasible and what was tested instead). - D9 — Documentation.
README.md+docs/explain architecture, how to enroll a recipe, how to add/run tests locally, how to operate/rotate secrets, and how to debug a failed run. A new engineer can enroll a recipe and get a green run using only the docs. - D10 — Proof (breadth). At least six real recipes spanning the meaningful
categories have a full green run triggered by
!testmeon a real PR, with all three stages (install / upgrade / backup+restore) actually exercised. The set must cover: a stateless/simple app, a single-DB app, a multi-service app, an SSO/identity app, and an object-storage/large-volume app. Target set (all previously verified deployable):hedgedoc(simple),cryptpad(stateful, no external DB),keycloak+authentik(SSO/identity, DB-backed),lasuite-docsand/orlasuite-drive(multi-service + S3/MinIO),matrix-synapse(DB + media store),immich(large volumes + Postgres),bluesky-pds(TLS-passthrough/atproto). Pick six that together satisfy the categories; record the chosen set and per-recipe green-run evidence inREVIEW.md. Any recipe that genuinely cannot be CI'd is a documented finding (inDECISIONS.md) with the reason, not a silent omission. Recipe availability: the testable repos live on the private mirrorgit.autonomic.zone/recipe-maintainers/<recipe>(already mirrored as of bootstrap:bluesky-pds,cryptpad,keycloak,lasuite-docs,lasuite-meet,matrix-synapse,n8n,custom-html,custom-html-tiny). Any recipe not yet mirrored (e.g.hedgedoc,authentik,immich,lasuite-drive) is pulled from upstream git.coopcloud.tech and created on the mirror via the recipe mirror+PR flow (§4.1) — so the target set is not capped by what currently exists. If the chosen simple/stateless app isn't mirrored,custom-html/custom-html-tinyalready are.
When all of D1–D10 hold and are Adversary-verified, write ## DONE to STATUS.md with the
evidence links and stop scheduling new iterations.
3. Repository layout (git.autonomic.zone/recipe-maintainers/cc-ci)
cc-ci/
├── README.md
├── flake.nix # NixOS host(s) + devshell
├── flake.lock
├── hosts/
│ └── cc-ci/
│ ├── configuration.nix # the cc-ci machine
│ └── hardware.nix
├── modules/
│ ├── drone.nix # Drone server + runner (exec/docker)
│ ├── comment-bridge.nix # !testme webhook listener service
│ ├── swarm.nix # Docker + single-node swarm + `proxy` net; deploys the
│ │ # coop-cloud/traefik recipe via abra (wildcard/file-provider, §4.2)
│ ├── dashboard.nix # results overview site
│ └── secrets.nix # sops-nix / agenix wiring
├── secrets/ # sops-encrypted (*.enc / *.age); see §4.4
│ └── secrets.yaml
├── bridge/ # comment-bridge source (small Go/Python service)
├── runner/ # CI orchestration entrypoint invoked by Drone
│ ├── run_recipe_ci.py # top-level: deploy→test→teardown for a recipe@ref
│ └── harness/ # shared pytest fixtures (abra wrappers, app lifecycle)
├── dashboard/ # results UI generator (reads Drone API → static site)
├── tests/
│ ├── conftest.py # shared fixtures, recipe selection, teardown guarantees
│ ├── <recipe>/
│ │ ├── test_install.py
│ │ ├── test_upgrade.py
│ │ ├── test_backup.py
│ │ └── playwright/ # e2e flows for this recipe
│ └── _template/ # copy-to-add-a-recipe template
├── docs/
│ ├── install.md # from-scratch server build (D8)
│ ├── enroll-recipe.md # how to add a recipe (D5)
│ ├── secrets.md # secret model + rotation (D6)
│ ├── architecture.md
│ ├── runbook.md # debugging failed runs
│ └── baseline.md # bootstrap snapshot
├── STATUS.md BACKLOG.md REVIEW.md JOURNAL.md DECISIONS.md # loop state (§7)
└── .drone.yml # pipeline for cc-ci's own repo (lint/self-test)
4. Technical design (default architecture)
4.0 Domain model (where things live)
Two DNS zones, deliberately separated — do not conflate them:
git.autonomic.zone— source of truth for code (unchanged, not ours to reconfigure). The Gitea host: the enrolled recipe repos and thecc-ciconfig repo live here. The loop reads, comments, and (when enrolling) adds a webhook here, but deploys nothing here. Per §9 this zone is read/comment-only — never push recipe code, never point app DNS at it.commoninternet.net— the CI server's own zone; everything CI-facing. A wildcard*.ci.commoninternet.netresolves to a gateway (not cc-ci directly — see Network path below). Under it:- Apps under test: each run deploys to a unique subdomain
<recipe>-pr<n>-<short-sha>.ci.commoninternet.net, so concurrent runs never collide on a hostname. The subdomain (app, volumes, secrets, Traefik route) is torn down at run end (§4.3). - Results dashboard:
ci.commoninternet.net— overview page + per-recipe status badges (§4.5). - Webhook bridge:
ci.commoninternet.net/hook— the Giteaissue_commentreceiver (§4.1).
- Apps under test: each run deploys to a unique subdomain
- Network path (gateway → TLS passthrough → cc-ci). The wildcard record does not point at
cc-ci's IP. It points at a gateway that passes TLS through to cc-ci: the gateway routes by SNI
and forwards the raw encrypted stream without decrypting it, so TLS still terminates on cc-ci's
Traefik. Consequences the agent must respect:
dig <sub>.ci.commoninternet.netreturns the gateway's IP, not cc-ci's — do not assert the record points at cc-ci. Reachability is proven end-to-end (an HTTPS request lands on cc-ci), not by comparing A records.- The gateway is assumed to passthrough the whole wildcard, so a fresh per-run subdomain needs
no gateway change and no cert work (the pre-issued wildcard already covers it) — the
agent only adds the Traefik router on cc-ci. (If the gateway
instead needs per-host config, that's an operator/gateway concern and a
## Blockeditem, not something the agent reconfigures — the gateway is not ours, only cc-ci is, per §9.) - The gateway is operator-managed and out of scope; the agent configures only cc-ci.
- Caveat for TLS-passthrough recipes (e.g.
bluesky-pds, §2 D10): the default path terminates TLS at cc-ci's Traefik. A recipe that expects to terminate TLS in its own container needs cc-ci's Traefik configured to passthrough that host too (the outer gateway already passes the whole wildcard). Treat this as a per-recipe harness quirk to absorb (§5 M6.5), or pick a non-passthrough recipe for that D10 category and record the swap inDECISIONS.md— not a silent omission.
- Wildcard TLS — operator pre-issues, agent serves it statically (no token in the agent).
Routing and certs are separate: the preconfigured wildcard DNS solves routing only; a cert is
still needed because the gateway passes TLS through and cc-ci's Traefik terminates it. The cert
is pre-provisioned out-of-band so the DNS-editing token never enters the agent/repo. A wildcard
SAN cert covering
*.ci.commoninternet.net+ci.commoninternet.net(issued via Let's Encrypt DNS-01 against Gandi, by the operator, using a token the agent never sees) lives on cc-ci:/var/lib/ci-certs/live/fullchain.pem(leaf+intermediate) and…/privkey.pem.- Traefik is the real
coop-cloud/traefikrecipe, deployed via abra (for e2e fidelity — see §4.2), run in its wildcard / file-provider mode (WILDCARDS_ENABLED=1+compose.wildcard.yml). The pre-issued cert is supplied as the recipe'sssl_cert/ssl_keyswarm secrets (sourced from the files above); the recipe's file provider then serves it undertls.certificates. No ACME resolver / no DNS provider is enabled — only the cert+key reach cc-ci, never the DNS token. One cert covers every per-run subdomain (matched by SNI), so a new app domain needs no cert work. - Renewal is a manual operator task (LE 90-day cert): the operator re-issues out-of-band, then
updates the
ssl_cert/ssl_keysecret (bump its version) and redeploys traefik. The agent must not attempt ACME/DNS-01 forcommoninternet.netand must not expect a DNS token — a missing/expired cert is an operator action surfaced as a finding, not something the agent re-issues. (Rationale for choosing a wildcard cert over per-subdomain: a wildcard is reused for every churning run subdomain and sidesteps LE's 50-certs/week-per-domain limit; only DNS-01 can mint a wildcard. We keep that DNS-01 issuance with the operator rather than handing the agent the zone token.)
- Record the live facts in
docs/install.md: the zone + DNS provider (Gandi), that the wildcard*.ci.commoninternet.net(and bareci.commoninternet.net) point at the gateway, that the gateway TLS-passthroughs the wildcard to cc-ci, the gateway's address, the TTL, and that the wildcard cert is pre-issued/operator-renewed at/var/lib/ci-certs/live/(no DNS token on cc-ci).
4.1 The !testme trigger path
Gitea does not natively forward PR-comment events to Drone, and Drone's built-in triggers fire on push/PR-open, not on a magic comment. So:
PR comment "!testme"
│ Gitea webhook (issue_comment event) ──► comment-bridge (modules/comment-bridge.nix)
│ • verifies webhook HMAC secret
│ • checks comment body == "!testme" (exact, trimmed)
│ • checks commenter is allowed (org member / collaborator)
│ • resolves PR head repo + SHA via Gitea API
│ • calls Drone API: build for cc-ci pipeline,
│ params RECIPE=<repo> REF=<sha> PR=<n> SRC=<headrepo>
▼
Drone build (cc-ci repo pipeline, parameterized) ──► runner/run_recipe_ci.py
▼
Bridge posts/updates a Gitea PR comment with the run URL and (on completion) pass/fail.
- The bridge is a tiny service (Go or Python+FastAPI). Keep it dependency-light; it's a NixOS
systemd service behind Traefik at e.g.
ci.commoninternet.net/hook(§4.0). - Trigger: POLLING is primary; webhook is an optional, admin-registered push optimization
(SETTLED). Hard constraint: the CI server/bot must run on READ-level access — never repo-admin.
- Polling (primary, default): the bridge polls the Gitea API for new
!testmecomments on enrolled repos at ≤60s (satisfies D1). This is outbound (cc-ci → git.autonomic.zone, the reliably-working direction) and needs only read. It is the source of truth for triggering. - Webhook (optional): the bridge keeps its
/hookendpoint so a Giteaissue_commentwebhook, if present, gives lower latency. But the server does NOT self-register webhooks (that needs repo-admin, which we refuse to require). Registration is a manual admin task, documented indocs/enroll-recipe.md(URLhttps://ci.commoninternet.net/hook, eventissue_comment, content-typejson, the shared HMAC secret, and the note that the Gitea instance must allow the host). The two paths are mutually exclusive in effect; don't double-fire a comment seen by both. - (Webhook delivery on this instance was flaky early on —
last_status: None— so polling being primary is also the robust choice, not just the low-privilege one.)
- Polling (primary, default): the bridge polls the Gitea API for new
- Commenter auth via org membership (read-level — no admin). The repo's explicit collaborator
list is empty: the bot and the maintainers (
trav/notplants) all reach the repo asrecipe-maintainersorg members/owners, soGET /collaborators/{user}404s for everyone, andGET /collaborators/{user}/permissionwould authorize correctly but requires repo-admin — which we refuse. Instead authorize withGET /orgs/recipe-maintainers/members/{user}(204 = member = authorized; 404 = rejected) — readable by any org member (read-level), verified to admittrav/notplants/the bot and reject non-members. Notepublic_membersis hidden here, so use the authenticatedmembersendpoint (bot must be an org member, still read-level). Fail-closed on error. Zero-privilege fallback: a configured allowlist of usernames. (Still satisfies §6's non-collaborator-rejection check.) - Enrollment = adding the recipe to the bridge's poll list + ensuring a
tests/<recipe>/dir exists. The bot needs only read on the recipe repo (+ comment-back to post status). Registering a webhook is optional and operator/admin-side (documented inenroll-recipe.md), never required for CI to work. - Recipe mirror+PR flow (how a recipe gets a testable PR). Recipe repos under test live on the
private mirror
git.autonomic.zone/recipe-maintainers/<recipe>, mirrored from the official upstreamgit.coopcloud.tech. To bring a recipe under CI:abra recipe fetch <recipe>(pulls from upstream into~/.abra/recipes/<recipe>), then mirror it to the org + open a PR via the recipe mirror+PR procedure — reference implementation:/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md(createsrecipe-maintainers/<recipe>if absent, force-syncsmainfrom upstream so the PR diff is clean, pushes a branch, opens the PR).!testmeon that PR is what kicks off a run. So a recipe missing from the mirror is not a blocker — mirror it first. - Decide and record in DECISIONS.md: one shared Gitea org-level webhook vs per-repo webhooks. Org-level is fewer moving parts; per-repo is more explicit. Default: per-repo via enroll script.
4.2 Drone + the test target
- Drone server connects to Gitea via OAuth app (Gitea → Settings → Applications). Runner is the
exec runner (or a privileged docker runner) running on cc-ci itself, because tests must
drive
abrato deploy real recipes onto a real swarm. - cc-ci doubles as the deploy target: single-node Docker Swarm + abra, with the reverse proxy
provided by the real
coop-cloud/traefikrecipe deployed via abra (not a hand-rolled Traefik — chosen for end-to-end fidelity: test apps route through the exact proxy a real Co-op Cloud host uses —web/web-secureentrypoints, theproxyoverlay, the swarm provider). TLS terminates on it using the pre-issued static wildcard cert (§4.0): run the recipe in wildcard/file-provider mode (WILDCARDS_ENABLED=1+compose.wildcard.yml) and supply the cert as the recipe'sssl_cert/ssl_keyswarm secrets from/var/lib/ci-certs/live/. The operator preconfigures the wildcard DNS (→ gateway), the gateway's TLS-passthrough, and the cert itself (§4.4); the agent deploys the traefik recipe + swarm on top — no ACME, no DNS token on cc-ci. Make theabra app new/deploy traefiksteps reproducible (scripted/Nix-invoked) for D8. - Each CI run gets an isolated app domain
<recipe>-pr<n>-<short-sha>.ci.commoninternet.net(§4.0) so concurrent runs don't collide. Teardown removes app, secrets, and volumes. - Concurrency cap + queue — use Drone natively (SETTLED). Don't let the server fill with
simultaneously-deployed apps. Expose a configurable
MAX_TESTSmapped to the exec runner'sDRONE_RUNNER_CAPACITY(Nix-set on the runner; default low — 1–2 given a single 28 GiB node and heavy recipes like matrix/immich). Drone runs at mostMAX_TESTSbuilds at once and automatically queues excess builds (its native pending-build queue), starting them as slots free. Per-build timeout (repo/runner timeout) guarantees a hung test is killed and frees its slot — so "continue once a current test finishes or times out" is built in. No custom queue needed. Optionally also setconcurrency: { limit: <N> }in.drone.ymlas a per-pipeline cap. - One app at a time per run, torn down at run end. A build deploys its recipe, runs the three
stages, then undeploys — the server should not accumulate live test apps. Guaranteed teardown
- the run-start janitor (§4.3) enforce this even when a build is timed-out/killed (in-process cleanup can't run, so the janitor reaps it).
4.3 The test harness & recipe test contract
runner/run_recipe_ci.py orchestrates per run:
- Fetch recipe at
$REF(the PR head) via abra/git. - Install stage →
tests/<recipe>/test_install.py:abra app new, generate secrets,abra app deploy, wait healthy, run Playwright smoke + assertions. - Upgrade stage → deploy previous published version first, then upgrade to
$REF; assert data survives and app still healthy. - Backup/restore stage →
abra app backup, mutate state,abra app restore, assert restored state matches pre-mutation. - Recipe-local tests (D4) → if
<recipe-repo>/tests/exists, discover & run it in the same live environment; merge results. - Teardown (always, even on failure) →
abra app undeploy,abra app volume remove,abra app secret remove, DNS/route cleanup.
Shared fixtures (tests/conftest.py + runner/harness/) wrap abra. Known abra gotchas to bake
in from day one (carried over from prior work, re-verify on the installed abra version):
abra app undeployandabra app volume removedo not accept--chaos→ never pass it.- Plumb a
timeoutkwarg through secret-generate/insert/remove-all calls. abra app ls -S -mreturns nested{server: {apps: [...]}}— parse the inner structure.- Pick robust health checks per app (e.g. Keycloak:
/realms/master, not/).
The teardown guarantee is sacred: a failed test must never leak a deployed app or volume into the
next run. Implement teardown as a pytest fixture finalizer / try/finally in the orchestrator and
add a janitor pass at run start that nukes any orphaned *-pr* apps older than N hours.
Crucially, the janitor is the backstop for timed-out/killed builds: when Drone hits the
per-build timeout (or a build is cancelled) it may SIGKILL the runner process, so the try/finally
teardown can't run — those orphaned apps/volumes are reaped by the next build's run-start janitor
(and the janitor should run regardless of how the previous build ended). Net effect with the
MAX_TESTS/DRONE_RUNNER_CAPACITY cap (§4.2): at most MAX_TESTS apps are ever live at once, and
each is torn down (or janitor-reaped) so the single node never accumulates deployments.
4.4 Secrets (D6)
There are two distinct classes of secret and they are handled in opposite ways. Do not conflate them.
(A) Infra secrets. All of these end up sops-nix-encrypted in secrets/, decrypt into the Nix
store at activation, and are never world-readable. But they split into two sub-classes — see §1.5
for the concrete locations/usage — and only the first sub-class blocks:
- (A1) External inputs — provided by the operator, the loop cannot create them. The Tailscale
auth key + Gitea bot creds (
/srv/cc-ci/.testenv, already provided), the pre-issued wildcard TLS cert at/var/lib/ci-certs/live/(§4.0 — not a DNS token; the agent serves it, never issues it), and registry pull creds (if needed). If one of these is missing or invalid, the loop is blocked — write it toSTATUS.md ## Blockedand stop (§9). The agent must not invent or work around an external input it wasn't given, and must not attempt ACME/DNS-01 forcommoninternet.net. - (A2) Internal secrets — the loop generates and manages these itself; never block on them.
Drone RPC secret + webhook HMAC (
openssl rand), the Gitea OAuth app for Drone (created via the bot API), and the cc-ci host age/GPG key for sops. These are not human inputs; generate, store insecrets/, and wire both ends.
Alongside these, three preconfigured network/cert facts are operator-provided inputs the loop
also depends on (not secrets the agent makes, but class-A in the same "provided, don't improvise"
sense): (1) the wildcard *.ci.commoninternet.net record (and bare ci.commoninternet.net) already
points at the gateway, (2) the gateway TLS-passthroughs that wildcard to cc-ci (SNI-routed,
no decryption — see §4.0 Network path), and (3) the pre-issued wildcard cert is in place at
/var/lib/ci-certs/live/. The operator owns the DNS record, the gateway, and cert issuance/renewal;
everything else on cc-ci is the agent's job — Traefik (pointed at the static cert), swarm,
per-run subdomain routing, and teardown. If the wildcard does not resolve, the gateway doesn't reach
cc-ci, or the cert is missing/expired, that is a ## Blocked condition (operator action), not
something to work around (the gateway and DNS are not ours to reconfigure, per §9).
(B) Recipe app secrets — generated by the test, persisted within the run. These are NOT a blocker and are NOT pre-provisioned by a human. The harness creates them itself for each app under test and is responsible for persisting them across the run so the multi-stage lifecycle works:
- Generate at install: the harness runs
abra app secret generate(+ inserts any deterministic test fixtures like an admin password / test user it chooses) when it deploys the app. - Persist for the run's duration: the same generated secrets must survive across stages —
install → upgrade and especially backup → restore — because an app cannot be upgraded or
restored against rotated credentials. Persist them in a per-run secret store keyed by the run's
unique app name (e.g.
<recipe>-pr<n>-<sha>): the live abra/swarm secrets plus a sidecar record the harness writes (e.g. the app's.env+ the generated values) to a run-scoped, non-public location on the runner, so any stage can re-read them. They are emphemeral by design. - Destroy at teardown: the same teardown that removes the app/volumes also runs
abra app secret remove(withtimeoutplumbed) and deletes the per-run sidecar. Nothing generated for a run outlives that run. - How the harness should "figure out" persistence (acceptance for D6): decide and document one
concrete mechanism — recommended default is "abra/swarm holds the live secrets; the harness keeps
a run-scoped sidecar file under a
runs/<app-name>/dir on the runner (mode 600), and reloads from it between stages." Whatever is chosen, it must (1) keep the same values stable across all three stages, (2) isolate concurrent runs from each other, and (3) leave nothing behind.
(C) Drone CI tokens: store as Drone org/repo secrets, referenced by the pipeline. Where a value is an external input (A1, e.g. registry creds) it is provided; where it is internal (A2) it is generated — see the (A) split above.
Hard rule across all classes: scrub secrets from logs before they reach the dashboard; the results UI shows sanitized logs only. Add a redaction filter in the log pipeline and an Adversary test that greps published logs and the overview site for known secret patterns and any generated app password.
4.5 Results UX (D7) — YunoHost-CI-like
- Per-run logs: Drone's native UI already gives live, per-stage, tail-able logs and a final status — use it as the canonical run view; the PR comment links to it.
- Overview page: a small generator (
dashboard/) polls the Drone API and renders a static page atci.commoninternet.net(§4.0): a table of enrolled recipes, latest run status badge (pass/fail/running), last-tested version, link to history — mirroring the YunoHost app-list feel. Served by Traefik; regenerated on build-completion webhook or a short timer. - Provide a status badge endpoint per recipe for embedding in recipe READMEs.
5. Milestones / initial BACKLOG
Work top-down; each milestone ends with an Adversary gate (Adversary must independently
verify the acceptance check before the next milestone starts). Seed BACKLOG.md from this.
- M0 — Foundations. Repo created; flake builds;
nixos-rebuild(or deploy-rs) applies a no-op-then-base config to cc-ci; sops decrypts a test secret on the host. Accept:ssh cc-ci 'systemctl is-system-running'healthy after a rebuild from the repo. - M1 — Swarm + abra target. Docker + single-node swarm +
proxynetwork; thecoop-cloud/traefikrecipe deployed via abra (wildcard/file-provider mode, serving the pre-issued cert — §4.0/§4.2, not a custom Traefik); abra can deploy and tear down a trivial recipe by hand. Accept: a recipe deployed via abra is reachable over HTTPS (valid wildcard cert) on theweb-secureentrypoint at*.ci.commoninternet.net, then fully torn down leaving no volumes; the proxy is verifiably the traefik recipe and no DNS/ACME token is present on cc-ci. - M2 — Drone online. Drone server+runner via Nix, OAuth to Gitea; a hello-world
.drone.ymlin cc-ci runs green; logs visible in Drone UI. Accept: push to cc-ci triggers a visible green Drone build. - M3 — Comment bridge.
!testmeon a PR triggers a parameterized Drone build; bridge posts a PR comment with the run link; non-!testmecomments and non-collaborators are ignored. Accept: live demo on a scratch PR — comment in, build out, link back, auth enforced. - M4 — Harness + install stage.
run_recipe_ci.py+ conftest; install stage green for one simple recipe end-to-end with a Playwright assertion; guaranteed teardown. Accept: full green install run for recipe #1, no orphaned app/volume afterward. - M5 — Upgrade + backup/restore stages. Add the other two stages for recipe #1. Accept: upgrade preserves data; backup→mutate→restore returns original state.
- M6 — Recipe-local tests (D4) + second recipe. Discover/run recipe-repo
tests/; enroll a second, DB-backed recipe via the documented flow. Accept: both recipes green; recipe-local tests demonstrably executed and merged. - M6.5 — Breadth ramp. Enroll recipes 3→6 covering the remaining D10 categories, one at a
time, each via the documented enroll flow (this is the real test of D5: enrolling recipe N
should be template-copy + recipe-specific tests/fixtures, with no harness surgery). Expect
per-recipe quirks — multi-service deps, S3/MinIO config, SSO client setup, TLS passthrough,
large-volume backups — and absorb them into the shared harness, not one-off per-recipe hacks.
When flakiness appears, add real readiness/wait robustness to the harness rather than sprinkling
sleeps. Run benchmarks/long deploys sequentially, never in parallel (network contention). Accept: recipes 3–6 each have a full three-stage green run; enrolling N≥3 needed no changes to shared harness code. - M7 — Secrets hardening (D6). Full sops model, rotation doc, log redaction + leak test. Accept: Adversary's secret-grep over published logs finds nothing; rotation doc followed.
- M8 — Dashboard (D7). Overview page + badges + PR-comment outcome reflection. Accept: overview matches reality across several runs; outcomes mirrored to PR comments.
- M9 — Reproducibility + docs (D8/D9).
docs/install.mdrebuilds the server from scratch on a blank VM; all docs complete. Accept: Adversary rebuilds from docs onto a throwaway host (or records the tested subset). - M10 — Proof (D10). All six chosen recipes green via real
!testmePRs (the breadth set from M6/M6.5 carried through the hardened pipeline), each with install/upgrade/backup-restore exercised and Adversary-verified; flipSTATUS.mdto DONE.
6. The two agents
Builder (primary)
Implements the backlog top-down. Discipline:
- One backlog item in flight at a time. Small, committed, reversible steps.
- Every change verified against the real system (server, Drone, Gitea) before claiming done —
never "should work". Paste the verifying command + output into
JOURNAL.md. - Touch production carefully: cc-ci is the only target; never deploy test apps onto unrelated production servers; never reuse production domains. Idempotent server changes only (via Nix).
- If blocked on access/secrets/external state, write it to
STATUS.md ## Blockedand pick up an unblocked item rather than hacking around it.
Adversary (reviewer)
Runs as a separate, independent loop in its own process/sandbox (see §6.1 for how the two loops coordinate). Its job is to disbelieve. It:
- Re-verifies each
Definition of Doneand milestone-acceptance claim independently, from a cold start (fresh shell, own clone, no cached state), and logs PASS/FAIL + evidence inREVIEW.md. - Actively tries to break things: comment
!testmexyz(should NOT trigger), comment as a non-collaborator (should be rejected), push a PR that fails tests (must report red, not green), kill an app mid-run (teardown must still clean up), grep published logs/dashboard for secrets, run two!testmes concurrently (no domain/volume/secret collision), confirm the same generated app secrets persist across install→upgrade→backup/restore. - Files every defect as a
BACKLOG.mditem tagged[adversary]with repro steps. The Builder may not close an adversary item; only the Adversary closes it after re-test. - Has veto power over
STATUS.md → DONE.
6.1 Coordination protocol (two independent loops, one shared repo)
The two loops never talk directly; the git repo is the only coordination medium. Each agent
has its own clone (e.g. Builder in /srv/cc-ci/cc-ci, Adversary in /srv/cc-ci/cc-ci-adv) and
its own pacing. To make concurrent writes conflict-free:
- File ownership (one writer each — the other only reads):
- Builder owns: all source code/config,
STATUS.md,JOURNAL.md,DECISIONS.md. - Adversary owns:
REVIEW.md. BACKLOG.mdis split into two H2 sections —## Build backlog(Builder-only) and## Adversary findings(Adversary-only). Each agent edits only its own section, so git merges the two cleanly. Closing an item = checking the box in your own section; the Builder fixes an[adversary]finding and notes the fix in JOURNAL, but only the Adversary ticks it closed after re-test.
- Builder owns: all source code/config,
- Append-only where possible.
JOURNAL.mdandREVIEW.mdare append-only logs → they never conflict. Prefer appending over rewriting. - Git discipline (both loops, every write):
git pull --rebasebefore editing, make the smallest change, commit,git push. On a rebase conflict, it will be inside the other agent's file/section only if a rule was broken — re-pull and keep to your own files. Never--force. - Gate handshake via STATUS.md. When the Builder believes a milestone gate is met, it sets in
STATUS.md:Gate: <Mn> — CLAIMED, awaiting Adversaryand stops advancing past it. The Adversary, on its next wake, sees the claim, runs the acceptance check cold, and writes the verdict toREVIEW.md(<Mn>: PASS @<ts>with evidence, orFAIL+ an[adversary]item). The Builder only proceeds past the gate after seeingPASSinREVIEW.md. - DONE handshake. Builder may write
## DONEtoSTATUS.mdonly whenREVIEW.mdshows a PASS dated within 24h for every D1–D10. The Adversary can write## VETO <reason>toREVIEW.mdat any time, which forbids DONE until cleared. - Liveness. If the Adversary sees a gate
CLAIMEDfor too long with no Builder progress, or the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep doing independent work — neither loop blocks idle waiting on the other beyond its gate.
(If you are ever forced to run with a single process, the degraded fallback is to alternate
roles per iteration and keep JOURNAL.md and REVIEW.md strictly separate — but two loops is
the intended design.)
7. The Loop Protocol
Both loops run this same shape; state lives in the repo so it survives restarts/compaction. On
every wake, git pull --rebase first, then:
- Orient. Read
STATUS.md(phase, in-flight item, gate claims, blockers),BACKLOG.md, and the tail ofREVIEW.md. Reconcile with reality via cheap probes (Drone health, last build,git log) — never trust the ledger blindly; if it disagrees with the system, fix the ledger first (your own files only — see §6.1). - Select.
- Builder: highest-priority open item in
## Build backlog: unresolved[adversary]findings > current milestone's next task > next milestone. Never advance past a milestone gate untilREVIEW.mdshows its PASS. - Adversary: any standing
Gate: <Mn> CLAIMEDinSTATUS.mdto verify > re-verify a D1–D10 gate whose last PASS is stale (>24h) > a fresh break-it probe from §6.
- Builder: highest-priority open item in
- Act. Smallest change that advances the item. Builder verifies against the real system; Adversary verifies from a cold start. Commit with a clear message (author per repo convention).
- Record (your own files only). Builder: append to
JOURNAL.md(what you did + verifying command/output + next), updateSTATUS.md, tick## Build backlog. Adversary: append PASS/ FAIL + evidence toREVIEW.md, add/close items in## Adversary findings. Thengit push. - Gate handshake (§6.1). Builder, on reaching a milestone, sets
Gate: <Mn> CLAIMED, awaiting AdversaryinSTATUS.mdand works on other unblocked items meanwhile. Adversary clears it with aREVIEW.mdverdict. No gate is "passed" without a logged PASS. - Decide continuation. Builder writes
## DONEonly whenREVIEW.mdshows a <24h PASS for every D1–D10 and no standing## VETO. Otherwise schedule the next wake.
Pacing. Use /loop (self-paced) or ScheduleWakeup. Most waits here are for things the
harness can't notify you about — a Drone build, a nixos-rebuild, a deploy converging — so poll
the specific thing. Three cases:
- Something in flight (build/deploy/
nixos-rebuild) → re-check on a short cadence (≈4 min) to stay cache-warm; keep polling it, don't treat it as idle, and don't spin on a minutes-long build. - Blocked on the other loop — Builder parked at a
CLAIMEDgate awaiting the Adversary, or Adversary waiting for the Builder to fix an[adversary]finding. You don't need to busy-poll here: the watchdog signals across the handoff. The moment the Builder writes aCLAIMEDgate, the watchdog pings the Adversary to verify now; the moment the Adversary updatesREVIEW.md(verdict/finding), it pings the Builder to proceed (launch.sh, ~30 s detection). So you may sleep while blocked and trust the ping — but keep a fallback self-poll on a modest cadence (~2–4 min) in case a ping is missed (a dead session is restarted by the watchdog and re-orients from the repo anyway). The goal: a pending handoff resolves in well under a minute, not a whole idle interval. - Genuinely idle, nothing pending from either loop → sleep ~10–15 min, then re-orient.
Notes: The Adversary may idle freely when nothing is pending — it should NOT pointlessly re-verify or busy-poll to look busy. It gets woken by the watchdog the instant the Builder claims a gate, so "start verifying very soon after the Builder waits" is handled by the signal, not by the Adversary spinning. The Builder should prefer keeping an unblocked backlog item in hand so it's rarely fully blocked on a gate; only hit case 2 when everything is genuinely gated behind the pending verification — and then rely on the watchdog ping (+ fallback poll) rather than a long idle.
Anti-drift guards.
- Cap retries: if an approach fails 3× the same way, stop, write the dead-end in
DECISIONS.md, and try a different approach or mark blocked. No thrashing. - Never weaken a test to make it pass. A red test is information; "fix" the recipe/harness or file a finding — do not delete the assertion. (This is the single most important rule; the Adversary watches specifically for tests being softened or skipped.)
- Keep changes reversible; prefer Nix-declared state over imperative server edits so any rebuild reproduces it.
- Don't expand scope beyond §2. New ideas →
BACKLOG.md(tagged[idea]), not into this run.
8. Open decisions to settle early (log in DECISIONS.md)
- Deploy mechanism:
nixos-rebuild --target-hostvsdeploy-rs/colmena. (Default: deploy-rs for atomic rollbacks; nixos-rebuild fine if simpler.) - Webhook scope: per-repo vs org-level Gitea webhook. (Default: per-repo via enroll script.)
- Drone runner type: exec vs privileged docker. (Default: exec, since it must drive host abra.)
- Secret tool: sops-nix vs agenix. (Default: sops-nix for multi-recipient + yaml ergonomics.)
- Reverse proxy / Wildcard TLS: SETTLED — deploy the real
coop-cloud/traefikrecipe via abra (for e2e fidelity), in wildcard/file-provider mode, serving the operator's pre-issued wildcard cert; no ACME, no token (§4.0/§4.2). Supersedes the original plan's hand-rolledmodules/traefik.nix. The operator issued the wildcard SAN cert (*.ci.commoninternet.net+ci.commoninternet.net) via LE DNS-01/Gandi out-of-band into/var/lib/ci-certs/live/; the agent feeds it as the recipe'sssl_cert/ssl_keyswarm secrets so the DNS-editing token never reaches cc-ci. Manual renewal ~90 days (next ~2026-08-24): re-issue → update the secret → redeploy. - Proof recipe set (D10 — six, category-spanning). Default candidates, all previously verified
deployable:
hedgedoc,cryptpad,keycloak,authentik,lasuite-docs/lasuite-drive,matrix-synapse,immich,bluesky-pds. Lock the final six early so M4–M6.5 build toward them. Sequence easy→hard: prove the pipeline onhedgedoc/cryptpadbefore tackling SSO, S3, media stores, and TLS-passthrough recipes.
Each default stands until the Adversary or reality forces a change; record the change and why.
9. Guardrails / hard rules
- Access boundary: only cc-ci is yours to reconfigure. Recipe repos: read + comment + (when enrolling) add a webhook — nothing else. Never push to a recipe repo's code.
- No secrets in git/logs/UI. Ever. Verified by the Adversary's leak test.
- No mocks for the e2e stages. D2 means real deploys. If something can't be tested for real, it's a finding, not a pass.
- Idempotent + reversible. Anything done to the server must be re-derivable from the repo.
Infra bring-up is declarative idempotent reconciliation in Nix — not manual post-steps and not
run-once scripts. Each piece (swarm +
proxynet, the traefik recipe deploy, Drone, the comment-bridge, the dashboard) is a systemd oneshot that re-runs on every activation/boot and converges to the desired state (inspect → act only if needed → no-op if already correct), likeswarm-init. No/var/lib/.bootstrapped-style sentinels (they don't self-heal drift). The goal: a from-scratch install isgit clone+nixos-rebuild switch+ the operator preconditions —docs/install.mdmust never accumulate manual post-rebuild steps. - Stop on missing external infra inputs (class-A1 in §4.4: cc-ci SSH/root access, the
Tailscale auth key, Gitea bot creds, the pre-issued wildcard cert at
/var/lib/ci-certs/live/, registry creds — and the preconfigured DNS/gateway facts) rather than improvising around them; surface inSTATUS.md ## Blocked. Never attempt ACME/DNS-01 forcommoninternet.net— the cert is pre-provided and renewed out-of-band by the operator. This does NOT apply to internal infra secrets (class-A2: Drone RPC, webhook HMAC, Gitea OAuth app, host age key — the agent generates these) or to recipe app secrets (class-B): those the test harness generates itself (abra app secret generate+ chosen fixtures), persists for the run, and destroys at teardown — a missing app secret is never a blocker, it is something the harness creates. See §4.4. - Honest reporting. If a stage is skipped or a check failed, say so in
STATUS.md/JOURNAL.mdwith the output. The loop's value depends entirely on the ledgers being true.