Bugs found by the live proof, fixed:
- warmsnap: snapshot now swaps a <recipe>/snapshot/ SUBDIR, not the whole
<recipe>/ dir — so the reconciler's sibling last_good file survives a
snapshot swap (was being clobbered).
- warm_reconcile: deploy_version captures abra's stdout (it writes FATA to
stdout) in the error; add wait_undeployed() after every undeploy so
snapshot/restore/redeploy don't race a half-removed swarm stack; the upgrade
deploy is wrapped so a deploy FAILURE (not just unhealthy) also triggers
rollback. (57 unit pass.)
LIVE PROOF on warm keycloak (annotated fake tags via CCCI_SKIP_FETCH):
(a) healthy upgrade 10.7.1->10.7.9: snapshot+deploy+health-pass, last_good
committed=10.7.9, marker realm preserved.
(b) MARQUEE rollback: broken latest 10.7.10 (lint-fail) -> rollback to 10.7.9,
HEALTHY, marker realm INTACT (data preserved through broken-upgrade+restore),
last_good NOT advanced, rollback alert written (attempted=10.7.10,
last_good=10.7.9, recovered=True). keycloak recovered to canonical
10.7.1+26.6.2 healthy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cc-ci — Co-op Cloud recipe CI server
Comment !testme on a PR in an enrolled Co-op Cloud recipe repo and cc-ci deploys the recipe
at that commit onto a real single-node Docker Swarm, runs install / upgrade / backup-restore tests
(Python + Playwright) end-to-end, and reports a live, tail-able run with pass/fail back to the PR.
This repo declares the entire server as a NixOS flake and holds the test harness, the per-recipe test trees, and the docs to enroll a recipe or rebuild the box from scratch.
Status: under active autonomous construction. See
machine-docs/STATUS.mdfor the live phase andplan.md-driven milestones inmachine-docs/BACKLOG.md. Definition of Done is D1–D10 (see the build plan).
Layout
flake.nix NixOS entry point + devshells (stays at root; build ref #cc-ci)
nix/hosts/cc-ci/ the cc-ci machine config
nix/modules/ drone, comment-bridge, swarm, dashboard, secrets (Nix modules)
secrets/ sops-encrypted infra secrets (cc-ci-secrets submodule)
bridge/ !testme webhook listener source
runner/ run_recipe_ci.py + shared pytest harness
dashboard/ results overview generator
tests/<recipe>/ per-recipe install/upgrade/backup tests + playwright/
docs/ install, enroll-recipe, secrets, architecture, runbook, baseline
All .nix code lives under nix/; flake.nix/flake.lock stay at the repo root so the build
reference (nixos-rebuild switch --flake '…#cc-ci') is unchanged.
Docs
docs/install.md— rebuild the server from scratch (D8)docs/testing.md— test architecture: generic lifecycle suite + layered recipe overlays (override/extend, discovery precedence, custom install-steps hook)docs/enroll-recipe.md— add a recipe under CI (D5)docs/secrets.md— secret model + rotation (D6)docs/architecture.md,docs/runbook.md— design + debugging failed runsdocs/baseline.md— bootstrap snapshot / rollback reference
Linting & formatting
The codebase is kept formatted + lint-clean by a single entrypoint, run from the pinned lint
devshell so local and CI use identical tool versions:
nix develop .#lint --command bash scripts/lint.sh # check-only (what CI runs)
nix develop .#lint --command bash scripts/lint.sh --fix # auto-format + apply fixes
Covers Nix (nixpkgs-fmt · statix · deadnix), Python (ruff lint+format), Shell
(shellcheck · shfmt), and YAML (yamllint). Config lives in ruff.toml / .yamllint.yaml;
tool/strictness choices are in machine-docs/DECISIONS.md. CI enforces it: the lint step in the
.drone.yml push pipeline runs the same command and fails the build on any unclean file, so
keep commits clean (--fix before pushing).
Loop state (autonomous build)
The multi-agent loop state lives under machine-docs/: STATUS.md (phase/blockers),
BACKLOG.md (work + adversary findings), REVIEW.md (independent verification), JOURNAL.md
(build log), DECISIONS.md (architecture choices) — plus the phase-namespaced *-1b.md / *-1c.md
variants. See the build plan for the two-loop Builder/Adversary protocol.