Root cause (instrumented, DECISIONS 2026-05-30): a DB recipe dumps its data in a backupbot pre-hook, but if the DB container cycles mid-dump (intermittent on the loaded CI node — full5/6/7 RED, full8 green; NOT OOM/NOT healthcheck) the dump is truncated/absent and restic snapshots an empty path — abra app backup 'succeeds' yet a later restore silently loses the data (ghost ci_marker). Fix (additive, recipe-scoped via meta like READY_PROBE): recipe_meta may define BACKUP_VERIFY(domain) -> bool, a READ-ONLY post-backup integrity probe. When it returns False the harness re-runs the whole backup (fresh snapshot, re-stabilised db) up to 3x. Recipes without the hook are unaffected. ghost's BACKUP_VERIFY confirms /var/lib/mysql/backup.sql.gz is a valid non-empty gzip. Weakens no assertion — it only retries a flaky CAPTURE so P4 restore is RELIABLY exercised, not luck-dependent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cc-ci — Co-op Cloud recipe CI server
Comment !testme on a PR in an enrolled Co-op Cloud recipe repo and cc-ci deploys the recipe
at that commit onto a real single-node Docker Swarm, runs install / upgrade / backup-restore tests
(Python + Playwright) end-to-end, and reports a live, tail-able run with pass/fail back to the PR.
This repo declares the entire server as a NixOS flake and holds the test harness, the per-recipe test trees, and the docs to enroll a recipe or rebuild the box from scratch.
Status: under active autonomous construction. See
machine-docs/STATUS.mdfor the live phase andplan.md-driven milestones inmachine-docs/BACKLOG.md. Definition of Done is D1–D10 (see the build plan).
Layout
flake.nix NixOS entry point + devshells (stays at root; build ref #cc-ci)
nix/hosts/cc-ci/ the cc-ci machine config
nix/modules/ drone, comment-bridge, swarm, dashboard, secrets (Nix modules)
secrets/ sops-encrypted infra secrets (cc-ci-secrets submodule)
bridge/ !testme webhook listener source
runner/ run_recipe_ci.py + shared pytest harness
dashboard/ results overview generator
tests/<recipe>/ per-recipe install/upgrade/backup tests + playwright/
docs/ install, enroll-recipe, secrets, architecture, runbook, baseline
All .nix code lives under nix/; flake.nix/flake.lock stay at the repo root so the build
reference (nixos-rebuild switch --flake '…#cc-ci') is unchanged.
Docs
docs/install.md— rebuild the server from scratch (D8)docs/testing.md— test architecture: generic lifecycle suite + layered recipe overlays (override/extend, discovery precedence, custom install-steps hook)docs/enroll-recipe.md— add a recipe under CI (D5)docs/secrets.md— secret model + rotation (D6)docs/architecture.md,docs/runbook.md— design + debugging failed runsdocs/baseline.md— bootstrap snapshot / rollback reference
Linting & formatting
The codebase is kept formatted + lint-clean by a single entrypoint, run from the pinned lint
devshell so local and CI use identical tool versions:
nix develop .#lint --command bash scripts/lint.sh # check-only (what CI runs)
nix develop .#lint --command bash scripts/lint.sh --fix # auto-format + apply fixes
Covers Nix (nixpkgs-fmt · statix · deadnix), Python (ruff lint+format), Shell
(shellcheck · shfmt), and YAML (yamllint). Config lives in ruff.toml / .yamllint.yaml;
tool/strictness choices are in machine-docs/DECISIONS.md. CI enforces it: the lint step in the
.drone.yml push pipeline runs the same command and fails the build on any unclean file, so
keep commits clean (--fix before pushing).
Loop state (autonomous build)
The multi-agent loop state lives under machine-docs/: STATUS.md (phase/blockers),
BACKLOG.md (work + adversary findings), REVIEW.md (independent verification), JOURNAL.md
(build log), DECISIONS.md (architecture choices) — plus the phase-namespaced *-1b.md / *-1c.md
variants. See the build plan for the two-loop Builder/Adversary protocol.