Files
cc-ci/.drone.yml
autonomic-bot c0df77d0d9
All checks were successful
continuous-integration/drone/push Build is passing
fix(harness): make concurrent recipe runs safe (per-recipe flock + active-run registry)
capacity=2 went live with three stale capacity=1-era assumptions that corrupted
concurrent runs (immich 229/230 '/pg_backup.sh: No such file'):

- ~/.abra/recipes/<recipe> is ONE shared working tree that fetch_recipe rm-rf's/
  reclones and the upgrade tier git-checkouts mid-run. Same-recipe runs now
  serialise on an exclusive flock (/run/lock/cc-ci-recipe-<recipe>.lock), taken
  in main() BEFORE fetch_recipe and held for the whole run; the kernel releases
  it on any process death, so there is no stale-lock failure mode. Different
  recipes still run in parallel.

- CCCI_JANITOR_MAX_AGE=0 made a starting build reap ANY in-flight run app. Every
  run now registers its app domain + pid in /run/cc-ci-active/<domain> before
  app creation; the janitor checks the owner: alive (pid is a live run_recipe_ci
  process) -> never reaped; dead -> reaped immediately; unknown (pre-registry or
  post-reboot) -> age fallback (default 2h). The MAX_AGE=0 env override is gone
  from .drone.yml.

- .drone.yml: concurrency.limit 1 -> 2 to match DRONE_RUNNER_CAPACITY=2; the
  'safe because capacity=1' comments now describe the flock+registry model.

lint: PASS, unit tests: 138 passed.
2026-06-09 21:56:25 +00:00

74 lines
2.8 KiB
YAML

---
# Self-test pipeline: runs on normal pushes to cc-ci (M2). Sanity-checks the exec runner can drive
# host abra/docker. Recipe CI is the separate `custom`-event pipeline below.
kind: pipeline
type: exec
name: self-test
platform:
os: linux
arch: amd64
trigger:
event:
- push
steps:
# Lint/format gate (Phase 1b, RL1). Runs the exact toolchain from the pinned `lint` devshell
# (flake.nix) via scripts/lint.sh in check mode — FAILS the build on any unclean file so future
# commits stay formatted + lint-clean. HOME=/root so nix reuses root's store/eval cache.
- name: lint
environment:
HOME: /root
commands:
- nix develop .#lint --command bash scripts/lint.sh
- name: hello
commands:
- echo "cc-ci self-test on the exec runner"
- whoami
- abra --version
- docker info --format 'swarm={{.Swarm.LocalNodeState}}'
---
# Recipe-CI pipeline: runs on bridge-triggered builds (event=custom, params RECIPE/REF/PR/SRC set by
# the comment-bridge). Deploys the recipe at the PR head, runs install/upgrade/backup + any
# recipe-local tests via the shared harness, then guarantees teardown (plan §4.2/§4.3).
#
# Resource safety (plan §4.2/§4.3): DRONE_RUNNER_CAPACITY=2 (nix/modules/drone-runner.nix) +
# concurrency.limit=2 below allow two recipe runs in parallel. Concurrent-run safety is enforced by
# the harness, not by serialisation: same-recipe runs serialise on a per-recipe flock
# (lifecycle.acquire_recipe_lock — the shared ~/.abra/recipes/<recipe> checkout is the conflict),
# and every run registers its app domain + pid in /run/cc-ci-active so the run-start janitor only
# reaps orphans whose owning run is DEAD (alive → never touched; unknown → age fallback, default 2h).
kind: pipeline
type: exec
name: recipe-ci
platform:
os: linux
arch: amd64
trigger:
event:
- custom
concurrency:
limit: 2
steps:
- name: ci
environment:
STAGES: install,upgrade,backup,restore,custom
# The exec runner points HOME at a per-build workspace; force it to /root so abra finds its
# server config + recipes under /root/.abra (as the manual M4/M5 runs did). Safe with
# capacity=2: app names are unique per (recipe,pr,ref) and same-recipe runs serialise on the
# per-recipe flock, so concurrent builds never touch the same recipe checkout or app.
HOME: /root
commands:
# RECIPE/REF/PR/SRC (+ CCCI_QUICK for `!testme --quick`) are injected as env vars from the
# build's custom params. CCCI_QUICK=1 makes run_recipe_ci take the opt-in fast lane (WC7);
# absent => full cold (default). run_quick ignores STAGES (always upgrade+custom).
- 'echo "recipe-ci: RECIPE=$RECIPE REF=$REF PR=$PR SRC=$SRC stages=$STAGES quick=${CCCI_QUICK:-0}"'
- cc-ci-run runner/run_recipe_ci.py