All checks were successful
continuous-integration/drone/push Build is passing
- acquire_app_lock(domain): exclusive flock on /run/lock/cc-ci-app-<domain>.lock, taken in deploy_app exactly where register_run_app was (BEFORE app creation); blocks with a log line when another run of the same domain is in flight (double-!testme serialisation). The file object is retained in module-level _held_app_locks so GC can never close the fd and silently release the lock. mtime is touched at acquisition (lock age for the long-held flag). - janitor(): probes each candidate's lock (discovery unchanged: abra app ls + docker-service sweep vs RUN_APP_RE). Acquirable -> orphan -> teardown_app(verify=False) WHILE HOLDING the probe lock (a new same-domain run blocks until the reap finishes), then unlink before release. Held -> live run -> leave it; held >120min (2x hard deadline) -> warn, never steal. Stale unheld lockfiles with no app are unlinked on sight. Unreadable lockfile -> skip + log. - unlink/recreate race guard (both sides): after ANY acquisition, verify the locked fd still is the inode the path names (fstat vs stat); a waiter that won a just-unlinked inode retries on the live path, and a probe that won one skips (unlinking now would hit a newer run's file). - deleted: register_run_app, unregister_run_app, _run_owner_state, _registry_path, ACTIVE_RUN_DIR, CCCI_JANITOR_MAX_AGE + age fallback, _stack_age_seconds, pid-reuse guard. teardown_app no longer unregisters (release is process exit). janitor() takes no args now. - post-reboot: /run/lock is tmpfs -> lockfiles gone -> probe trivially acquires -> immediate reap (improvement over the old 2h age fallback).
84 lines
3.4 KiB
YAML
84 lines
3.4 KiB
YAML
---
|
|
# Self-test pipeline: runs on normal pushes to cc-ci (M2). Sanity-checks the exec runner can drive
|
|
# host abra/docker. Recipe CI is the separate `custom`-event pipeline below.
|
|
kind: pipeline
|
|
type: exec
|
|
name: self-test
|
|
|
|
platform:
|
|
os: linux
|
|
arch: amd64
|
|
|
|
trigger:
|
|
event:
|
|
- push
|
|
|
|
steps:
|
|
# Lint/format gate (Phase 1b, RL1). Runs the exact toolchain from the pinned `lint` devshell
|
|
# (flake.nix) via scripts/lint.sh in check mode — FAILS the build on any unclean file so future
|
|
# commits stay formatted + lint-clean. HOME=/root so nix reuses root's store/eval cache.
|
|
- name: lint
|
|
environment:
|
|
HOME: /root
|
|
commands:
|
|
- nix develop .#lint --command bash scripts/lint.sh
|
|
|
|
- name: hello
|
|
commands:
|
|
- echo "cc-ci self-test on the exec runner"
|
|
- whoami
|
|
- abra --version
|
|
- docker info --format 'swarm={{.Swarm.LocalNodeState}}'
|
|
|
|
---
|
|
# Recipe-CI pipeline: runs on bridge-triggered builds (event=custom, params RECIPE/REF/PR/SRC set by
|
|
# the comment-bridge). Deploys the recipe at the PR head, runs install/upgrade/backup + any
|
|
# recipe-local tests via the shared harness, then guarantees teardown (plan §4.2/§4.3).
|
|
#
|
|
# Resource safety (plan §4.2/§4.3): DRONE_RUNNER_CAPACITY=2 (nix/modules/drone-runner.nix) +
|
|
# concurrency.limit=2 below allow two recipe runs in parallel. Concurrent-run safety is enforced by
|
|
# the harness, not by serialisation: every run holds an exclusive flock on its app domain
|
|
# (/run/lock/cc-ci-app-<domain>.lock) for its whole process lifetime, the run-start janitor probes
|
|
# that lock to reap only orphans (held lock = live run, never touched), and same-recipe runs
|
|
# serialise on a per-recipe flock for the shared ~/.abra/recipes/<recipe> checkout
|
|
# (lifecycle.acquire_recipe_lock — removed by P3's per-run ABRA_DIR). See docs/concurrency.md.
|
|
kind: pipeline
|
|
type: exec
|
|
name: recipe-ci
|
|
|
|
platform:
|
|
os: linux
|
|
arch: amd64
|
|
|
|
trigger:
|
|
event:
|
|
- custom
|
|
|
|
concurrency:
|
|
limit: 2
|
|
|
|
steps:
|
|
- name: ci
|
|
environment:
|
|
STAGES: install,upgrade,backup,restore,custom
|
|
# The exec runner points HOME at a per-build workspace; force it to /root so abra finds its
|
|
# server config + recipes under /root/.abra (as the manual M4/M5 runs did). Safe with
|
|
# capacity=2: app names are unique per (recipe,pr,ref) and same-recipe runs serialise on the
|
|
# per-recipe flock, so concurrent builds never touch the same recipe checkout or app.
|
|
HOME: /root
|
|
commands:
|
|
# RECIPE/REF/PR/SRC (+ CCCI_QUICK for `!testme --quick`) are injected as env vars from the
|
|
# build's custom params. CCCI_QUICK=1 makes run_recipe_ci take the opt-in fast lane (WC7);
|
|
# absent => full cold (default). run_quick ignores STAGES (always upgrade+custom).
|
|
- 'echo "recipe-ci: RECIPE=$RECIPE REF=$REF PR=$PR SRC=$SRC stages=$STAGES quick=${CCCI_QUICK:-0}"'
|
|
# P1 lock-lifetime hardening: run the harness in its own session/process group (setsid) and
|
|
# forward a drone cancel (TERM to this step shell) to the WHOLE group, so the harness's
|
|
# SIGTERM handler runs its teardown funnel instead of being leaked (the exec runner kills
|
|
# only the step shell, not the tree). PDEATHSIG inside the harness backstops the case where
|
|
# this shell dies without the trap firing. `wait` propagates the harness exit code.
|
|
- |
|
|
setsid cc-ci-run runner/run_recipe_ci.py &
|
|
PID=$!
|
|
trap 'kill -TERM -- "-$PID" 2>/dev/null' TERM EXIT
|
|
wait "$PID"
|