Files
cc-ci/runner
autonomic-bot b302f3ab63
All checks were successful
continuous-integration/drone/push Build is passing
feat(harness): P2 flock-probe janitor — the kernel flock IS the liveness oracle
- acquire_app_lock(domain): exclusive flock on /run/lock/cc-ci-app-<domain>.lock, taken in
  deploy_app exactly where register_run_app was (BEFORE app creation); blocks with a log line
  when another run of the same domain is in flight (double-!testme serialisation). The file
  object is retained in module-level _held_app_locks so GC can never close the fd and silently
  release the lock. mtime is touched at acquisition (lock age for the long-held flag).
- janitor(): probes each candidate's lock (discovery unchanged: abra app ls + docker-service
  sweep vs RUN_APP_RE). Acquirable -> orphan -> teardown_app(verify=False) WHILE HOLDING the
  probe lock (a new same-domain run blocks until the reap finishes), then unlink before release.
  Held -> live run -> leave it; held >120min (2x hard deadline) -> warn, never steal. Stale
  unheld lockfiles with no app are unlinked on sight. Unreadable lockfile -> skip + log.
- unlink/recreate race guard (both sides): after ANY acquisition, verify the locked fd still is
  the inode the path names (fstat vs stat); a waiter that won a just-unlinked inode retries on
  the live path, and a probe that won one skips (unlinking now would hit a newer run's file).
- deleted: register_run_app, unregister_run_app, _run_owner_state, _registry_path,
  ACTIVE_RUN_DIR, CCCI_JANITOR_MAX_AGE + age fallback, _stack_age_seconds, pid-reuse guard.
  teardown_app no longer unregisters (release is process exit). janitor() takes no args now.
- post-reboot: /run/lock is tmpfs -> lockfiles gone -> probe trivially acquires -> immediate
  reap (improvement over the old 2h age fallback).
2026-06-10 04:11:31 +00:00
..