From d3fe9e26bb0fbaedb37383539ba3973bc1c80aff Mon Sep 17 00:00:00 2001
From: autonomic-bot <autonomic-bot@git.autonomic.zone>
Date: Wed, 10 Jun 2026 04:32:54 +0000
Subject: [PATCH] =?UTF-8?q?docs:=20P5=20concurrency=20spec=20rewrite=20?=
 =?UTF-8?q?=E2=80=94=20one=20lock,=20one=20structural=20isolation,=20the?=
 =?UTF-8?q?=20invariant=20chain?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Rewritten to the restructured model: lifetime-hardening guards (PDEATHSIG/SIGTERM/SIGALRM +
setsid/trap), per-run ABRA_DIR isolation (same-recipe runs now parallel), per-app-domain flock
(double-!testme serialisation), flock-probe janitor decision table (incl. the inode-identity
race rows), updated failure-mode table (cancel now tears down via the harness's own funnel;
reboot reaps immediately; 60-min deadline bounds everything), single-knob config table, how to
run tests/concurrency, fresh file/symbol index + deleted-symbol list for grep verification.
Also drops the last stale concurrency.limit mention from the .drone.yml header comment.
---
 .drone.yml          |   4 +-
 docs/concurrency.md | 311 ++++++++++++++++++++++++++------------------
 2 files changed, 188 insertions(+), 127 deletions(-)
diff --git a/.drone.yml b/.drone.yml
index d7c3674..cc709f3 100644
--- a/.drone.yml
+++ b/.drone.yml
@@ -35,8 +35,8 @@ steps:
 # the comment-bridge). Deploys the recipe at the PR head, runs install/upgrade/backup + any
 # recipe-local tests via the shared harness, then guarantees teardown (plan §4.2/§4.3).
 #
-# Resource safety (plan §4.2/§4.3): DRONE_RUNNER_CAPACITY=2 (nix/modules/drone-runner.nix) +
-# concurrency.limit=2 below allow two recipe runs in parallel. Concurrent-run safety is enforced by
+# Resource safety (plan §4.2/§4.3): DRONE_RUNNER_CAPACITY=2 (nix/modules/drone-runner.nix, the
+# single concurrency knob) allows two recipe runs in parallel. Concurrent-run safety is enforced by
 # the harness, not by serialisation: every run holds an exclusive flock on its app domain
 # (/run/lock/cc-ci-app-<domain>.lock) for its whole process lifetime, the run-start janitor probes
 # that lock to reap only orphans (held lock = live run, never touched), and recipe working trees
diff --git a/docs/concurrency.md b/docs/concurrency.md
index 64d4496..06858f1 100644
--- a/docs/concurrency.md
+++ b/docs/concurrency.md
@@ -1,167 +1,228 @@
 # Concurrency: how parallel recipe CI runs stay safe
 
-Spec of the concurrent-run system as of 2026-06-10 (commits `c0df77d`, `68ef0f8`, `e6d55b5`).
-Written for review/restructuring — it documents what IS, including known limitations.
+Spec of the concurrent-run system after the 2026-06-10 restructure (branch
+`restructure/concurrency`; plan: cc-ci-plan `concurrency-restructure-full-plan.md`). The previous
+registry + per-recipe-flock model is documented in this file's git history (`5b65c6c`).
 
 ## 1. Goal and design summary
 
-Two recipe CI builds may run **at the same time** on the single cc-ci host (e.g. immich and
-plausible under active development at once). Safety is enforced by the **harness**, not by
-serialising everything:
+Two recipe CI builds may run **at the same time** on the single cc-ci host. Safety is enforced by
+the **harness**, not by serialising everything, and rests on ONE locking mechanism plus ONE
+structural isolation:
 
 | Rule | Mechanism |
 |---|---|
 | Different recipes run in parallel | nothing blocks them (isolation, §3) |
-| Same-recipe runs serialise | per-recipe `flock` (§4) |
-| A starting run never reaps a live concurrent run's app | active-run registry + three-way janitor (§5) |
-| A crashed run's leftovers still get reaped | registry owner-dead detection, age fallback (§5) |
+| Same-RECIPE runs run in parallel too | per-run `ABRA_DIR` recipe trees (§4) — no shared tree, no lock |
+| Same-DOMAIN runs (double-`!testme` of one PR) serialise | per-app-domain `flock` (§5) |
+| A starting run never reaps a live concurrent run's app | janitor probes the app lock; held = live (§6) |
+| A crashed/canceled/rebooted run's leftovers get reaped | lock auto-released by the kernel → probe acquires → reap (§6) |
 
-There is **no daemon and no shared state service**: both mechanisms are kernel/file primitives
-under `/run`, scoped to the harness process lifetime, so a `SIGKILL`'d run can never leak a stale
-lock or a stale "I'm alive" claim.
+The invariant chain that makes "held lock = live owner" sound:
 
-## 2. Configuration knobs
+```
+lock lifetime ⊆ harness process lifetime ⊆ drone step lifetime ⊆ 60-min hard deadline
+```
 
-| Knob | Where | Current | Meaning |
-|---|---|---|---|
-| `DRONE_RUNNER_CAPACITY` (aka `MAX_TESTS`) | `nix/modules/drone-runner.nix` (`maxTests` let-binding) | `2` | **THE cap.** Max builds the exec runner executes at once; Drone queues the rest in its native pending queue. Change requires `nixos-rebuild switch` on cc-ci. |
-| `concurrency.limit` | `.drone.yml`, `recipe-ci` pipeline | `2` | Server-side cap on concurrent `recipe-ci` pipelines. Kept equal to capacity; redundant belt (the push pipeline shares runner capacity too, so lint builds can interleave). |
-| `CCCI_JANITOR_MAX_AGE` | env, read in `lifecycle.janitor()` | unset → `7200`s | **Age fallback only** — applies solely to apps with no registry entry (§5 case 3). The capacity=1-era `"0"` override in `.drone.yml` is GONE; do not reintroduce it (it made a starting build reap in-flight runs). |
-| `RECIPE_LOCK_DIR` | `lifecycle.py` constant | `/run/lock` | Where per-recipe lock files live. |
-| `ACTIVE_RUN_DIR` | `lifecycle.py` constant | `/run/cc-ci-active` | Where active-run pidfiles live. |
+- **lock ⊆ process**: locks are kernel flocks on fds the process holds (and PEP 446 makes those
+  fds non-inheritable, so abra/docker/pytest children never carry them). The kernel releases them
+  on process death, however it dies. There is no unlock code path and no stale-lock failure mode.
+- **process ⊆ step**: `PR_SET_PDEATHSIG(SIGTERM)` + the `.drone.yml` setsid/trap wrap (§2) — a
+  dead or canceled build cannot leak a running harness.
+- **step ⊆ 60 min**: `signal.alarm(3600)` self-deadline (§2).
 
-Memory budget rationale for capacity=2 (Hetzner cpx22, ~7.6 GiB): a full immich stack measured
-~1 GiB; two concurrent recipes fit. Revert `maxTests` to `"1"` if OOM/disk-I/O contention appears.
+Never steal a held lock; manage the holder's lifetime. There is **no daemon and no shared state
+service** — everything is kernel/file primitives under `/run/lock` and per-run directories.
+
+## 2. Mechanism 0: run-lifetime hardening (`runner/harness/lifetime.py`)
+
+`run_recipe_ci.main()` calls `lifetime.install_lifetime_guards()` before ANY abra call or lock
+acquisition:
+
+1. **`PR_SET_PDEATHSIG(SIGTERM)`** (ctypes prctl, return code checked): if the parent — the drone
+   step shell — dies, the kernel TERMs the harness. A post-prctl `ppid == 1` re-check closes the
+   start race: a harness whose parent died *before* the prctl armed would never get the signal,
+   so it refuses to run orphaned.
+2. **SIGTERM handler**: logs, then raises `SystemExit(143)` so the run's `finally:` teardown
+   funnel executes and the process exits non-zero. Re-entrant signals during teardown are logged
+   and IGNORED (`lifetime.begin_teardown()`, also set at the top of the run's `finally:` blocks)
+   so a second signal can't abort the cleanup the first one asked for.
+3. **`signal.alarm(3600)` hard deadline**: SIGALRM funnels into the same teardown path with a
+   distinct log line (`== run exceeded 60-minute hard deadline — tearing down ==`), exit 142.
+   Recipes keep their own smaller per-tier timeouts; this bounds the whole run. Teardown time
+   after the deadline is deliberately not alarm-bounded — the janitor is the backstop if a
+   teardown wedges and the process is killed harder.
+
+The `.drone.yml` recipe-ci step runs the harness as `setsid cc-ci-run … &` with a
+`trap 'kill -TERM -- "-$PID"' TERM EXIT; wait "$PID"` — a drone **cancel** (TERM to the step
+shell) is forwarded to the harness's whole process group instead of leaking it (the exec runner
+only kills the step shell). PDEATHSIG backstops the no-trap paths.
 
 ## 3. Isolation model: what is shared, what is per-run
 
 Per-run (no conflict possible):
 
-- **App + stack + volumes + secrets.** The run app domain is deterministic and unique per
-  (recipe, pr, ref): `naming.app_domain()` → `<recipe[:4]>-<sha1(recipe|pr|ref)[:6]>.ci.commoninternet.net`.
-  Everything abra creates is namespaced by it. Run apps are recognised by
-  `RUN_APP_RE = ^[a-z0-9]{1,4}-[0-9a-f]{6}\.ci\.commoninternet\.net$`; canonical/warm apps
-  (e.g. `warm-keycloak...`) deliberately do NOT match, so the janitor never touches them.
-- **Drone build workspace.** The exec runner gives each build its own clone under
-  `/var/lib/drone-runner/drone-<id>/` — harness code and test files are per-build.
-- **Run artifacts.** `/var/lib/cc-ci-runs/<build-number>/`.
+- **App + stack + volumes + secrets.** Run app domain = `naming.app_domain()` →
+  `<recipe[:4]>-<sha1(recipe|pr|ref)[:6]>.ci.commoninternet.net`, unique per (recipe, pr, ref);
+  everything abra creates is namespaced by it. Run apps are recognised by
+  `RUN_APP_RE = ^[a-z0-9]{1,4}-[0-9a-f]{6}\.ci\.commoninternet\.net$`; warm/canonical apps
+  (e.g. `warm-keycloak...`) deliberately do NOT match → the janitor never probes them.
+- **Recipe working trees** — `$ABRA_DIR/recipes/<recipe>`, per run (§4). NEW in the restructure.
+- **Drone build workspace** (`/var/lib/drone-runner/drone-<id>/`) and **run artifacts**
+  (`/var/lib/cc-ci-runs/<run-id>/`).
 
-Shared (the two hazards the mechanisms exist for):
+Shared (by design, conflict-free):
 
-- **`~/.abra/recipes/<recipe>`** — ONE working tree per recipe (abra's own layout). The harness
-  `fetch_recipe()` `rm -rf`'s + reclones it at run start, and the upgrade tier `git checkout`s it
-  mid-run for the chaos redeploy. Two same-recipe runs would corrupt each other's deploy tree
-  (observed: immich builds 229/230 deployed a tree missing its config). → per-recipe flock (§4).
-- **`HOME=/root`** — forced in `.drone.yml` so abra finds its server config under `/root/.abra`.
-  Safe *given* the above: app names are unique and same-recipe runs serialise, so no two builds
-  touch the same recipe checkout or app env file.
+- **`/root/.abra/servers`** — app `.env` files, one per domain. The per-run `ABRA_DIR` symlinks
+  `servers/` here, so .env files land in the canonical path: janitor discovery (`abra app ls`)
+  and out-of-run tooling see every app. Per-domain filenames + the app-domain lock prevent write
+  conflicts.
+- **`/root/.abra/catalogue`** — read-mostly, symlinked into each per-run dir.
+- **`HOME=/root`** (forced in `.drone.yml`) — safe: nothing recipe-mutable lives under `~/.abra`
+  for a run anymore except through the two symlinks above.
 
-## 4. Mechanism 1: per-recipe flock
+## 4. Mechanism 1: per-run `ABRA_DIR` (replaces the per-recipe flock)
 
-Code: `lifecycle.acquire_recipe_lock(recipe)`; taken in `run_recipe_ci.main()` **before**
-`fetch_recipe()` (the first shared-tree mutation).
+`run_recipe_ci.setup_run_abra_dir()` — called first thing in `main()`, before any abra call —
+builds `<runs_dir>/<run-id>/abra/` (run-id = Drone build number; `manual-<pid>` for hand runs):
 
-- Lock file: `/run/lock/cc-ci-recipe-<recipe>.lock`, exclusive `fcntl.flock`.
-- Non-blocking attempt first; on `BlockingIOError` it logs
-  `== recipe lock: another <recipe> run is in flight — waiting ... ==` and blocks. The waiting
-  run is visibly "stuck" in its drone log at that line — that is by design.
-- The open file object is returned and kept alive (`_recipe_lock = ...  # noqa: F841`) for the
-  **whole process lifetime**. Release is implicit: the kernel drops a flock when the fd closes —
-  including on crash or SIGKILL. **There is no stale-lock failure mode and no unlock code path.**
-- Scope: serialises only runs of the SAME recipe. Different recipes never contend.
+```
+abra/
+  servers/    -> /root/.abra/servers     (symlink; canonical shared .env path)
+  catalogue/  -> /root/.abra/catalogue   (symlink; read-mostly)
+  recipes/    fresh, empty               (THE isolation that matters)
+```
 
-## 5. Mechanism 2: active-run registry + three-way janitor
+and exports it as `$ABRA_DIR` — honored by the abra CLI itself and by every harness path helper
+(`abra.abra_dir()` / `abra.recipe_dir()`; `generic._recipe_dir`, `prepull_images`,
+`snapshot_recipe_tests`, `warm_reconcile._recipe_dir` all route through the same rule:
+`$ABRA_DIR` if set, else `~/.abra`).
 
-Why: every run starts with `lifecycle.janitor()` (called from both the cold path and the
-warm/quick path in `run_recipe_ci.py`) to reap orphans left by crashed/SIGKILL'd runs (whose
-`finally:` teardown never ran). Under capacity=2 "any run app that isn't mine" may be a LIVE
-concurrent run — age alone can't tell. The registry makes ownership explicit.
+- `fetch_recipe()` is now a plain clone into `$ABRA_DIR/recipes/<recipe>` (PR-head clone+checkout
+  or `abra recipe fetch`); the upgrade tier's mid-run `git checkout`s happen in the run's own
+  tree. Two same-recipe runs can no longer corrupt each other — structurally, with no lock. The
+  old observed failure (immich builds 229/230 deploying a tree missing its config) is impossible.
+- `CCCI_SKIP_FETCH=1` (test/Adversary staging) copies the canonically-staged
+  `~/.abra/recipes/<recipe>` clone into the per-run tree.
+- Out-of-run flows (warm_reconcile's systemd timer, manual abra) set no `ABRA_DIR` and keep using
+  the canonical `/root/.abra` unchanged. In-run flows that touch canonical state on purpose
+  (warm/canonical .env files) go through `servers/` and are unaffected.
+- The per-run dir rides along the existing `/var/lib/cc-ci-runs/<run-id>/` retention. abra
+  auto-clones any recipe it needs to resolve (e.g. during `app ls`) into the per-run `recipes/` —
+  a few seconds of git per run, gone with the run dir.
 
-Registry protocol (all in `lifecycle.py`):
+## 5. Mechanism 2: per-app-domain flock (`lifecycle.acquire_app_lock`)
 
-1. `register_run_app(domain)` — writes `/run/cc-ci-active/<domain>` containing the harness pid.
-   Called inside `deploy_app()` **before** the app is created, so no window exists where a
-   concurrent janitor can see the app without its registration.
-2. `unregister_run_app(domain)` — removes the pidfile. Called at the end of `teardown_app()`
-   (every exit path funnels through teardown) and by the janitor after reaping.
-3. `_run_owner_state(domain)` — classifies the owner:
-   - reads the pid; missing/garbled file → `"unknown"`
-   - `/proc/<pid>/cmdline` gone → `"dead"`
-   - cmdline must contain `run_recipe_ci` → `"alive"`, else `"dead"` (**pid-reuse guard**: a
-     recycled pid won't look like a harness run)
+- Lock file: `/run/lock/cc-ci-app-<domain>.lock` (dir overridable via `CCCI_APP_LOCK_DIR` for the
+  test suite), exclusive `fcntl.flock`, taken in `deploy_app()` **before the app is created** — a
+  concurrent janitor can never see a run app without its held lock.
+- Blocks (with a log line: `== app lock: another run of <domain> is in flight — waiting ==`) when
+  another run of the SAME domain is in flight — the double-`!testme` serialisation point; the
+  waiting run is visibly parked at that line in its drone log, by design.
+- The returned file object is ALSO retained in module-level `_held_app_locks` — if a caller
+  dropped it, GC would close the fd and silently release the lock.
+- mtime is touched at acquisition: lock age feeds the janitor's long-held flag (§6).
+- **Unlink/recreate race guard**: the janitor unlinks reaped lockfiles, so after EVERY
+  acquisition the locked fd is verified to still be the inode the path names
+  (`fstat().st_ino == stat().st_ino`); a waiter that won a just-unlinked inode closes it and
+  retries on the live path. (A lock on an unlinked inode protects nothing: a later opener gets a
+  fresh inode and would acquire "the same" lock.)
+- Release is implicit: process exit (any kind). `teardown_app()` does NOT release or unlink —
+  a clean run's leftover lockfile is unheld and is unlinked on sight by the next janitor sweep.
 
-Janitor decision table (`lifecycle.janitor()`):
+## 6. The flock-probe janitor (`lifecycle.janitor`)
 
-| Owner state | Meaning | Action |
+Runs at every run start (cold + quick paths) and in the warm/upgrade sweeps. Candidate discovery
+is unchanged from the old model: `abra app ls` + a docker-service sweep (catches stacks whose
+`.env` is already gone), both matched against `RUN_APP_RE` — warm/canonical apps never match and
+are never probed.
+
+Decision table (per candidate domain, `_probe_and_reap`):
+
+| Probe (`LOCK_EX\|LOCK_NB`) | Meaning | Action |
 |---|---|---|
-| `alive` | live concurrent run | **never reap** (logs "is a live concurrent run — leaving it") |
-| `dead` | crashed run's definite orphan | reap immediately (`teardown_app(verify=False)`) + unregister |
-| `unknown` | pre-registry app, or post-reboot (`/run` is tmpfs) | age fallback: reap only if stack age ≥ `CCCI_JANITOR_MAX_AGE` (default 2h) |
+| acquires (+ inode identity OK) | nobody holds it → owner died (kernel-guaranteed) | **reap**: `teardown_app(verify=False)` WHILE HOLDING the probe lock, then unlink the lockfile, then release |
+| acquires, inode stale | another janitor reaped + unlinked while we raced | skip (reap already done; unlinking now would hit a newer run's file) |
+| `BlockingIOError` (held) | live concurrent run | leave it; if lockfile mtime > 120 min (2× the hard deadline): `!! lock for <domain> held >120min — possible leaked run; inspect with lslocks` — flag, **never steal** |
+| `open()` fails (`OSError`) | garbled/unopenable lockfile | skip + log, never crash |
 
-Candidate discovery is unchanged from before: `abra app ls` matches against `RUN_APP_RE`, plus a
-docker-service sweep that reconstructs domains for stacks whose `.env` was already deleted.
-
-## 6. Where convergence fits (adjacent, landed with this work)
-
-Parallel runs surfaced two swarm-convergence bugs that look like concurrency bugs but aren't —
-documented here because any restructuring must keep them fixed (`services_converged()` in
-`lifecycle.py`):
-
-- **N/N replicas ≠ converged** during a stop-first rolling update: the update is *registered*
-  instantly but the OLD task still shows 1/1 until swarm cycles it (build 238: backupbot exec'd a
-  pre-hook into a container killed seconds later → 409 → empty snapshot). `services_converged()`
-  therefore also inspects each service's `UpdateStatus.State`.
-- **`paused` persists forever**: swarm's default `update-failure-action: pause` sets it on one
-  task flicker and it never clears, even at N/N healthy (build 241 hung 22 min). Only `updating`
-  and `rollback_started` block convergence; `paused`/`rollback_paused`/`completed` are settled —
-  the HTTP-health and tier assertions still gate actual app correctness.
-- `backup_app()` additionally waits (bounded, 300s) for `services_converged()` before
-  `abra app backup create`, as defence in depth for the backupbot race.
+- Reaping under the probe lock closes the janitor-vs-new-run race: a new run of that domain
+  blocks in `acquire_app_lock` until the reap finishes — no window where a fresh app coexists
+  with a half-reaped one.
+- Two racing janitors arbitrate on the flock: one reaps, the other sees "held" and leaves; reaps
+  are idempotent (`teardown_app(verify=False)` tolerates half-gone stacks).
+- After the candidates, a tidy sweep unlinks stale **unheld** `cc-ci-app-*.lock` files with no
+  app behind them (under their own probe lock + identity check), keeping `/run/lock` clean.
+- **Post-reboot**: `/run/lock` is tmpfs → lockfiles gone → every surviving app probes as an
+  orphan → reaped immediately. (Improvement over the old 2-hour age fallback; there IS no age
+  logic anymore.)
 
 ## 7. Failure-mode guarantees
 
 | Event | Outcome |
 |---|---|
-| Run crashes / SIGKILL mid-run | flock auto-released by kernel; pidfile remains but owner is `dead` → next janitor (any run's start) reaps app + pidfile |
-| Drone build canceled via API | **known gap**: cancel kills the step's `sh` wrapper but can LEAK the python harness child — it keeps running (holding lock + registry) until killed by hand. See §8. |
-| Host reboot | `/run` is tmpfs → locks and registry vanish (correct: no processes survived either). All surviving apps become `unknown` → 2h age fallback governs. |
-| Two same-recipe `!testme`s | second blocks on the flock at run start (before touching the shared tree), runs after the first finishes |
-| Janitor vs. app being created | impossible to mis-reap: registration happens before app creation, and an `alive` owner is never reaped |
-| Pid reuse after crash | cmdline check (`run_recipe_ci`) classifies as `dead`, orphan still reaped |
+| Run crashes / SIGKILL mid-run | flock auto-released by kernel → next janitor probe reaps app + lockfile |
+| Drone build canceled via API | step trap TERMs the harness process group → SIGTERM funnel runs the run's own teardown (exit 143); if anything still leaks, PDEATHSIG + janitor reap (the old "cancel leaks the harness" gap is CLOSED) |
+| Run exceeds 60 min | SIGALRM → distinct log line → own teardown → exit 142 |
+| Host reboot | locks and lockfiles vanish (tmpfs, correct: no owners survived) → all surviving run apps reaped at the next run start, immediately |
+| Two same-recipe `!testme`s (different PRs) | run in parallel — separate domains, separate per-run recipe trees |
+| Double-`!testme` (same PR → same domain) | second blocks on the app lock before creating anything, visibly in its drone log, runs after the first finishes |
+| Janitor vs. app being created | impossible to mis-reap: the lock is held before `app new`, and a held lock is never touched |
+| Janitor unlink vs. blocked waiter | inode identity re-check on every acquisition → waiter retries on the live path |
+| Lock held implausibly long (>120 min) | flagged loudly for a human (`lslocks`), never stolen |
 
-## 8. Known limitations / restructuring candidates
+## 8. Where convergence fits (adjacent; unchanged by the restructure)
 
-1. **Drone cancel leaks the harness process.** The exec runner kills the step shell, not the
-   process tree; the leaked python continues deploying/holding the lock. Fix ideas: run the step
-   under `setsid` + a trap that kills the process group, or have the harness watch
-   `DRONE_BUILD_STATUS`/parent-death (`PR_SET_PDEATHSIG`).
-2. **Head-of-line blocking on same-recipe serialisation.** A run waiting on the recipe flock
-   still occupies one of the 2 runner slots, so two builds of the SAME recipe temporarily starve
-   all other recipes. Alternatives: a Drone-level per-recipe fan-out (one pipeline `concurrency`
-   group per recipe is not natively expressible), or detect-and-requeue in the harness.
-3. **The lock protects harness runs only.** Manual `abra`/`git` activity on
-   `~/.abra/recipes/<recipe>` (operator or another agent) bypasses the flock — the
-   "park the checkout, then hands off" discipline is still required. A restructure could make the
-   harness deploy from a per-run copy of the recipe tree instead of the shared checkout
-   (eliminates the lock entirely, at the cost of diverging from abra's expected layout).
-4. **`HOME=/root` is still shared.** Safe today by argument (§3), not by enforcement. Per-build
-   `ABRA_DIR` with a shared read-only server config would make isolation structural.
-5. **Registry is advisory.** Nothing stops a non-harness actor from creating run-app-shaped
-   stacks the janitor will eventually age-reap; conversely the janitor trusts pidfiles it can
-   parse. Acceptable on a single-purpose CI host.
-6. **Capacity is configured in two places** (`drone-runner.nix` + `.drone.yml`) that must be
-   kept in step by hand.
+Two swarm-convergence behaviors in `services_converged()` look like concurrency bugs but aren't —
+any future work must keep them fixed:
 
-## 9. File / symbol index
+- **N/N replicas ≠ converged** during a stop-first rolling update — `UpdateStatus.State` is also
+  inspected (build 238: backupbot exec'd into a container killed seconds later).
+- **`paused` persists forever** (swarm's default `update-failure-action`) — only `updating` and
+  `rollback_started` block convergence; `paused`/`rollback_paused` are settled (build 241).
+- `backup_app()` additionally waits (bounded 300s) for convergence before `backup create`.
+
+## 9. Configuration knobs
+
+| Knob | Where | Current | Meaning |
+|---|---|---|---|
+| `DRONE_RUNNER_CAPACITY` (aka `MAX_TESTS`) | `nix/modules/drone-runner.nix` (`maxTests`) | `2` | **THE single concurrency knob.** Max builds the exec runner executes at once; Drone queues the rest. (The `.drone.yml` `concurrency.limit` duplicate was removed.) Change requires `nixos-rebuild switch`. |
+| `CCCI_APP_LOCK_DIR` | env, read at call time | unset → `/run/lock` | App-domain lockfile dir override — used by `tests/concurrency` to sandbox locks. Never set in production. |
+| hard deadline | `lifetime.HARD_DEADLINE_SECONDS` | 3600 s | the whole-run alarm; long-held flag threshold is 2× this (`LONG_HELD_LOCK_SECONDS`) |
+
+## 10. Testing: `tests/concurrency/`
+
+Real-kernel suite (19 planned cases + companions): helper subprocesses hold REAL flocks and
+install the REAL prctl/signal/alarm guards — flock itself is never mocked; the janitor runs with
+injected candidates + stubbed teardown but probes real locks. **Not part of the default
+`pytest tests/unit` gate** (it spawns processes and sleeps); run it explicitly:
+
+```
+cc-ci-run -m pytest tests/concurrency -q
+```
+
+Covers: kernel auto-release on SIGKILL; LOCK_NB probe semantics; PEP 446 fd non-inheritance;
+same-domain serialisation; orphan reap + unlink; live-run protection; reap-under-probe-lock
+blocking; two-janitor arbitration; reboot-immediate reap; long-held flag; RUN_APP_RE allowlist;
+degrade-on-garbage; PDEATHSIG; ppid start race; deadline + SIGTERM funnels; per-run ABRA_DIR
+construction/export; concurrent same-recipe fetch isolation; symlinked-servers .env canonicality.
+
+## 11. File / symbol index
 
 | What | Where |
 |---|---|
-| `maxTests` / `DRONE_RUNNER_CAPACITY` | `nix/modules/drone-runner.nix` |
-| `concurrency.limit`, `HOME=/root`, env | `.drone.yml` (`recipe-ci` pipeline) |
-| Lock + registry constants & helpers | `runner/harness/lifecycle.py` (top, after `TeardownError`) |
-| `acquire_recipe_lock` call site | `runner/run_recipe_ci.py` `main()`, before `fetch_recipe()` |
-| `register_run_app` call site | `lifecycle.deploy_app()` (before app creation) |
-| `unregister_run_app` call sites | `lifecycle.teardown_app()`, `lifecycle.janitor()` |
-| Janitor + decision table | `lifecycle.janitor()`, `_run_owner_state()` |
-| Run-app naming | `runner/harness/naming.py` (`app_domain`), `RUN_APP_RE` in `lifecycle.py` |
-| Convergence (adjacent) | `lifecycle.services_converged()`, `lifecycle.backup_app()` |
+| lifetime guards (PDEATHSIG, signal funnels, deadline) | `runner/harness/lifetime.py`; installed in `run_recipe_ci.main()` |
+| setsid/trap cancel forwarding | `.drone.yml` (`recipe-ci` step) |
+| `acquire_app_lock`, `_held_app_locks`, `_app_lock_path` | `runner/harness/lifecycle.py` |
+| `acquire_app_lock` call site | `lifecycle.deploy_app()` (before app creation) |
+| janitor + probe (`janitor`, `_probe_and_reap`, `LONG_HELD_LOCK_SECONDS`) | `runner/harness/lifecycle.py` |
+| per-run ABRA_DIR (`setup_run_abra_dir`, `fetch_recipe`) | `runner/run_recipe_ci.py` |
+| path resolution (`abra_dir`, `recipe_dir`) | `runner/harness/abra.py` (used by `generic`, `lifecycle.prepull_images`, `warm_reconcile`) |
+| run-app naming | `runner/harness/naming.py` (`app_domain`), `RUN_APP_RE` in `lifecycle.py` |
+| capacity knob | `nix/modules/drone-runner.nix` (`maxTests`) |
+| convergence (adjacent) | `lifecycle.services_converged()`, `lifecycle.backup_app()` |
+| the test suite | `tests/concurrency/` (`helpers.py` subprocess entrypoints, `concutil.py` probes) |
+
+Deleted in the restructure (grep should find NOTHING): `register_run_app`, `unregister_run_app`,
+`_run_owner_state`, `ACTIVE_RUN_DIR`, `CCCI_JANITOR_MAX_AGE`, `_stack_age_seconds`,
+`acquire_recipe_lock`, `RECIPE_LOCK_DIR`.