fix(harness): run-keyed run-scoped state files — CONC-A1 (same-domain runs corrupted shared deploy-count)
All checks were successful
continuous-integration/drone/push Build is passing

The four CCCI state files (deploys countfile, opstate, deps, depskip) were keyed
by app domain in shared /tmp. A second run of the same domain executes its main()
preamble + deploy_app's pre-lock _record_deploy BEFORE blocking at the app lock,
so it reset/polluted the live first run's counter (false DG4.1 deploy-count=2,
build 279) and the first run's end-of-run os.remove crashed the second
(FileNotFoundError, build 281). Masked pre-restructure by the end-to-end recipe
flock. Now keyed by run id + harness pid via _run_state_path(); children receive
exact paths via the CCCI_*_FILE env vars, so domain keying was never load-bearing.

tests/concurrency/test_run_state.py: path-invariant cases + a real-process
regression (helpers.py deploy-count-run) reproducing the live interleaving —
verified to FAIL under simulated shared keying. docs/concurrency.md §3 updated.
This commit is contained in:
autonomic-bot
2026-06-10 08:16:09 +00:00
parent e1c4198c08
commit b6e12ef428
4 changed files with 135 additions and 8 deletions

View File

@ -70,6 +70,13 @@ Per-run (no conflict possible):
- **Recipe working trees** — `$ABRA_DIR/recipes/<recipe>`, per run (§4). NEW in the restructure.
- **Drone build workspace** (`/var/lib/drone-runner/drone-<id>/`) and **run artifacts**
(`/var/lib/cc-ci-runs/<run-id>/`).
- **Run-scoped state files** (`/tmp/ccci-{deploys,opstate,deps,depskip}-<run-id>-<pid>…`) —
keyed by run id + harness pid via `run_recipe_ci._run_state_path()`, NEVER by app domain.
A second run of the same domain executes its `main()` preamble before blocking at the app
lock (§5), so domain-keyed files would be reset/removed underneath the live first run
(live finding, M2(c) double-`!testme`: false DG4.1 deploy-count in run 1, countfile
`FileNotFoundError` in run 2). Tier/hook children get the exact paths via the
`CCCI_*_FILE` env vars; removed on normal run exit.
Shared (by design, conflict-free):
@ -205,7 +212,8 @@ Covers: kernel auto-release on SIGKILL; LOCK_NB probe semantics; PEP 446 fd non-
same-domain serialisation; orphan reap + unlink; live-run protection; reap-under-probe-lock
blocking; two-janitor arbitration; reboot-immediate reap; long-held flag; RUN_APP_RE allowlist;
degrade-on-garbage; PDEATHSIG; ppid start race; deadline + SIGTERM funnels; per-run ABRA_DIR
construction/export; concurrent same-recipe fetch isolation; symlinked-servers .env canonicality.
construction/export; concurrent same-recipe fetch isolation; symlinked-servers .env canonicality;
run-keyed (never domain-keyed) run-scoped state files (M2(c) regression, `test_run_state.py`).
## 11. File / symbol index