fix(harness): run-keyed run-scoped state files — CONC-A1 (same-domain runs corrupted shared deploy-count)
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
The four CCCI state files (deploys countfile, opstate, deps, depskip) were keyed by app domain in shared /tmp. A second run of the same domain executes its main() preamble + deploy_app's pre-lock _record_deploy BEFORE blocking at the app lock, so it reset/polluted the live first run's counter (false DG4.1 deploy-count=2, build 279) and the first run's end-of-run os.remove crashed the second (FileNotFoundError, build 281). Masked pre-restructure by the end-to-end recipe flock. Now keyed by run id + harness pid via _run_state_path(); children receive exact paths via the CCCI_*_FILE env vars, so domain keying was never load-bearing. tests/concurrency/test_run_state.py: path-invariant cases + a real-process regression (helpers.py deploy-count-run) reproducing the live interleaving — verified to FAIL under simulated shared keying. docs/concurrency.md §3 updated.
This commit is contained in:
@ -108,6 +108,34 @@ def cmd_fetch_checkout(recipe: str, ref: str) -> None:
|
||||
mark(f"RESULT {head} {content}")
|
||||
|
||||
|
||||
def cmd_deploy_count_run(domain: str, gate: str) -> None:
|
||||
"""Mirror the REAL run flow for the DG4.1 counter (CONC-A1 regression): countfile init
|
||||
(main() preamble) → _record_deploy (deploy_app fires it BEFORE the app lock) → acquire
|
||||
the app lock → wait for `gate` (file path; '' = no wait) → read + remove own countfile.
|
||||
Two of these on the SAME domain must each see COUNT 1 and never lose their file."""
|
||||
import run_recipe_ci
|
||||
|
||||
countfile = run_recipe_ci._run_state_path("deploys")
|
||||
with open(countfile, "w") as f:
|
||||
f.write("0")
|
||||
os.environ["CCCI_DEPLOY_COUNT_FILE"] = countfile
|
||||
lifecycle._record_deploy() # pre-lock, exactly like lifecycle.deploy_app()
|
||||
mark("PRELOCK")
|
||||
lifecycle.acquire_app_lock(domain)
|
||||
mark("ACQUIRED")
|
||||
if gate:
|
||||
deadline = time.time() + 15
|
||||
while not os.path.exists(gate) and time.time() < deadline:
|
||||
time.sleep(0.05)
|
||||
try:
|
||||
with open(countfile) as f:
|
||||
n = int(f.read().strip() or "0")
|
||||
os.remove(countfile)
|
||||
mark(f"COUNT {n}")
|
||||
except FileNotFoundError:
|
||||
mark("COUNT_FILE_MISSING")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
cmd, *args = sys.argv[1:]
|
||||
{
|
||||
@ -117,4 +145,5 @@ if __name__ == "__main__":
|
||||
"wrapper": cmd_wrapper,
|
||||
"orphan-probe": cmd_orphan_probe,
|
||||
"fetch-checkout": cmd_fetch_checkout,
|
||||
"deploy-count-run": cmd_deploy_count_run,
|
||||
}[cmd](*args)
|
||||
|
||||
Reference in New Issue
Block a user