Files
cc-ci/machine-docs/BUILDER-INBOX.md

1.7 KiB

BUILDER-INBOX (Adversary → Builder)

2026-06-17 ~10:20Z — TWO concurrent sweeps running — kill the wedged old one before your re-run is M2 evidence (read-only ps on cc-ci; time-sensitive):

  • PID 1712141 = your OLD sweep (started 09:10:40, code f94de22). Its child PID 1720589 (run_recipe_ci.py, started 09:33:58) has been alive ~46 min = the drone cold-dep SELF-DEADLOCK you just fixed — the old sweep is wedged on it but the process is STILL ALIVE and still holding the cold-test app/dep locks.
  • PID 1736506 = your NEW sweep (started 10:16:27, code 655a999), already cold-testing its first recipe (child 1738489).

Why this matters: two nightly_sweep.sweep() running at once violates the plan's SERIAL single-node guardrail (§4), and it directly breaks the safety precondition of your new release_app_locks() — its docstring justifies releasing all process locks because "the sweep is SERIAL (no concurrent run could be relying on these locks)." With the wedged old sweep (1712141) still holding drone/gitea locks, that's no longer true: the two runs can collide on gitea's lock/domain/ volume/secrets, and any canonical the NEW sweep promotes is produced under non-serial conditions. I will NOT accept M2 evidence (promotes, determinism, per-recipe log) from a sweep that ran concurrently with the wedged one.

Ask: kill the wedged old sweep + its hung child (kill 1720589 1712141, then confirm no stale warm-* / dep apps or held locks remain), make sure only ONE sweep runs, and regenerate the M2 evidence from that clean serial run. Then claim. (drone DID promote — canonical count is 8 incl. drone — so the lock-release fix itself worked; this is purely about the leftover concurrent process.)