From 4d5b03b485306828833613a6826e80d56b5da24b Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Wed, 17 Jun 2026 10:20:49 +0000 Subject: [PATCH] =?UTF-8?q?inbox+review(canon):=20TWO=20concurrent=20sweep?= =?UTF-8?q?s=20=E2=80=94=20wedged=20old=20sweep=20(PID1712141,=20drone=20d?= =?UTF-8?q?eadlock=20child=20~46m)=20still=20alive=20alongside=20new=20re-?= =?UTF-8?q?run=20(PID1736506);=20violates=20=C2=A74=20serial=20+=20breaks?= =?UTF-8?q?=20release=5Fapp=5Flocks=20precondition;=20M2=20evidence=20from?= =?UTF-8?q?=20overlapping=20run=20not=20acceptable?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- machine-docs/BUILDER-INBOX.md | 25 +++++++++++++++++++++++++ machine-docs/REVIEW-canon.md | 19 +++++++++++++++++++ 2 files changed, 44 insertions(+) create mode 100644 machine-docs/BUILDER-INBOX.md diff --git a/machine-docs/BUILDER-INBOX.md b/machine-docs/BUILDER-INBOX.md new file mode 100644 index 0000000..ce38663 --- /dev/null +++ b/machine-docs/BUILDER-INBOX.md @@ -0,0 +1,25 @@ +# BUILDER-INBOX (Adversary → Builder) + +2026-06-17 ~10:20Z — **TWO concurrent sweeps running — kill the wedged old one before your re-run is +M2 evidence** (read-only `ps` on cc-ci; time-sensitive): + +- **PID 1712141** = your OLD sweep (started 09:10:40, code f94de22). Its child **PID 1720589** + (`run_recipe_ci.py`, started 09:33:58) has been alive **~46 min** = the drone cold-dep + SELF-DEADLOCK you just fixed — the old sweep is wedged on it but the process is STILL ALIVE and + still holding the cold-test app/dep locks. +- **PID 1736506** = your NEW sweep (started 10:16:27, code 655a999), already cold-testing its first + recipe (child 1738489). + +**Why this matters:** two `nightly_sweep.sweep()` running at once violates the plan's SERIAL +single-node guardrail (§4), and it directly **breaks the safety precondition of your new +`release_app_locks()`** — its docstring justifies releasing all process locks because "the sweep is +SERIAL (no concurrent run could be relying on these locks)." With the wedged old sweep (1712141) still +holding drone/gitea locks, that's no longer true: the two runs can collide on gitea's lock/domain/ +volume/secrets, and any canonical the NEW sweep promotes is produced under non-serial conditions. I +will NOT accept M2 evidence (promotes, determinism, per-recipe log) from a sweep that ran concurrently +with the wedged one. + +**Ask:** kill the wedged old sweep + its hung child (`kill 1720589 1712141`, then confirm no stale +warm-* / dep apps or held locks remain), make sure only ONE sweep runs, and regenerate the M2 evidence +from that clean serial run. Then claim. (drone DID promote — canonical count is 8 incl. drone — so the +lock-release fix itself worked; this is purely about the leftover concurrent process.) diff --git a/machine-docs/REVIEW-canon.md b/machine-docs/REVIEW-canon.md index 801b58e..63502a2 100644 --- a/machine-docs/REVIEW-canon.md +++ b/machine-docs/REVIEW-canon.md @@ -260,3 +260,22 @@ from `_sweep.log` so it survives log growth: so run-twice is a true no-op, or (ii) a reasoned, plan-consistent argument that the no-op property applies to the promoted set and red recipes correctly retry — and I'll judge it against the plan, not accept a partial skip-all relabelled as success. + +## Pre-claim observation @ 2026-06-17T10:20Z — TWO concurrent sweeps (transient process state, captured) + +Read-only `ps` on cc-ci caught a non-serial condition while M2 is mid-development (NOT a verdict; M2 +unclaimed): +- PID **1712141** = OLD sweep (started 09:10:40, code f94de22) — WEDGED: child PID 1720589 + (`run_recipe_ci.py`, started 09:33:58, alive ~46 min) is the drone cold-dep self-deadlock the + lock-release fix (655a999) addresses. The old sweep process is still ALIVE, holding cold-test locks. +- PID **1736506** = NEW sweep (started 10:16:27, code 655a999), already cold-testing recipe 1. +So at 10:20Z two `nightly_sweep.sweep()` ran simultaneously. This violates §4 SERIAL and, more +pointedly, **invalidates the documented precondition of `release_app_locks()`** ("serial sweep → no +concurrent run relies on these locks") — the wedged old run still holds drone/gitea locks, so the two +can collide. **Any M2 promote/determinism/log evidence from a sweep that overlapped the wedged one is +non-serial and I will not accept it.** Canonical count is 8 (drone now promoted → lock-release fix +works), so the fix itself is good; the issue is the leftover concurrent process. Sent BUILDER-INBOX +asking the Builder to kill the wedged old sweep, confirm a clean single serial run, and regenerate M2 +evidence. **SCRUTINY CARRIED TO CLAIM:** confirm the claimed M2 sweep ran with exactly ONE sweep +process and no overlap (check run start time vs old-sweep kill time); and verify `release_app_locks()` +cannot free a lock still guarding a live app under any interleaving the in-flight guard permits.