From 08b629f52ac46420117b6732f61549ec0284fc27 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Wed, 10 Jun 2026 04:16:41 +0000 Subject: [PATCH] =?UTF-8?q?chore(conc):=20pre-review=20P1+P2=20=E2=80=94?= =?UTF-8?q?=204=20break-it=20concerns=20tested=20+=20refuted=20(not=20a=20?= =?UTF-8?q?verdict)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- REVIEW-conc.md | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/REVIEW-conc.md b/REVIEW-conc.md index 340f876..769ba93 100644 --- a/REVIEW-conc.md +++ b/REVIEW-conc.md @@ -48,3 +48,45 @@ debugging tooling: - Baseline `.drone.yml` on main is still the pre-restructure version (concurrency.limit=2, acquire_recipe_lock / /run/cc-ci-active registry referenced) — confirms P1/P4 edits are branch-only so far. Good. + +## 2026-06-10T04:23Z — early pre-review of P1+P2 (branch @b302f3a, NO gate claimed — NOT a verdict) + +Builder has pushed P1 (b492f99) + P2 (b302f3a) to restructure/concurrency; P3/P4/P5/tests still +pending, so M1 is not claimable and this is NOT a PASS — it's pre-review to front-load the M1 diff +audit and avoid re-doing it under gate time pressure. Read code/diff + git only; did NOT read +JOURNAL (anti-anchoring intact). I actively tried to break the following and each concern was +REFUTED: + +1. **Green-on-red via the .drone.yml EXIT trap** (my lead hypothesis). The wrapper is + `setsid cc-ci-run … & PID=$!; trap 'kill -TERM -- -$PID' TERM EXIT; wait $PID`. I worried the + EXIT trap's final `kill` status would override the harness exit code and mask a failing run. + EMPIRICALLY TESTED (4 bash repros incl. failing harness with a lingering group member that + makes kill succeed=0): bash PRESERVES the pre-trap exit status when the EXIT trap doesn't call + `exit`. Exit code propagates correctly in all cases (RED stays RED, GREEN stays GREEN). Refuted. +2. **P2 unlink/reacquire inode race** (janitor unlinks a reaped orphan's lockfile while a new run + blocks on the old inode). Handled: both acquire_app_lock and _probe_and_reap recheck + `fstat(fd).st_ino == stat(path).st_ino` after acquiring and retry/bail on mismatch — a lock on + an unlinked (anonymous) inode is never treated as authoritative, and the path's lockfile is + never unlinked out from under a newer run. Refuted. +3. **Half-reaped/new-app coexistence.** Reap runs WHILE HOLDING the probe lock; a new same-domain + run blocks in acquire_app_lock until reap completes. The pre-deploy window (lock held, app not + yet created) is covered: the stale-lockfile sweep sees the held lock (BlockingIOError) and + leaves it. Refuted. +4. **Signal mid-normal-teardown aborting cleanup.** begin_teardown() is the FIRST line of BOTH + finally blocks (run_recipe_ci.py:663 run_quick, :1134 main); the _funnel_handler swallows + (logs+returns) any SIGTERM/SIGALRM once tearing_down is set, so a second signal can't abort the + cleanup the first asked for. install_lifetime_guards() is the FIRST statement of main() (:829), + before any abra/lock call, with prctl→ppid==1 recheck in the correct order. Refuted. + +Open items to confirm AT M1 (cold, full suite) — NOT defects, just unverified-until-then: +- `datetime` import removed from lifecycle.py along with _stack_age_seconds — grep for any + remaining datetime use (ruff would catch an undefined name; confirm import truly orphaned). +- `_stack_name` / age-fallback deadcode after the janitor rewrite — confirm no dangling refs. +- Registry-symbol deletion is only PARTIAL on this commit: acquire_recipe_lock still present + (P3 deletes it); register/unregister/_run_owner_state/ACTIVE_RUN_DIR/CCCI_JANITOR_MAX_AGE are + gone — full dangling-ref grep belongs at M1 once P3 lands. +- setsid-fork edge: if `setsid` ever forks (only when it's a pgrp leader; not the case for a + backgrounded job in a non-job-control drone shell), $PID would be the intermediate and the + harness would reparent to ppid==1 and self-abort. Live-verify the trap+cancel path at M2(a). +- begin_teardown is process-global module state (lifetime._state) — fine for one harness process; + the tests/concurrency suite must not import-share it across in-process cases (verify at M1).