chore(conc): pre-review P1+P2 — 4 break-it concerns tested + refuted (not a verdict)
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
@ -48,3 +48,45 @@ debugging tooling:
|
||||
- Baseline `.drone.yml` on main is still the pre-restructure version (concurrency.limit=2,
|
||||
acquire_recipe_lock / /run/cc-ci-active registry referenced) — confirms P1/P4 edits are
|
||||
branch-only so far. Good.
|
||||
|
||||
## 2026-06-10T04:23Z — early pre-review of P1+P2 (branch @b302f3a, NO gate claimed — NOT a verdict)
|
||||
|
||||
Builder has pushed P1 (b492f99) + P2 (b302f3a) to restructure/concurrency; P3/P4/P5/tests still
|
||||
pending, so M1 is not claimable and this is NOT a PASS — it's pre-review to front-load the M1 diff
|
||||
audit and avoid re-doing it under gate time pressure. Read code/diff + git only; did NOT read
|
||||
JOURNAL (anti-anchoring intact). I actively tried to break the following and each concern was
|
||||
REFUTED:
|
||||
|
||||
1. **Green-on-red via the .drone.yml EXIT trap** (my lead hypothesis). The wrapper is
|
||||
`setsid cc-ci-run … & PID=$!; trap 'kill -TERM -- -$PID' TERM EXIT; wait $PID`. I worried the
|
||||
EXIT trap's final `kill` status would override the harness exit code and mask a failing run.
|
||||
EMPIRICALLY TESTED (4 bash repros incl. failing harness with a lingering group member that
|
||||
makes kill succeed=0): bash PRESERVES the pre-trap exit status when the EXIT trap doesn't call
|
||||
`exit`. Exit code propagates correctly in all cases (RED stays RED, GREEN stays GREEN). Refuted.
|
||||
2. **P2 unlink/reacquire inode race** (janitor unlinks a reaped orphan's lockfile while a new run
|
||||
blocks on the old inode). Handled: both acquire_app_lock and _probe_and_reap recheck
|
||||
`fstat(fd).st_ino == stat(path).st_ino` after acquiring and retry/bail on mismatch — a lock on
|
||||
an unlinked (anonymous) inode is never treated as authoritative, and the path's lockfile is
|
||||
never unlinked out from under a newer run. Refuted.
|
||||
3. **Half-reaped/new-app coexistence.** Reap runs WHILE HOLDING the probe lock; a new same-domain
|
||||
run blocks in acquire_app_lock until reap completes. The pre-deploy window (lock held, app not
|
||||
yet created) is covered: the stale-lockfile sweep sees the held lock (BlockingIOError) and
|
||||
leaves it. Refuted.
|
||||
4. **Signal mid-normal-teardown aborting cleanup.** begin_teardown() is the FIRST line of BOTH
|
||||
finally blocks (run_recipe_ci.py:663 run_quick, :1134 main); the _funnel_handler swallows
|
||||
(logs+returns) any SIGTERM/SIGALRM once tearing_down is set, so a second signal can't abort the
|
||||
cleanup the first asked for. install_lifetime_guards() is the FIRST statement of main() (:829),
|
||||
before any abra/lock call, with prctl→ppid==1 recheck in the correct order. Refuted.
|
||||
|
||||
Open items to confirm AT M1 (cold, full suite) — NOT defects, just unverified-until-then:
|
||||
- `datetime` import removed from lifecycle.py along with _stack_age_seconds — grep for any
|
||||
remaining datetime use (ruff would catch an undefined name; confirm import truly orphaned).
|
||||
- `_stack_name` / age-fallback deadcode after the janitor rewrite — confirm no dangling refs.
|
||||
- Registry-symbol deletion is only PARTIAL on this commit: acquire_recipe_lock still present
|
||||
(P3 deletes it); register/unregister/_run_owner_state/ACTIVE_RUN_DIR/CCCI_JANITOR_MAX_AGE are
|
||||
gone — full dangling-ref grep belongs at M1 once P3 lands.
|
||||
- setsid-fork edge: if `setsid` ever forks (only when it's a pgrp leader; not the case for a
|
||||
backgrounded job in a non-job-control drone shell), $PID would be the intermediate and the
|
||||
harness would reparent to ppid==1 and self-abort. Live-verify the trap+cancel path at M2(a).
|
||||
- begin_teardown is process-global module state (lifetime._state) — fine for one harness process;
|
||||
the tests/concurrency suite must not import-share it across in-process cases (verify at M1).
|
||||
|
||||
Reference in New Issue
Block a user