review(conc): M2(c) FAIL — double-!testme same domain corrupts shared deploy-count file (CONC-A1) + VETO
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
Builds 279+281 (immich#2, same domain immi-ad3e33) both RED: 279 false DG4.1 'deploy-count 2!=1' from 281's pre-lock _record_deploy polluting the shared /tmp/ccci-deploys-<domain> counter; 281 FileNotFoundError after 279 os.remove'd it. Lock serialisation works (281 logged block+acquire); per-run isolation of the deploy-count file does not (P3 missed it; _record_deploy at lifecycle:250 fires before acquire_app_lock at :254). Control build 275 (isolated) green. Veto DONE until counter keyed per-run + same-domain test + live (c) both-green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@ -258,3 +258,49 @@ real drone exec shell). main now = d3fe9e2 + this .drone.yml wrapper fix; the fi
|
||||
Open for the formal M2 verdict: re-confirm lint green on the new .drone.yml (yamllint), the push
|
||||
build green, and live (a) cancel-no-leak / (b) parallel both-green / (c) double-!testme blocks /
|
||||
(d) one full green run — cold, once the Builder posts the M2 claim with evidence.
|
||||
|
||||
## M2(c): FAIL @2026-06-10T08:10Z — double-!testme same domain corrupts shared deploy-count → both runs RED + VETO
|
||||
|
||||
Proactive cold break-it probe of the live M2 evidence (M2 not yet formally `claim(conc)`'d — the
|
||||
Builder's JOURNAL shows (c) "triggered" but NOT evidenced as PASS; I went straight to the Drone API
|
||||
to verify the in-flight (c) runs independently, not to the JOURNAL narrative). I found a REAL defect
|
||||
that breaks M2(c). Filed as BACKLOG-conc CONC-A1.
|
||||
|
||||
EVIDENCE (Drone API, recipe-maintainers/cc-ci, cold via /run/secrets/bridge_drone_token — my own
|
||||
access path, not the Builder's word):
|
||||
- (c) = builds **279 + 281**, both `event=custom PR=2 RECIPE=immich REF=a92b28d…` → SAME domain
|
||||
`immi-ad3e33.ci.commoninternet.net`. Both `status=failure` (step `ci` exit_code=1).
|
||||
- 281 (the blocked run): log `== app lock: ... in flight — waiting ==` @2s → `== acquired ==` @194s,
|
||||
which is exactly when 279's process exited (279 finished 05:07:35Z). **Lock serialisation + the
|
||||
visible block line WORK** — that half of (c) is fine.
|
||||
- 279 RED: `!! deploy-count 2 != 1 (DG4.1 violation)`.
|
||||
- 281 RED: `FileNotFoundError: /tmp/ccci-deploys-immi-ad3e33….ci.commoninternet.net` at
|
||||
run_recipe_ci.py:1213.
|
||||
- Control build 275 (isolated immich, same fixed wrapper) → `deploy-count = 1`, GREEN. Confirms the
|
||||
failure is concurrency-specific, NOT a pre-existing immich/wrapper regression.
|
||||
|
||||
ROOT CAUSE (code, confirmed):
|
||||
- DG4.1 counter file is DOMAIN-keyed in shared /tmp, not per-run: `run_recipe_ci.py:930
|
||||
/tmp/ccci-deploys-<domain>`. P3 isolated ABRA_DIR per run but this per-run state file was missed
|
||||
(predates the restructure, ef44d46; the old recipe-flock serialised same-recipe runs end-to-end,
|
||||
masking it).
|
||||
- `deploy_app()` calls `_record_deploy()` (lifecycle.py:250) BEFORE `acquire_app_lock()` (:254,
|
||||
introduced by P2 b302f3a) → the increment races OUTSIDE the lock. 281's single pre-lock
|
||||
`_record_deploy` (@2s) bumps the shared counter 279 is using (→2, false violation), and 279's
|
||||
end-of-run `os.remove(countfile)` (:1215) deletes the file under 281 → FileNotFoundError.
|
||||
- Interleaving is fully reconstructed and self-consistent with the build timestamps (see CONC-A1).
|
||||
|
||||
This is squarely in M2(c) scope: the plan's DoD (c) requires the second run to "block … then RUN"
|
||||
(implicitly green), and the phase's whole premise is "two concurrent !testme don't collide on
|
||||
domain/volume/secrets." This is a domain-keyed-state collision — the restructure's narrower domain
|
||||
lock no longer covers the deploy-count file. M1 (code/suites/lint/diff of d3fe9e2) is unaffected —
|
||||
this is a live concurrency behavior M1's checks could not exercise; the tests/concurrency suite has
|
||||
the matching blind spot (case 4 serialises acquire but never asserts deploy-count isolation across
|
||||
two same-domain runs).
|
||||
|
||||
## VETO — M2 may NOT be marked DONE until CONC-A1 is fixed and I log a fresh (c) PASS
|
||||
Forbidding `## DONE` in STATUS-conc until: (1) deploy-counter keyed per-run; (2) a tests/concurrency
|
||||
case asserts same-domain deploy-count isolation; (3) live (c) re-run shows BOTH builds GREEN with
|
||||
the visible block line and zero leakage; (4) (a),(b),(d) re-confirmed unaffected. Only I clear this.
|
||||
(After this verdict I may consult JOURNAL-conc to contextualise — noting I had NOT read the (c)
|
||||
journal reasoning before forming this FAIL; I verified from the Drone API + code directly.)
|
||||
|
||||
Reference in New Issue
Block a user