review(2): pre-claim recon F2-12 fix e1147b5 — abra -c skips converge monitor BUT harness owns stricter wait_healthy(N/N all svcs)+READY_PROBE(collabora 200, raises on timeout); plausibly not-a-weakening, MUST cold-verify upgrade-GREEN + P7-negative at re-claim; NO verdict yet
This commit is contained in:
@ -879,3 +879,27 @@ TIMEOUT/`DEPLOY_TIMEOUT` covers the python subprocess, but abra's own per-servic
|
|||||||
what emitted `FATA deploy failed`), and/or a post-redeploy collabora-health wait before asserting
|
what emitted `FATA deploy failed`), and/or a post-redeploy collabora-health wait before asserting
|
||||||
reconverge. Anti-anchoring honored: verdict formed from the plan + code + my own run's observable log;
|
reconverge. Anti-anchoring honored: verdict formed from the plan + code + my own run's observable log;
|
||||||
I did NOT read JOURNAL-2 before writing this.
|
I did NOT read JOURNAL-2 before writing this.
|
||||||
|
|
||||||
|
## @2026-05-29 — Pre-claim recon: F2-12 fix e1147b5 (NOT re-claimed yet — no verdict)
|
||||||
|
Builder ACKed F2-12 and pushed fix `e1147b5` ("own convergence wait via abra `-c` + collabora
|
||||||
|
READY_PROBE"), status `cc4af49` = validating multi-run before RE-CLAIM. Read the fix ahead of the
|
||||||
|
re-claim. **The adversarial crux: the upgrade redeploy now passes `abra … -c` (`--no-converge-checks`),
|
||||||
|
which skips abra's own convergence monitor.** Skipping a convergence check is exactly the shape of a
|
||||||
|
P7 weakening — so I scrutinized whether the replacement is genuinely stronger or a green-washing.
|
||||||
|
- **Plausibly NOT a weakening (pending cold proof):** `-c` only skips abra's *post-deploy monitor*;
|
||||||
|
`docker stack deploy` (the real spec apply) still runs. The harness then owns the verification in
|
||||||
|
`generic.perform_upgrade`: `lifecycle.wait_healthy` (= `_wait_services_converged` "every swarm
|
||||||
|
service shows running == configured replicas" + HEALTH_PATH) **then** `lifecycle.wait_ready_probes`
|
||||||
|
(collabora `/hosting/discovery` → 200), bounded by the generous recipe DEPLOY_TIMEOUT. The READY_PROBE
|
||||||
|
loop **raises TimeoutError** if discovery never hits 200 (while/else) → upgrade op fails → tier fails,
|
||||||
|
so it's non-vacuous by construction. HC1 (chaos-version label == PR-head) preserved; chaos_redeploy
|
||||||
|
still bypasses deploy_app so deploy-count stays 1.
|
||||||
|
- **MUST cold-verify at re-claim (cannot fully settle by reading):**
|
||||||
|
1. **Upgrade tier GREEN on MY own cold run** — the F2-12 close condition (repeat-green, not one-off;
|
||||||
|
Builder admits it was 3×green/1×fail before this fix).
|
||||||
|
2. **P7 negative:** confirm `_wait_services_converged` truly fails on a stuck `0/1` service (i.e. `-c`
|
||||||
|
+ owned-wait catches a genuinely broken converge, not just a slow one). I started reading its
|
||||||
|
parser (lifecycle.py ~286–328) — finish that read + ideally observe a broken-upgrade-still-RED.
|
||||||
|
3. deploy-count == 1; clean teardown.
|
||||||
|
F2-12 stays OPEN (Adversary-owned). NO verdict until Q3.2 is re-claimed. Anti-anchoring: not reading
|
||||||
|
JOURNAL before the verdict.
|
||||||
|
|||||||
Reference in New Issue
Block a user