review(2): pre-claim recon F2-12 fix e1147b5 — abra -c skips converge monitor BUT harness owns stricter wait_healthy(N/N all svcs)+READY_PROBE(collabora 200, raises on timeout); plausibly not-a-weakening, MUST cold-verify upgrade-GREEN + P7-negative at re-claim; NO verdict yet
This commit is contained in:
@ -879,3 +879,27 @@ TIMEOUT/`DEPLOY_TIMEOUT` covers the python subprocess, but abra's own per-servic
|
||||
what emitted `FATA deploy failed`), and/or a post-redeploy collabora-health wait before asserting
|
||||
reconverge. Anti-anchoring honored: verdict formed from the plan + code + my own run's observable log;
|
||||
I did NOT read JOURNAL-2 before writing this.
|
||||
|
||||
## @2026-05-29 — Pre-claim recon: F2-12 fix e1147b5 (NOT re-claimed yet — no verdict)
|
||||
Builder ACKed F2-12 and pushed fix `e1147b5` ("own convergence wait via abra `-c` + collabora
|
||||
READY_PROBE"), status `cc4af49` = validating multi-run before RE-CLAIM. Read the fix ahead of the
|
||||
re-claim. **The adversarial crux: the upgrade redeploy now passes `abra … -c` (`--no-converge-checks`),
|
||||
which skips abra's own convergence monitor.** Skipping a convergence check is exactly the shape of a
|
||||
P7 weakening — so I scrutinized whether the replacement is genuinely stronger or a green-washing.
|
||||
- **Plausibly NOT a weakening (pending cold proof):** `-c` only skips abra's *post-deploy monitor*;
|
||||
`docker stack deploy` (the real spec apply) still runs. The harness then owns the verification in
|
||||
`generic.perform_upgrade`: `lifecycle.wait_healthy` (= `_wait_services_converged` "every swarm
|
||||
service shows running == configured replicas" + HEALTH_PATH) **then** `lifecycle.wait_ready_probes`
|
||||
(collabora `/hosting/discovery` → 200), bounded by the generous recipe DEPLOY_TIMEOUT. The READY_PROBE
|
||||
loop **raises TimeoutError** if discovery never hits 200 (while/else) → upgrade op fails → tier fails,
|
||||
so it's non-vacuous by construction. HC1 (chaos-version label == PR-head) preserved; chaos_redeploy
|
||||
still bypasses deploy_app so deploy-count stays 1.
|
||||
- **MUST cold-verify at re-claim (cannot fully settle by reading):**
|
||||
1. **Upgrade tier GREEN on MY own cold run** — the F2-12 close condition (repeat-green, not one-off;
|
||||
Builder admits it was 3×green/1×fail before this fix).
|
||||
2. **P7 negative:** confirm `_wait_services_converged` truly fails on a stuck `0/1` service (i.e. `-c`
|
||||
+ owned-wait catches a genuinely broken converge, not just a slow one). I started reading its
|
||||
parser (lifecycle.py ~286–328) — finish that read + ideally observe a broken-upgrade-still-RED.
|
||||
3. deploy-count == 1; clean teardown.
|
||||
F2-12 stays OPEN (Adversary-owned). NO verdict until Q3.2 is re-claimed. Anti-anchoring: not reading
|
||||
JOURNAL before the verdict.
|
||||
|
||||
Reference in New Issue
Block a user