journal: two harness convergence fixes (UpdateStatus settle + paused-is-settled); immich build 245 in flight
This commit is contained in:
@ -317,3 +317,22 @@ test/plausible-upgrade-base-3.0.1 pin (UPGRADE_BASE_VERSION=3.0.1+v2.0.0). Branc
|
||||
main push build 236 green. /root/builder-clone fast-forwarded to c828f6c. assistant3 notified via tmux
|
||||
(plausible !testme unblocked; restore-hook gzip failure is their lane). Next: immich PR #2 !testme
|
||||
re-triggered alone (checkout parked clean at a92b28d) — polling to verdict.
|
||||
|
||||
### Event 2026-06-09 ~23:20 — Two harness convergence fixes landed; immich on run 3 (build 245)
|
||||
Immich !testme run 2 (build 238) RED but PROGRESS: install/upgrade/custom PASS (checkout race gone),
|
||||
backup CRASHED — backupbot exec'd the db pre-hook into a container swarm killed seconds earlier: the
|
||||
chaos redeploy changes the db image (pgvecto.rs→vectorchord pin) and registers a stop-first rolling
|
||||
update that hadn't STARTED when the N/N convergence check passed (old task still 1/1). → `68ef0f8`
|
||||
fix(harness): services_converged() also requires swarm UpdateStatus settled + bounded settle-wait in
|
||||
backup_app(). Run 3 (build 241) then HUNG 22min in the restore tier: the app service's UpdateStatus
|
||||
was 'paused' (swarm default update-failure-action after one task flicker during restore) — a state
|
||||
that persists FOREVER; my check treated it as in-flight. Killed 241 (cancel leaks the python child —
|
||||
killed by hand; immi-ad3e33 undeployed+rm'd, registry entry cleared, zero leakage verified). →
|
||||
`e6d55b5` fix(harness): only 'updating'/'rollback_started' block convergence; 'paused' + N/N is
|
||||
settled (health asserts still gate). Both branch builds green (239, 243); main ff'd; builder-clone
|
||||
updated. **Plausible build 237 (assistant3, head 4cab6b5): install/upgrade/backup PASS — their
|
||||
gzip/dump-path fix WORKS, marker restore test PASSED; remaining: app 502s after restore
|
||||
(test_restore_healthy + custom tier) + restore hook needs pg_restore --if-exists; diagnosed +
|
||||
relayed via tmux.** Concurrency machinery observed working live: parallel immich+plausible runs held
|
||||
per-recipe locks, registered pidfiles, plausible's teardown unregistered cleanly. Immich run 4 =
|
||||
build 245 (custom, running) with both fixes live — monitor armed.
|
||||
|
||||
Reference in New Issue
Block a user