journal: two harness convergence fixes (UpdateStatus settle + paused-is-settled); immich build 245 in flight

This commit is contained in:
autonomic-bot
2026-06-09 23:08:59 +00:00
parent 1580738c97
commit e3e0a9ee80

View File

@ -317,3 +317,22 @@ test/plausible-upgrade-base-3.0.1 pin (UPGRADE_BASE_VERSION=3.0.1+v2.0.0). Branc
main push build 236 green. /root/builder-clone fast-forwarded to c828f6c. assistant3 notified via tmux
(plausible !testme unblocked; restore-hook gzip failure is their lane). Next: immich PR #2 !testme
re-triggered alone (checkout parked clean at a92b28d) — polling to verdict.
### Event 2026-06-09 ~23:20 — Two harness convergence fixes landed; immich on run 3 (build 245)
Immich !testme run 2 (build 238) RED but PROGRESS: install/upgrade/custom PASS (checkout race gone),
backup CRASHED — backupbot exec'd the db pre-hook into a container swarm killed seconds earlier: the
chaos redeploy changes the db image (pgvecto.rs→vectorchord pin) and registers a stop-first rolling
update that hadn't STARTED when the N/N convergence check passed (old task still 1/1). → `68ef0f8`
fix(harness): services_converged() also requires swarm UpdateStatus settled + bounded settle-wait in
backup_app(). Run 3 (build 241) then HUNG 22min in the restore tier: the app service's UpdateStatus
was 'paused' (swarm default update-failure-action after one task flicker during restore) — a state
that persists FOREVER; my check treated it as in-flight. Killed 241 (cancel leaks the python child —
killed by hand; immi-ad3e33 undeployed+rm'd, registry entry cleared, zero leakage verified). →
`e6d55b5` fix(harness): only 'updating'/'rollback_started' block convergence; 'paused' + N/N is
settled (health asserts still gate). Both branch builds green (239, 243); main ff'd; builder-clone
updated. **Plausible build 237 (assistant3, head 4cab6b5): install/upgrade/backup PASS — their
gzip/dump-path fix WORKS, marker restore test PASSED; remaining: app 502s after restore
(test_restore_healthy + custom tier) + restore hook needs pg_restore --if-exists; diagnosed +
relayed via tmux.** Concurrency machinery observed working live: parallel immich+plausible runs held
per-recipe locks, registered pidfiles, plausible's teardown unregistered cleanly. Immich run 4 =
build 245 (custom, running) with both fixes live — monitor armed.