From e3e0a9ee80f3b3ef8467cad306cb7bdc8664f34d Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Tue, 9 Jun 2026 23:08:59 +0000 Subject: [PATCH] journal: two harness convergence fixes (UpdateStatus settle + paused-is-settled); immich build 245 in flight --- cc-ci-plan/JOURNAL.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/cc-ci-plan/JOURNAL.md b/cc-ci-plan/JOURNAL.md index 922560f..477d159 100644 --- a/cc-ci-plan/JOURNAL.md +++ b/cc-ci-plan/JOURNAL.md @@ -317,3 +317,22 @@ test/plausible-upgrade-base-3.0.1 pin (UPGRADE_BASE_VERSION=3.0.1+v2.0.0). Branc main push build 236 green. /root/builder-clone fast-forwarded to c828f6c. assistant3 notified via tmux (plausible !testme unblocked; restore-hook gzip failure is their lane). Next: immich PR #2 !testme re-triggered alone (checkout parked clean at a92b28d) — polling to verdict. + +### Event 2026-06-09 ~23:20 — Two harness convergence fixes landed; immich on run 3 (build 245) +Immich !testme run 2 (build 238) RED but PROGRESS: install/upgrade/custom PASS (checkout race gone), +backup CRASHED — backupbot exec'd the db pre-hook into a container swarm killed seconds earlier: the +chaos redeploy changes the db image (pgvecto.rs→vectorchord pin) and registers a stop-first rolling +update that hadn't STARTED when the N/N convergence check passed (old task still 1/1). → `68ef0f8` +fix(harness): services_converged() also requires swarm UpdateStatus settled + bounded settle-wait in +backup_app(). Run 3 (build 241) then HUNG 22min in the restore tier: the app service's UpdateStatus +was 'paused' (swarm default update-failure-action after one task flicker during restore) — a state +that persists FOREVER; my check treated it as in-flight. Killed 241 (cancel leaks the python child — +killed by hand; immi-ad3e33 undeployed+rm'd, registry entry cleared, zero leakage verified). → +`e6d55b5` fix(harness): only 'updating'/'rollback_started' block convergence; 'paused' + N/N is +settled (health asserts still gate). Both branch builds green (239, 243); main ff'd; builder-clone +updated. **Plausible build 237 (assistant3, head 4cab6b5): install/upgrade/backup PASS — their +gzip/dump-path fix WORKS, marker restore test PASSED; remaining: app 502s after restore +(test_restore_healthy + custom tier) + restore hook needs pg_restore --if-exists; diagnosed + +relayed via tmux.** Concurrency machinery observed working live: parallel immich+plausible runs held +per-recipe locks, registered pidfiles, plausible's teardown unregistered cleanly. Immich run 4 = +build 245 (custom, running) with both fixes live — monitor armed.