journal(2): cc-ci VM offline mid discourse full5 — likely OOM on 7-GiB node; polling recovery
This commit is contained in:
@ -1495,3 +1495,17 @@ full5 fixes (the ones that actually address the timeout):
|
|||||||
Cleaned full4's stray state (2 app.1 containers stuck "Removal In Progress" held the discourse_data
|
Cleaned full4's stray state (2 app.1 containers stuck "Removal In Progress" held the discourse_data
|
||||||
volume; cleared after the daemon finished removal; volume rm'd). Node verified clean before launch.
|
volume; cleared after the daemon finished removal; volume rm'd). Node verified clean before launch.
|
||||||
full5: `/root/ccci-discourse-full5.log`, PID 848184, REF 3758522, builder-clone @8dfd8ed.
|
full5: `/root/ccci-discourse-full5.log`, PID 848184, REF 3758522, builder-clone @8dfd8ed.
|
||||||
|
|
||||||
|
---
|
||||||
|
## 2026-05-31T01:38Z — cc-ci VM went OFFLINE mid discourse full5 (likely OOM on 7-GiB node) (Builder)
|
||||||
|
At the 01:38 poll, `ssh cc-ci` timed out; `ping 100.90.116.4` 100% loss; `tailscale status` shows
|
||||||
|
`cc-nix-test 100.90.116.4 ... active; relay "nyc"; offline`. My orchestrator host + b1 (hypervisor)
|
||||||
|
are online — only the cc-ci VM dropped off. Last good state (01:33): discourse app attempt-2 in
|
||||||
|
"Populating database" (Rails migration), health=starting. Strong hypothesis: the 7-GiB node OOM'd /
|
||||||
|
thrashed under discourse's migration+asset-precompile (Rails/ember, memory-hungry) co-resident with
|
||||||
|
the CI infra (traefik/drone/dashboard/bridge/backups) AND a running warm-keycloak+db → tailscaled
|
||||||
|
starved → VM unresponsive. Tailnet membership intact (node exists, just offline) → recoverable, not a
|
||||||
|
class-A1 blocker yet. Polling for recovery; if it doesn't come back in ~15-20min it's an operator
|
||||||
|
reboot (b1 VM) → STATUS Blocked. Root-cause implication regardless: discourse is too heavy for this
|
||||||
|
node co-resident with warm-keycloak — need to shed memory (stop warm-keycloak before discourse, and/or
|
||||||
|
mem-limit the discourse build) before re-running, else this recurs.
|
||||||
|
|||||||
Reference in New Issue
Block a user