journal(2): RESUMED — cc-ci migrated to Hetzner node (still ~8GB); discourse full6 setup + memory-shed
This commit is contained in:
@ -1509,3 +1509,34 @@ class-A1 blocker yet. Polling for recovery; if it doesn't come back in ~15-20min
|
||||
reboot (b1 VM) → STATUS Blocked. Root-cause implication regardless: discourse is too heavy for this
|
||||
node co-resident with warm-keycloak — need to shed memory (stop warm-keycloak before discourse, and/or
|
||||
mem-limit the discourse build) before re-running, else this recurs.
|
||||
|
||||
---
|
||||
## 2026-05-31T04:2xZ — RESUMED (spend limit lifted): cc-ci now = Hetzner node; discourse full6 setup (Builder)
|
||||
Woke into the loop after the spend pause. Re-oriented from STATUS-2/REVIEW-2/JOURNAL-2.
|
||||
|
||||
**Node migration (prior session, undocumented until now):** `ssh cc-ci` no longer targets the b1-hosted
|
||||
`cc-nix-test` VM (100.90.116.4 — now tailnet-OFFLINE, the 7-GiB node that OOM'd mid discourse full5).
|
||||
It now targets the new **Hetzner cloud node** `cc-ci` = 100.95.31.88 (public 91.98.47.73), the
|
||||
`cc-ci-hetzner` host added in commits 4237cc0/a216395 (nixos-infect). Confirmed: hostname `nixos`,
|
||||
swarm node `cc-ci` Ready/Active/Leader, abra server `default` registered, CI infra stacks
|
||||
(traefik/drone/dashboard/bridge/backups + warm-keycloak) all redeployed and running. `HCLOUD_TOKEN`
|
||||
is in `.testenv` (Hetzner access available). **Caveat: the new node is STILL 4 vCPU / ~7.7 GiB RAM**
|
||||
(MemTotal 7937188 kB, nproc 4) — same class as the old node, NOT bigger. So the discourse memory
|
||||
constraint persists; the migration bought a reachable/declarative node, not more RAM.
|
||||
|
||||
**Fresh-node state:** root is persistent ext4 (150G, 7% used) but `/root/builder-clone`, the cached
|
||||
discourse image, and recipe residue were all absent (fresh infect). Re-established builder-clone at
|
||||
`origin/main` (a216395) via `git clone` (no submodules). abra + cc-ci-run are Nix-provided
|
||||
(`/run/current-system/sw/bin`). No discourse/ghost stacks/volumes/secrets present → clean slate.
|
||||
|
||||
**discourse full6 setup (re-run of the OOM-lost full5, same committed shape):** recipe_meta at main
|
||||
already carries the full upgrade-to-latest shape — UPGRADE_BASE_VERSION=0.7.0+3.3.1,
|
||||
COMPOSE_FILE=compose.yml:compose.ccci.yml, CHAOS_BASE_DEPLOY=True, TIMEOUT/DEPLOY_TIMEOUT=3600,
|
||||
BACKUP_VERIFY probe. compose.ccci.yml (bitnamilegacy re-pin + literal 20m start_period grace on the
|
||||
0.7.0 base) + install_steps.sh both present and consistent. REF = discourse PR#1 head
|
||||
3758522cf8702e97e88cd38d47165cf14defe74e (confirmed current via gitea API; branch ci/bitnamilegacy-repin).
|
||||
**Memory-shed (the full5 root-cause fix):** stopped warm-keycloak (`docker stack rm`) — discourse needs
|
||||
no SSO for STAGES=install,upgrade,backup,restore,custom. Result: available RAM 6.4→**7.0 GiB**, platform
|
||||
stacks total ~70 MiB (traefik 33 / drone 7 / dashboard 13 / bridge 14 / backups 2). discourse now gets
|
||||
nearly the whole node vs competing with keycloak's ~700MB java during asset-precompile. Pre-pulling
|
||||
`bitnamilegacy/discourse:3.3.1` by TAG (full5 fix #1: inline deploy pull → no-op). Launch on image-ready.
|
||||
|
||||
Reference in New Issue
Block a user