diff --git a/machine-docs/JOURNAL-2.md b/machine-docs/JOURNAL-2.md index fe131e8..fba929f 100644 --- a/machine-docs/JOURNAL-2.md +++ b/machine-docs/JOURNAL-2.md @@ -1509,3 +1509,34 @@ class-A1 blocker yet. Polling for recovery; if it doesn't come back in ~15-20min reboot (b1 VM) → STATUS Blocked. Root-cause implication regardless: discourse is too heavy for this node co-resident with warm-keycloak — need to shed memory (stop warm-keycloak before discourse, and/or mem-limit the discourse build) before re-running, else this recurs. + +--- +## 2026-05-31T04:2xZ — RESUMED (spend limit lifted): cc-ci now = Hetzner node; discourse full6 setup (Builder) +Woke into the loop after the spend pause. Re-oriented from STATUS-2/REVIEW-2/JOURNAL-2. + +**Node migration (prior session, undocumented until now):** `ssh cc-ci` no longer targets the b1-hosted +`cc-nix-test` VM (100.90.116.4 — now tailnet-OFFLINE, the 7-GiB node that OOM'd mid discourse full5). +It now targets the new **Hetzner cloud node** `cc-ci` = 100.95.31.88 (public 91.98.47.73), the +`cc-ci-hetzner` host added in commits 4237cc0/a216395 (nixos-infect). Confirmed: hostname `nixos`, +swarm node `cc-ci` Ready/Active/Leader, abra server `default` registered, CI infra stacks +(traefik/drone/dashboard/bridge/backups + warm-keycloak) all redeployed and running. `HCLOUD_TOKEN` +is in `.testenv` (Hetzner access available). **Caveat: the new node is STILL 4 vCPU / ~7.7 GiB RAM** +(MemTotal 7937188 kB, nproc 4) — same class as the old node, NOT bigger. So the discourse memory +constraint persists; the migration bought a reachable/declarative node, not more RAM. + +**Fresh-node state:** root is persistent ext4 (150G, 7% used) but `/root/builder-clone`, the cached +discourse image, and recipe residue were all absent (fresh infect). Re-established builder-clone at +`origin/main` (a216395) via `git clone` (no submodules). abra + cc-ci-run are Nix-provided +(`/run/current-system/sw/bin`). No discourse/ghost stacks/volumes/secrets present → clean slate. + +**discourse full6 setup (re-run of the OOM-lost full5, same committed shape):** recipe_meta at main +already carries the full upgrade-to-latest shape — UPGRADE_BASE_VERSION=0.7.0+3.3.1, +COMPOSE_FILE=compose.yml:compose.ccci.yml, CHAOS_BASE_DEPLOY=True, TIMEOUT/DEPLOY_TIMEOUT=3600, +BACKUP_VERIFY probe. compose.ccci.yml (bitnamilegacy re-pin + literal 20m start_period grace on the +0.7.0 base) + install_steps.sh both present and consistent. REF = discourse PR#1 head +3758522cf8702e97e88cd38d47165cf14defe74e (confirmed current via gitea API; branch ci/bitnamilegacy-repin). +**Memory-shed (the full5 root-cause fix):** stopped warm-keycloak (`docker stack rm`) — discourse needs +no SSO for STAGES=install,upgrade,backup,restore,custom. Result: available RAM 6.4→**7.0 GiB**, platform +stacks total ~70 MiB (traefik 33 / drone 7 / dashboard 13 / bridge 14 / backups 2). discourse now gets +nearly the whole node vs competing with keycloak's ~700MB java during asset-precompile. Pre-pulling +`bitnamilegacy/discourse:3.3.1` by TAG (full5 fix #1: inline deploy pull → no-op). Launch on image-ready.