From 00fca8a33e08ef64ce4108239682f8eeb84da50f Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Thu, 18 Jun 2026 00:14:32 +0000 Subject: [PATCH] journal+status(redfix): M1 gitea app.ini read-only JWT crash CONFIRMED on warm advance (recipe defect); 6/6 classified --- machine-docs/JOURNAL-redfix.md | 33 +++++++++++++++++++++++++++++++++ machine-docs/STATUS-redfix.md | 2 +- 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/machine-docs/JOURNAL-redfix.md b/machine-docs/JOURNAL-redfix.md index 24c44ad..5a8a701 100644 --- a/machine-docs/JOURNAL-redfix.md +++ b/machine-docs/JOURNAL-redfix.md @@ -194,3 +194,36 @@ warm-gitea (docker stack rm; retained data+config volumes → proper idle state) cold@3.6.0 (gitea2). Result pending. NOTE: the "already deployed" promote-failure-when-left-deployed may be a secondary promote-machinery robustness gap (advance should undeploy-or-chaos an already-deployed canonical) — will assess after confirming the primary app.ini crash. + +## 2026-06-18T00:14Z — M1: gitea warm advance — app.ini read-only JWT crash CONFIRMED (recipe defect) + +After restoring warm-gitea to proper idle state (undeployed, 3.5.3 data+config volumes retained), +re-ran gitea cold@3.6.0 (gitea2, log /tmp/redfix-gitea2.log). Cold lifecycle ALL PASS +(install/upgrade/backup/restore/custom — incl. the cold FRESH 3.5.3→3.6.0 upgrade tier). WC5 promote +advance then crash-loops. Live container logs (warm-gitea_..._app, repeated Failed/exit 1): + + modules/setting/setting.go:105:LoadCommonSettings() [F] Unable to load settings from config: + error saving JWT Secret for custom config: failed to save "/etc/gitea/app.ini": + open /etc/gitea/app.ini: read-only file system + +EXACTLY the canon-documented crash. Mechanism: the recipe mounts app.ini as a docker `config` +(read-only by design) at /etc/gitea/app.ini (compose `configs: - source: app_ini target: +/etc/gitea/app.ini`, app.ini.tmpl). gitea 1.24.2 (3.6.0), on the warm REATTACH of the retained +3.5.3 config volume, decides to (re)generate+SAVE a JWT secret to app.ini → read-only fs → FATA at +config-load, BEFORE any DB migration (so the 3.5.3 data volume stays intact — confirmed canon). + +Why cold passes but warm crashes: the cold fresh deploy + cold chaos-upgrade use freshly-generated +secrets consistent with a freshly-initialized config, so gitea never needs to rewrite app.ini. The +warm advance reattaches an OLDER retained config-volume state (seeded under 3.5.3) against the new +run's secrets/3.6.0 binary → gitea reconciles by trying to persist a JWT secret → read-only crash. + +Classification: **genuine RECIPE defect** (gitea 3.6.0/1.24.2 + read-only app.ini docker-config mount +on the warm-reattach advance), deterministic, reproduced first-hand. NOT a flake, NOT promote +machinery. Fix approach (M2): recipe PR making app.ini writable on the advance path — e.g. render the +config into the WRITABLE `config:/etc/gitea` volume via an entrypoint (not a read-only docker config), +OR ensure the persisted secrets are accepted without rewrite. (Secondary harness option: canonical +advance falls back to clean re-deploy when in-place config rewrite is impossible — but that loses the +reattach data-warm property; recipe fix preferred.) Ties to LFS PR #1 (app.ini secret handling). + +ACTION NEEDED after run exits: warm-gitea is left crash-looping at 3.6.0 → restore it to 3.5.3 +(redeploy the known-good canonical version) so the canonical is healthy again. Data volume intact. diff --git a/machine-docs/STATUS-redfix.md b/machine-docs/STATUS-redfix.md index 21df7be..0384604 100644 --- a/machine-docs/STATUS-redfix.md +++ b/machine-docs/STATUS-redfix.md @@ -39,7 +39,7 @@ flake source per phase plan §2.1). Runs execute on cc-ci from `/etc/cc-ci`. | mattermost-lts | DONE @00:05Z (`/tmp/redfix-mattermost-lts.log`) | install/upgrade/backup/custom PASS; **restore FAIL** `ci_marker does not exist` — **deterministic in isolation** (not a load race) | recipe `postgres` svc backup labels: backs up hot live PGDATA + dump but has **NO `backupbot.restore.post-hook`** to replay the dump → restore doesn't round-trip postgres. Contrast immich (passes): dump-only `backup.volumes.postgres.path: backup.sql` + `restore.post-hook: /pg_backup.sh restore`. | **genuine RECIPE defect** at latest → recipe PR (adopt immich-style dump+restore-post-hook) | | mumble | DONE @00:18Z (`/tmp/redfix-mumble.log`) | **ALL tiers PASS** incl. handshake; no orphans. Canon red under load; canonical written green TODAY | handshake (TLS+ServerSync) not completing within ~60s retry under heavy concurrent sweep load; fine in isolation | **load/timing FLAKE** → harness stabilization (readiness gate / retry). (1 isolation green; will repeat 1-2× before M1 claim) | | bluesky-pds | DONE @00:45Z (`/tmp/redfix-bluesky-pds.log` + live diag) | cold lifecycle GREEN; **WC5 promote 000** reproduces (warm /xrpc/_health last status 0). NOT a flake | caddy on-demand TLS (`ask http://app:3000/tls-check`) can't reach app: caddy resolves bare `app` to OTHER stacks' app endpoints on shared `proxy` net (getent app→only 10.10.0.X, never internal 10.0.3.3; proxy has drone/traefik/keycloak/ccci `app` aliases) → no cert → 000. Promote machinery correct (refused to write canonical). | **genuine routing/RECIPE defect** (cross-stack `app`-alias collision on shared proxy) → recipe PR: unique PDS service name/alias. NOT promote-machinery, NOT flake | -| gitea | running (isolation; warm 3.6.0 advance) | — | — | — | +| gitea | DONE @00:14Z (`/tmp/redfix-gitea2.log` + live container logs) | cold lifecycle (incl fresh 3.5.3→3.6.0 upgrade) PASS; **warm advance crash-loops** | `LoadCommonSettings() [F] error saving JWT Secret … failed to save "/etc/gitea/app.ini": read-only file system` — gitea 3.6.0/1.24.2 tries to persist a JWT to the read-only app.ini docker-config mount on warm reattach (before DB migration; 3.5.3 data intact). Cold passes (fresh secrets, no rewrite). | **genuine RECIPE defect** (3.6.0 + read-only app.ini config mount on advance) → recipe PR: render app.ini into the writable config volume. (1st gitea run hit a nixenv "already deployed" leftover confound — fixed by undeploying to idle then re-running) | | keycloak | DONE @01:05Z (code-verified; no run) | de-enrolled. `canonical_domain("keycloak")` == `WARM_DOMAINS["keycloak"]` == `warm-keycloak.ci.commoninternet.net` EXACTLY (canonical.py:42, warm.py:27,44). Live keycloak 200 /realms/master. | data-warm canonical domain uses same `warm-` scheme as the live-warm OIDC provider → promote would collide with live shared SSO. No collision-free canonical namespace exists. | **HARNESS defect** (warm-domain namespace collision) → fix: collision-free `canonical_domain` for live-warm providers (`warm-canon-`), then enroll keycloak | Gate: M1 not yet claimed.