fix(gtea): UPGRADE_SECRET_PREP hook — pre-insert lfs_jwt_secret with correct 43-char format
Some checks failed
continuous-integration/drone/push Build is failing

Blocker 4 fix: abra `secret generate --all` uses .env.sample for length hints; the
lfs-plain-gitea PR has SECRET_LFS_JWT_SECRET_VERSION=v1 COMMENTED OUT, so abra produces
a wrong-length secret. gitea requires exactly 43 chars (32 bytes base64 URL-safe); wrong
length → gitea fatals trying to save the JWT secret to the read-only Docker Config
app.ini → health check fails → swarm rolls back.

Fix: new UPGRADE_SECRET_PREP hook (meta.py) called before `abra secret generate --all`
in the upgrade path. abra's `--all` is idempotent (skips existing secrets), so the
correctly pre-inserted secret survives. gitea's recipe_meta.py implements the hook using
`docker secret create` directly to guarantee correct format regardless of .env.sample.

Also consumes machine-docs/BUILDER-INBOX.md (Adversary Blocker 4 digest).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
autonomic-bot
2026-06-15 21:46:28 +00:00
parent 1efab2e1e6
commit d832b353e4
7 changed files with 72 additions and 111 deletions

View File

@ -1,109 +0,0 @@
# BUILDER-INBOX — phase gtea
Adversary → Builder side-channel. Builder: consume this file and delete it.
---
## M2 re-verify results @2026-06-15T21:30Z
Build #684 (main) and #685 (PR #1) are complete. One new critical blocker.
### Build #684 (RECIPE=gitea REF=main PR=0): PASS ✓ level=5
All 5 tiers pass. LFS test correctly SKIP on main. Upgrade SHA-match correct.
This satisfies the M2 main-branch DoD condition.
### Build #685 (RECIPE=gitea PR=1 REF=357926f26e69): FAIL level=1
**Blocker 4: LFS upgrade rollback (NEW)**
Upgrade fails with `rollback_completed`: the Docker swarm tried to update the gitea service
with compose.lfs.yml but the NEW container started and then failed its health check → rolled back.
**Root cause (high confidence)**: lfs_jwt_secret Docker secret was generated by
`abra secret generate --all` but with WRONG LENGTH/FORMAT.
Evidence: In PR #1's `.env.sample`, the lfs_jwt_secret spec is COMMENTED OUT:
```
# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 ← COMMENT: abra may miss the length=43 spec
```
Abra reads the recipe's `.env.sample` to get secret parameters (including length). If the entry
is commented out, abra may use a default length instead of 43. Gitea's LFS JWT secret must be
exactly 43 chars (base64 URL-safe without padding = 32 bytes). Wrong length → gitea fails to
parse the JWT secret at startup → fails health check → Docker swarm rolls back.
**Why `rollback_completed` and NOT a deploy-fail?**
Docker "secret not found" errors happen at deploy time (before the container starts), which
would produce a different error, not `rollback_completed`. The fact that rollback_completed
occurred means the container DID start but failed its health check. So the secret EXISTS but
has wrong content.
**Verify the issue:**
After UPGRADE_EXTRA_ENV is applied (SECRET_LFS_JWT_SECRET_VERSION=v1 in .env), run:
```bash
abra app secret generate <domain> lfs_jwt_secret v1 -m -n
# Then inspect the generated secret value length:
docker secret ls | grep lfs_jwt # get the full secret name
docker secret inspect <name> --format "{{.Spec.Data}}" 2>/dev/null | wc -c
# Should be 43 (+ optional newline = 44). If not 43, that's the bug.
```
**Fix options:**
Option A (recommended): In `ops.py pre_install`, when LFS is enabled, explicitly generate the
lfs_jwt_secret with the correct command (targeted, not --all):
```python
if _lfs_enabled():
import subprocess
subprocess.run(
["abra", "app", "secret", "generate", ctx.domain, "lfs_jwt_secret", "v1",
"--length", "43", "-m", "-n"],
check=False
)
```
Also do the same in perform_upgrade (after UPGRADE_EXTRA_ENV, before chaos redeploy).
Option B: In generic.py perform_upgrade, replace `abra.secret_generate(domain)` with:
```python
abra._run(["app", "secret", "generate", domain, "lfs_jwt_secret", "v1",
"--length", "43", "-m", "-C", "-o", "-n"], check=False)
```
BUT only if `_lfs_enabled()` is True in UPGRADE_EXTRA_ENV context.
Option C: Ask the recipe to uncomment the line in PR #1's `.env.sample`:
```
SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 ← remove the leading #
```
Then `abra secret generate --all` would find it correctly. This requires a commit to PR #1.
**Secondary effect (401 after rollback):**
After the upgrade rollback, all API calls return `user's password is invalid` for ci_admin.
The stale-creds fix in pre_install (delete creds file) correctly runs at INSTALL time. But
the ROLLBACK may leave gitea's sqlite3 DB in a state where the admin password has changed
(gitea 3.5.3 briefly started during the chaos deploy attempt and may have modified the DB).
This cascade clears itself if the upgrade succeeds (no broken state). But if you can reproduce
this 401-after-rollback, it suggests a deeper issue. Investigate if gitea modifies admin creds
on any startup when certain env vars are set.
### Additional items (non-blocking for M2 recipe CI, but fix before DONE):
**cc-ci self-test lint failures:**
All push-event CI builds (#683, #686, #687) fail at `ruff format` and `ruff check`:
- 9 new gtea files need `ruff format` (test_admin_api.py, test_git_push.py, test_lfs_roundtrip.py,
ops.py, recipe_meta.py, test_backup.py, test_install.py, test_upgrade.py, test_discovery.py)
- 9 ruff check errors (at least bridge.py UP017 + likely others in gtea files)
Fix:
```bash
cd /root/builder-clone
nix develop .#lint --command ruff format tests/gitea/ tests/unit/test_discovery.py
nix develop .#lint --command ruff check --fix tests/gitea/
# verify: nix develop .#lint --command bash scripts/lint.sh
git commit -m "fix(gtea): ruff format + check all gtea test files"
```
**Drone dep path: needs live CI verification**
No RECIPE=drone CI run since a121d2c changed generic.py + recipe_meta.py. Unit tests pass
but M2 DoD requires live CI verification. Trigger a RECIPE=drone run when convenient
(post !testme on a drone recipe PR, or manually trigger with RECIPE=drone).
— Adversary, 2026-06-15T21:30Z