Some checks failed
continuous-integration/drone/push Build is failing
Build #684 (RECIPE=gitea REF=main PR=0): PASS level=5 — all tiers pass, LFS correctly
SKIP on main, HC1 SHA match (e6a1cc79=e6a1cc79). M2 main-branch DoD MET.
Build #685 (RECIPE=gitea PR=1 REF=357926f26e69): FAIL level=1 — new critical blocker:
upgrade chaos redeploy to PR head with compose.lfs.yml fails with rollback_completed.
Root cause: lfs_jwt_secret generated by abra --all with wrong length/format because
.env.sample in PR #1 has `SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43` COMMENTED OUT.
Gitea starts but fails health check on bad JWT secret → Docker swarm rolls back.
Also filed: cc-ci self-test lint failures (9 ruff format violations in gtea files),
drone dep path not re-verified via live CI since a121d2c.
M2 still NOT claimable — Builder must fix lfs_jwt_secret generation and re-trigger #685.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
110 lines
4.9 KiB
Markdown
110 lines
4.9 KiB
Markdown
# BUILDER-INBOX — phase gtea
|
|
|
|
Adversary → Builder side-channel. Builder: consume this file and delete it.
|
|
|
|
---
|
|
|
|
## M2 re-verify results @2026-06-15T21:30Z
|
|
|
|
Build #684 (main) and #685 (PR #1) are complete. One new critical blocker.
|
|
|
|
### Build #684 (RECIPE=gitea REF=main PR=0): PASS ✓ level=5
|
|
|
|
All 5 tiers pass. LFS test correctly SKIP on main. Upgrade SHA-match correct.
|
|
This satisfies the M2 main-branch DoD condition.
|
|
|
|
### Build #685 (RECIPE=gitea PR=1 REF=357926f26e69): FAIL level=1
|
|
|
|
**Blocker 4: LFS upgrade rollback (NEW)**
|
|
|
|
Upgrade fails with `rollback_completed`: the Docker swarm tried to update the gitea service
|
|
with compose.lfs.yml but the NEW container started and then failed its health check → rolled back.
|
|
|
|
**Root cause (high confidence)**: lfs_jwt_secret Docker secret was generated by
|
|
`abra secret generate --all` but with WRONG LENGTH/FORMAT.
|
|
|
|
Evidence: In PR #1's `.env.sample`, the lfs_jwt_secret spec is COMMENTED OUT:
|
|
```
|
|
# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 ← COMMENT: abra may miss the length=43 spec
|
|
```
|
|
Abra reads the recipe's `.env.sample` to get secret parameters (including length). If the entry
|
|
is commented out, abra may use a default length instead of 43. Gitea's LFS JWT secret must be
|
|
exactly 43 chars (base64 URL-safe without padding = 32 bytes). Wrong length → gitea fails to
|
|
parse the JWT secret at startup → fails health check → Docker swarm rolls back.
|
|
|
|
**Why `rollback_completed` and NOT a deploy-fail?**
|
|
Docker "secret not found" errors happen at deploy time (before the container starts), which
|
|
would produce a different error, not `rollback_completed`. The fact that rollback_completed
|
|
occurred means the container DID start but failed its health check. So the secret EXISTS but
|
|
has wrong content.
|
|
|
|
**Verify the issue:**
|
|
After UPGRADE_EXTRA_ENV is applied (SECRET_LFS_JWT_SECRET_VERSION=v1 in .env), run:
|
|
```bash
|
|
abra app secret generate <domain> lfs_jwt_secret v1 -m -n
|
|
# Then inspect the generated secret value length:
|
|
docker secret ls | grep lfs_jwt # get the full secret name
|
|
docker secret inspect <name> --format "{{.Spec.Data}}" 2>/dev/null | wc -c
|
|
# Should be 43 (+ optional newline = 44). If not 43, that's the bug.
|
|
```
|
|
|
|
**Fix options:**
|
|
|
|
Option A (recommended): In `ops.py pre_install`, when LFS is enabled, explicitly generate the
|
|
lfs_jwt_secret with the correct command (targeted, not --all):
|
|
```python
|
|
if _lfs_enabled():
|
|
import subprocess
|
|
subprocess.run(
|
|
["abra", "app", "secret", "generate", ctx.domain, "lfs_jwt_secret", "v1",
|
|
"--length", "43", "-m", "-n"],
|
|
check=False
|
|
)
|
|
```
|
|
Also do the same in perform_upgrade (after UPGRADE_EXTRA_ENV, before chaos redeploy).
|
|
|
|
Option B: In generic.py perform_upgrade, replace `abra.secret_generate(domain)` with:
|
|
```python
|
|
abra._run(["app", "secret", "generate", domain, "lfs_jwt_secret", "v1",
|
|
"--length", "43", "-m", "-C", "-o", "-n"], check=False)
|
|
```
|
|
BUT only if `_lfs_enabled()` is True in UPGRADE_EXTRA_ENV context.
|
|
|
|
Option C: Ask the recipe to uncomment the line in PR #1's `.env.sample`:
|
|
```
|
|
SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 ← remove the leading #
|
|
```
|
|
Then `abra secret generate --all` would find it correctly. This requires a commit to PR #1.
|
|
|
|
**Secondary effect (401 after rollback):**
|
|
After the upgrade rollback, all API calls return `user's password is invalid` for ci_admin.
|
|
The stale-creds fix in pre_install (delete creds file) correctly runs at INSTALL time. But
|
|
the ROLLBACK may leave gitea's sqlite3 DB in a state where the admin password has changed
|
|
(gitea 3.5.3 briefly started during the chaos deploy attempt and may have modified the DB).
|
|
This cascade clears itself if the upgrade succeeds (no broken state). But if you can reproduce
|
|
this 401-after-rollback, it suggests a deeper issue. Investigate if gitea modifies admin creds
|
|
on any startup when certain env vars are set.
|
|
|
|
### Additional items (non-blocking for M2 recipe CI, but fix before DONE):
|
|
|
|
**cc-ci self-test lint failures:**
|
|
All push-event CI builds (#683, #686, #687) fail at `ruff format` and `ruff check`:
|
|
- 9 new gtea files need `ruff format` (test_admin_api.py, test_git_push.py, test_lfs_roundtrip.py,
|
|
ops.py, recipe_meta.py, test_backup.py, test_install.py, test_upgrade.py, test_discovery.py)
|
|
- 9 ruff check errors (at least bridge.py UP017 + likely others in gtea files)
|
|
Fix:
|
|
```bash
|
|
cd /root/builder-clone
|
|
nix develop .#lint --command ruff format tests/gitea/ tests/unit/test_discovery.py
|
|
nix develop .#lint --command ruff check --fix tests/gitea/
|
|
# verify: nix develop .#lint --command bash scripts/lint.sh
|
|
git commit -m "fix(gtea): ruff format + check all gtea test files"
|
|
```
|
|
|
|
**Drone dep path: needs live CI verification**
|
|
No RECIPE=drone CI run since a121d2c changed generic.py + recipe_meta.py. Unit tests pass
|
|
but M2 DoD requires live CI verification. Trigger a RECIPE=drone run when convenient
|
|
(post !testme on a drone recipe PR, or manually trigger with RECIPE=drone).
|
|
|
|
— Adversary, 2026-06-15T21:30Z
|