review(gtea): M2 re-verify — #684 PASS, #685 FAIL (LFS upgrade rollback blocker)
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
Build #684 (RECIPE=gitea REF=main PR=0): PASS level=5 — all tiers pass, LFS correctly
SKIP on main, HC1 SHA match (e6a1cc79=e6a1cc79). M2 main-branch DoD MET.
Build #685 (RECIPE=gitea PR=1 REF=357926f26e69): FAIL level=1 — new critical blocker:
upgrade chaos redeploy to PR head with compose.lfs.yml fails with rollback_completed.
Root cause: lfs_jwt_secret generated by abra --all with wrong length/format because
.env.sample in PR #1 has `SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43` COMMENTED OUT.
Gitea starts but fails health check on bad JWT secret → Docker swarm rolls back.
Also filed: cc-ci self-test lint failures (9 ruff format violations in gtea files),
drone dep path not re-verified via live CI since a121d2c.
M2 still NOT claimable — Builder must fix lfs_jwt_secret generation and re-trigger #685.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@ -78,6 +78,71 @@ Not a code bug. Builder should post ONE !testme at a time to avoid concurrency c
|
||||
The concurrent lock mechanism should prevent partial-state damage, but the stale cred cache
|
||||
(`/tmp/ccci-gitea-admin-<domain>.json`) persists and causes 401s.
|
||||
|
||||
### [critical — M2 blocker] LFS upgrade rollback in build #685 @2026-06-15T21:10Z
|
||||
|
||||
Build #685 (RECIPE=gitea, PR=1, REF=357926f26e69): upgrade FAIL with rollback_completed.
|
||||
|
||||
Evidence: `abra.secret_generate --all` was called (after UPGRADE_EXTRA_ENV applied
|
||||
SECRET_LFS_JWT_SECRET_VERSION=v1). lfs_jwt_secret was created as a Docker secret (rollback_completed
|
||||
means container started, not pre-deploy failure). But gitea failed its health check.
|
||||
|
||||
**Root cause hypothesis**: lfs_jwt_secret generated with WRONG FORMAT/LENGTH because the
|
||||
`.env.sample` in PR #1 (lfs-plain-gitea branch) has the entry COMMENTED OUT:
|
||||
```
|
||||
# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 ← COMMENTED = abra may miss the length=43 spec
|
||||
```
|
||||
vs active entries (uncommented): `SECRET_JWT_SECRET_VERSION=v1 # length=43`
|
||||
|
||||
gitea's LFS JWT secret must be exactly 43 chars (base64 URL-safe, 32 bytes). If abra uses
|
||||
a different default length, gitea fails to parse the JWT secret and crashes on startup → rollback.
|
||||
|
||||
**Fix options** (Builder to choose):
|
||||
A. In `ops.py pre_install` (when `_lfs_enabled()`): explicitly generate lfs_jwt_secret with
|
||||
correct length: `abra._run(["app", "secret", "generate", domain, "lfs_jwt_secret", "v1", ...])`.
|
||||
Do NOT rely on `--all` for this secret because the spec is commented out.
|
||||
B. In generic.py `perform_upgrade` after UPGRADE_EXTRA_ENV: targeted secret generate (not --all).
|
||||
C. Ask the recipe maintainer to uncomment the `SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43`
|
||||
line in PR #1's `.env.sample` (and add a note that it's optional but needed for LFS installs).
|
||||
|
||||
Debug steps before fixing:
|
||||
1. After UPGRADE_EXTRA_ENV sets SECRET_LFS_JWT_SECRET_VERSION=v1, run:
|
||||
`abra app secret generate <domain> lfs_jwt_secret v1` and inspect the generated Docker secret
|
||||
length: `docker secret inspect <stack>_lfs_jwt_secret_v1 --format "{{.Spec.Data}}" | wc -c`
|
||||
2. Alternatively: check gitea container logs during the chaos deploy to see the startup error.
|
||||
3. A correct 43-char base64 secret should be: `openssl rand -base64 32 | tr -d '='` (43 chars).
|
||||
|
||||
Cascade effects (all from upgrade rollback):
|
||||
- pre_backup FAIL (401 on API call — stale creds after upgrade chaos)
|
||||
- pre_restore FAIL (ci-marker not in backed-up snapshot since backup was bad)
|
||||
- test_restore FAIL (marker not returned — restore didn't revert non-existent change)
|
||||
- custom tests: test_admin_api/test_git_push/test_lfs_roundtrip all 401 (stale creds)
|
||||
|
||||
Secondary mystery: WHY is ci_admin password invalid (401) after upgrade rollback? The password
|
||||
in the sqlite3 DB should be unchanged. Possible: gitea 3.5.3 briefly started during chaos deploy
|
||||
and modified the DB before failing health check. Builder should investigate if this is a separate
|
||||
bug or purely cascade from the upgrade failure.
|
||||
|
||||
### [minor — fix before M2 complete] cc-ci self-test lint failures @2026-06-15T21:10Z
|
||||
|
||||
Push-event CI builds #683/#686/#687 fail at `scripts/lint.sh` (cc-ci repo's own self-test):
|
||||
- `ruff format --check` wants to reformat 9 files (all new gtea files + test_discovery.py)
|
||||
- `ruff check` has 9 errors (bridge.py UP017 + likely others in gtea files)
|
||||
|
||||
This does NOT block M2 recipe CI runs (which use custom events). But:
|
||||
1. The cc-ci repo's self-test should be green (it's the CI server's own code quality check).
|
||||
2. `ruff format` violations in the new gtea files are Builder code quality debt.
|
||||
|
||||
Fix: `cd /root/builder-clone && nix develop .#lint --command ruff format tests/gitea/ tests/unit/test_discovery.py && nix develop .#lint --command ruff check --fix tests/gitea/`
|
||||
Then commit and push to clear the self-test lint failures.
|
||||
|
||||
### [pending — verify before M2 DONE] Drone dep path: no live CI since a121d2c
|
||||
|
||||
M2 DoD: "drone CI re-confirmed green (dep path intact)". No RECIPE=drone CI run has run
|
||||
since a121d2c modified `runner/harness/generic.py` and `tests/gitea/recipe_meta.py`.
|
||||
Unit tests (test_gitea_dep.py 10/10) still pass.
|
||||
Builder should trigger a RECIPE=drone run (e.g., post !testme on a drone recipe PR)
|
||||
to complete the M2 DoD dep-path verification.
|
||||
|
||||
### [non-blocking] Stale screenshot in manual runs @2026-06-15T20:32Z
|
||||
|
||||
`/var/lib/cc-ci-runs/manual/screenshot.png` mtime = June 13, not from today's M1 run.
|
||||
|
||||
Reference in New Issue
Block a user