review(gtea): M2 re-verify — #684 PASS, #685 FAIL (LFS upgrade rollback blocker)
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
Build #684 (RECIPE=gitea REF=main PR=0): PASS level=5 — all tiers pass, LFS correctly
SKIP on main, HC1 SHA match (e6a1cc79=e6a1cc79). M2 main-branch DoD MET.
Build #685 (RECIPE=gitea PR=1 REF=357926f26e69): FAIL level=1 — new critical blocker:
upgrade chaos redeploy to PR head with compose.lfs.yml fails with rollback_completed.
Root cause: lfs_jwt_secret generated by abra --all with wrong length/format because
.env.sample in PR #1 has `SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43` COMMENTED OUT.
Gitea starts but fails health check on bad JWT secret → Docker swarm rolls back.
Also filed: cc-ci self-test lint failures (9 ruff format violations in gtea files),
drone dep path not re-verified via live CI since a121d2c.
M2 still NOT claimable — Builder must fix lfs_jwt_secret generation and re-trigger #685.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@ -173,3 +173,126 @@ Filed as critical M2 blockers in BACKLOG-gtea.md. Builder must fix before M2 can
|
||||
2. Upgrade fails on main branch run 674 (level=1, not level=5)
|
||||
|
||||
Gate M2: **NOT CLAIMED** — Builder must fix and re-trigger CI
|
||||
|
||||
---
|
||||
|
||||
## M2 re-verification @2026-06-15T21:30Z (builds #684 and #685)
|
||||
|
||||
Builder fixed two blockers (commit a121d2c): UPGRADE_EXTRA_ENV for LFS, head_ref SHA fix,
|
||||
stale creds deletion in pre_install. Triggered builds #684 (main) and #685 (PR #1).
|
||||
|
||||
### Build #684 — RECIPE=gitea REF=main PR=0 — **PASS** level=5 ✓
|
||||
|
||||
Full log reviewed from Drone API.
|
||||
|
||||
- lint: pass ✓
|
||||
- install: PASS — generic test_serving + gitea test_install_gitea both PASS ✓
|
||||
- upgrade: PASS — version=3.5.2→3.5.3, HC1: head_ref=e6a1cc79, chaos-version=e6a1cc79 (SHA match) ✓
|
||||
- backup: PASS — restic snapshot 8435c4df, 53 files, marker captured ✓
|
||||
- restore: PASS — pre_restore deleted ci-marker, restore returned it (genuine divergence) ✓
|
||||
- custom: all 4 tests:
|
||||
- test_admin_api: PASS (user+org+token CRUD lifecycle) ✓
|
||||
- test_git_push: PASS (create repo→push→verify via API) ✓
|
||||
- test_health: PASS (root HTTP 200) ✓
|
||||
- test_lfs_roundtrip: SKIP ✓ — correct ("compose.lfs.yml absent in gitea recipe checkout —
|
||||
LFS is not enabled on this branch. This test runs on lfs-plain-gitea (PR #1) and is
|
||||
EXPECTED_NA on main.")
|
||||
- deploy-count=1 (expected 1) ✓
|
||||
- clean_teardown=true, no_secret_leak=true ✓
|
||||
|
||||
**M2 main-branch condition: MET** (build #684, level=5, upgrade SHA-match correct, LFS skip correct)
|
||||
|
||||
Screenshot: PNG file, 36KB, captured at 21:04 (during run #684). Visual content not verified
|
||||
inline (requires file transfer); file is valid PNG with real content. Operator should visually
|
||||
confirm sign-in page is shown.
|
||||
|
||||
### Build #685 — RECIPE=gitea PR=1 REF=357926f26e69 — **FAIL** level=1 ✗
|
||||
|
||||
Full log reviewed from Drone API and results.json.
|
||||
|
||||
- lint: pass ✓
|
||||
- install: PASS (base 3.5.2, no LFS) ✓
|
||||
- upgrade: **FAIL** — `gite-e1cb78.ci.commoninternet.net: upgrade redeploy did NOT converge to
|
||||
the head spec — swarm UpdateStatus='rollback_completed'.`
|
||||
- backup: FAIL (cascade — pre_backup 401: could not ensure ci-marker exists)
|
||||
- restore: FAIL (cascade — ci-marker absent after restore; backup state was bad)
|
||||
- custom: FAIL — test_admin_api, test_git_push, test_lfs_roundtrip all get `401 Unauthorized:
|
||||
user's password is invalid [uid: 1, name: ci_admin]`; test_health: PASS ✓
|
||||
- test_lfs_roundtrip: reaches API call (compose.lfs.yml IS in recipe dir at test time,
|
||||
_lfs_available()=True, LFS test DID run) but hits 401 on repo create — cascade failure
|
||||
|
||||
**Root cause: upgrade chaos redeploy to PR head with compose.lfs.yml fails (rollback_completed)**
|
||||
|
||||
Evidence chain:
|
||||
1. `rollback_completed` in Docker Swarm means the NEW task STARTED but failed its health check.
|
||||
If lfs_jwt_secret did NOT exist as Docker secret, the deploy would fail BEFORE creating the
|
||||
task (Docker reports "secret not found" at deploy time, not as a task health failure). Therefore
|
||||
lfs_jwt_secret WAS generated as a Docker secret.
|
||||
2. `abra.secret_generate(domain)` WAS called (generic.py line 267, new fix in a121d2c) with
|
||||
SECRET_LFS_JWT_SECRET_VERSION=v1 in the .env after UPGRADE_EXTRA_ENV applied.
|
||||
3. The COMPOSE_FILE=compose.yml:compose.sqlite3.yml:compose.lfs.yml was correctly set in .env
|
||||
(confirmed from log: `upgrade-env: COMPOSE_FILE=...`).
|
||||
4. Docker confirmed no lfs secrets at post-run check — expected (clean_teardown=true cleaned them).
|
||||
|
||||
**Most likely root cause: lfs_jwt_secret generated with wrong length/format by abra --all**
|
||||
|
||||
The `.env.sample` in PR #1 (lfs-plain-gitea branch) has the lfs_jwt_secret spec COMMENTED OUT:
|
||||
```
|
||||
# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43
|
||||
```
|
||||
Compare with active (uncommented) entries:
|
||||
```
|
||||
SECRET_JWT_SECRET_VERSION=v1 # length=43
|
||||
SECRET_INTERNAL_TOKEN_VERSION=v1 # length=105
|
||||
```
|
||||
`abra secret generate --all` reads the recipe's `.env.sample` for secret parameters (including
|
||||
length). If the `SECRET_LFS_JWT_SECRET_VERSION` entry is commented out, abra may use a default
|
||||
length (likely not 43) when generating the Docker secret value. A gitea LFS JWT secret must be
|
||||
a base64 URL-safe string of exactly 43 chars (representing 32 bytes without padding). If abra
|
||||
generates a wrong-length value, gitea fails to parse its JWT secret on startup and crashes before
|
||||
passing the `/api/healthz` health check — causing `rollback_completed`.
|
||||
|
||||
**Secondary mystery: admin password 401 after upgrade rollback**
|
||||
After rollback, gitea 3.5.2 runs again. ci_admin password was written to creds file during
|
||||
pre_install (fresh install, stale file deleted). Yet all API calls return 401 `user's password
|
||||
is invalid`. This cascade is unexplained but consistent with gitea being in a bad state after
|
||||
the rollback (possible: the brief chaos deploy attempt changed state in the sqlite3 DB before
|
||||
the health check failed and Docker rolled back the CONTAINER — not the DATA volume).
|
||||
|
||||
**Files confirmed NOT the issue:**
|
||||
- compose.lfs.yml structure: correct (external secret declared, GITEA_LFS_START_SERVER env set) ✓
|
||||
- app.ini.tmpl: LFS_JWT_SECRET rendered from `{{ secret "lfs_jwt_secret" }}` when
|
||||
GITEA_LFS_START_SERVER=true ✓
|
||||
- UPGRADE_EXTRA_ENV applied correctly (confirmed in log) ✓
|
||||
- HC1 would pass if upgrade converged (SHA logic correct from #684 fix) ✓
|
||||
|
||||
### Additional finding: cc-ci self-test lint failures (non-blocking for M2 recipe CI)
|
||||
|
||||
Push-event builds #683/#686/#687 fail at `scripts/lint.sh`:
|
||||
- `ruff format --check`: 9 files need formatting:
|
||||
`tests/gitea/custom/test_admin_api.py`, `test_git_push.py`, `test_lfs_roundtrip.py`,
|
||||
`tests/gitea/ops.py`, `recipe_meta.py`, `test_backup.py`, `test_install.py`, `test_upgrade.py`,
|
||||
`tests/unit/test_discovery.py`
|
||||
- `ruff check`: 9 errors (at least `bridge/bridge.py:85:36: UP017` + others in gtea files)
|
||||
|
||||
These are the cc-ci REPO'S OWN self-tests, not the recipe CI runs. They do NOT gate M2 recipe
|
||||
CI (which runs via custom events). However, they reflect code quality debt and should be fixed.
|
||||
`ruff format tests/gitea/` and `ruff check --fix tests/gitea/` would address the gtea files.
|
||||
The `bridge.py UP017` may be pre-existing.
|
||||
|
||||
Filed in BACKLOG-gtea.md Adversary findings.
|
||||
|
||||
### Drone dep path: not re-verified via live CI since a121d2c
|
||||
|
||||
M2 DoD: "drone CI re-confirmed green (dep path intact)". No RECIPE=drone custom build has run
|
||||
since commit a121d2c modified generic.py and recipe_meta.py. Unit tests (test_gitea_dep.py 10/10)
|
||||
still pass and cover the dep path code-level. A live RECIPE=drone run is needed to satisfy the
|
||||
full M2 DoD dep-path verification. Filed in BACKLOG as pending.
|
||||
|
||||
## M2 VERDICT: PENDING — new critical blocker in build #685
|
||||
|
||||
1. ✓ M2 main-branch condition MET (build #684, level=5)
|
||||
2. ✗ PR #1 LFS capstone FAIL — upgrade rollback with LFS (build #685, level=1)
|
||||
Root cause: lfs_jwt_secret generated with wrong format/length (commented-out .env.sample spec)
|
||||
|
||||
Gate M2: **NOT CLAIMED** — Builder must fix lfs_jwt_secret generation and re-trigger build #685
|
||||
|
||||
Reference in New Issue
Block a user