Build #695 (RECIPE=gitea PR=1 REF=357926f26e69): level=5/5, test_lfs_roundtrip PASS (18s). Build #692 (RECIPE=drone REF=main): level=5/5, dep path confirmed. All 6 M2 DoD conditions met per Adversary REVIEW-gtea.md @2026-06-15T22:10Z. Phase gtea complete. Gitea enrolled as a fully-tested recipe with LFS PR verified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
178 lines
11 KiB
Markdown
178 lines
11 KiB
Markdown
# BACKLOG — phase gtea (gitea full-test enrollment)
|
|
|
|
## Build backlog
|
|
(Builder-owned — read-only to Adversary)
|
|
|
|
- [x] 0. Prerequisites verified (timezone, recipe, backup labels)
|
|
- [x] 1. Write all gitea test files (recipe_meta.py + ops.py + lifecycle overlays + custom + PARITY.md)
|
|
- [x] 2. Run harness locally against cc-ci (install + upgrade + backup + restore + custom) on gitea main
|
|
Run 846690: level=5/5 (all PASS). Fixes: _csrf→user_name selector; cred_url git push;
|
|
auto_init repo; token scopes for gitea 1.22+; NixOS git-lfs deploy.
|
|
- [x] 3. Confirm drone CI stays green (dep path unaffected by recipe_meta.py changes)
|
|
Unit tests pass (10/10 gitea dep + 43/43 meta). Drone dep path byte-for-byte unchanged.
|
|
- [x] 4. Verify LFS test correctly skips on main (compose.lfs.yml absent)
|
|
SKIPPED with expected message in run 846690. PASS.
|
|
- [x] 5. CLAIM M1 — ADVERSARY PASS @2026-06-15T20:32Z (commit a106036)
|
|
- [~] 6. Run full harness via real CI / !testme on gitea recipe
|
|
Builds #674/#675 FAILED (blocker: head_ref="main" fails HC1; stale creds).
|
|
FIXED in commit a121d2c. Retriggered as build #681 (RECIPE=gitea REF=main PR=0) @21:00Z
|
|
- [~] 7. Run harness on lfs-plain-gitea head → LFS test must go green
|
|
Build #676 FAILED (blocker: LFS not enabled in upgrade chaos redeploy).
|
|
FIXED in commit a121d2c. Retriggered as build #682 (PR=1 REF=357926f2) @21:00Z
|
|
- [x] 8. Post !testme on PR #1 so result lands in PR
|
|
DONE (posted 20:34Z, build #676, PENDING; re-triggered as #682)
|
|
- [x] 9. CLAIM M2 — ADVERSARY PASS @2026-06-15T22:10Z (commit 90522ee)
|
|
Build #695 (PR=1 LFS): level=5, test_lfs_roundtrip PASS. Build #692 (drone): level=5.
|
|
- [x] 10. Write ## DONE — STATUS-gtea.md updated; phase complete.
|
|
|
|
## Adversary findings
|
|
(Adversary-owned — only the Adversary writes this section)
|
|
|
|
### [critical — M2 blocker] LFS test fails in run 676 @2026-06-15T20:36Z
|
|
|
|
Drone build 676 (RECIPE=gitea, PR=1, REF=357926f2): all lifecycle stages PASS but
|
|
custom FAIL — `test_lfs_roundtrip` fails at `git push` with:
|
|
```
|
|
batch response: Repository or object not found:
|
|
https://ci_admin:<passwd>@gite-e1cb78.ci.commoninternet.net/ci_admin/ci-lfs-test.git/info/lfs/objects/batch
|
|
```
|
|
Level=3 (install+upgrade+backup_restore pass, functional FAIL).
|
|
|
|
Diagnosis: gitea ran WITHOUT LFS enabled at server level (`LFS_START_SERVER = false` in app.ini).
|
|
`_lfs_available()` returned True (compose.lfs.yml was in the per-run ABRA_DIR at test time —
|
|
recipe reflog confirms checkout to 357926f2 at 20:35:58, 38s before the test at 20:36:36).
|
|
|
|
Root cause under investigation: EXTRA_ENV sets COMPOSE_FILE to include compose.lfs.yml when
|
|
`_lfs_enabled()` is True. But the upgrade tier's abra base-deploy internally checks out
|
|
`3.5.2+1.24.2-rootless` tag in the recipe dir (reflog: 20:35:37) removing compose.lfs.yml, then
|
|
harness re-checkouts 357926f2 at 20:35:58. Depending on WHEN the install deploy runs relative to
|
|
these checkouts, COMPOSE_FILE and/or SECRET_LFS_JWT_SECRET_VERSION may not have been correctly
|
|
resolved.
|
|
|
|
Most likely cause: compose.lfs.yml was NOT included in the actual `docker stack deploy` command
|
|
(either because EXTRA_ENV was evaluated before compose.lfs.yml existed, or because the lfs_jwt_secret
|
|
Docker secret was not generated since SECRET_LFS_JWT_SECRET_VERSION=v1 only exists in the EXTRA_ENV
|
|
dict, not in the .env FILE that `abra secret generate` reads).
|
|
|
|
Builder must: reproduce locally with RECIPE=gitea, PR=1, REF=357926f2; verify compose.lfs.yml is
|
|
in COMPOSE_FILE at deploy time; verify lfs_jwt_secret Docker secret is generated; verify
|
|
LFS_START_SERVER=true and LFS_JWT_SECRET=<value> appear in /etc/gitea/app.ini inside the container.
|
|
|
|
### [critical — M2 blocker] Upgrade fails on main-branch CI run (run 674) @2026-06-15T20:36Z
|
|
|
|
Drone build 674 (RECIPE=gitea, PR=0, REF=main): upgrade FAIL with:
|
|
"upgrade deployed chaos commit 'e6a1cc79', not the intended PR-head 'main' — the re-checkout
|
|
to the code under test failed, so the upgrade is not exercised."
|
|
Level=1 (install pass only).
|
|
|
|
This is the M2 main-branch CI run that must be level=5. With upgrade failing, M2 cannot pass.
|
|
Builder must investigate why REF=main doesn't work correctly for the upgrade tier.
|
|
|
|
### [non-blocking — concurrency] Run 675 install failure @2026-06-15T20:36Z
|
|
|
|
4 !testme comments were posted concurrently → 4 Drone builds triggered simultaneously (674, 675,
|
|
676, +). Builds 674 and 675 both have PR=0/REF=main → same app domain → lock contention.
|
|
Run 675 started while 674 had the lock → found stale state → ci_admin creds cached but user
|
|
gone (409 create path) → 401 on API calls → level=0.
|
|
|
|
Not a code bug. Builder should post ONE !testme at a time to avoid concurrency collisions.
|
|
The concurrent lock mechanism should prevent partial-state damage, but the stale cred cache
|
|
(`/tmp/ccci-gitea-admin-<domain>.json`) persists and causes 401s.
|
|
|
|
### [critical — M2 blocker] LFS upgrade rollback in build #685 @2026-06-15T21:10Z
|
|
|
|
Build #685 (RECIPE=gitea, PR=1, REF=357926f26e69): upgrade FAIL with rollback_completed.
|
|
|
|
Evidence: `abra.secret_generate --all` was called (after UPGRADE_EXTRA_ENV applied
|
|
SECRET_LFS_JWT_SECRET_VERSION=v1). lfs_jwt_secret was created as a Docker secret (rollback_completed
|
|
means container started, not pre-deploy failure). But gitea failed its health check.
|
|
|
|
**Root cause hypothesis**: lfs_jwt_secret generated with WRONG FORMAT/LENGTH because the
|
|
`.env.sample` in PR #1 (lfs-plain-gitea branch) has the entry COMMENTED OUT:
|
|
```
|
|
# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 ← COMMENTED = abra may miss the length=43 spec
|
|
```
|
|
vs active entries (uncommented): `SECRET_JWT_SECRET_VERSION=v1 # length=43`
|
|
|
|
gitea's LFS JWT secret must be exactly 43 chars (base64 URL-safe, 32 bytes). If abra uses
|
|
a different default length, gitea fails to parse the JWT secret and crashes on startup → rollback.
|
|
|
|
**Fix options** (Builder to choose):
|
|
A. In `ops.py pre_install` (when `_lfs_enabled()`): explicitly generate lfs_jwt_secret with
|
|
correct length: `abra._run(["app", "secret", "generate", domain, "lfs_jwt_secret", "v1", ...])`.
|
|
Do NOT rely on `--all` for this secret because the spec is commented out.
|
|
B. In generic.py `perform_upgrade` after UPGRADE_EXTRA_ENV: targeted secret generate (not --all).
|
|
C. Ask the recipe maintainer to uncomment the `SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43`
|
|
line in PR #1's `.env.sample` (and add a note that it's optional but needed for LFS installs).
|
|
|
|
Debug steps before fixing:
|
|
1. After UPGRADE_EXTRA_ENV sets SECRET_LFS_JWT_SECRET_VERSION=v1, run:
|
|
`abra app secret generate <domain> lfs_jwt_secret v1` and inspect the generated Docker secret
|
|
length: `docker secret inspect <stack>_lfs_jwt_secret_v1 --format "{{.Spec.Data}}" | wc -c`
|
|
2. Alternatively: check gitea container logs during the chaos deploy to see the startup error.
|
|
3. A correct 43-char base64 secret should be: `openssl rand -base64 32 | tr -d '='` (43 chars).
|
|
|
|
Cascade effects (all from upgrade rollback):
|
|
- pre_backup FAIL (401 on API call — stale creds after upgrade chaos)
|
|
- pre_restore FAIL (ci-marker not in backed-up snapshot since backup was bad)
|
|
- test_restore FAIL (marker not returned — restore didn't revert non-existent change)
|
|
- custom tests: test_admin_api/test_git_push/test_lfs_roundtrip all 401 (stale creds)
|
|
|
|
Secondary mystery: WHY is ci_admin password invalid (401) after upgrade rollback? The password
|
|
in the sqlite3 DB should be unchanged. Possible: gitea 3.5.3 briefly started during chaos deploy
|
|
and modified the DB before failing health check. Builder should investigate if this is a separate
|
|
bug or purely cascade from the upgrade failure.
|
|
|
|
### [minor — fix before M2 complete] cc-ci self-test lint failures @2026-06-15T21:10Z
|
|
|
|
Push-event CI builds #683/#686/#687 fail at `scripts/lint.sh` (cc-ci repo's own self-test):
|
|
- `ruff format --check` wants to reformat 9 files (all new gtea files + test_discovery.py)
|
|
- `ruff check` has 9 errors (bridge.py UP017 + likely others in gtea files)
|
|
|
|
This does NOT block M2 recipe CI runs (which use custom events). But:
|
|
1. The cc-ci repo's self-test should be green (it's the CI server's own code quality check).
|
|
2. `ruff format` violations in the new gtea files are Builder code quality debt.
|
|
|
|
Fix: `cd /root/builder-clone && nix develop .#lint --command ruff format tests/gitea/ tests/unit/test_discovery.py && nix develop .#lint --command ruff check --fix tests/gitea/`
|
|
Then commit and push to clear the self-test lint failures.
|
|
|
|
### [pending — verify before M2 DONE] Drone dep path: no live CI since a121d2c
|
|
|
|
M2 DoD: "drone CI re-confirmed green (dep path intact)". No RECIPE=drone CI run has run
|
|
since a121d2c modified `runner/harness/generic.py` and `tests/gitea/recipe_meta.py`.
|
|
Unit tests (test_gitea_dep.py 10/10) still pass.
|
|
Builder should trigger a RECIPE=drone run (e.g., post !testme on a drone recipe PR)
|
|
to complete the M2 DoD dep-path verification.
|
|
|
|
### [critical — FIXED] Build #691 STACK_NAME not in .env @2026-06-15T22:05Z
|
|
|
|
Build #691 (RECIPE=gitea, PR=1, REF=357926f26e69): FAIL in UPGRADE_SECRET_PREP hook with:
|
|
`RuntimeError: UPGRADE_SECRET_PREP: STACK_NAME not found in /root/.abra/servers/default/gite-e1cb78.ci.commoninternet.net.env`
|
|
|
|
Root cause: d832b35's UPGRADE_SECRET_PREP read STACK_NAME from the app's .env file. But abra
|
|
does NOT write STACK_NAME to that file — it derives it from the domain at runtime. The .env
|
|
only contains DOMAIN, TYPE, COMPOSE_FILE, and app-specific vars.
|
|
|
|
Fix: derive STACK_NAME from domain as fallback — `domain.replace(".", "_")` — matching abra's
|
|
own derivation (dots replaced by underscores). Applied in commit ad53b5a.
|
|
|
|
Status: FIXED. Build #695 (retriggered) PASS level=5 with test_lfs_roundtrip PASS. ✓
|
|
|
|
### [non-blocking] Stale screenshot in manual runs @2026-06-15T20:32Z
|
|
|
|
`/var/lib/cc-ci-runs/manual/screenshot.png` mtime = June 13, not from today's M1 run.
|
|
|
|
Root cause: `screenshot.capture()` (screenshot.py:149) checks `if not os.path.exists(out_path)`
|
|
after the SCREENSHOT hook runs. For run_id="manual", `out_path` reuses the same directory
|
|
(`/var/lib/cc-ci-runs/manual/screenshot.png`), so if a prior manual run left a file there, the
|
|
guard prevents overwriting it. The SCREENSHOT hook (recipe_meta.py) navigates to the login page
|
|
but doesn't call `page.screenshot()` itself — that's the harness's job, blocked by the guard.
|
|
|
|
Impact: results.json shows `"screenshot": "screenshot.png"` (file exists, non-empty) but the
|
|
image is from a prior session. Cosmetic only — does not affect verdict (R7).
|
|
M2 runs with DRONE_BUILD_NUMBER → unique dir → no issue.
|
|
|
|
Recommendation: `screenshot.capture()` should always overwrite (remove `if not exists` guard),
|
|
or the Builder could add `page.screenshot(path=out_path)` at the end of the SCREENSHOT hook.
|
|
No action required for M1/M2 gates. Pre-existing harness limitation, not Builder error.
|