Run 674 (main): upgrade FAIL ("not intended PR-head"); run 676 (PR#1 LFS): test_lfs_roundtrip
fails at git-push batch endpoint (LFS not enabled in deployed container). Builder must fix before M2.
5.6 KiB
BACKLOG — phase gtea (gitea full-test enrollment)
Build backlog
(Builder-owned — read-only to Adversary)
- 0. Prerequisites verified (timezone, recipe, backup labels)
- 1. Write all gitea test files (recipe_meta.py + ops.py + lifecycle overlays + custom + PARITY.md)
- 2. Run harness locally against cc-ci (install + upgrade + backup + restore + custom) on gitea main Run 846690: level=5/5 (all PASS). Fixes: _csrf→user_name selector; cred_url git push; auto_init repo; token scopes for gitea 1.22+; NixOS git-lfs deploy.
- 3. Confirm drone CI stays green (dep path unaffected by recipe_meta.py changes) Unit tests pass (10/10 gitea dep + 43/43 meta). Drone dep path byte-for-byte unchanged.
- 4. Verify LFS test correctly skips on main (compose.lfs.yml absent) SKIPPED with expected message in run 846690. PASS.
- 5. CLAIM M1 — ADVERSARY PASS @2026-06-15T20:32Z (commit
a106036) - 6. Run full harness via real CI / !testme on gitea recipe IN FLIGHT: Drone build #675 (RECIPE=gitea REF=main PR=0) — triggered 20:34Z
- 7. Run harness on lfs-plain-gitea head → LFS test must go green IN FLIGHT: Drone build #676 (PR=1 REF=357926f2) — !testme posted 20:34Z
- 8. Post !testme on PR #1 so result lands in PR DONE (posted 20:34Z, build #676, PENDING)
- 9. CLAIM M2 (await Adversary PASS)
- 10. Write ## DONE (all Adversary PASSes)
Adversary findings
(Adversary-owned — only the Adversary writes this section)
[critical — M2 blocker] LFS test fails in run 676 @2026-06-15T20:36Z
Drone build 676 (RECIPE=gitea, PR=1, REF=357926f2): all lifecycle stages PASS but
custom FAIL — test_lfs_roundtrip fails at git push with:
batch response: Repository or object not found:
https://ci_admin:<passwd>@gite-e1cb78.ci.commoninternet.net/ci_admin/ci-lfs-test.git/info/lfs/objects/batch
Level=3 (install+upgrade+backup_restore pass, functional FAIL).
Diagnosis: gitea ran WITHOUT LFS enabled at server level (LFS_START_SERVER = false in app.ini).
_lfs_available() returned True (compose.lfs.yml was in the per-run ABRA_DIR at test time —
recipe reflog confirms checkout to 357926f2 at 20:35:58, 38s before the test at 20:36:36).
Root cause under investigation: EXTRA_ENV sets COMPOSE_FILE to include compose.lfs.yml when
_lfs_enabled() is True. But the upgrade tier's abra base-deploy internally checks out
3.5.2+1.24.2-rootless tag in the recipe dir (reflog: 20:35:37) removing compose.lfs.yml, then
harness re-checkouts 357926f2 at 20:35:58. Depending on WHEN the install deploy runs relative to
these checkouts, COMPOSE_FILE and/or SECRET_LFS_JWT_SECRET_VERSION may not have been correctly
resolved.
Most likely cause: compose.lfs.yml was NOT included in the actual docker stack deploy command
(either because EXTRA_ENV was evaluated before compose.lfs.yml existed, or because the lfs_jwt_secret
Docker secret was not generated since SECRET_LFS_JWT_SECRET_VERSION=v1 only exists in the EXTRA_ENV
dict, not in the .env FILE that abra secret generate reads).
Builder must: reproduce locally with RECIPE=gitea, PR=1, REF=357926f2; verify compose.lfs.yml is in COMPOSE_FILE at deploy time; verify lfs_jwt_secret Docker secret is generated; verify LFS_START_SERVER=true and LFS_JWT_SECRET= appear in /etc/gitea/app.ini inside the container.
[critical — M2 blocker] Upgrade fails on main-branch CI run (run 674) @2026-06-15T20:36Z
Drone build 674 (RECIPE=gitea, PR=0, REF=main): upgrade FAIL with: "upgrade deployed chaos commit 'e6a1cc79', not the intended PR-head 'main' — the re-checkout to the code under test failed, so the upgrade is not exercised." Level=1 (install pass only).
This is the M2 main-branch CI run that must be level=5. With upgrade failing, M2 cannot pass. Builder must investigate why REF=main doesn't work correctly for the upgrade tier.
[non-blocking — concurrency] Run 675 install failure @2026-06-15T20:36Z
4 !testme comments were posted concurrently → 4 Drone builds triggered simultaneously (674, 675, 676, +). Builds 674 and 675 both have PR=0/REF=main → same app domain → lock contention. Run 675 started while 674 had the lock → found stale state → ci_admin creds cached but user gone (409 create path) → 401 on API calls → level=0.
Not a code bug. Builder should post ONE !testme at a time to avoid concurrency collisions.
The concurrent lock mechanism should prevent partial-state damage, but the stale cred cache
(/tmp/ccci-gitea-admin-<domain>.json) persists and causes 401s.
[non-blocking] Stale screenshot in manual runs @2026-06-15T20:32Z
/var/lib/cc-ci-runs/manual/screenshot.png mtime = June 13, not from today's M1 run.
Root cause: screenshot.capture() (screenshot.py:149) checks if not os.path.exists(out_path)
after the SCREENSHOT hook runs. For run_id="manual", out_path reuses the same directory
(/var/lib/cc-ci-runs/manual/screenshot.png), so if a prior manual run left a file there, the
guard prevents overwriting it. The SCREENSHOT hook (recipe_meta.py) navigates to the login page
but doesn't call page.screenshot() itself — that's the harness's job, blocked by the guard.
Impact: results.json shows "screenshot": "screenshot.png" (file exists, non-empty) but the
image is from a prior session. Cosmetic only — does not affect verdict (R7).
M2 runs with DRONE_BUILD_NUMBER → unique dir → no issue.
Recommendation: screenshot.capture() should always overwrite (remove if not exists guard),
or the Builder could add page.screenshot(path=out_path) at the end of the SCREENSHOT hook.
No action required for M1/M2 gates. Pre-existing harness limitation, not Builder error.