Files
cc-ci/machine-docs/JOURNAL-gtea.md
autonomic-bot 89c2d70acf
All checks were successful
continuous-integration/drone/push Build is passing
journal(gtea): Blocker 4 fix + STACK_NAME discovery + ruff cleanup
2026-06-15 21:57:47 +00:00

224 lines
10 KiB
Markdown

# JOURNAL — phase gtea (gitea full-test enrollment)
Builder private log. Append-only.
---
## 2026-06-15 — Phase start + initial suite build
### Context read
- Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-gtea-gitea-fulltests.md
- Reference tests: /srv/cc-ci-orch/references/recipe-maintainer/recipe-info/gitea/tests/
- health_check.py — checks HTTP 200 from root URL
- git_push.py — create repo → clone → push → verify via API → delete repo
- NOTE: These files exist ONLY in the local references directory, NOT in the upstream
recipe-maintainers/gitea repo (which has no tests/ directory). PARITY.md updated to
reflect this accurately (references are from recipe-info corpus, not the upstream recipe).
- gitea recipe on cc-ci: compose.yml (backupbot.backup=true), compose.sqlite3.yml
- PR #1 (lfs-plain-gitea → main): adds compose.lfs.yml + LFS_JWT_SECRET in app.ini.tmpl
- Versions in abra release dir: 2.0.0+1.18.0, 2.1.2+1.19.3, 2.6.0+1.21.5, 3.0.0+1.22.2-rootless
- Adversary notes: latest recipe tag is 3.5.3+1.24.2-rootless; LFS PR bumps to 3.6.0
### Design decisions
**LFS dep-vs-recipe-under-test split mechanism:**
- EXTRA_ENV(ctx) checks TWO conditions: (1) compose.lfs.yml exists in $ABRA_DIR/recipes/gitea/,
AND (2) RECIPE=gitea env var is set. Both conditions required.
- Condition (1) ensures LFS is never enabled on main (overlay absent).
- Condition (2) ensures LFS is never enabled when gitea is drone's dep (RECIPE=drone).
- The dep path is thus byte-for-byte identical whether or not compose.lfs.yml exists.
- Decision documented in DECISIONS.md (phase gtea).
**Admin user management:**
- gitea has no built-in admin user from abra deploy. Admin is created via `gitea admin user create`.
- ops.pre_install creates admin user `ci_admin` with a random 32-char hex password.
- Credentials stored at /tmp/ccci-gitea-admin-{domain}.json (mode 600) for reuse across hook calls.
- All subsequent pre_* hooks read from this file (ops module re-imported per op).
**Marker repo:**
- Marker = git repo named `ci-marker` owned by `ci_admin`, auto_init=True.
- pre_upgrade/pre_backup: ensure marker exists (idempotent create)
- pre_restore: DELETE the marker repo (diverge from backup state)
- test_upgrade: assert marker survived chaos redeploy
- test_backup: assert marker exists at backup time
- test_restore: assert marker returned (restore reverted deletion)
### Files written
1. tests/gitea/recipe_meta.py — UPDATED (added BACKUP_CAPABLE, READY_PROBE, SCREENSHOT,
LFS-conditional EXTRA_ENV; header updated to dual-role)
2. tests/gitea/ops.py — NEW (admin user + marker repo hooks)
3. tests/gitea/test_install.py — NEW (assert_serving + API + admin auth + Playwright)
4. tests/gitea/test_upgrade.py — NEW (marker survived upgrade)
5. tests/gitea/test_backup.py — NEW (marker captured in backup)
6. tests/gitea/test_restore.py — NEW (marker returned after restore)
7. tests/gitea/custom/test_health.py — NEW (parity: HTTP 200 from root)
8. tests/gitea/custom/test_git_push.py — NEW (parity: create→clone→push→verify→delete)
9. tests/gitea/custom/test_admin_api.py — NEW (beyond-parity: user+org+token CRUD)
10. tests/gitea/custom/test_lfs_roundtrip.py — NEW (LFS capstone; skips on main)
11. tests/gitea/PARITY.md — NEW
### Unit test results after changes
```
tests/unit/test_gitea_dep.py: 10/10 PASSED
tests/unit/test_meta.py: 43/43 PASSED
All unit tests: 269 passed, 1 pre-existing failure (test_warm_reconcile.py - unrelated)
```
### Next: run harness locally (BACKLOG item 2)
---
## 2026-06-15 — Harness run + M1 claim
### Bugs found and fixed during harness run
1. **Playwright `_csrf` selector (test_install.py)**: `input[name='_csrf']` is a hidden field;
`wait_for_selector` defaults to `state='visible'` and times out. Fixed: use `input#user_name`
(the visible username field). Root cause: gitea renders CSRF as `type="hidden"`.
2. **git credential injection (test_git_push.py + test_lfs_roundtrip.py)**: The
`GIT_CONFIG_COUNT/KEY/VALUE` insteadOf rewriting approach silently failed: push exited 0 but
the remote repo remained empty. Fixed: embed credentials directly in the clone URL as
`https://user:pass@host/user/repo.git`. Also switched from empty-repo clone to auto_init=True
(initial commit present) + push via explicit URL `git push cred_url HEAD:refs/heads/main`.
3. **double /api/v1 in LFS restart poll (test_lfs_roundtrip.py)**: `_api()` prepends `/api/v1`;
the health poll used path `/api/v1/version` which produced `/api/v1/api/v1/version` → 404 forever.
Fixed: changed path to `/version`.
4. **Token scope required (test_admin_api.py)**: gitea 1.22+ requires `scopes` in token creation
body. Added `["read:user", "read:organization"]` to satisfy both the creation endpoint and the
subsequent read-back assertions.
5. **git-lfs not installed on cc-ci (Adversary finding)**: Added `git-lfs` to
`nix/hosts/cc-ci-hetzner/configuration.nix` systemPackages. Deployed via
`nixos-rebuild switch --flake '/root/builder-clone?submodules=1#cc-ci' 2>&1`. Note: secrets/
is a git submodule (gitignored but tracked); must use `?submodules=1` in flake URL.
git-lfs 3.6.1 confirmed installed post-deploy.
### Harness results (run 846690)
```
install : PASS
upgrade : PASS
backup : PASS
restore : PASS
custom : PASS (admin_api PASS, git_push PASS, health PASS, lfs_roundtrip SKIPPED ✓)
Level: 5/5
```
LFS test self-skips with expected message: "compose.lfs.yml absent in gitea recipe checkout".
### M1 CLAIMED
Commit chain: 6ac9989 → 74bc5f0 (selector fix → full test suite → all harness fixes → git-lfs NixOS)
Adversary findings from BUILDER-INBOX consumed in 446bafe.
M1 claim commit: see `claim(gtea):` below.
### Next: await Adversary M1 PASS → proceed to BACKLOG items 6-8 (real CI + LFS PR)
---
## 2026-06-15 — M2 builds analysis + fixes
### Adversary inbox consumed @20:50Z
BUILDER-INBOX had two critical M2 blockers:
1. LFS roundtrip FAIL (run 676): LFS not running in upgrade deploy
2. Upgrade FAIL on main (run 674): REF="main" fails HC1 SHA comparison
### Root cause analysis
**Blocker 1 (LFS):**
Recipe checkout timeline in run 676:
- 20:35:35: Initial clone at 357926f2 (compose.lfs.yml present)
- 20:35:37: abra base-deploy checks out 3.5.2+1.24.2-rootless (compose.lfs.yml REMOVED)
- 20:35:58: harness re-checks out 357926f2 for upgrade (compose.lfs.yml RESTORED)
The key: EXTRA_ENV is called AFTER abra.recipe_checkout(version) in deploy_app. At that point
compose.lfs.yml is absent → EXTRA_ENV returns sqlite3-only → install runs without LFS.
Then UPGRADE_EXTRA_ENV (undefined for gitea) → no update to COMPOSE_FILE → chaos redeploy
also without compose.lfs.yml. But _lfs_available() checks disk and finds compose.lfs.yml
(restored at 20:35:58) → test runs but LFS server is off → batch endpoint: "not found".
Fix: Added UPGRADE_EXTRA_ENV to recipe_meta.py (returns compose.lfs.yml in COMPOSE_FILE
when present after PR-head checkout) + abra.secret_generate() call in generic.perform_upgrade
when upgrade_env is non-empty (to generate lfs_jwt_secret before chaos redeploy).
**Blocker 2 (REF=main HC1):**
HC1 check: `head_ref.startswith(chaos_commit) or chaos_commit.startswith(head_ref)`
When head_ref="main" and chaos_commit="e6a1cc79": both checks fail.
Fix: always use `lifecycle.recipe_head_commit(recipe)` (git rev-parse HEAD) for head_ref
instead of `ref` directly. After the fetch/checkout, HEAD is at the correct SHA.
**Blocker 3 (stale creds file, build #675):**
/tmp/ccci-gitea-admin-{domain}.json persists across runs. Fresh install wipes the DB, but
pre_install finds the stale file and returns old credentials → 401 on all API calls.
Fix: pre_install deletes the creds file before calling _ensure_admin.
### Fixes applied (commit a121d2c)
- tests/gitea/ops.py: delete stale creds file in pre_install
- tests/gitea/recipe_meta.py: add UPGRADE_EXTRA_ENV (LFS upgrade trigger)
- runner/harness/generic.py: abra.secret_generate() in upgrade when upgrade_env non-empty
- runner/run_recipe_ci.py: head_ref = recipe_head_commit() always (not ref directly)
Unit tests: 53/53 pass (test_gitea_dep.py 10/10, test_meta.py 43/43)
### CI builds re-triggered
Build #684: RECIPE=gitea REF=main PR=0 (main branch, all tiers)
Build #685: RECIPE=gitea REF=357926f2 PR=1 (LFS PR capstone)
Both running as of 21:04Z.
---
## 2026-06-15 — Blocker 4 fix + ruff cleanup
### BUILDER-INBOX consumption (from Adversary @21:30Z)
Adversary confirmed:
- Build #684 (RECIPE=gitea REF=main PR=0): PASS level=5 — M2 main-branch condition MET
- Build #685 (RECIPE=gitea PR=1 REF=357926f2): FAIL level=1 — new Blocker 4
Blocker 4: lfs_jwt_secret rollback. The secret was created (rollback_completed, not pre-deploy
fail), but gitea failed health check. Root cause: `.env.sample` in lfs-plain-gitea PR has
`# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43` COMMENTED OUT. abra `generate --all` then
uses wrong default length. gitea requires exactly 43 chars (32-byte base64 URL-safe); wrong
length → gitea tries to auto-save JWT secret to app.ini → read-only Docker Config → FATAL
"error saving JWT Secret: failed to save app.ini: read-only file system" → health check fails
→ Docker swarm rollback_completed.
Confirmed via: journalctl -u docker on cc-ci from prior session showed the exact fatal error.
### Fix design
New `UPGRADE_SECRET_PREP(ctx)` hook in meta.py, called BEFORE `abra secret generate --all`
in perform_upgrade(). abra's `--all` is idempotent (skips existing secrets), so our correctly
pre-inserted Docker secret survives the subsequent --all pass.
gitea's UPGRADE_SECRET_PREP uses `docker secret create {STACK_NAME}_lfs_jwt_secret_v1 -`
with a Python-generated 43-char value: `base64.urlsafe_b64encode(os.urandom(32)).rstrip(b"=")`.
Discovery: abra does NOT store STACK_NAME in the .env file. Docker stack name is derived from
the domain by replacing dots with underscores. Verified from `docker stack ls`:
- drone.ci.commoninternet.net → drone_ci_commoninternet_net
Build #691 failed with "STACK_NAME not found" (tried to read from .env, key absent).
Fixed in ad53b5a: derive STACK_NAME from ctx.domain.replace(".", "_").
### Runs in this session
- Build #691 (PR=1): FAIL — STACK_NAME not found in .env (fixed in ad53b5a)
- Build #692 (RECIPE=drone REF=main): PASS level=5 — dep path confirmed after a121d2c changes
- Build #695 (PR=1, STACK_NAME fix): IN FLIGHT
### Ruff cleanup
All 9 gtea files + test_discovery.py + bridge/bridge.py reformatted/check-fixed.
manifest.py B007 (unused loop variable `path``_path`) fixed manually.
scripts/lint.sh: PASS (verified on builder-clone @22:00Z).