Files
cc-ci/machine-docs/JOURNAL-gtea.md
autonomic-bot 89c2d70acf
All checks were successful
continuous-integration/drone/push Build is passing
journal(gtea): Blocker 4 fix + STACK_NAME discovery + ruff cleanup
2026-06-15 21:57:47 +00:00

10 KiB

JOURNAL — phase gtea (gitea full-test enrollment)

Builder private log. Append-only.


2026-06-15 — Phase start + initial suite build

Context read

  • Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-gtea-gitea-fulltests.md
  • Reference tests: /srv/cc-ci-orch/references/recipe-maintainer/recipe-info/gitea/tests/
    • health_check.py — checks HTTP 200 from root URL
    • git_push.py — create repo → clone → push → verify via API → delete repo
    • NOTE: These files exist ONLY in the local references directory, NOT in the upstream recipe-maintainers/gitea repo (which has no tests/ directory). PARITY.md updated to reflect this accurately (references are from recipe-info corpus, not the upstream recipe).
  • gitea recipe on cc-ci: compose.yml (backupbot.backup=true), compose.sqlite3.yml
  • PR #1 (lfs-plain-gitea → main): adds compose.lfs.yml + LFS_JWT_SECRET in app.ini.tmpl
  • Versions in abra release dir: 2.0.0+1.18.0, 2.1.2+1.19.3, 2.6.0+1.21.5, 3.0.0+1.22.2-rootless
  • Adversary notes: latest recipe tag is 3.5.3+1.24.2-rootless; LFS PR bumps to 3.6.0

Design decisions

LFS dep-vs-recipe-under-test split mechanism:

  • EXTRA_ENV(ctx) checks TWO conditions: (1) compose.lfs.yml exists in $ABRA_DIR/recipes/gitea/, AND (2) RECIPE=gitea env var is set. Both conditions required.
  • Condition (1) ensures LFS is never enabled on main (overlay absent).
  • Condition (2) ensures LFS is never enabled when gitea is drone's dep (RECIPE=drone).
  • The dep path is thus byte-for-byte identical whether or not compose.lfs.yml exists.
  • Decision documented in DECISIONS.md (phase gtea).

Admin user management:

  • gitea has no built-in admin user from abra deploy. Admin is created via gitea admin user create.
  • ops.pre_install creates admin user ci_admin with a random 32-char hex password.
  • Credentials stored at /tmp/ccci-gitea-admin-{domain}.json (mode 600) for reuse across hook calls.
  • All subsequent pre_* hooks read from this file (ops module re-imported per op).

Marker repo:

  • Marker = git repo named ci-marker owned by ci_admin, auto_init=True.
  • pre_upgrade/pre_backup: ensure marker exists (idempotent create)
  • pre_restore: DELETE the marker repo (diverge from backup state)
  • test_upgrade: assert marker survived chaos redeploy
  • test_backup: assert marker exists at backup time
  • test_restore: assert marker returned (restore reverted deletion)

Files written

  1. tests/gitea/recipe_meta.py — UPDATED (added BACKUP_CAPABLE, READY_PROBE, SCREENSHOT, LFS-conditional EXTRA_ENV; header updated to dual-role)
  2. tests/gitea/ops.py — NEW (admin user + marker repo hooks)
  3. tests/gitea/test_install.py — NEW (assert_serving + API + admin auth + Playwright)
  4. tests/gitea/test_upgrade.py — NEW (marker survived upgrade)
  5. tests/gitea/test_backup.py — NEW (marker captured in backup)
  6. tests/gitea/test_restore.py — NEW (marker returned after restore)
  7. tests/gitea/custom/test_health.py — NEW (parity: HTTP 200 from root)
  8. tests/gitea/custom/test_git_push.py — NEW (parity: create→clone→push→verify→delete)
  9. tests/gitea/custom/test_admin_api.py — NEW (beyond-parity: user+org+token CRUD)
  10. tests/gitea/custom/test_lfs_roundtrip.py — NEW (LFS capstone; skips on main)
  11. tests/gitea/PARITY.md — NEW

Unit test results after changes

tests/unit/test_gitea_dep.py: 10/10 PASSED
tests/unit/test_meta.py: 43/43 PASSED
All unit tests: 269 passed, 1 pre-existing failure (test_warm_reconcile.py - unrelated)

Next: run harness locally (BACKLOG item 2)


2026-06-15 — Harness run + M1 claim

Bugs found and fixed during harness run

  1. Playwright _csrf selector (test_install.py): input[name='_csrf'] is a hidden field; wait_for_selector defaults to state='visible' and times out. Fixed: use input#user_name (the visible username field). Root cause: gitea renders CSRF as type="hidden".

  2. git credential injection (test_git_push.py + test_lfs_roundtrip.py): The GIT_CONFIG_COUNT/KEY/VALUE insteadOf rewriting approach silently failed: push exited 0 but the remote repo remained empty. Fixed: embed credentials directly in the clone URL as https://user:pass@host/user/repo.git. Also switched from empty-repo clone to auto_init=True (initial commit present) + push via explicit URL git push cred_url HEAD:refs/heads/main.

  3. double /api/v1 in LFS restart poll (test_lfs_roundtrip.py): _api() prepends /api/v1; the health poll used path /api/v1/version which produced /api/v1/api/v1/version → 404 forever. Fixed: changed path to /version.

  4. Token scope required (test_admin_api.py): gitea 1.22+ requires scopes in token creation body. Added ["read:user", "read:organization"] to satisfy both the creation endpoint and the subsequent read-back assertions.

  5. git-lfs not installed on cc-ci (Adversary finding): Added git-lfs to nix/hosts/cc-ci-hetzner/configuration.nix systemPackages. Deployed via nixos-rebuild switch --flake '/root/builder-clone?submodules=1#cc-ci' 2>&1. Note: secrets/ is a git submodule (gitignored but tracked); must use ?submodules=1 in flake URL. git-lfs 3.6.1 confirmed installed post-deploy.

Harness results (run 846690)

install : PASS
upgrade : PASS
backup  : PASS
restore : PASS
custom  : PASS  (admin_api PASS, git_push PASS, health PASS, lfs_roundtrip SKIPPED ✓)
Level: 5/5

LFS test self-skips with expected message: "compose.lfs.yml absent in gitea recipe checkout".

M1 CLAIMED

Commit chain: 6ac998974bc5f0 (selector fix → full test suite → all harness fixes → git-lfs NixOS) Adversary findings from BUILDER-INBOX consumed in 446bafe. M1 claim commit: see claim(gtea): below.

Next: await Adversary M1 PASS → proceed to BACKLOG items 6-8 (real CI + LFS PR)


2026-06-15 — M2 builds analysis + fixes

Adversary inbox consumed @20:50Z

BUILDER-INBOX had two critical M2 blockers:

  1. LFS roundtrip FAIL (run 676): LFS not running in upgrade deploy
  2. Upgrade FAIL on main (run 674): REF="main" fails HC1 SHA comparison

Root cause analysis

Blocker 1 (LFS): Recipe checkout timeline in run 676:

  • 20:35:35: Initial clone at 357926f2 (compose.lfs.yml present)
  • 20:35:37: abra base-deploy checks out 3.5.2+1.24.2-rootless (compose.lfs.yml REMOVED)
  • 20:35:58: harness re-checks out 357926f2 for upgrade (compose.lfs.yml RESTORED)

The key: EXTRA_ENV is called AFTER abra.recipe_checkout(version) in deploy_app. At that point compose.lfs.yml is absent → EXTRA_ENV returns sqlite3-only → install runs without LFS. Then UPGRADE_EXTRA_ENV (undefined for gitea) → no update to COMPOSE_FILE → chaos redeploy also without compose.lfs.yml. But _lfs_available() checks disk and finds compose.lfs.yml (restored at 20:35:58) → test runs but LFS server is off → batch endpoint: "not found".

Fix: Added UPGRADE_EXTRA_ENV to recipe_meta.py (returns compose.lfs.yml in COMPOSE_FILE when present after PR-head checkout) + abra.secret_generate() call in generic.perform_upgrade when upgrade_env is non-empty (to generate lfs_jwt_secret before chaos redeploy).

Blocker 2 (REF=main HC1): HC1 check: head_ref.startswith(chaos_commit) or chaos_commit.startswith(head_ref) When head_ref="main" and chaos_commit="e6a1cc79": both checks fail. Fix: always use lifecycle.recipe_head_commit(recipe) (git rev-parse HEAD) for head_ref instead of ref directly. After the fetch/checkout, HEAD is at the correct SHA.

Blocker 3 (stale creds file, build #675): /tmp/ccci-gitea-admin-{domain}.json persists across runs. Fresh install wipes the DB, but pre_install finds the stale file and returns old credentials → 401 on all API calls. Fix: pre_install deletes the creds file before calling _ensure_admin.

Fixes applied (commit a121d2c)

  • tests/gitea/ops.py: delete stale creds file in pre_install
  • tests/gitea/recipe_meta.py: add UPGRADE_EXTRA_ENV (LFS upgrade trigger)
  • runner/harness/generic.py: abra.secret_generate() in upgrade when upgrade_env non-empty
  • runner/run_recipe_ci.py: head_ref = recipe_head_commit() always (not ref directly)

Unit tests: 53/53 pass (test_gitea_dep.py 10/10, test_meta.py 43/43)

CI builds re-triggered

Build #684: RECIPE=gitea REF=main PR=0 (main branch, all tiers) Build #685: RECIPE=gitea REF=357926f2 PR=1 (LFS PR capstone) Both running as of 21:04Z.


2026-06-15 — Blocker 4 fix + ruff cleanup

BUILDER-INBOX consumption (from Adversary @21:30Z)

Adversary confirmed:

  • Build #684 (RECIPE=gitea REF=main PR=0): PASS level=5 — M2 main-branch condition MET
  • Build #685 (RECIPE=gitea PR=1 REF=357926f2): FAIL level=1 — new Blocker 4

Blocker 4: lfs_jwt_secret rollback. The secret was created (rollback_completed, not pre-deploy fail), but gitea failed health check. Root cause: .env.sample in lfs-plain-gitea PR has # SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 COMMENTED OUT. abra generate --all then uses wrong default length. gitea requires exactly 43 chars (32-byte base64 URL-safe); wrong length → gitea tries to auto-save JWT secret to app.ini → read-only Docker Config → FATAL "error saving JWT Secret: failed to save app.ini: read-only file system" → health check fails → Docker swarm rollback_completed.

Confirmed via: journalctl -u docker on cc-ci from prior session showed the exact fatal error.

Fix design

New UPGRADE_SECRET_PREP(ctx) hook in meta.py, called BEFORE abra secret generate --all in perform_upgrade(). abra's --all is idempotent (skips existing secrets), so our correctly pre-inserted Docker secret survives the subsequent --all pass.

gitea's UPGRADE_SECRET_PREP uses docker secret create {STACK_NAME}_lfs_jwt_secret_v1 - with a Python-generated 43-char value: base64.urlsafe_b64encode(os.urandom(32)).rstrip(b"=").

Discovery: abra does NOT store STACK_NAME in the .env file. Docker stack name is derived from the domain by replacing dots with underscores. Verified from docker stack ls:

  • drone.ci.commoninternet.net → drone_ci_commoninternet_net

Build #691 failed with "STACK_NAME not found" (tried to read from .env, key absent). Fixed in ad53b5a: derive STACK_NAME from ctx.domain.replace(".", "_").

Runs in this session

  • Build #691 (PR=1): FAIL — STACK_NAME not found in .env (fixed in ad53b5a)
  • Build #692 (RECIPE=drone REF=main): PASS level=5 — dep path confirmed after a121d2c changes
  • Build #695 (PR=1, STACK_NAME fix): IN FLIGHT

Ruff cleanup

All 9 gtea files + test_discovery.py + bridge/bridge.py reformatted/check-fixed. manifest.py B007 (unused loop variable path_path) fixed manually. scripts/lint.sh: PASS (verified on builder-clone @22:00Z).