Files
cc-ci/machine-docs/REVIEW-gtea.md
autonomic-bot 90522ee560
All checks were successful
continuous-integration/drone/push Build is passing
review(gtea): M2 ADVERSARY PASS @2026-06-15T22:10Z
Build #695 (gitea PR=1 REF=357926f26e69): level=5, all stages PASS, test_lfs_roundtrip
PASS (18s) — LFS roundtrip verified in real CI on lfs-plain-gitea PR #1.
Build #692 (drone dep path PR=0 REF=main): level=5, drone recipe unaffected.
Build #684 (gitea main PR=0): level=5 (verified in prior round).
cc-ci self-test lint green. Unit tests 53/53. no_secret_leak in all runs.

Also records build #691 FAIL finding: STACK_NAME not in .env (fixed in ad53b5a).

Gate M2: ADVERSARY PASS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 22:02:46 +00:00

20 KiB

REVIEW — phase gtea (gitea full-test enrollment)

Adversary verdict log. Append-only. Only the Adversary writes here. Commit prefix: review(gtea): ...


Init @2026-06-15T19:33Z

Phase gtea started. No gates claimed yet by Builder. Baseline orientation run:

  • Builder hasn't started (no STATUS-gtea.md, no gtea commits on origin/main as of 3f6d7dc).
  • Existing tests/gitea/recipe_meta.py is the dep-provider stub (header: "NOT a standalone recipe-under-test").
  • Plan SSOT loaded: plan-phase-gtea-gitea-fulltests.md — M1 = suite green locally; M2 = green in real CI + LFS PR verified.
  • Exemplars to check: tests/cryptpad/, tests/keycloak/.
  • Will maintain independent break-it probes while Builder builds.

Pre-M1 code review @2026-06-15T19:58Z

Builder commit 33561c8 (all files) + 6ac9989 (Playwright fix) read.

PASS items

  • recipe_meta.py: READY_PROBE(ctx) and SCREENSHOT(page, ctx) signatures match registry hook_params ✓
  • BACKUP_CAPABLE=True explicit (compose.yml backupbot.backup=true confirmed) ✓
  • EXTRA_ENV dep path unchanged: sqlite3 + relaxed auth; LFS guard requires RECIPE=gitea AND overlay file ✓
  • PARITY.md honest about absent upstream tests (source note says recipe-info corpus, not upstream) ✓
  • ops.py pre_restore deletes marker + asserts absence — divergence is real ✓
  • test_restore.py asserts marker returned — a no-op restore would fail ✓
  • harness.http.retry_http_get, lifecycle.http_fetch, lifecycle.exec_in_app all exist in the harness ✓
  • PARITY.md: beyond-parity test rationale non-vacuous ✓
  • Playwright fix: wait_for_selector("input#user_name") is visible — correct ✓

ISSUES filed (in BUILDER-INBOX.md @4a4b756)

[critical — M2 blocker] git-lfs not installed on cc-ci: git lfs is not a git subcommand. The LFS test uses git lfs install/track/ls-files — all fail without git-lfs. Fix: add git-lfs to nix/hosts/cc-ci/configuration.nix systemPackages, rebuild, deploy.

[bug in test_lfs_roundtrip.py] Double /api/v1 path: _api(live_app, "/api/v1/version", ...) constructs https://domain/api/v1/api/v1/version → 404. The restart health-poll will spin 120s then fail. Fix: change path argument to "/version".

Both issues affect only the LFS capstone (which skips on main). Do NOT block M1 verdict. M2 verdict will FAIL unless both are fixed before the lfs-plain-gitea run.

Additional pre-M1 cold checks @2026-06-15T20:10Z

Builder addressed inbox findings in commits 893a7b0, 3cc8338, 74bc5f0, 3ec24b0:

  • Double /api/v1 path bug: FIXED ("/version" path used correctly) ✓
  • git-lfs: added to nix/hosts/cc-ci-hetzner/configuration.nix (correct host config) ✓
  • test_git_push: auto_init=True repo, credential URL approach ✓
  • test_admin_api: scopes added for gitea 1.22+ ✓

Cold checks run from cc-ci /root/builder-clone (HEAD 3ec24b0):

  • recipe_meta.py: all keys load — BACKUP_CAPABLE=True, READY_PROBE callable, SCREENSHOT callable, EXTRA_ENV callable ✓
  • unit tests: 53/53 PASS (test_gitea_dep.py 10/10, test_meta.py 43/43) ✓
  • LFS conditional (RECIPE=gitea, compose.lfs.yml absent): COMPOSE_FILE=sqlite3 only, LFS=False ✓
  • LFS skip mechanism: _lfs_enabled() returns False when compose.lfs.yml absent (main branch) ✓

M1 cold verification @2026-06-15T20:32Z

Builder claim: commit bac3662, all 5 stages PASS locally (RECIPE=gitea), run_id=manual.

Evidence reviewed (independent, from adv-clone at HEAD b2663dc)

results.json (/var/lib/cc-ci-runs/manual/results.json, mtime 20:08 today):

  • level: 5/5 ✓
  • install/upgrade/backup/restore/custom: all "pass" ✓
  • lint: "pass" ✓
  • LFS (test_lfs_roundtrip): status="skip", message="compose.lfs.yml absent in gitea recipe checkout — LFS is not enabled on this branch. This test runs on lfs-plain-gitea (PR #1) and is EXPECTED_NA on main." ✓
  • flags: clean_teardown=true, no_secret_leak=true ✓
  • customization: 4 custom tests, ops.py hooks for all 4 pre-op stages, meta non-default keys all correct ✓
  • unintentional skips: [] (no unexpected skips) ✓

Unit tests (Adversary cold run from adv-clone):

  • 53/53 PASS (test_gitea_dep.py 10/10, test_meta.py 43/43) ✓
  • test_gitea_recipe_meta_extra_env PASS — dep env correct (no LFS when RECIPE≠gitea) ✓
  • test_enrich_deps_routes_gitea PASS — dep routing intact ✓
  • test_drone_recipe_meta_deps PASS — DEPS=["gitea"] correct ✓

Code review of test hooks:

  • test_restore: pre_restore DELETES marker + asserts absence; test asserts marker RETURNED — no-op restore fails ✓
  • test_upgrade: marker_repo_exists() hits API with admin creds — data continuity is real ✓
  • test_git_push: auto_init=True repo, credential URL embedded, push via git; verifies non-empty response ✓
  • test_admin_api: creates user, org, token via API with 1.22+ scopes; teardown cleans up ✓
  • test_health: HTTP 200 on root endpoint ✓
  • LFS conditional: 2-guard (_lfs_enabled requires RECIPE=gitea AND compose.lfs.yml exists) prevents dep leak ✓

Dep path verification:

  • No RECIPE=drone CI run post-Builder changes (last drone run was #506, June 13)
  • EXTRA_ENV dep path verified code-level: RECIPE=drone → no LFS flags, standard sqlite3+auth only ✓
  • Unit tests cover this path explicitly ✓

Findings

[non-blocking, pre-existing harness bug] Stale screenshot: /var/lib/cc-ci-runs/manual/screenshot.png has mtime June 13 — not from today's M1 run. Root cause: screenshot.capture() checks if not os.path.exists(out_path) after running the SCREENSHOT hook; since the file exists from a prior manual run (run_id="manual" reuses the same dir), _snap_with_blank_retry is never called and the old file persists. results.json reports "screenshot": "screenshot.png" (file exists and is non-empty), but it's a stale image. Non-blocking per R7 (cosmetics never change verdict). M2 will use DRONE_BUILD_NUMBER as run_id → fresh directory → no issue. NOT a Builder error; pre-existing harness limitation of manual runs. Filed in BACKLOG-gtea.md under Adversary findings.

[constraint] Independent harness run blocked by lifetime.py orphan guard: lifetime.install_lifetime_guards() calls prctl(PR_SET_PDEATHSIG) then checks ppid==1; when running via systemd-run or nohup (detached), the harness correctly refuses to run orphaned. No bypass env var exists. Running the full harness in foreground would require ~30-min SSH hold. Code review + unit test verification substitutes for M1 (M2 !testme provides the live run).

M1 VERDICT: PASS @2026-06-15T20:32Z

All M1 DoD satisfied:

  • Suite built: install/upgrade/backup/restore/custom/lint all exist and ran ✓
  • Suite green locally: level=5/5, all stages PASS on main ✓
  • LFS test correctly SKIP on main (compose.lfs.yml absent → _lfs_enabled()=False) ✓
  • Tests have teeth: restore divergence is real, upgrade verifies data continuity ✓
  • Dep path unbroken: EXTRA_ENV dep route correct, unit tests pass ✓
  • No secrets in run artifacts: no_secret_leak=true ✓

Gate M1: ADVERSARY PASS (commit bac3662, run_id=manual, all stages pass)


M2 pre-verification @2026-06-15T20:50Z

Builder triggered !testme on PR #1 (gitea recipe mirror, git.autonomic.zone) and on main branch. Bridge is live with recipe-maintainers/gitea in POLL_REPOS. 3 CI runs completed:

Run 674 — main branch (RECIPE=gitea, PR=0, REF=main)

level=1. install: PASS. upgrade: FAIL. Error: "upgrade deployed chaos commit 'e6a1cc79', not the intended PR-head 'main' — the re-checkout to the code under test failed." backup/restore/custom: PASS (ran on the existing install despite upgrade failure). LFS test: correctly SKIP (REF=main, compose.lfs.yml absent from main branch). ✓

M2 main-branch DoD NOT met. Upgrade tier must PASS for level=5.

Run 675 — main branch concurrent (PR=0, REF=main)

level=0. All stages FAIL. Root cause: concurrent collision with run 674 (same domain from same recipe+pr+ref hash). ci_admin creds cached at /tmp/ccci-gitea-admin-.json from run 674 → 401 on API calls because gitea was in a stale state. Non-blocking bug (triggered by multiple !testme comments).

Run 676 — PR #1 (RECIPE=gitea, PR=1, REF=357926f2)

level=3. install/upgrade/backup/restore: PASS ✓. custom: FAIL. LFS test failure: git push batch endpoint returns "Repository or object not found". _lfs_available() returned True (compose.lfs.yml present in recipe dir at test time — confirmed via recipe reflog: checkout to 357926f2 at 20:35:58, test ran at 20:36:36). But gitea LFS server was not accepting LFS batch requests → LFS_START_SERVER = false in app.ini.

PR #1 code verified correct:

  • compose.lfs.yml: GITEA_LFS_START_SERVER=true + lfs_jwt_secret external secret ✓
  • app.ini.tmpl: LFS_START_SERVER rendered from env, LFS_JWT_SECRET conditional ✓
  • abra.sh: APP_INI_VERSION v22 (triggers re-render on deploy) ✓

Likely harness-level bug: either (a) lfs_jwt_secret not generated (SECRET_LFS_JWT_SECRET_VERSION=v1 only in EXTRA_ENV dict, not in disk .env file read by abra secret generate), or (b) compose.lfs.yml not included in COMPOSE_FILE at actual docker deploy time due to abra base-deploy checkout timing (abra checked out 3.5.2+1.24.2-rootless tag at 20:35:37 removing compose.lfs.yml, harness re-checked 357926f2 at 20:35:58 restoring it, but EXTRA_ENV may have been evaluated before that).

Filed as critical M2 blockers in BACKLOG-gtea.md. Builder must fix before M2 can be claimed.

M2 VERDICT: PENDING — two critical blockers

  1. LFS test fails in run 676 (PR #1 custom tier fail, level=3 not level=5)
  2. Upgrade fails on main branch run 674 (level=1, not level=5)

Gate M2: NOT CLAIMED — Builder must fix and re-trigger CI


M2 re-verification @2026-06-15T21:30Z (builds #684 and #685)

Builder fixed two blockers (commit a121d2c): UPGRADE_EXTRA_ENV for LFS, head_ref SHA fix, stale creds deletion in pre_install. Triggered builds #684 (main) and #685 (PR #1).

Build #684 — RECIPE=gitea REF=main PR=0 — PASS level=5 ✓

Full log reviewed from Drone API.

  • lint: pass ✓
  • install: PASS — generic test_serving + gitea test_install_gitea both PASS ✓
  • upgrade: PASS — version=3.5.2→3.5.3, HC1: head_ref=e6a1cc79, chaos-version=e6a1cc79 (SHA match) ✓
  • backup: PASS — restic snapshot 8435c4df, 53 files, marker captured ✓
  • restore: PASS — pre_restore deleted ci-marker, restore returned it (genuine divergence) ✓
  • custom: all 4 tests:
    • test_admin_api: PASS (user+org+token CRUD lifecycle) ✓
    • test_git_push: PASS (create repo→push→verify via API) ✓
    • test_health: PASS (root HTTP 200) ✓
    • test_lfs_roundtrip: SKIP ✓ — correct ("compose.lfs.yml absent in gitea recipe checkout — LFS is not enabled on this branch. This test runs on lfs-plain-gitea (PR #1) and is EXPECTED_NA on main.")
  • deploy-count=1 (expected 1) ✓
  • clean_teardown=true, no_secret_leak=true ✓

M2 main-branch condition: MET (build #684, level=5, upgrade SHA-match correct, LFS skip correct)

Screenshot: PNG file, 36KB, captured at 21:04 (during run #684). Visual content not verified inline (requires file transfer); file is valid PNG with real content. Operator should visually confirm sign-in page is shown.

Build #685 — RECIPE=gitea PR=1 REF=357926f26e69 — FAIL level=1 ✗

Full log reviewed from Drone API and results.json.

  • lint: pass ✓
  • install: PASS (base 3.5.2, no LFS) ✓
  • upgrade: FAILgite-e1cb78.ci.commoninternet.net: upgrade redeploy did NOT converge to the head spec — swarm UpdateStatus='rollback_completed'.
  • backup: FAIL (cascade — pre_backup 401: could not ensure ci-marker exists)
  • restore: FAIL (cascade — ci-marker absent after restore; backup state was bad)
  • custom: FAIL — test_admin_api, test_git_push, test_lfs_roundtrip all get 401 Unauthorized: user's password is invalid [uid: 1, name: ci_admin]; test_health: PASS ✓
  • test_lfs_roundtrip: reaches API call (compose.lfs.yml IS in recipe dir at test time, _lfs_available()=True, LFS test DID run) but hits 401 on repo create — cascade failure

Root cause: upgrade chaos redeploy to PR head with compose.lfs.yml fails (rollback_completed)

Evidence chain:

  1. rollback_completed in Docker Swarm means the NEW task STARTED but failed its health check. If lfs_jwt_secret did NOT exist as Docker secret, the deploy would fail BEFORE creating the task (Docker reports "secret not found" at deploy time, not as a task health failure). Therefore lfs_jwt_secret WAS generated as a Docker secret.
  2. abra.secret_generate(domain) WAS called (generic.py line 267, new fix in a121d2c) with SECRET_LFS_JWT_SECRET_VERSION=v1 in the .env after UPGRADE_EXTRA_ENV applied.
  3. The COMPOSE_FILE=compose.yml:compose.sqlite3.yml:compose.lfs.yml was correctly set in .env (confirmed from log: upgrade-env: COMPOSE_FILE=...).
  4. Docker confirmed no lfs secrets at post-run check — expected (clean_teardown=true cleaned them).

Most likely root cause: lfs_jwt_secret generated with wrong length/format by abra --all

The .env.sample in PR #1 (lfs-plain-gitea branch) has the lfs_jwt_secret spec COMMENTED OUT:

# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43

Compare with active (uncommented) entries:

SECRET_JWT_SECRET_VERSION=v1 # length=43
SECRET_INTERNAL_TOKEN_VERSION=v1 # length=105

abra secret generate --all reads the recipe's .env.sample for secret parameters (including length). If the SECRET_LFS_JWT_SECRET_VERSION entry is commented out, abra may use a default length (likely not 43) when generating the Docker secret value. A gitea LFS JWT secret must be a base64 URL-safe string of exactly 43 chars (representing 32 bytes without padding). If abra generates a wrong-length value, gitea fails to parse its JWT secret on startup and crashes before passing the /api/healthz health check — causing rollback_completed.

Secondary mystery: admin password 401 after upgrade rollback After rollback, gitea 3.5.2 runs again. ci_admin password was written to creds file during pre_install (fresh install, stale file deleted). Yet all API calls return 401 user's password is invalid. This cascade is unexplained but consistent with gitea being in a bad state after the rollback (possible: the brief chaos deploy attempt changed state in the sqlite3 DB before the health check failed and Docker rolled back the CONTAINER — not the DATA volume).

Files confirmed NOT the issue:

  • compose.lfs.yml structure: correct (external secret declared, GITEA_LFS_START_SERVER env set) ✓
  • app.ini.tmpl: LFS_JWT_SECRET rendered from {{ secret "lfs_jwt_secret" }} when GITEA_LFS_START_SERVER=true ✓
  • UPGRADE_EXTRA_ENV applied correctly (confirmed in log) ✓
  • HC1 would pass if upgrade converged (SHA logic correct from #684 fix) ✓

Additional finding: cc-ci self-test lint failures (non-blocking for M2 recipe CI)

Push-event builds #683/#686/#687 fail at scripts/lint.sh:

  • ruff format --check: 9 files need formatting: tests/gitea/custom/test_admin_api.py, test_git_push.py, test_lfs_roundtrip.py, tests/gitea/ops.py, recipe_meta.py, test_backup.py, test_install.py, test_upgrade.py, tests/unit/test_discovery.py
  • ruff check: 9 errors (at least bridge/bridge.py:85:36: UP017 + others in gtea files)

These are the cc-ci REPO'S OWN self-tests, not the recipe CI runs. They do NOT gate M2 recipe CI (which runs via custom events). However, they reflect code quality debt and should be fixed. ruff format tests/gitea/ and ruff check --fix tests/gitea/ would address the gtea files. The bridge.py UP017 may be pre-existing.

Filed in BACKLOG-gtea.md Adversary findings.

Drone dep path: not re-verified via live CI since a121d2c

M2 DoD: "drone CI re-confirmed green (dep path intact)". No RECIPE=drone custom build has run since commit a121d2c modified generic.py and recipe_meta.py. Unit tests (test_gitea_dep.py 10/10) still pass and cover the dep path code-level. A live RECIPE=drone run is needed to satisfy the full M2 DoD dep-path verification. Filed in BACKLOG as pending.

M2 VERDICT: PENDING — new critical blocker in build #685

  1. ✓ M2 main-branch condition MET (build #684, level=5)
  2. ✗ PR #1 LFS capstone FAIL — upgrade rollback with LFS (build #685, level=1) Root cause: lfs_jwt_secret generated with wrong format/length (commented-out .env.sample spec)

Gate M2: NOT CLAIMED — Builder must fix lfs_jwt_secret generation and re-trigger build #685


M2 re-verification round 3 @2026-06-15T22:10Z (builds #691, #692, #695)

Builder applied two further fixes (commits d832b35 + ad53b5a):

  • d832b35: UPGRADE_SECRET_PREP hook in meta.py + generic.py; recipe_meta.py UPGRADE_SECRET_PREP implementation uses docker secret create directly with correct 43-char base64 URL-safe value
  • ad53b5a: derive STACK_NAME from domain (domain.replace(".", "_")) when not found in .env (abra does NOT write STACK_NAME to the .env file — it derives it at runtime from the domain)
  • 2d865f0: ruff format + check all gtea files (cc-ci self-test lint now passes)

Build #691 — RECIPE=gitea PR=1 REF=357926f26e69 — FAIL (STACK_NAME not found) ✗

UPGRADE_SECRET_PREP aborted: RuntimeError: UPGRADE_SECRET_PREP: STACK_NAME not found in /root/.abra/servers/default/gite-e1cb78.ci.commoninternet.net.env

Root cause: the hook attempted to read STACK_NAME from the app's .env, but abra writes only app-specific vars to that file (DOMAIN, TYPE, COMPOSE_FILE etc.) — STACK_NAME is derived from the domain at runtime by abra's own code. The fix in ad53b5a (domain.replace(".", "_") fallback) is the correct approach and matches how abra derives stack names.

New finding filed in BACKLOG-gtea.md. Builder fixed in commit ad53b5a.

Build #692 — RECIPE=drone PR=0 REF=main — PASS level=5 ✓

Full results.json from ci.commoninternet.net/runs/692/results.json:

  • recipe: drone, pr=0, ref=main
  • level: 5 (install: PASS, upgrade: PASS, custom: PASS; backup/restore: skip — correct, drone is not backup-capable)
  • rungs: install=pass, upgrade=pass, functional=pass, lint=pass, backup_restore=skip ✓
  • skips.intentional: backup_restore: "not backup-capable (no backupbot labels / declared)" ✓
  • clean_teardown=true, no_secret_leak=true ✓
  • customization: DEPS=["gitea"] confirmed (gitea dep used in drone's own dep chain) ✓

M2 drone dep path condition: MET — drone recipe CI unaffected by all gtea changes

Build #695 — RECIPE=gitea PR=1 REF=357926f26e69 — PASS level=5 ✓

Full results.json from ci.commoninternet.net/runs/695/results.json:

  • recipe: gitea, pr=1, ref=357926f26e69 — THIS IS THE LFS PR
  • level: 5, all 5 stages: install=pass, upgrade=pass, backup=pass, restore=pass, custom=pass
  • No intentional or unintentional skips ✓
  • clean_teardown=true, no_secret_leak=true ✓

Custom tests (all PASS):

  • test_admin_api_user_org_token_lifecycle: PASS (333ms) ✓
  • test_git_push: PASS (889ms) ✓
  • test_gitea_root_returns_200: PASS (36ms) ✓
  • test_lfs_roundtrip: PASS (18147ms = 18s) ✓ — LFS ROUNDTRIP VERIFIED

UPGRADE_SECRET_PREP hook in customization.meta_non_default confirms it ran. version=ce4de9e6451f (deployed recipe HEAD at upgrade time — expected, as chaos deploy uses PR HEAD).

M2 PR #1 LFS capstone: MET — test_lfs_roundtrip PASS in real CI on PR #1

cc-ci self-test lint: CLEARED

Builds #690 and #693 (push events) report success — ruff format + check now both pass. All M2 DoD conditions now satisfied.

M2 VERDICT: PASS @2026-06-15T22:10Z

All M2 DoD conditions met:

  1. ✓ Full 5-tier suite green on gitea main in real CI — build #684, level=5, upgrade SHA-match correct, HC1 PASS, LFS correctly SKIP on main ✓
  2. ✓ LFS roundtrip green in real CI on PR #1 — build #695, level=5, test_lfs_roundtrip PASS (18s), lfs_jwt_secret correct length via UPGRADE_SECRET_PREP hook, all tiers PASS ✓
  3. ✓ Drone dep path unaffected — build #692, level=5, drone recipe still fully green ✓
  4. ✓ cc-ci self-test lint green — ruff format+check pass on all gtea files ✓
  5. ✓ Unit tests 53/53 pass throughout (test_gitea_dep.py 10/10, test_meta.py 43/43) ✓
  6. ✓ No secrets in any run artifact — no_secret_leak=true in #684, #692, #695 ✓

Gate M2: ADVERSARY PASS @2026-06-15T22:10Z