Files
cc-ci/machine-docs/REVIEW-gtea.md
autonomic-bot 90522ee560
All checks were successful
continuous-integration/drone/push Build is passing
review(gtea): M2 ADVERSARY PASS @2026-06-15T22:10Z
Build #695 (gitea PR=1 REF=357926f26e69): level=5, all stages PASS, test_lfs_roundtrip
PASS (18s) — LFS roundtrip verified in real CI on lfs-plain-gitea PR #1.
Build #692 (drone dep path PR=0 REF=main): level=5, drone recipe unaffected.
Build #684 (gitea main PR=0): level=5 (verified in prior round).
cc-ci self-test lint green. Unit tests 53/53. no_secret_leak in all runs.

Also records build #691 FAIL finding: STACK_NAME not in .env (fixed in ad53b5a).

Gate M2: ADVERSARY PASS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 22:02:46 +00:00

374 lines
20 KiB
Markdown

# REVIEW — phase gtea (gitea full-test enrollment)
Adversary verdict log. Append-only. Only the Adversary writes here.
Commit prefix: `review(gtea): ...`
---
## Init @2026-06-15T19:33Z
Phase gtea started. No gates claimed yet by Builder. Baseline orientation run:
- Builder hasn't started (no STATUS-gtea.md, no gtea commits on origin/main as of 3f6d7dc).
- Existing `tests/gitea/recipe_meta.py` is the dep-provider stub (header: "NOT a standalone recipe-under-test").
- Plan SSOT loaded: plan-phase-gtea-gitea-fulltests.md — M1 = suite green locally; M2 = green in real CI + LFS PR verified.
- Exemplars to check: tests/cryptpad/, tests/keycloak/.
- Will maintain independent break-it probes while Builder builds.
---
## Pre-M1 code review @2026-06-15T19:58Z
Builder commit 33561c8 (all files) + 6ac9989 (Playwright fix) read.
### PASS items
- recipe_meta.py: READY_PROBE(ctx) and SCREENSHOT(page, ctx) signatures match registry hook_params ✓
- BACKUP_CAPABLE=True explicit (compose.yml backupbot.backup=true confirmed) ✓
- EXTRA_ENV dep path unchanged: sqlite3 + relaxed auth; LFS guard requires RECIPE=gitea AND overlay file ✓
- PARITY.md honest about absent upstream tests (source note says recipe-info corpus, not upstream) ✓
- ops.py pre_restore deletes marker + asserts absence — divergence is real ✓
- test_restore.py asserts marker returned — a no-op restore would fail ✓
- harness.http.retry_http_get, lifecycle.http_fetch, lifecycle.exec_in_app all exist in the harness ✓
- PARITY.md: beyond-parity test rationale non-vacuous ✓
- Playwright fix: wait_for_selector("input#user_name") is visible — correct ✓
### ISSUES filed (in BUILDER-INBOX.md @4a4b756)
**[critical — M2 blocker]** `git-lfs` not installed on cc-ci: `git lfs` is not a git subcommand.
The LFS test uses `git lfs install/track/ls-files` — all fail without git-lfs. Fix: add
`git-lfs` to `nix/hosts/cc-ci/configuration.nix` systemPackages, rebuild, deploy.
**[bug in test_lfs_roundtrip.py]** Double `/api/v1` path: `_api(live_app, "/api/v1/version", ...)`
constructs `https://domain/api/v1/api/v1/version` → 404. The restart health-poll will spin 120s
then fail. Fix: change path argument to `"/version"`.
Both issues affect only the LFS capstone (which skips on main). Do NOT block M1 verdict.
M2 verdict will FAIL unless both are fixed before the lfs-plain-gitea run.
## Additional pre-M1 cold checks @2026-06-15T20:10Z
Builder addressed inbox findings in commits 893a7b0, 3cc8338, 74bc5f0, 3ec24b0:
- Double /api/v1 path bug: FIXED ("/version" path used correctly) ✓
- git-lfs: added to nix/hosts/cc-ci-hetzner/configuration.nix (correct host config) ✓
- test_git_push: auto_init=True repo, credential URL approach ✓
- test_admin_api: scopes added for gitea 1.22+ ✓
Cold checks run from cc-ci /root/builder-clone (HEAD 3ec24b0):
- recipe_meta.py: all keys load — BACKUP_CAPABLE=True, READY_PROBE callable, SCREENSHOT callable, EXTRA_ENV callable ✓
- unit tests: 53/53 PASS (test_gitea_dep.py 10/10, test_meta.py 43/43) ✓
- LFS conditional (RECIPE=gitea, compose.lfs.yml absent): COMPOSE_FILE=sqlite3 only, LFS=False ✓
- LFS skip mechanism: _lfs_enabled() returns False when compose.lfs.yml absent (main branch) ✓
## M1 cold verification @2026-06-15T20:32Z
Builder claim: commit bac3662, all 5 stages PASS locally (RECIPE=gitea), run_id=manual.
### Evidence reviewed (independent, from adv-clone at HEAD b2663dc)
**results.json** (`/var/lib/cc-ci-runs/manual/results.json`, mtime 20:08 today):
- level: 5/5 ✓
- install/upgrade/backup/restore/custom: all "pass" ✓
- lint: "pass" ✓
- LFS (test_lfs_roundtrip): status="skip", message="compose.lfs.yml absent in gitea recipe checkout — LFS is not enabled on this branch. This test runs on lfs-plain-gitea (PR #1) and is EXPECTED_NA on main." ✓
- flags: clean_teardown=true, no_secret_leak=true ✓
- customization: 4 custom tests, ops.py hooks for all 4 pre-op stages, meta non-default keys all correct ✓
- unintentional skips: [] (no unexpected skips) ✓
**Unit tests (Adversary cold run from adv-clone)**:
- 53/53 PASS (test_gitea_dep.py 10/10, test_meta.py 43/43) ✓
- test_gitea_recipe_meta_extra_env PASS — dep env correct (no LFS when RECIPE≠gitea) ✓
- test_enrich_deps_routes_gitea PASS — dep routing intact ✓
- test_drone_recipe_meta_deps PASS — DEPS=["gitea"] correct ✓
**Code review of test hooks:**
- test_restore: pre_restore DELETES marker + asserts absence; test asserts marker RETURNED — no-op restore fails ✓
- test_upgrade: marker_repo_exists() hits API with admin creds — data continuity is real ✓
- test_git_push: auto_init=True repo, credential URL embedded, push via git; verifies non-empty response ✓
- test_admin_api: creates user, org, token via API with 1.22+ scopes; teardown cleans up ✓
- test_health: HTTP 200 on root endpoint ✓
- LFS conditional: 2-guard (_lfs_enabled requires RECIPE=gitea AND compose.lfs.yml exists) prevents dep leak ✓
**Dep path verification:**
- No RECIPE=drone CI run post-Builder changes (last drone run was #506, June 13)
- EXTRA_ENV dep path verified code-level: RECIPE=drone → no LFS flags, standard sqlite3+auth only ✓
- Unit tests cover this path explicitly ✓
### Findings
**[non-blocking, pre-existing harness bug] Stale screenshot:**
`/var/lib/cc-ci-runs/manual/screenshot.png` has mtime June 13 — not from today's M1 run.
Root cause: `screenshot.capture()` checks `if not os.path.exists(out_path)` after running the
SCREENSHOT hook; since the file exists from a prior manual run (run_id="manual" reuses the same dir),
`_snap_with_blank_retry` is never called and the old file persists. results.json reports
`"screenshot": "screenshot.png"` (file exists and is non-empty), but it's a stale image.
Non-blocking per R7 (cosmetics never change verdict). M2 will use DRONE_BUILD_NUMBER as run_id
→ fresh directory → no issue. NOT a Builder error; pre-existing harness limitation of manual runs.
Filed in BACKLOG-gtea.md under Adversary findings.
**[constraint] Independent harness run blocked by lifetime.py orphan guard:**
`lifetime.install_lifetime_guards()` calls `prctl(PR_SET_PDEATHSIG)` then checks `ppid==1`; when
running via systemd-run or nohup (detached), the harness correctly refuses to run orphaned.
No bypass env var exists. Running the full harness in foreground would require ~30-min SSH hold.
Code review + unit test verification substitutes for M1 (M2 !testme provides the live run).
## M1 VERDICT: PASS @2026-06-15T20:32Z
All M1 DoD satisfied:
- Suite built: install/upgrade/backup/restore/custom/lint all exist and ran ✓
- Suite green locally: level=5/5, all stages PASS on main ✓
- LFS test correctly SKIP on main (compose.lfs.yml absent → _lfs_enabled()=False) ✓
- Tests have teeth: restore divergence is real, upgrade verifies data continuity ✓
- Dep path unbroken: EXTRA_ENV dep route correct, unit tests pass ✓
- No secrets in run artifacts: no_secret_leak=true ✓
Gate M1: **ADVERSARY PASS** (commit bac3662, run_id=manual, all stages pass)
---
## M2 pre-verification @2026-06-15T20:50Z
Builder triggered !testme on PR #1 (gitea recipe mirror, git.autonomic.zone) and on main branch.
Bridge is live with recipe-maintainers/gitea in POLL_REPOS. 3 CI runs completed:
### Run 674 — main branch (RECIPE=gitea, PR=0, REF=main)
level=1. install: PASS. upgrade: **FAIL**.
Error: "upgrade deployed chaos commit 'e6a1cc79', not the intended PR-head 'main' — the re-checkout
to the code under test failed."
backup/restore/custom: PASS (ran on the existing install despite upgrade failure).
LFS test: correctly SKIP (REF=main, compose.lfs.yml absent from main branch). ✓
**M2 main-branch DoD NOT met.** Upgrade tier must PASS for level=5.
### Run 675 — main branch concurrent (PR=0, REF=main)
level=0. All stages FAIL.
Root cause: concurrent collision with run 674 (same domain from same recipe+pr+ref hash).
ci_admin creds cached at /tmp/ccci-gitea-admin-<domain>.json from run 674 → 401 on API calls
because gitea was in a stale state. Non-blocking bug (triggered by multiple !testme comments).
### Run 676 — PR #1 (RECIPE=gitea, PR=1, REF=357926f2)
level=3. install/upgrade/backup/restore: PASS ✓. custom: **FAIL**.
LFS test failure: `git push` batch endpoint returns "Repository or object not found".
`_lfs_available()` returned True (compose.lfs.yml present in recipe dir at test time — confirmed
via recipe reflog: checkout to 357926f2 at 20:35:58, test ran at 20:36:36).
But gitea LFS server was not accepting LFS batch requests → `LFS_START_SERVER = false` in app.ini.
PR #1 code verified correct:
- compose.lfs.yml: GITEA_LFS_START_SERVER=true + lfs_jwt_secret external secret ✓
- app.ini.tmpl: LFS_START_SERVER rendered from env, LFS_JWT_SECRET conditional ✓
- abra.sh: APP_INI_VERSION v22 (triggers re-render on deploy) ✓
Likely harness-level bug: either (a) lfs_jwt_secret not generated (SECRET_LFS_JWT_SECRET_VERSION=v1
only in EXTRA_ENV dict, not in disk .env file read by `abra secret generate`), or (b) compose.lfs.yml
not included in COMPOSE_FILE at actual docker deploy time due to abra base-deploy checkout timing
(abra checked out 3.5.2+1.24.2-rootless tag at 20:35:37 removing compose.lfs.yml, harness
re-checked 357926f2 at 20:35:58 restoring it, but EXTRA_ENV may have been evaluated before that).
Filed as critical M2 blockers in BACKLOG-gtea.md. Builder must fix before M2 can be claimed.
## M2 VERDICT: PENDING — two critical blockers
1. LFS test fails in run 676 (PR #1 custom tier fail, level=3 not level=5)
2. Upgrade fails on main branch run 674 (level=1, not level=5)
Gate M2: **NOT CLAIMED** — Builder must fix and re-trigger CI
---
## M2 re-verification @2026-06-15T21:30Z (builds #684 and #685)
Builder fixed two blockers (commit a121d2c): UPGRADE_EXTRA_ENV for LFS, head_ref SHA fix,
stale creds deletion in pre_install. Triggered builds #684 (main) and #685 (PR #1).
### Build #684 — RECIPE=gitea REF=main PR=0 — **PASS** level=5 ✓
Full log reviewed from Drone API.
- lint: pass ✓
- install: PASS — generic test_serving + gitea test_install_gitea both PASS ✓
- upgrade: PASS — version=3.5.2→3.5.3, HC1: head_ref=e6a1cc79, chaos-version=e6a1cc79 (SHA match) ✓
- backup: PASS — restic snapshot 8435c4df, 53 files, marker captured ✓
- restore: PASS — pre_restore deleted ci-marker, restore returned it (genuine divergence) ✓
- custom: all 4 tests:
- test_admin_api: PASS (user+org+token CRUD lifecycle) ✓
- test_git_push: PASS (create repo→push→verify via API) ✓
- test_health: PASS (root HTTP 200) ✓
- test_lfs_roundtrip: SKIP ✓ — correct ("compose.lfs.yml absent in gitea recipe checkout —
LFS is not enabled on this branch. This test runs on lfs-plain-gitea (PR #1) and is
EXPECTED_NA on main.")
- deploy-count=1 (expected 1) ✓
- clean_teardown=true, no_secret_leak=true ✓
**M2 main-branch condition: MET** (build #684, level=5, upgrade SHA-match correct, LFS skip correct)
Screenshot: PNG file, 36KB, captured at 21:04 (during run #684). Visual content not verified
inline (requires file transfer); file is valid PNG with real content. Operator should visually
confirm sign-in page is shown.
### Build #685 — RECIPE=gitea PR=1 REF=357926f26e69 — **FAIL** level=1 ✗
Full log reviewed from Drone API and results.json.
- lint: pass ✓
- install: PASS (base 3.5.2, no LFS) ✓
- upgrade: **FAIL** — `gite-e1cb78.ci.commoninternet.net: upgrade redeploy did NOT converge to
the head spec — swarm UpdateStatus='rollback_completed'.`
- backup: FAIL (cascade — pre_backup 401: could not ensure ci-marker exists)
- restore: FAIL (cascade — ci-marker absent after restore; backup state was bad)
- custom: FAIL — test_admin_api, test_git_push, test_lfs_roundtrip all get `401 Unauthorized:
user's password is invalid [uid: 1, name: ci_admin]`; test_health: PASS ✓
- test_lfs_roundtrip: reaches API call (compose.lfs.yml IS in recipe dir at test time,
_lfs_available()=True, LFS test DID run) but hits 401 on repo create — cascade failure
**Root cause: upgrade chaos redeploy to PR head with compose.lfs.yml fails (rollback_completed)**
Evidence chain:
1. `rollback_completed` in Docker Swarm means the NEW task STARTED but failed its health check.
If lfs_jwt_secret did NOT exist as Docker secret, the deploy would fail BEFORE creating the
task (Docker reports "secret not found" at deploy time, not as a task health failure). Therefore
lfs_jwt_secret WAS generated as a Docker secret.
2. `abra.secret_generate(domain)` WAS called (generic.py line 267, new fix in a121d2c) with
SECRET_LFS_JWT_SECRET_VERSION=v1 in the .env after UPGRADE_EXTRA_ENV applied.
3. The COMPOSE_FILE=compose.yml:compose.sqlite3.yml:compose.lfs.yml was correctly set in .env
(confirmed from log: `upgrade-env: COMPOSE_FILE=...`).
4. Docker confirmed no lfs secrets at post-run check — expected (clean_teardown=true cleaned them).
**Most likely root cause: lfs_jwt_secret generated with wrong length/format by abra --all**
The `.env.sample` in PR #1 (lfs-plain-gitea branch) has the lfs_jwt_secret spec COMMENTED OUT:
```
# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43
```
Compare with active (uncommented) entries:
```
SECRET_JWT_SECRET_VERSION=v1 # length=43
SECRET_INTERNAL_TOKEN_VERSION=v1 # length=105
```
`abra secret generate --all` reads the recipe's `.env.sample` for secret parameters (including
length). If the `SECRET_LFS_JWT_SECRET_VERSION` entry is commented out, abra may use a default
length (likely not 43) when generating the Docker secret value. A gitea LFS JWT secret must be
a base64 URL-safe string of exactly 43 chars (representing 32 bytes without padding). If abra
generates a wrong-length value, gitea fails to parse its JWT secret on startup and crashes before
passing the `/api/healthz` health check — causing `rollback_completed`.
**Secondary mystery: admin password 401 after upgrade rollback**
After rollback, gitea 3.5.2 runs again. ci_admin password was written to creds file during
pre_install (fresh install, stale file deleted). Yet all API calls return 401 `user's password
is invalid`. This cascade is unexplained but consistent with gitea being in a bad state after
the rollback (possible: the brief chaos deploy attempt changed state in the sqlite3 DB before
the health check failed and Docker rolled back the CONTAINER — not the DATA volume).
**Files confirmed NOT the issue:**
- compose.lfs.yml structure: correct (external secret declared, GITEA_LFS_START_SERVER env set) ✓
- app.ini.tmpl: LFS_JWT_SECRET rendered from `{{ secret "lfs_jwt_secret" }}` when
GITEA_LFS_START_SERVER=true ✓
- UPGRADE_EXTRA_ENV applied correctly (confirmed in log) ✓
- HC1 would pass if upgrade converged (SHA logic correct from #684 fix) ✓
### Additional finding: cc-ci self-test lint failures (non-blocking for M2 recipe CI)
Push-event builds #683/#686/#687 fail at `scripts/lint.sh`:
- `ruff format --check`: 9 files need formatting:
`tests/gitea/custom/test_admin_api.py`, `test_git_push.py`, `test_lfs_roundtrip.py`,
`tests/gitea/ops.py`, `recipe_meta.py`, `test_backup.py`, `test_install.py`, `test_upgrade.py`,
`tests/unit/test_discovery.py`
- `ruff check`: 9 errors (at least `bridge/bridge.py:85:36: UP017` + others in gtea files)
These are the cc-ci REPO'S OWN self-tests, not the recipe CI runs. They do NOT gate M2 recipe
CI (which runs via custom events). However, they reflect code quality debt and should be fixed.
`ruff format tests/gitea/` and `ruff check --fix tests/gitea/` would address the gtea files.
The `bridge.py UP017` may be pre-existing.
Filed in BACKLOG-gtea.md Adversary findings.
### Drone dep path: not re-verified via live CI since a121d2c
M2 DoD: "drone CI re-confirmed green (dep path intact)". No RECIPE=drone custom build has run
since commit a121d2c modified generic.py and recipe_meta.py. Unit tests (test_gitea_dep.py 10/10)
still pass and cover the dep path code-level. A live RECIPE=drone run is needed to satisfy the
full M2 DoD dep-path verification. Filed in BACKLOG as pending.
## M2 VERDICT: PENDING — new critical blocker in build #685
1. ✓ M2 main-branch condition MET (build #684, level=5)
2. ✗ PR #1 LFS capstone FAIL — upgrade rollback with LFS (build #685, level=1)
Root cause: lfs_jwt_secret generated with wrong format/length (commented-out .env.sample spec)
Gate M2: **NOT CLAIMED** — Builder must fix lfs_jwt_secret generation and re-trigger build #685
---
## M2 re-verification round 3 @2026-06-15T22:10Z (builds #691, #692, #695)
Builder applied two further fixes (commits d832b35 + ad53b5a):
- d832b35: `UPGRADE_SECRET_PREP` hook in `meta.py` + `generic.py`; `recipe_meta.py` UPGRADE_SECRET_PREP
implementation uses `docker secret create` directly with correct 43-char base64 URL-safe value
- ad53b5a: derive `STACK_NAME` from domain (`domain.replace(".", "_")`) when not found in .env
(abra does NOT write STACK_NAME to the .env file — it derives it at runtime from the domain)
- 2d865f0: ruff format + check all gtea files (cc-ci self-test lint now passes)
### Build #691 — RECIPE=gitea PR=1 REF=357926f26e69 — FAIL (STACK_NAME not found) ✗
`UPGRADE_SECRET_PREP` aborted: `RuntimeError: UPGRADE_SECRET_PREP: STACK_NAME not found in
/root/.abra/servers/default/gite-e1cb78.ci.commoninternet.net.env`
Root cause: the hook attempted to read STACK_NAME from the app's .env, but abra writes only
app-specific vars to that file (DOMAIN, TYPE, COMPOSE_FILE etc.) — STACK_NAME is derived from
the domain at runtime by abra's own code. The fix in ad53b5a (domain.replace(".", "_") fallback)
is the correct approach and matches how abra derives stack names.
New finding filed in BACKLOG-gtea.md. Builder fixed in commit ad53b5a.
### Build #692 — RECIPE=drone PR=0 REF=main — **PASS** level=5 ✓
Full results.json from ci.commoninternet.net/runs/692/results.json:
- recipe: drone, pr=0, ref=main
- level: 5 (install: PASS, upgrade: PASS, custom: PASS; backup/restore: skip — correct, drone
is not backup-capable)
- rungs: install=pass, upgrade=pass, functional=pass, lint=pass, backup_restore=skip ✓
- skips.intentional: backup_restore: "not backup-capable (no backupbot labels / declared)" ✓
- clean_teardown=true, no_secret_leak=true ✓
- customization: DEPS=["gitea"] confirmed (gitea dep used in drone's own dep chain) ✓
**M2 drone dep path condition: MET** — drone recipe CI unaffected by all gtea changes
### Build #695 — RECIPE=gitea PR=1 REF=357926f26e69 — **PASS** level=5 ✓
Full results.json from ci.commoninternet.net/runs/695/results.json:
- recipe: gitea, pr=1, ref=357926f26e69 — THIS IS THE LFS PR
- level: 5, all 5 stages: install=pass, upgrade=pass, backup=pass, restore=pass, custom=pass
- No intentional or unintentional skips ✓
- clean_teardown=true, no_secret_leak=true ✓
Custom tests (all PASS):
- `test_admin_api_user_org_token_lifecycle`: PASS (333ms) ✓
- `test_git_push`: PASS (889ms) ✓
- `test_gitea_root_returns_200`: PASS (36ms) ✓
- `test_lfs_roundtrip`: **PASS (18147ms = 18s)** ✓ — LFS ROUNDTRIP VERIFIED
UPGRADE_SECRET_PREP hook in customization.meta_non_default confirms it ran.
version=ce4de9e6451f (deployed recipe HEAD at upgrade time — expected, as chaos deploy uses PR HEAD).
**M2 PR #1 LFS capstone: MET** — test_lfs_roundtrip PASS in real CI on PR #1
### cc-ci self-test lint: CLEARED
Builds #690 and #693 (push events) report success — ruff format + check now both pass.
All M2 DoD conditions now satisfied.
## M2 VERDICT: PASS @2026-06-15T22:10Z
All M2 DoD conditions met:
1. ✓ Full 5-tier suite green on gitea main in real CI — build #684, level=5, upgrade SHA-match
correct, HC1 PASS, LFS correctly SKIP on main ✓
2. ✓ LFS roundtrip green in real CI on PR #1 — build #695, level=5, `test_lfs_roundtrip` PASS
(18s), lfs_jwt_secret correct length via UPGRADE_SECRET_PREP hook, all tiers PASS ✓
3. ✓ Drone dep path unaffected — build #692, level=5, drone recipe still fully green ✓
4. ✓ cc-ci self-test lint green — ruff format+check pass on all gtea files ✓
5. ✓ Unit tests 53/53 pass throughout (test_gitea_dep.py 10/10, test_meta.py 43/43) ✓
6. ✓ No secrets in any run artifact — no_secret_leak=true in #684, #692, #695
Gate M2: **ADVERSARY PASS** @2026-06-15T22:10Z