From 7addb9686cbb9beb51a37b9e198c69b2e05712bd Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Wed, 27 May 2026 02:41:25 +0100 Subject: [PATCH] bridge: polling primary + org-membership auth (orchestrator design change) Polling is now the primary, read-only trigger (always-on thread); the /hook webhook is an optional admin-registered push optimization deduped by comment id. Authorize commenters via GET /orgs/{owner}/members/{user} (204, read-level) + optional allowlist, replacing the admin-requiring /collaborators permission endpoint. Bot never self-registers webhooks. Enroll = POLL_REPOS + tests//. Co-Authored-By: Claude Opus 4.7 (1M context) --- DECISIONS.md | 21 +++++ bridge/bridge.py | 198 ++++++++++++++++++++++++++++-------------- docs/enroll-recipe.md | 20 ++++- modules/bridge.nix | 5 ++ 4 files changed, 179 insertions(+), 65 deletions(-) diff --git a/DECISIONS.md b/DECISIONS.md index ff846bd..f1361cb 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -48,6 +48,27 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8) wildcard means bumping `SECRET_WILDCARD_*_VERSION` (operator) so the next reconcile re-inserts. Documented in docs/secrets.md at M7. +- **Trigger: POLLING primary, webhook optional — SETTLED (orchestrator design change 2026-05-27, + supersedes the earlier "keep webhook, do NOT pivot to polling" steer).** Hard constraint: the + bot/server runs at **READ level, never repo-admin**, and **never self-registers a webhook**. + - **Polling is PRIMARY and the source of truth for D1.** The bridge polls each enrolled repo's + open PRs for new `!testme` comments every `POLL_INTERVAL` (30s ≤ 60s). Outbound + (cc-ci → git.autonomic.zone, the reliably-working direction), needs only read+comment. On + startup the first poll marks pre-existing comments seen so it doesn't fire on old comments. + - **Webhook is an OPTIONAL push optimization.** The `/hook` endpoint stays live (HMAC-verified) + so an *admin-registered* `issue_comment` webhook lowers latency, but the bridge never registers + one. Manual registration is documented in `docs/enroll-recipe.md`. Both paths share an + in-memory seen-set keyed by comment id → a comment seen by both fires at most once. + - **Commenter authorization via org membership (read-level, no admin).** Allowed iff + `GET /orgs/{owner}/members/{user}` → 204 (verified 2026-05-27: admits bot/trav/notplants, 404 + for a non-member, works with bot read-level basic-auth) **or** the user is in the optional + `AUTH_ALLOWLIST`. Replaces the earlier `/collaborators/{user}/permission` check, which needs + repo-admin. Fail-closed on any error. + - **Enrollment** = add the repo to the bridge `POLL_REPOS` csv + ensure `tests//` exists. + No webhook required for CI to work. (Why root cause of the old webhook non-delivery doesn't + matter: polling makes it irrelevant; the operator was whitelisting `ci.commoninternet.net` in + Gitea's `ALLOWED_HOST_LIST`, but D1 no longer depends on that.) + ## Open (defaults from §8, to confirm as reality lands) - **Deploy mechanism — SETTLED (M0):** `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run *on diff --git a/bridge/bridge.py b/bridge/bridge.py index 7a7711f..6aa1725 100644 --- a/bridge/bridge.py +++ b/bridge/bridge.py @@ -1,24 +1,38 @@ #!/usr/bin/env python3 """cc-ci comment-bridge (§4.1). -Receives Gitea `issue_comment` webhooks; when a *collaborator* comments exactly `!testme` on an -open PR, triggers a parameterized Drone build of the cc-ci pipeline for that PR's head commit and -posts a PR comment linking the run. Everything else is ignored. Python stdlib only. +When an *authorized* user comments exactly `!testme` on an open PR in an enrolled recipe repo, +trigger a parameterized Drone build of the cc-ci pipeline for that PR's head commit and post a PR +comment linking the run. Everything else is ignored. -Config (env): - BRIDGE_LISTEN host:port to bind (default 0.0.0.0:8080) - GITEA_API e.g. https://git.autonomic.zone/api/v1 - DRONE_URL e.g. https://drone.ci.commoninternet.net - CI_REPO the pipeline repo, e.g. recipe-maintainers/cc-ci - HMAC_FILE file with the webhook HMAC secret - DRONE_TOKEN_FILE file with the Drone API token - GITEA_TOKEN_FILE file with the Gitea API token +Trigger paths (§4.1, SETTLED): + * POLLING is PRIMARY (always on): the bridge polls each enrolled repo's open PRs for new + `!testme` comments every POLL_INTERVAL seconds. This is outbound (cc-ci -> git.autonomic.zone) + and needs only READ + comment access — never repo-admin. It is the source of truth for D1. + * WEBHOOK is an OPTIONAL push optimization: the `/hook` endpoint stays live so a Gitea + `issue_comment` webhook, *if an admin registered one*, lowers latency. The bridge NEVER + self-registers a webhook (that needs repo-admin, which we refuse). Manual registration is + documented in docs/enroll-recipe.md. + +Both paths share an in-memory seen-set keyed by comment id, so a comment seen by both fires at most +once (no double-trigger). On startup the first poll marks pre-existing comments seen so old comments +don't re-fire. Python stdlib only. + +Authorization: a commenter is allowed iff they are a member of the repo's owning org +(`GET /orgs/{owner}/members/{user}` -> 204), which is readable by any org member (read-level, no +admin). An optional AUTH_ALLOWLIST (csv of usernames) is also honored. Fail-closed on any error. + +Config (env): BRIDGE_LISTEN, GITEA_API, DRONE_URL, CI_REPO, HMAC_FILE, DRONE_TOKEN_FILE, +GITEA_TOKEN_FILE, POLL_INTERVAL (default 30), POLL_REPOS (csv of enrolled repos), AUTH_ALLOWLIST +(csv, optional). """ import hashlib import hmac import json import os import sys +import threading +import time import urllib.error import urllib.parse import urllib.request @@ -28,6 +42,7 @@ GITEA_API = os.environ.get("GITEA_API", "https://git.autonomic.zone/api/v1") DRONE_URL = os.environ.get("DRONE_URL", "https://drone.ci.commoninternet.net") CI_REPO = os.environ.get("CI_REPO", "recipe-maintainers/cc-ci") TRIGGER = "!testme" +ALLOWLIST = {u.strip() for u in os.environ.get("AUTH_ALLOWLIST", "").split(",") if u.strip()} def _read(path): @@ -39,13 +54,18 @@ HMAC_SECRET = _read(os.environ["HMAC_FILE"]).encode() DRONE_TOKEN = _read(os.environ["DRONE_TOKEN_FILE"]) GITEA_TOKEN = _read(os.environ["GITEA_TOKEN_FILE"]) +# Shared dedup across the poll + webhook paths: a comment id triggers at most one run. +_PROCESSED: set = set() +_PROCESSED_LOCK = threading.Lock() + def log(*a): print(*a, file=sys.stderr, flush=True) -def _api(url, token, method="GET", data=None): - headers = {"Authorization": "token " + token} if token else {} +def _api(url, token, method="GET", data=None, scheme="token"): + # Gitea wants "Authorization: token "; Drone wants "Authorization: Bearer ". + headers = {"Authorization": f"{scheme} {token}"} if token else {} body = None if data is not None: body = json.dumps(data).encode() @@ -57,11 +77,22 @@ def _api(url, token, method="GET", data=None): return resp.status, (json.loads(raw) if raw else None) except urllib.error.HTTPError as e: return e.code, None + except (urllib.error.URLError, OSError) as e: + log("api error", url, e) + return None, None -def is_collaborator(full_name, user): - # 204 => the user has push access (collaborator or org member with access). - status, _ = _api(f"{GITEA_API}/repos/{full_name}/collaborators/{user}", GITEA_TOKEN) +def is_authorized(full_name, user): + """Allowed iff the user is a member of the repo's owning org (read-level membership check) or in + the static AUTH_ALLOWLIST. Uses GET /orgs/{owner}/members/{user} (204=member), which any org + member can read — no repo-admin needed. Fail-closed: anything other than a clean 204/allowlist + hit is rejected.""" + if not user: + return False + if user in ALLOWLIST: + return True + owner = full_name.partition("/")[0] + status, _ = _api(f"{GITEA_API}/orgs/{owner}/members/{user}", GITEA_TOKEN) return status == 204 @@ -79,7 +110,7 @@ def trigger_build(recipe, ref, pr, src): {"branch": "main", "RECIPE": recipe, "REF": ref, "PR": str(pr), "SRC": src} ) url = f"{DRONE_URL}/api/repos/{CI_REPO}/builds?{q}" - status, build = _api(url, DRONE_TOKEN, method="POST") + status, build = _api(url, DRONE_TOKEN, method="POST", scheme="Bearer") if status in (200, 201) and build: return build.get("number") log("drone trigger failed", status) @@ -87,12 +118,52 @@ def trigger_build(recipe, ref, pr, src): def post_comment(owner, repo, number, body): - _api( - f"{GITEA_API}/repos/{owner}/{repo}/issues/{number}/comments", - GITEA_TOKEN, - method="POST", - data={"body": body}, - ) + _api(f"{GITEA_API}/repos/{owner}/{repo}/issues/{number}/comments", GITEA_TOKEN, + method="POST", data={"body": body}) + + +def list_open_prs(full_name): + status, prs = _api(f"{GITEA_API}/repos/{full_name}/pulls?state=open&limit=50", GITEA_TOKEN) + return prs if status == 200 and prs else [] + + +def list_comments(full_name, number): + status, cs = _api(f"{GITEA_API}/repos/{full_name}/issues/{number}/comments", GITEA_TOKEN) + return cs if status == 200 and cs else [] + + +def _claim(comment_id) -> bool: + """Atomically claim a comment id for processing. Returns False if already claimed (dedup).""" + if comment_id is None: + return True + with _PROCESSED_LOCK: + if comment_id in _PROCESSED: + return False + _PROCESSED.add(comment_id) + return True + + +def process_testme(full_name, owner, name, number, user, comment_id, source): + """Shared by both paths. Dedupes by comment id, checks authorization, resolves the PR head, + triggers the build, comments the run link. Returns (run_url|None, reason).""" + if not _claim(comment_id): + return None, "duplicate" + if not is_authorized(full_name, user): + log(f"rejected: {user} is not an authorized org member on {full_name}") + return None, "not authorized" + head = pr_head(owner, name, number) + if not head or not head["sha"]: + return None, "cannot resolve PR head" + num = trigger_build(name, head["sha"], number, head["repo"] or full_name) + if not num: + post_comment(owner, name, number, "cc-ci: failed to start a CI run (see bridge logs).") + return None, "trigger failed" + run_url = f"{DRONE_URL}/{CI_REPO}/{num}" + post_comment(owner, name, number, + f"cc-ci: started CI run for `{name}` @ `{head['sha'][:8]}` → {run_url}") + log(f"[{source}] triggered build {num} for {name}@{head['sha'][:8]} " + f"(PR #{number}, comment {comment_id}) by {user}") + return run_url, "ok" class Handler(BaseHTTPRequestHandler): @@ -103,80 +174,81 @@ class Handler(BaseHTTPRequestHandler): self.wfile.write(msg.encode()) def do_GET(self): - # health endpoint if self.path.rstrip("/") in ("/hook/healthz", "/healthz"): return self._send(200, "ok") return self._send(404, "not found") def do_POST(self): + # Optional push optimization; polling is primary. Deduped against the poller by comment id. length = int(self.headers.get("Content-Length", 0)) body = self.rfile.read(length) - # 1) verify HMAC (Gitea sends hex sha256 in X-Gitea-Signature) sig = self.headers.get("X-Gitea-Signature", "") expected = hmac.new(HMAC_SECRET, body, hashlib.sha256).hexdigest() if not hmac.compare_digest(sig, expected): - log(f"rejected: bad signature event={self.headers.get('X-Gitea-Event')} " - f"got={sig[:12]} want={expected[:12]} bodylen={len(body)} seclen={len(HMAC_SECRET)} " - f"hub256={(self.headers.get('X-Hub-Signature-256') or '')[:20]}") + log(f"rejected: bad signature event={self.headers.get('X-Gitea-Event')}") return self._send(401, "bad signature") - if self.headers.get("X-Gitea-Event") != "issue_comment": return self._send(204, "ignored") - try: payload = json.loads(body) except ValueError: return self._send(400, "bad json") action = payload.get("action") - comment = (payload.get("comment") or {}).get("body", "") + c = payload.get("comment") or {} issue = payload.get("issue") or {} repo = payload.get("repository") or {} - user = (payload.get("comment") or {}).get("user", {}).get("login", "") - full_name = repo.get("full_name", "") - owner = (repo.get("owner") or {}).get("login", "") - name = repo.get("name", "") - number = issue.get("number") - - # 2) only a created comment, exactly "!testme", on a PR - if action != "created" or comment.strip() != TRIGGER: + if action != "created" or (c.get("body") or "").strip() != TRIGGER: return self._send(204, "ignored") if not issue.get("pull_request"): return self._send(204, "not a PR") - # 3) commenter must be a collaborator / org member with access - if not is_collaborator(full_name, user): - log(f"rejected: {user} not a collaborator on {full_name}") - return self._send(403, "not authorized") - - # 4) resolve PR head (test the code at the PR head commit) - head = pr_head(owner, name, number) - if not head or not head["sha"]: - return self._send(502, "cannot resolve PR head") - - # 5) trigger the parameterized Drone build - num = trigger_build(name, head["sha"], number, head["repo"] or full_name) - if not num: - post_comment(owner, name, number, "cc-ci: failed to start a CI run (see bridge logs).") - return self._send(502, "trigger failed") - - run_url = f"{DRONE_URL}/{CI_REPO}/{num}" - post_comment( - owner, name, number, - f"cc-ci: started CI run for `{name}` @ `{head['sha'][:8]}` → {run_url}", - ) - log(f"triggered build {num} for {name}@{head['sha'][:8]} (PR #{number}) by {user}") + run_url, reason = process_testme( + repo.get("full_name", ""), (repo.get("owner") or {}).get("login", ""), + repo.get("name", ""), issue.get("number"), + c.get("user", {}).get("login", ""), c.get("id"), "webhook") + if not run_url: + if reason == "duplicate": + return self._send(200, "already handled") + return self._send(403 if reason == "not authorized" else 502, reason) return self._send(201, run_url) - def log_message(self, *a): # quiet default access logging + def log_message(self, *a): pass +def poll_loop(): + """Primary trigger path. Outbound, read-only. Fires on NEW `!testme` comments only (the first + pass marks pre-existing comments seen).""" + repos = [r.strip() for r in os.environ.get("POLL_REPOS", CI_REPO).split(",") if r.strip()] + interval = int(os.environ.get("POLL_INTERVAL", "30")) + first = True + log(f"poller (primary) watching {repos} every {interval}s") + while True: + for full_name in repos: + owner, _, name = full_name.partition("/") + for pr in list_open_prs(full_name): + number = pr.get("number") + for c in list_comments(full_name, number): + if (c.get("body") or "").strip() != TRIGGER: + continue + cid = c.get("id") + if first: + _claim(cid) # mark pre-existing comments seen; don't fire on startup + continue + user = (c.get("user") or {}).get("login", "") + process_testme(full_name, owner, name, number, user, cid, "poll") + first = False + time.sleep(interval) + + def main(): + # Polling is the primary trigger; start it unconditionally. + threading.Thread(target=poll_loop, daemon=True).start() host, _, port = os.environ.get("BRIDGE_LISTEN", "0.0.0.0:8080").rpartition(":") srv = ThreadingHTTPServer((host or "0.0.0.0", int(port)), Handler) - log(f"comment-bridge listening on {host or '0.0.0.0'}:{port}") + log(f"comment-bridge listening on {host or '0.0.0.0'}:{port} (poll primary + optional webhook)") srv.serve_forever() diff --git a/docs/enroll-recipe.md b/docs/enroll-recipe.md index ae2706d..fe5241e 100644 --- a/docs/enroll-recipe.md +++ b/docs/enroll-recipe.md @@ -41,11 +41,27 @@ If the recipe's own repo contains `tests/test_*.py`, the runner snapshots them r runs them against the **live deployment** as a `recipe-local` stage. Contract: those tests receive env `CCCI_BASE_URL` (e.g. `https://.ci.commoninternet.net/`) and `CCCI_APP_DOMAIN`. -## 4. Register the trigger webhook +## 4. Add the repo to the bridge poll list + +The trigger is **polling** (primary): add the repo's full name to the comment-bridge `POLL_REPOS` +csv (`modules/bridge.nix`) and `nixos-rebuild switch`. The bridge then polls that repo's open PRs +every 30s and fires a run on a new `!testme` comment from an authorized org member. This needs only +**read + comment** access — no webhook, no repo-admin. -Add the per-repo Gitea webhook so `!testme` on a PR starts a run (see the bridge / runbook). Then `!testme` on a PR runs install/upgrade/backup + any recipe-local tests, and reports back to the PR. +### Optional: lower-latency webhook (admin-registered) + +Polling already satisfies D1 (<60s). For lower latency an **admin** may *optionally* register a +Gitea `issue_comment` webhook (the bot does **not** self-register one — that needs repo-admin): + +- URL `https://ci.commoninternet.net/hook`, content-type `application/json`, event `Issue Comment`, + secret = the shared webhook HMAC (`secrets/secrets.yaml` → `webhook_hmac`). +- The Gitea instance must allow the host (admin: add `ci.commoninternet.net` to the + `[webhook] ALLOWED_HOST_LIST`). + +The webhook and poller are deduped by comment id, so a comment seen by both fires only once. + ## Run locally ```sh diff --git a/modules/bridge.nix b/modules/bridge.nix index 96a999e..b5e2d54 100644 --- a/modules/bridge.nix +++ b/modules/bridge.nix @@ -31,6 +31,11 @@ let - DRONE_URL=https://drone.ci.commoninternet.net - CI_REPO=recipe-maintainers/cc-ci - BRIDGE_LISTEN=0.0.0.0:8080 + # Polling is PRIMARY (outbound, read-only, always on); the /hook webhook is an optional + # admin-registered push optimization deduped against the poller (§4.1). Enrollment = add + # the repo to POLL_REPOS (csv) + ensure tests// exists. + - POLL_INTERVAL=30 + - POLL_REPOS=recipe-maintainers/cc-ci - HMAC_FILE=/run/secrets/webhook_hmac - DRONE_TOKEN_FILE=/run/secrets/drone_token - GITEA_TOKEN_FILE=/run/secrets/gitea_token