bridge: polling primary + org-membership auth (orchestrator design change)
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing

Polling is now the primary, read-only trigger (always-on thread); the /hook
webhook is an optional admin-registered push optimization deduped by comment id.
Authorize commenters via GET /orgs/{owner}/members/{user} (204, read-level) +
optional allowlist, replacing the admin-requiring /collaborators permission
endpoint. Bot never self-registers webhooks. Enroll = POLL_REPOS + tests/<recipe>/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-27 02:41:25 +01:00
parent 25b628e959
commit 7addb9686c
4 changed files with 179 additions and 65 deletions

View File

@ -48,6 +48,27 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
wildcard means bumping `SECRET_WILDCARD_*_VERSION` (operator) so the next reconcile re-inserts. wildcard means bumping `SECRET_WILDCARD_*_VERSION` (operator) so the next reconcile re-inserts.
Documented in docs/secrets.md at M7. Documented in docs/secrets.md at M7.
- **Trigger: POLLING primary, webhook optional — SETTLED (orchestrator design change 2026-05-27,
supersedes the earlier "keep webhook, do NOT pivot to polling" steer).** Hard constraint: the
bot/server runs at **READ level, never repo-admin**, and **never self-registers a webhook**.
- **Polling is PRIMARY and the source of truth for D1.** The bridge polls each enrolled repo's
open PRs for new `!testme` comments every `POLL_INTERVAL` (30s ≤ 60s). Outbound
(cc-ci → git.autonomic.zone, the reliably-working direction), needs only read+comment. On
startup the first poll marks pre-existing comments seen so it doesn't fire on old comments.
- **Webhook is an OPTIONAL push optimization.** The `/hook` endpoint stays live (HMAC-verified)
so an *admin-registered* `issue_comment` webhook lowers latency, but the bridge never registers
one. Manual registration is documented in `docs/enroll-recipe.md`. Both paths share an
in-memory seen-set keyed by comment id → a comment seen by both fires at most once.
- **Commenter authorization via org membership (read-level, no admin).** Allowed iff
`GET /orgs/{owner}/members/{user}` → 204 (verified 2026-05-27: admits bot/trav/notplants, 404
for a non-member, works with bot read-level basic-auth) **or** the user is in the optional
`AUTH_ALLOWLIST`. Replaces the earlier `/collaborators/{user}/permission` check, which needs
repo-admin. Fail-closed on any error.
- **Enrollment** = add the repo to the bridge `POLL_REPOS` csv + ensure `tests/<recipe>/` exists.
No webhook required for CI to work. (Why root cause of the old webhook non-delivery doesn't
matter: polling makes it irrelevant; the operator was whitelisting `ci.commoninternet.net` in
Gitea's `ALLOWED_HOST_LIST`, but D1 no longer depends on that.)
## Open (defaults from §8, to confirm as reality lands) ## Open (defaults from §8, to confirm as reality lands)
- **Deploy mechanism — SETTLED (M0):** `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run *on - **Deploy mechanism — SETTLED (M0):** `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run *on

View File

@ -1,24 +1,38 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
"""cc-ci comment-bridge (§4.1). """cc-ci comment-bridge (§4.1).
Receives Gitea `issue_comment` webhooks; when a *collaborator* comments exactly `!testme` on an When an *authorized* user comments exactly `!testme` on an open PR in an enrolled recipe repo,
open PR, triggers a parameterized Drone build of the cc-ci pipeline for that PR's head commit and trigger a parameterized Drone build of the cc-ci pipeline for that PR's head commit and post a PR
posts a PR comment linking the run. Everything else is ignored. Python stdlib only. comment linking the run. Everything else is ignored.
Config (env): Trigger paths (§4.1, SETTLED):
BRIDGE_LISTEN host:port to bind (default 0.0.0.0:8080) * POLLING is PRIMARY (always on): the bridge polls each enrolled repo's open PRs for new
GITEA_API e.g. https://git.autonomic.zone/api/v1 `!testme` comments every POLL_INTERVAL seconds. This is outbound (cc-ci -> git.autonomic.zone)
DRONE_URL e.g. https://drone.ci.commoninternet.net and needs only READ + comment access — never repo-admin. It is the source of truth for D1.
CI_REPO the pipeline repo, e.g. recipe-maintainers/cc-ci * WEBHOOK is an OPTIONAL push optimization: the `/hook` endpoint stays live so a Gitea
HMAC_FILE file with the webhook HMAC secret `issue_comment` webhook, *if an admin registered one*, lowers latency. The bridge NEVER
DRONE_TOKEN_FILE file with the Drone API token self-registers a webhook (that needs repo-admin, which we refuse). Manual registration is
GITEA_TOKEN_FILE file with the Gitea API token documented in docs/enroll-recipe.md.
Both paths share an in-memory seen-set keyed by comment id, so a comment seen by both fires at most
once (no double-trigger). On startup the first poll marks pre-existing comments seen so old comments
don't re-fire. Python stdlib only.
Authorization: a commenter is allowed iff they are a member of the repo's owning org
(`GET /orgs/{owner}/members/{user}` -> 204), which is readable by any org member (read-level, no
admin). An optional AUTH_ALLOWLIST (csv of usernames) is also honored. Fail-closed on any error.
Config (env): BRIDGE_LISTEN, GITEA_API, DRONE_URL, CI_REPO, HMAC_FILE, DRONE_TOKEN_FILE,
GITEA_TOKEN_FILE, POLL_INTERVAL (default 30), POLL_REPOS (csv of enrolled repos), AUTH_ALLOWLIST
(csv, optional).
""" """
import hashlib import hashlib
import hmac import hmac
import json import json
import os import os
import sys import sys
import threading
import time
import urllib.error import urllib.error
import urllib.parse import urllib.parse
import urllib.request import urllib.request
@ -28,6 +42,7 @@ GITEA_API = os.environ.get("GITEA_API", "https://git.autonomic.zone/api/v1")
DRONE_URL = os.environ.get("DRONE_URL", "https://drone.ci.commoninternet.net") DRONE_URL = os.environ.get("DRONE_URL", "https://drone.ci.commoninternet.net")
CI_REPO = os.environ.get("CI_REPO", "recipe-maintainers/cc-ci") CI_REPO = os.environ.get("CI_REPO", "recipe-maintainers/cc-ci")
TRIGGER = "!testme" TRIGGER = "!testme"
ALLOWLIST = {u.strip() for u in os.environ.get("AUTH_ALLOWLIST", "").split(",") if u.strip()}
def _read(path): def _read(path):
@ -39,13 +54,18 @@ HMAC_SECRET = _read(os.environ["HMAC_FILE"]).encode()
DRONE_TOKEN = _read(os.environ["DRONE_TOKEN_FILE"]) DRONE_TOKEN = _read(os.environ["DRONE_TOKEN_FILE"])
GITEA_TOKEN = _read(os.environ["GITEA_TOKEN_FILE"]) GITEA_TOKEN = _read(os.environ["GITEA_TOKEN_FILE"])
# Shared dedup across the poll + webhook paths: a comment id triggers at most one run.
_PROCESSED: set = set()
_PROCESSED_LOCK = threading.Lock()
def log(*a): def log(*a):
print(*a, file=sys.stderr, flush=True) print(*a, file=sys.stderr, flush=True)
def _api(url, token, method="GET", data=None): def _api(url, token, method="GET", data=None, scheme="token"):
headers = {"Authorization": "token " + token} if token else {} # Gitea wants "Authorization: token <t>"; Drone wants "Authorization: Bearer <t>".
headers = {"Authorization": f"{scheme} {token}"} if token else {}
body = None body = None
if data is not None: if data is not None:
body = json.dumps(data).encode() body = json.dumps(data).encode()
@ -57,11 +77,22 @@ def _api(url, token, method="GET", data=None):
return resp.status, (json.loads(raw) if raw else None) return resp.status, (json.loads(raw) if raw else None)
except urllib.error.HTTPError as e: except urllib.error.HTTPError as e:
return e.code, None return e.code, None
except (urllib.error.URLError, OSError) as e:
log("api error", url, e)
return None, None
def is_collaborator(full_name, user): def is_authorized(full_name, user):
# 204 => the user has push access (collaborator or org member with access). """Allowed iff the user is a member of the repo's owning org (read-level membership check) or in
status, _ = _api(f"{GITEA_API}/repos/{full_name}/collaborators/{user}", GITEA_TOKEN) the static AUTH_ALLOWLIST. Uses GET /orgs/{owner}/members/{user} (204=member), which any org
member can read — no repo-admin needed. Fail-closed: anything other than a clean 204/allowlist
hit is rejected."""
if not user:
return False
if user in ALLOWLIST:
return True
owner = full_name.partition("/")[0]
status, _ = _api(f"{GITEA_API}/orgs/{owner}/members/{user}", GITEA_TOKEN)
return status == 204 return status == 204
@ -79,7 +110,7 @@ def trigger_build(recipe, ref, pr, src):
{"branch": "main", "RECIPE": recipe, "REF": ref, "PR": str(pr), "SRC": src} {"branch": "main", "RECIPE": recipe, "REF": ref, "PR": str(pr), "SRC": src}
) )
url = f"{DRONE_URL}/api/repos/{CI_REPO}/builds?{q}" url = f"{DRONE_URL}/api/repos/{CI_REPO}/builds?{q}"
status, build = _api(url, DRONE_TOKEN, method="POST") status, build = _api(url, DRONE_TOKEN, method="POST", scheme="Bearer")
if status in (200, 201) and build: if status in (200, 201) and build:
return build.get("number") return build.get("number")
log("drone trigger failed", status) log("drone trigger failed", status)
@ -87,12 +118,52 @@ def trigger_build(recipe, ref, pr, src):
def post_comment(owner, repo, number, body): def post_comment(owner, repo, number, body):
_api( _api(f"{GITEA_API}/repos/{owner}/{repo}/issues/{number}/comments", GITEA_TOKEN,
f"{GITEA_API}/repos/{owner}/{repo}/issues/{number}/comments", method="POST", data={"body": body})
GITEA_TOKEN,
method="POST",
data={"body": body}, def list_open_prs(full_name):
) status, prs = _api(f"{GITEA_API}/repos/{full_name}/pulls?state=open&limit=50", GITEA_TOKEN)
return prs if status == 200 and prs else []
def list_comments(full_name, number):
status, cs = _api(f"{GITEA_API}/repos/{full_name}/issues/{number}/comments", GITEA_TOKEN)
return cs if status == 200 and cs else []
def _claim(comment_id) -> bool:
"""Atomically claim a comment id for processing. Returns False if already claimed (dedup)."""
if comment_id is None:
return True
with _PROCESSED_LOCK:
if comment_id in _PROCESSED:
return False
_PROCESSED.add(comment_id)
return True
def process_testme(full_name, owner, name, number, user, comment_id, source):
"""Shared by both paths. Dedupes by comment id, checks authorization, resolves the PR head,
triggers the build, comments the run link. Returns (run_url|None, reason)."""
if not _claim(comment_id):
return None, "duplicate"
if not is_authorized(full_name, user):
log(f"rejected: {user} is not an authorized org member on {full_name}")
return None, "not authorized"
head = pr_head(owner, name, number)
if not head or not head["sha"]:
return None, "cannot resolve PR head"
num = trigger_build(name, head["sha"], number, head["repo"] or full_name)
if not num:
post_comment(owner, name, number, "cc-ci: failed to start a CI run (see bridge logs).")
return None, "trigger failed"
run_url = f"{DRONE_URL}/{CI_REPO}/{num}"
post_comment(owner, name, number,
f"cc-ci: started CI run for `{name}` @ `{head['sha'][:8]}` → {run_url}")
log(f"[{source}] triggered build {num} for {name}@{head['sha'][:8]} "
f"(PR #{number}, comment {comment_id}) by {user}")
return run_url, "ok"
class Handler(BaseHTTPRequestHandler): class Handler(BaseHTTPRequestHandler):
@ -103,80 +174,81 @@ class Handler(BaseHTTPRequestHandler):
self.wfile.write(msg.encode()) self.wfile.write(msg.encode())
def do_GET(self): def do_GET(self):
# health endpoint
if self.path.rstrip("/") in ("/hook/healthz", "/healthz"): if self.path.rstrip("/") in ("/hook/healthz", "/healthz"):
return self._send(200, "ok") return self._send(200, "ok")
return self._send(404, "not found") return self._send(404, "not found")
def do_POST(self): def do_POST(self):
# Optional push optimization; polling is primary. Deduped against the poller by comment id.
length = int(self.headers.get("Content-Length", 0)) length = int(self.headers.get("Content-Length", 0))
body = self.rfile.read(length) body = self.rfile.read(length)
# 1) verify HMAC (Gitea sends hex sha256 in X-Gitea-Signature)
sig = self.headers.get("X-Gitea-Signature", "") sig = self.headers.get("X-Gitea-Signature", "")
expected = hmac.new(HMAC_SECRET, body, hashlib.sha256).hexdigest() expected = hmac.new(HMAC_SECRET, body, hashlib.sha256).hexdigest()
if not hmac.compare_digest(sig, expected): if not hmac.compare_digest(sig, expected):
log(f"rejected: bad signature event={self.headers.get('X-Gitea-Event')} " log(f"rejected: bad signature event={self.headers.get('X-Gitea-Event')}")
f"got={sig[:12]} want={expected[:12]} bodylen={len(body)} seclen={len(HMAC_SECRET)} "
f"hub256={(self.headers.get('X-Hub-Signature-256') or '')[:20]}")
return self._send(401, "bad signature") return self._send(401, "bad signature")
if self.headers.get("X-Gitea-Event") != "issue_comment": if self.headers.get("X-Gitea-Event") != "issue_comment":
return self._send(204, "ignored") return self._send(204, "ignored")
try: try:
payload = json.loads(body) payload = json.loads(body)
except ValueError: except ValueError:
return self._send(400, "bad json") return self._send(400, "bad json")
action = payload.get("action") action = payload.get("action")
comment = (payload.get("comment") or {}).get("body", "") c = payload.get("comment") or {}
issue = payload.get("issue") or {} issue = payload.get("issue") or {}
repo = payload.get("repository") or {} repo = payload.get("repository") or {}
user = (payload.get("comment") or {}).get("user", {}).get("login", "") if action != "created" or (c.get("body") or "").strip() != TRIGGER:
full_name = repo.get("full_name", "")
owner = (repo.get("owner") or {}).get("login", "")
name = repo.get("name", "")
number = issue.get("number")
# 2) only a created comment, exactly "!testme", on a PR
if action != "created" or comment.strip() != TRIGGER:
return self._send(204, "ignored") return self._send(204, "ignored")
if not issue.get("pull_request"): if not issue.get("pull_request"):
return self._send(204, "not a PR") return self._send(204, "not a PR")
# 3) commenter must be a collaborator / org member with access run_url, reason = process_testme(
if not is_collaborator(full_name, user): repo.get("full_name", ""), (repo.get("owner") or {}).get("login", ""),
log(f"rejected: {user} not a collaborator on {full_name}") repo.get("name", ""), issue.get("number"),
return self._send(403, "not authorized") c.get("user", {}).get("login", ""), c.get("id"), "webhook")
if not run_url:
# 4) resolve PR head (test the code at the PR head commit) if reason == "duplicate":
head = pr_head(owner, name, number) return self._send(200, "already handled")
if not head or not head["sha"]: return self._send(403 if reason == "not authorized" else 502, reason)
return self._send(502, "cannot resolve PR head")
# 5) trigger the parameterized Drone build
num = trigger_build(name, head["sha"], number, head["repo"] or full_name)
if not num:
post_comment(owner, name, number, "cc-ci: failed to start a CI run (see bridge logs).")
return self._send(502, "trigger failed")
run_url = f"{DRONE_URL}/{CI_REPO}/{num}"
post_comment(
owner, name, number,
f"cc-ci: started CI run for `{name}` @ `{head['sha'][:8]}` → {run_url}",
)
log(f"triggered build {num} for {name}@{head['sha'][:8]} (PR #{number}) by {user}")
return self._send(201, run_url) return self._send(201, run_url)
def log_message(self, *a): # quiet default access logging def log_message(self, *a):
pass pass
def poll_loop():
"""Primary trigger path. Outbound, read-only. Fires on NEW `!testme` comments only (the first
pass marks pre-existing comments seen)."""
repos = [r.strip() for r in os.environ.get("POLL_REPOS", CI_REPO).split(",") if r.strip()]
interval = int(os.environ.get("POLL_INTERVAL", "30"))
first = True
log(f"poller (primary) watching {repos} every {interval}s")
while True:
for full_name in repos:
owner, _, name = full_name.partition("/")
for pr in list_open_prs(full_name):
number = pr.get("number")
for c in list_comments(full_name, number):
if (c.get("body") or "").strip() != TRIGGER:
continue
cid = c.get("id")
if first:
_claim(cid) # mark pre-existing comments seen; don't fire on startup
continue
user = (c.get("user") or {}).get("login", "")
process_testme(full_name, owner, name, number, user, cid, "poll")
first = False
time.sleep(interval)
def main(): def main():
# Polling is the primary trigger; start it unconditionally.
threading.Thread(target=poll_loop, daemon=True).start()
host, _, port = os.environ.get("BRIDGE_LISTEN", "0.0.0.0:8080").rpartition(":") host, _, port = os.environ.get("BRIDGE_LISTEN", "0.0.0.0:8080").rpartition(":")
srv = ThreadingHTTPServer((host or "0.0.0.0", int(port)), Handler) srv = ThreadingHTTPServer((host or "0.0.0.0", int(port)), Handler)
log(f"comment-bridge listening on {host or '0.0.0.0'}:{port}") log(f"comment-bridge listening on {host or '0.0.0.0'}:{port} (poll primary + optional webhook)")
srv.serve_forever() srv.serve_forever()

View File

@ -41,11 +41,27 @@ If the recipe's own repo contains `tests/test_*.py`, the runner snapshots them r
runs them against the **live deployment** as a `recipe-local` stage. Contract: those tests receive runs them against the **live deployment** as a `recipe-local` stage. Contract: those tests receive
env `CCCI_BASE_URL` (e.g. `https://<app>.ci.commoninternet.net/`) and `CCCI_APP_DOMAIN`. env `CCCI_BASE_URL` (e.g. `https://<app>.ci.commoninternet.net/`) and `CCCI_APP_DOMAIN`.
## 4. Register the trigger webhook ## 4. Add the repo to the bridge poll list
The trigger is **polling** (primary): add the repo's full name to the comment-bridge `POLL_REPOS`
csv (`modules/bridge.nix`) and `nixos-rebuild switch`. The bridge then polls that repo's open PRs
every 30s and fires a run on a new `!testme` comment from an authorized org member. This needs only
**read + comment** access — no webhook, no repo-admin.
Add the per-repo Gitea webhook so `!testme` on a PR starts a run (see the bridge / runbook). Then
`!testme` on a PR runs install/upgrade/backup + any recipe-local tests, and reports back to the PR. `!testme` on a PR runs install/upgrade/backup + any recipe-local tests, and reports back to the PR.
### Optional: lower-latency webhook (admin-registered)
Polling already satisfies D1 (<60s). For lower latency an **admin** may *optionally* register a
Gitea `issue_comment` webhook (the bot does **not** self-register one that needs repo-admin):
- URL `https://ci.commoninternet.net/hook`, content-type `application/json`, event `Issue Comment`,
secret = the shared webhook HMAC (`secrets/secrets.yaml` `webhook_hmac`).
- The Gitea instance must allow the host (admin: add `ci.commoninternet.net` to the
`[webhook] ALLOWED_HOST_LIST`).
The webhook and poller are deduped by comment id, so a comment seen by both fires only once.
## Run locally ## Run locally
```sh ```sh

View File

@ -31,6 +31,11 @@ let
- DRONE_URL=https://drone.ci.commoninternet.net - DRONE_URL=https://drone.ci.commoninternet.net
- CI_REPO=recipe-maintainers/cc-ci - CI_REPO=recipe-maintainers/cc-ci
- BRIDGE_LISTEN=0.0.0.0:8080 - BRIDGE_LISTEN=0.0.0.0:8080
# Polling is PRIMARY (outbound, read-only, always on); the /hook webhook is an optional
# admin-registered push optimization deduped against the poller (§4.1). Enrollment = add
# the repo to POLL_REPOS (csv) + ensure tests/<recipe>/ exists.
- POLL_INTERVAL=30
- POLL_REPOS=recipe-maintainers/cc-ci
- HMAC_FILE=/run/secrets/webhook_hmac - HMAC_FILE=/run/secrets/webhook_hmac
- DRONE_TOKEN_FILE=/run/secrets/drone_token - DRONE_TOKEN_FILE=/run/secrets/drone_token
- GITEA_TOKEN_FILE=/run/secrets/gitea_token - GITEA_TOKEN_FILE=/run/secrets/gitea_token