resource safety: MAX_TESTS=capacity=1 + per-build 60m timeout (orchestrator design change)
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
Bound live test apps on the single 28GiB node. DRONE_RUNNER_CAPACITY=1 (MAX_TESTS) caps concurrent builds; Drone auto-queues the rest natively. deploy-drone reconcile sets the cc-ci repo build timeout to 60m (best-effort PATCH, non-fatal) so a hung build is killed and frees its slot. Janitor remains the backstop for SIGKILL'd builds. Verified on host: DRONE_RUNNER_CAPACITY=1; repo timeout=60 via Drone API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -6,6 +6,17 @@
|
||||
# DECISIONS.md "CI engine"). It connects to the server over RPC at drone.ci.commoninternet.net,
|
||||
# sharing DRONE_RPC_SECRET with the server via the sops-rendered EnvironmentFile.
|
||||
{ pkgs, config, lib, ... }:
|
||||
let
|
||||
# MAX_TESTS (plan §4.2/§4.3 resource safety): max CI builds the exec runner runs at once. Drone
|
||||
# queues the rest in its native pending-build queue (no custom queue). THE concurrency cap that
|
||||
# bounds how many test apps can be live at once — kept LOW (1) on this single 28GiB node since
|
||||
# recipes are heavy (immich/matrix large volumes). With capacity=1 there is never a concurrent
|
||||
# in-flight run, so the run-start janitor can safely reap *any* orphan (a SIGKILL'd build runs no
|
||||
# teardown) and the "at most MAX_TESTS apps live" bound holds exactly. Raise to 2 only if the node
|
||||
# is shown to handle two light recipes at once (then the janitor MUST stay age-based to avoid
|
||||
# reaping a concurrent run — see DECISIONS.md "Resource safety").
|
||||
maxTests = "1";
|
||||
in
|
||||
{
|
||||
# Drone ships under the Polyform Small Business license (nixpkgs marks it unfree);
|
||||
# permitted for our internal CI use. Allow only this package.
|
||||
@ -20,7 +31,7 @@
|
||||
environment = {
|
||||
DRONE_RPC_PROTO = "https";
|
||||
DRONE_RPC_HOST = "drone.ci.commoninternet.net";
|
||||
DRONE_RUNNER_CAPACITY = "2"; # concurrency cap (plan §4.2)
|
||||
DRONE_RUNNER_CAPACITY = maxTests; # MAX_TESTS concurrency cap (see let-binding above)
|
||||
DRONE_RUNNER_NAME = "cc-ci-exec";
|
||||
# exec runner needs a writable root for build workspaces
|
||||
DRONE_RUNNER_ROOT = "/var/lib/drone-runner";
|
||||
|
||||
Reference in New Issue
Block a user