plan §4.2/§4.3: MAX_TESTS via DRONE_RUNNER_CAPACITY + native queue/timeout; teardown after each run
Don't overload the single node: cap concurrent test builds at a configurable MAX_TESTS (= DRONE_RUNNER_CAPACITY); Drone natively queues excess builds and times out hung ones, freeing slots — no custom queue. Each run deploys one app then undeploys; the run-start janitor is the backstop for timed-out/killed builds. At most MAX_TESTS apps live at once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -388,7 +388,18 @@ Bridge posts/updates a Gitea PR comment with the run URL and (on completion) pas
|
|||||||
cc-ci**. Make the `abra app new/deploy traefik` steps reproducible (scripted/Nix-invoked) for D8.
|
cc-ci**. Make the `abra app new/deploy traefik` steps reproducible (scripted/Nix-invoked) for D8.
|
||||||
- Each CI run gets an isolated app domain `<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`
|
- Each CI run gets an isolated app domain `<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`
|
||||||
(§4.0) so concurrent runs don't collide. Teardown removes app, secrets, and volumes.
|
(§4.0) so concurrent runs don't collide. Teardown removes app, secrets, and volumes.
|
||||||
- Consider a concurrency cap (1–2 deploys at a time) to avoid resource thrash; document it.
|
- **Concurrency cap + queue — use Drone natively (SETTLED).** Don't let the server fill with
|
||||||
|
simultaneously-deployed apps. Expose a configurable **`MAX_TESTS`** mapped to the exec runner's
|
||||||
|
**`DRONE_RUNNER_CAPACITY`** (Nix-set on the runner; default low — **1–2** given a single 28 GiB
|
||||||
|
node and heavy recipes like matrix/immich). Drone runs at most `MAX_TESTS` builds at once and
|
||||||
|
**automatically queues** excess builds (its native pending-build queue), starting them as slots
|
||||||
|
free. **Per-build timeout** (repo/runner timeout) guarantees a hung test is killed and frees its
|
||||||
|
slot — so "continue once a current test finishes *or times out*" is built in. No custom queue
|
||||||
|
needed. Optionally also set `concurrency: { limit: <N> }` in `.drone.yml` as a per-pipeline cap.
|
||||||
|
- **One app at a time per run, torn down at run end.** A build deploys its recipe, runs the three
|
||||||
|
stages, then **undeploys** — the server should not accumulate live test apps. Guaranteed teardown
|
||||||
|
+ the run-start janitor (§4.3) enforce this even when a build is timed-out/killed (in-process
|
||||||
|
cleanup can't run, so the janitor reaps it).
|
||||||
|
|
||||||
### 4.3 The test harness & recipe test contract
|
### 4.3 The test harness & recipe test contract
|
||||||
|
|
||||||
@ -415,6 +426,12 @@ in from day one** (carried over from prior work, re-verify on the installed abra
|
|||||||
The teardown guarantee is sacred: a failed test must never leak a deployed app or volume into the
|
The teardown guarantee is sacred: a failed test must never leak a deployed app or volume into the
|
||||||
next run. Implement teardown as a pytest fixture finalizer / `try/finally` in the orchestrator and
|
next run. Implement teardown as a pytest fixture finalizer / `try/finally` in the orchestrator and
|
||||||
add a janitor pass at run start that nukes any orphaned `*-pr*` apps older than N hours.
|
add a janitor pass at run start that nukes any orphaned `*-pr*` apps older than N hours.
|
||||||
|
**Crucially, the janitor is the backstop for timed-out/killed builds:** when Drone hits the
|
||||||
|
per-build timeout (or a build is cancelled) it may SIGKILL the runner process, so the `try/finally`
|
||||||
|
teardown can't run — those orphaned apps/volumes are reaped by the next build's run-start janitor
|
||||||
|
(and the janitor should run regardless of how the previous build ended). Net effect with the
|
||||||
|
`MAX_TESTS`/`DRONE_RUNNER_CAPACITY` cap (§4.2): at most `MAX_TESTS` apps are ever live at once, and
|
||||||
|
each is torn down (or janitor-reaped) so the single node never accumulates deployments.
|
||||||
|
|
||||||
### 4.4 Secrets (D6)
|
### 4.4 Secrets (D6)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user