Compare commits
22 Commits
feat/expec
...
restructur
| Author | SHA1 | Date | |
|---|---|---|---|
| b6e12ef428 | |||
| e1c4198c08 | |||
| d3fe9e26bb | |||
| 84d90fb655 | |||
| 91d3cc7e99 | |||
| 17ebdf39ac | |||
| b302f3ab63 | |||
| b492f995bd | |||
| 45afccbef5 | |||
| 48d03d8405 | |||
| 5b65c6caa3 | |||
| 157d06dc77 | |||
| e6d55b53c7 | |||
| 79c652ddd3 | |||
| 68ef0f84fb | |||
| c828f6cdd0 | |||
| c0df77d0d9 | |||
| 9a7772563a | |||
| 1ba0d961a3 | |||
| e76d4005ab | |||
| c32e6105d0 | |||
| c51cd84159 |
40
.drone.yml
40
.drone.yml
@ -35,10 +35,12 @@ steps:
|
|||||||
# the comment-bridge). Deploys the recipe at the PR head, runs install/upgrade/backup + any
|
# the comment-bridge). Deploys the recipe at the PR head, runs install/upgrade/backup + any
|
||||||
# recipe-local tests via the shared harness, then guarantees teardown (plan §4.2/§4.3).
|
# recipe-local tests via the shared harness, then guarantees teardown (plan §4.2/§4.3).
|
||||||
#
|
#
|
||||||
# Resource safety (plan §4.2/§4.3): MAX_TESTS=DRONE_RUNNER_CAPACITY=1 (nix/modules/drone-runner.nix) is
|
# Resource safety (plan §4.2/§4.3): DRONE_RUNNER_CAPACITY=2 (nix/modules/drone-runner.nix, the
|
||||||
# the primary concurrency cap; concurrency.limit below is a redundant belt. CCCI_JANITOR_MAX_AGE=0
|
# single concurrency knob) allows two recipe runs in parallel. Concurrent-run safety is enforced by
|
||||||
# makes the run-start janitor reap ANY orphaned run app before deploying — safe because capacity=1
|
# the harness, not by serialisation: every run holds an exclusive flock on its app domain
|
||||||
# means no concurrent run exists (a SIGKILL'd/timed-out build leaves an orphan with no teardown).
|
# (/run/lock/cc-ci-app-<domain>.lock) for its whole process lifetime, the run-start janitor probes
|
||||||
|
# that lock to reap only orphans (held lock = live run, never touched), and recipe working trees
|
||||||
|
# are per-run ($ABRA_DIR/recipes — no shared checkout, no recipe lock). See docs/concurrency.md.
|
||||||
kind: pipeline
|
kind: pipeline
|
||||||
type: exec
|
type: exec
|
||||||
name: recipe-ci
|
name: recipe-ci
|
||||||
@ -51,21 +53,37 @@ trigger:
|
|||||||
event:
|
event:
|
||||||
- custom
|
- custom
|
||||||
|
|
||||||
concurrency:
|
# NB deliberately NO `concurrency.limit` here: DRONE_RUNNER_CAPACITY (nix/modules/drone-runner.nix
|
||||||
limit: 1
|
# maxTests) is the single concurrency knob (P4 — two knobs in two files drifted).
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: ci
|
- name: ci
|
||||||
environment:
|
environment:
|
||||||
STAGES: install,upgrade,backup,restore,custom
|
STAGES: install,upgrade,backup,restore,custom
|
||||||
CCCI_JANITOR_MAX_AGE: "0"
|
# The exec runner points HOME at a per-build workspace; force it to /root so abra's server
|
||||||
# The exec runner points HOME at a per-build workspace; force it to /root so abra finds its
|
# config is found via the per-run ABRA_DIR's servers/ symlink -> /root/.abra/servers.
|
||||||
# server config + recipes under /root/.abra (as the manual M4/M5 runs did). Safe: capacity=1
|
# Recipe trees are PER-RUN ($ABRA_DIR/recipes, exported by run_recipe_ci before any abra
|
||||||
# means no concurrent build shares /root/.abra.
|
# call), so concurrent builds never share a recipe checkout; app .env files are per-domain
|
||||||
|
# in the shared canonical servers/ path, guarded by the app-domain flock.
|
||||||
HOME: /root
|
HOME: /root
|
||||||
commands:
|
commands:
|
||||||
# RECIPE/REF/PR/SRC (+ CCCI_QUICK for `!testme --quick`) are injected as env vars from the
|
# RECIPE/REF/PR/SRC (+ CCCI_QUICK for `!testme --quick`) are injected as env vars from the
|
||||||
# build's custom params. CCCI_QUICK=1 makes run_recipe_ci take the opt-in fast lane (WC7);
|
# build's custom params. CCCI_QUICK=1 makes run_recipe_ci take the opt-in fast lane (WC7);
|
||||||
# absent => full cold (default). run_quick ignores STAGES (always upgrade+custom).
|
# absent => full cold (default). run_quick ignores STAGES (always upgrade+custom).
|
||||||
- 'echo "recipe-ci: RECIPE=$RECIPE REF=$REF PR=$PR SRC=$SRC stages=$STAGES quick=${CCCI_QUICK:-0}"'
|
- 'echo "recipe-ci: RECIPE=$RECIPE REF=$REF PR=$PR SRC=$SRC stages=$STAGES quick=${CCCI_QUICK:-0}"'
|
||||||
- cc-ci-run runner/run_recipe_ci.py
|
# P1 lock-lifetime hardening: run the harness in its own session/process group (setsid) and
|
||||||
|
# forward a drone cancel (TERM to this step shell) to the WHOLE group, so the harness's
|
||||||
|
# SIGTERM handler runs its teardown funnel instead of being leaked (the exec runner kills
|
||||||
|
# only the step shell, not the tree). PDEATHSIG inside the harness backstops the case where
|
||||||
|
# this shell dies without the trap firing. The harness exit code is captured explicitly and
|
||||||
|
# the traps cleared before exiting: the runner shell is `set -e`, and an EXIT-trap kill of
|
||||||
|
# the already-gone process group returns ESRCH, which otherwise poisons a GREEN run's exit
|
||||||
|
# status to 1 (observed live, build 269: all tiers pass, step exit 1).
|
||||||
|
- |
|
||||||
|
setsid cc-ci-run runner/run_recipe_ci.py &
|
||||||
|
PID=$!
|
||||||
|
trap 'kill -TERM -- "-$PID" 2>/dev/null || true' TERM EXIT
|
||||||
|
rc=0
|
||||||
|
wait "$PID" || rc=$?
|
||||||
|
trap - TERM EXIT
|
||||||
|
exit "$rc"
|
||||||
|
|||||||
22
BACKLOG-conc.md
Normal file
22
BACKLOG-conc.md
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
# BACKLOG — sub-phase conc
|
||||||
|
|
||||||
|
## Build backlog
|
||||||
|
|
||||||
|
- [ ] P1 lock-lifetime hardening: prctl PDEATHSIG + ppid race check + SIGTERM handler →
|
||||||
|
teardown funnel + signal.alarm(3600) hard deadline; .drone.yml setsid/trap wrap;
|
||||||
|
PEP 446 comment on lock open()
|
||||||
|
- [ ] P2 flock-probe janitor: acquire_app_lock(domain) at register_run_app's call site;
|
||||||
|
janitor probes per-domain lockfiles (acquired→reap under probe lock, held→leave,
|
||||||
|
>120min mtime→warn); delete registry symbols
|
||||||
|
- [ ] P3 per-run ABRA_DIR: /var/lib/cc-ci-runs/<build>/abra with servers+catalogue symlinks,
|
||||||
|
fresh recipes/; fetch_recipe = plain clone; delete acquire_recipe_lock; route harness
|
||||||
|
recipe paths through ABRA_DIR
|
||||||
|
- [ ] P4 config cleanup: remove concurrency.limit from .drone.yml; maxTests is the single knob
|
||||||
|
- [ ] tests/concurrency suite (19 cases, real-kernel flock, explicit invocation only)
|
||||||
|
- [ ] P5 docs/concurrency.md rewrite to the new model
|
||||||
|
- [ ] M1 claim (branch complete, both suites + lint green)
|
||||||
|
- [ ] M2: merge to main after M1 PASS, push build green, live verification a–d
|
||||||
|
|
||||||
|
## Adversary findings
|
||||||
|
|
||||||
|
(adversary-owned)
|
||||||
24
JOURNAL-conc.md
Normal file
24
JOURNAL-conc.md
Normal file
@ -0,0 +1,24 @@
|
|||||||
|
# JOURNAL — sub-phase conc (Builder, append-only)
|
||||||
|
|
||||||
|
## 2026-06-10 — bootstrap
|
||||||
|
|
||||||
|
Read concurrency-restructure-full-plan.md (SSOT) + plan.md §6.1/§7/§9. Oriented on the code:
|
||||||
|
|
||||||
|
- `runner/harness/lifecycle.py` — recipe flock (l.46), registry (l.65–97), deploy_app
|
||||||
|
registration (l.283), teardown unregister (l.723), three-way janitor (l.726).
|
||||||
|
- `runner/run_recipe_ci.py` — `acquire_recipe_lock` call site (l.843), `fetch_recipe` (l.140,
|
||||||
|
rm-rf + reclone of the shared tree), janitor call sites (l.600 quick, l.932 cold).
|
||||||
|
- `.drone.yml` — recipe-ci step runs `cc-ci-run runner/run_recipe_ci.py` bare (P1 wraps it),
|
||||||
|
`concurrency.limit: 2` (P4 removes).
|
||||||
|
- Greps for P3 fallout: `~/.abra/recipes` referenced in abra.py (recipe_checkout,
|
||||||
|
has_lightweight_version_tags, recipe_head_commit, recipe_versions), generic.py:28,
|
||||||
|
lifecycle.prepull_images, run_recipe_ci (fetch_recipe, snapshot_recipe_tests, comment),
|
||||||
|
warm_reconcile.py:202 (runs OUTSIDE per-run context — keeps default), and
|
||||||
|
tests/ghost+discourse install_steps.sh (`${HOME}/.abra/recipes/...` — these run INSIDE a
|
||||||
|
run and copy compose.ccci.yml into the deploy tree, so they must resolve the per-run dir).
|
||||||
|
- `~/.abra/servers/...` paths are unaffected by design (servers/ is symlinked to the canonical
|
||||||
|
/root/.abra/servers, so both resolutions land on the same file).
|
||||||
|
|
||||||
|
Working setup: state files on main in this clone; code on branch `restructure/concurrency`
|
||||||
|
via a git worktree at ../cc-ci-conc; test runs on the cc-ci host via /root/builder-clone
|
||||||
|
(`cc-ci-run -m pytest ...`, `nix develop .#lint`).
|
||||||
32
REVIEW-conc.md
Normal file
32
REVIEW-conc.md
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
# REVIEW-conc.md — Adversary ledger, concurrency-restructure phase
|
||||||
|
|
||||||
|
Append-only. Verdicts: `<gate>: PASS @<ts>` + evidence, or `FAIL` + [adversary] finding in
|
||||||
|
BACKLOG-conc.md. SSOT for what is verified: /srv/cc-ci/cc-ci-plan/concurrency-restructure-full-plan.md.
|
||||||
|
|
||||||
|
## 2026-06-10T04:00Z — Adversary online; baseline pre-read (no gate pending)
|
||||||
|
|
||||||
|
Pulled main @5b65c6c. No STATUS-conc.md, no `restructure/concurrency` branch — nothing claimed yet.
|
||||||
|
Pre-read the CURRENT system (docs/concurrency.md @5b65c6c + lifecycle.py/run_recipe_ci.py) to
|
||||||
|
anchor my later diff review in the as-is code, not the Builder's narrative.
|
||||||
|
|
||||||
|
Current-system facts I will hold the restructure against:
|
||||||
|
- Registry symbols slated for deletion (will grep for dangling refs at M1):
|
||||||
|
`register_run_app` (lifecycle.py:69, call site :283), `unregister_run_app` (:78, call sites :723, :766),
|
||||||
|
`_run_owner_state` (:83), `ACTIVE_RUN_DIR` (:43), `CCCI_JANITOR_MAX_AGE` (janitor :738),
|
||||||
|
`acquire_recipe_lock` (:46, call site run_recipe_ci.py:843), `RECIPE_LOCK_DIR` (:42).
|
||||||
|
- Must survive untouched: `RUN_APP_RE` (lifecycle.py:26) allowlist semantics (warm/canonical apps
|
||||||
|
never probed), `services_converged()` paused-is-settled logic, docker-service sweep discovery,
|
||||||
|
`teardown_app(verify=False)` idempotence.
|
||||||
|
- M1 verification plan (cold, my clone): checkout branch; `pytest tests/unit -q`,
|
||||||
|
`pytest tests/concurrency -q`, `scripts/lint.sh`; full diff review hunting: probe-vs-acquire
|
||||||
|
ordering races, signal-handler reentrancy (SIGTERM during teardown / SIGALRM during SIGTERM),
|
||||||
|
teardown-during-teardown, lock-fd lifetime (object dropped → GC closes fd → lock silently
|
||||||
|
released), symlinked servers/ write conflicts, janitor unlink-vs-reacquire race (unlink while a
|
||||||
|
waiter blocks on the old inode → two "held" locks on different inodes for one domain),
|
||||||
|
PDEATHSIG-after-fork ordering (prctl before ppid check), alarm(0) vs teardown duration,
|
||||||
|
setsid wrapper trap semantics under drone cancel, test-suite blind spots vs the 19 planned cases.
|
||||||
|
- Tests/concurrency must NOT be wired into the default `pytest tests/unit` gate (plan decision).
|
||||||
|
- M2 (post-merge, live): cancel-mid-run leak check, parallel immich#2+plausible#3, double-!testme
|
||||||
|
same PR blocks visibly, one full green run. NEVER merge/push recipe mirror repos.
|
||||||
|
|
||||||
|
No verdict yet — waiting for Builder bootstrap/claim.
|
||||||
19
STATUS-conc.md
Normal file
19
STATUS-conc.md
Normal file
@ -0,0 +1,19 @@
|
|||||||
|
# STATUS — sub-phase conc (concurrency restructure)
|
||||||
|
|
||||||
|
Plan: /srv/cc-ci/cc-ci-plan/concurrency-restructure-full-plan.md (SSOT for this phase)
|
||||||
|
|
||||||
|
## Phase state
|
||||||
|
|
||||||
|
- Phase: conc — concurrency restructure (P1–P5 + tests/concurrency)
|
||||||
|
- Builder branch: `restructure/concurrency` (code lands there; main untouched until M2 merge)
|
||||||
|
- In flight: P1 (lock-lifetime hardening)
|
||||||
|
- Gate: none claimed yet
|
||||||
|
|
||||||
|
## Gates
|
||||||
|
|
||||||
|
- M1 (implementation verified): NOT CLAIMED
|
||||||
|
- M2 (merged + live-verified): NOT CLAIMED — blocked on M1 PASS
|
||||||
|
|
||||||
|
## Blockers
|
||||||
|
|
||||||
|
(none)
|
||||||
@ -64,6 +64,8 @@ def parse_trigger(body):
|
|||||||
if s == f"{TRIGGER} --quick":
|
if s == f"{TRIGGER} --quick":
|
||||||
return True, True
|
return True, True
|
||||||
return False, False
|
return False, False
|
||||||
|
|
||||||
|
|
||||||
ALLOWLIST = {u.strip() for u in os.environ.get("AUTH_ALLOWLIST", "").split(",") if u.strip()}
|
ALLOWLIST = {u.strip() for u in os.environ.get("AUTH_ALLOWLIST", "").split(",") if u.strip()}
|
||||||
|
|
||||||
|
|
||||||
@ -167,8 +169,12 @@ def post_commit_status(owner, repo, sha, state, target_url, description=""):
|
|||||||
f"{GITEA_API}/repos/{owner}/{repo}/statuses/{sha}",
|
f"{GITEA_API}/repos/{owner}/{repo}/statuses/{sha}",
|
||||||
GITEA_TOKEN,
|
GITEA_TOKEN,
|
||||||
method="POST",
|
method="POST",
|
||||||
data={"state": state, "target_url": target_url,
|
data={
|
||||||
"description": description, "context": "cc-ci/testme"},
|
"state": state,
|
||||||
|
"target_url": target_url,
|
||||||
|
"description": description,
|
||||||
|
"context": "cc-ci/testme",
|
||||||
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@ -217,7 +223,9 @@ def result_comment_body(recipe, sha, num, run_url, status):
|
|||||||
if artifact_available(badge_url):
|
if artifact_available(badge_url):
|
||||||
body += f"\n\n[]({run_url})"
|
body += f"\n\n[]({run_url})"
|
||||||
return f"{body}\n\n{links}"
|
return f"{body}\n\n{links}"
|
||||||
return f"{header} → {run_url}\n\n_(summary card unavailable — see the run for details.)_ {links}"
|
return (
|
||||||
|
f"{header} → {run_url}\n\n_(summary card unavailable — see the run for details.)_ {links}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def watch_and_reflect(owner, name, number, num, recipe, sha, comment_id, run_url):
|
def watch_and_reflect(owner, name, number, num, recipe, sha, comment_id, run_url):
|
||||||
|
|||||||
@ -66,8 +66,13 @@ _COLORS = {
|
|||||||
# Level → colour ramp, kept in sync with runner/harness/card.py LEVEL_COLOR (the dashboard is a
|
# Level → colour ramp, kept in sync with runner/harness/card.py LEVEL_COLOR (the dashboard is a
|
||||||
# standalone stdlib service that doesn't import the runner harness, so the small map is duplicated).
|
# standalone stdlib service that doesn't import the runner harness, so the small map is duplicated).
|
||||||
_LEVEL_COLOR = {
|
_LEVEL_COLOR = {
|
||||||
0: "#e5534b", 1: "#e0823d", 2: "#e0823d", 3: "#d9b343",
|
0: "#e5534b",
|
||||||
4: "#a0b93f", 5: "#57ab5a", 6: "#3fb950",
|
1: "#e0823d",
|
||||||
|
2: "#e0823d",
|
||||||
|
3: "#d9b343",
|
||||||
|
4: "#a0b93f",
|
||||||
|
5: "#57ab5a",
|
||||||
|
6: "#3fb950",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@ -269,7 +274,11 @@ def _card(r):
|
|||||||
f'<a class="shot" href="{run_url}" title="open run">'
|
f'<a class="shot" href="{run_url}" title="open run">'
|
||||||
f'<span class="ph">no screenshot</span>{_level_pill(r["level"])}</a>'
|
f'<span class="ph">no screenshot</span>{_level_pill(r["level"])}</a>'
|
||||||
)
|
)
|
||||||
cap = f'<div class="cap">{html.escape(r["level_cap_reason"])}</div>' if r["level_cap_reason"] else ""
|
cap = (
|
||||||
|
f'<div class="cap">{html.escape(r["level_cap_reason"])}</div>'
|
||||||
|
if r["level_cap_reason"]
|
||||||
|
else ""
|
||||||
|
)
|
||||||
return (
|
return (
|
||||||
f'<div class="card">{shot}<div class="body">'
|
f'<div class="card">{shot}<div class="body">'
|
||||||
f'<div class="name">{html.escape(r["recipe"])}</div>'
|
f'<div class="name">{html.escape(r["recipe"])}</div>'
|
||||||
@ -307,7 +316,11 @@ def render_history(recipe, rows):
|
|||||||
trs = []
|
trs = []
|
||||||
for r in rows:
|
for r in rows:
|
||||||
color = _COLORS.get(r["status"], "#8b949e")
|
color = _COLORS.get(r["status"], "#8b949e")
|
||||||
lvl = "—" if r["level"] is None else f'<b style="color:{level_color(r["level"])}">L{int(r["level"])}</b>'
|
lvl = (
|
||||||
|
"—"
|
||||||
|
if r["level"] is None
|
||||||
|
else f'<b style="color:{level_color(r["level"])}">L{int(r["level"])}</b>'
|
||||||
|
)
|
||||||
shot = f'<a href="/runs/{r["number"]}/summary.png">card</a>' if r["has_screenshot"] else "—"
|
shot = f'<a href="/runs/{r["number"]}/summary.png">card</a>' if r["has_screenshot"] else "—"
|
||||||
trs.append(
|
trs.append(
|
||||||
f'<tr><td><a href="{html.escape(r["url"])}">#{r["number"]}</a></td>'
|
f'<tr><td><a href="{html.escape(r["url"])}">#{r["number"]}</a></td>'
|
||||||
@ -317,7 +330,7 @@ def render_history(recipe, rows):
|
|||||||
)
|
)
|
||||||
body = "\n".join(trs) or '<tr><td colspan="6">no runs for this recipe yet</td></tr>'
|
body = "\n".join(trs) or '<tr><td colspan="6">no runs for this recipe yet</td></tr>'
|
||||||
inner = (
|
inner = (
|
||||||
f'<h1>{_FLOWER} {html.escape(recipe)} — run history</h1>'
|
f"<h1>{_FLOWER} {html.escape(recipe)} — run history</h1>"
|
||||||
'<p class="sub"><a href="/">← all recipes</a> · every <code>!testme</code> run, newest first.</p>'
|
'<p class="sub"><a href="/">← all recipes</a> · every <code>!testme</code> run, newest first.</p>'
|
||||||
"<table><thead><tr><th>Run</th><th>Status</th><th>Level</th><th>Version</th>"
|
"<table><thead><tr><th>Run</th><th>Status</th><th>Level</th><th>Version</th>"
|
||||||
"<th>When</th><th>Card</th></tr></thead><tbody>"
|
"<th>When</th><th>Card</th></tr></thead><tbody>"
|
||||||
|
|||||||
236
docs/concurrency.md
Normal file
236
docs/concurrency.md
Normal file
@ -0,0 +1,236 @@
|
|||||||
|
# Concurrency: how parallel recipe CI runs stay safe
|
||||||
|
|
||||||
|
Spec of the concurrent-run system after the 2026-06-10 restructure (branch
|
||||||
|
`restructure/concurrency`; plan: cc-ci-plan `concurrency-restructure-full-plan.md`). The previous
|
||||||
|
registry + per-recipe-flock model is documented in this file's git history (`5b65c6c`).
|
||||||
|
|
||||||
|
## 1. Goal and design summary
|
||||||
|
|
||||||
|
Two recipe CI builds may run **at the same time** on the single cc-ci host. Safety is enforced by
|
||||||
|
the **harness**, not by serialising everything, and rests on ONE locking mechanism plus ONE
|
||||||
|
structural isolation:
|
||||||
|
|
||||||
|
| Rule | Mechanism |
|
||||||
|
|---|---|
|
||||||
|
| Different recipes run in parallel | nothing blocks them (isolation, §3) |
|
||||||
|
| Same-RECIPE runs run in parallel too | per-run `ABRA_DIR` recipe trees (§4) — no shared tree, no lock |
|
||||||
|
| Same-DOMAIN runs (double-`!testme` of one PR) serialise | per-app-domain `flock` (§5) |
|
||||||
|
| A starting run never reaps a live concurrent run's app | janitor probes the app lock; held = live (§6) |
|
||||||
|
| A crashed/canceled/rebooted run's leftovers get reaped | lock auto-released by the kernel → probe acquires → reap (§6) |
|
||||||
|
|
||||||
|
The invariant chain that makes "held lock = live owner" sound:
|
||||||
|
|
||||||
|
```
|
||||||
|
lock lifetime ⊆ harness process lifetime ⊆ drone step lifetime ⊆ 60-min hard deadline
|
||||||
|
```
|
||||||
|
|
||||||
|
- **lock ⊆ process**: locks are kernel flocks on fds the process holds (and PEP 446 makes those
|
||||||
|
fds non-inheritable, so abra/docker/pytest children never carry them). The kernel releases them
|
||||||
|
on process death, however it dies. There is no unlock code path and no stale-lock failure mode.
|
||||||
|
- **process ⊆ step**: `PR_SET_PDEATHSIG(SIGTERM)` + the `.drone.yml` setsid/trap wrap (§2) — a
|
||||||
|
dead or canceled build cannot leak a running harness.
|
||||||
|
- **step ⊆ 60 min**: `signal.alarm(3600)` self-deadline (§2).
|
||||||
|
|
||||||
|
Never steal a held lock; manage the holder's lifetime. There is **no daemon and no shared state
|
||||||
|
service** — everything is kernel/file primitives under `/run/lock` and per-run directories.
|
||||||
|
|
||||||
|
## 2. Mechanism 0: run-lifetime hardening (`runner/harness/lifetime.py`)
|
||||||
|
|
||||||
|
`run_recipe_ci.main()` calls `lifetime.install_lifetime_guards()` before ANY abra call or lock
|
||||||
|
acquisition:
|
||||||
|
|
||||||
|
1. **`PR_SET_PDEATHSIG(SIGTERM)`** (ctypes prctl, return code checked): if the parent — the drone
|
||||||
|
step shell — dies, the kernel TERMs the harness. A post-prctl `ppid == 1` re-check closes the
|
||||||
|
start race: a harness whose parent died *before* the prctl armed would never get the signal,
|
||||||
|
so it refuses to run orphaned.
|
||||||
|
2. **SIGTERM handler**: logs, then raises `SystemExit(143)` so the run's `finally:` teardown
|
||||||
|
funnel executes and the process exits non-zero. Re-entrant signals during teardown are logged
|
||||||
|
and IGNORED (`lifetime.begin_teardown()`, also set at the top of the run's `finally:` blocks)
|
||||||
|
so a second signal can't abort the cleanup the first one asked for.
|
||||||
|
3. **`signal.alarm(3600)` hard deadline**: SIGALRM funnels into the same teardown path with a
|
||||||
|
distinct log line (`== run exceeded 60-minute hard deadline — tearing down ==`), exit 142.
|
||||||
|
Recipes keep their own smaller per-tier timeouts; this bounds the whole run. Teardown time
|
||||||
|
after the deadline is deliberately not alarm-bounded — the janitor is the backstop if a
|
||||||
|
teardown wedges and the process is killed harder.
|
||||||
|
|
||||||
|
The `.drone.yml` recipe-ci step runs the harness as `setsid cc-ci-run … &` with a
|
||||||
|
`trap 'kill -TERM -- "-$PID"' TERM EXIT; wait "$PID"` — a drone **cancel** (TERM to the step
|
||||||
|
shell) is forwarded to the harness's whole process group instead of leaking it (the exec runner
|
||||||
|
only kills the step shell). PDEATHSIG backstops the no-trap paths.
|
||||||
|
|
||||||
|
## 3. Isolation model: what is shared, what is per-run
|
||||||
|
|
||||||
|
Per-run (no conflict possible):
|
||||||
|
|
||||||
|
- **App + stack + volumes + secrets.** Run app domain = `naming.app_domain()` →
|
||||||
|
`<recipe[:4]>-<sha1(recipe|pr|ref)[:6]>.ci.commoninternet.net`, unique per (recipe, pr, ref);
|
||||||
|
everything abra creates is namespaced by it. Run apps are recognised by
|
||||||
|
`RUN_APP_RE = ^[a-z0-9]{1,4}-[0-9a-f]{6}\.ci\.commoninternet\.net$`; warm/canonical apps
|
||||||
|
(e.g. `warm-keycloak...`) deliberately do NOT match → the janitor never probes them.
|
||||||
|
- **Recipe working trees** — `$ABRA_DIR/recipes/<recipe>`, per run (§4). NEW in the restructure.
|
||||||
|
- **Drone build workspace** (`/var/lib/drone-runner/drone-<id>/`) and **run artifacts**
|
||||||
|
(`/var/lib/cc-ci-runs/<run-id>/`).
|
||||||
|
- **Run-scoped state files** (`/tmp/ccci-{deploys,opstate,deps,depskip}-<run-id>-<pid>…`) —
|
||||||
|
keyed by run id + harness pid via `run_recipe_ci._run_state_path()`, NEVER by app domain.
|
||||||
|
A second run of the same domain executes its `main()` preamble before blocking at the app
|
||||||
|
lock (§5), so domain-keyed files would be reset/removed underneath the live first run
|
||||||
|
(live finding, M2(c) double-`!testme`: false DG4.1 deploy-count in run 1, countfile
|
||||||
|
`FileNotFoundError` in run 2). Tier/hook children get the exact paths via the
|
||||||
|
`CCCI_*_FILE` env vars; removed on normal run exit.
|
||||||
|
|
||||||
|
Shared (by design, conflict-free):
|
||||||
|
|
||||||
|
- **`/root/.abra/servers`** — app `.env` files, one per domain. The per-run `ABRA_DIR` symlinks
|
||||||
|
`servers/` here, so .env files land in the canonical path: janitor discovery (`abra app ls`)
|
||||||
|
and out-of-run tooling see every app. Per-domain filenames + the app-domain lock prevent write
|
||||||
|
conflicts.
|
||||||
|
- **`/root/.abra/catalogue`** — read-mostly, symlinked into each per-run dir.
|
||||||
|
- **`HOME=/root`** (forced in `.drone.yml`) — safe: nothing recipe-mutable lives under `~/.abra`
|
||||||
|
for a run anymore except through the two symlinks above.
|
||||||
|
|
||||||
|
## 4. Mechanism 1: per-run `ABRA_DIR` (replaces the per-recipe flock)
|
||||||
|
|
||||||
|
`run_recipe_ci.setup_run_abra_dir()` — called first thing in `main()`, before any abra call —
|
||||||
|
builds `<runs_dir>/<run-id>/abra/` (run-id = Drone build number; `manual-<pid>` for hand runs):
|
||||||
|
|
||||||
|
```
|
||||||
|
abra/
|
||||||
|
servers/ -> /root/.abra/servers (symlink; canonical shared .env path)
|
||||||
|
catalogue/ -> /root/.abra/catalogue (symlink; read-mostly)
|
||||||
|
recipes/ fresh, empty (THE isolation that matters)
|
||||||
|
```
|
||||||
|
|
||||||
|
and exports it as `$ABRA_DIR` — honored by the abra CLI itself and by every harness path helper
|
||||||
|
(`abra.abra_dir()` / `abra.recipe_dir()`; `generic._recipe_dir`, `prepull_images`,
|
||||||
|
`snapshot_recipe_tests`, `warm_reconcile._recipe_dir` all route through the same rule:
|
||||||
|
`$ABRA_DIR` if set, else `~/.abra`).
|
||||||
|
|
||||||
|
- `fetch_recipe()` is now a plain clone into `$ABRA_DIR/recipes/<recipe>` (PR-head clone+checkout
|
||||||
|
or `abra recipe fetch`); the upgrade tier's mid-run `git checkout`s happen in the run's own
|
||||||
|
tree. Two same-recipe runs can no longer corrupt each other — structurally, with no lock. The
|
||||||
|
old observed failure (immich builds 229/230 deploying a tree missing its config) is impossible.
|
||||||
|
- `CCCI_SKIP_FETCH=1` (test/Adversary staging) copies the canonically-staged
|
||||||
|
`~/.abra/recipes/<recipe>` clone into the per-run tree.
|
||||||
|
- Out-of-run flows (warm_reconcile's systemd timer, manual abra) set no `ABRA_DIR` and keep using
|
||||||
|
the canonical `/root/.abra` unchanged. In-run flows that touch canonical state on purpose
|
||||||
|
(warm/canonical .env files) go through `servers/` and are unaffected.
|
||||||
|
- The per-run dir rides along the existing `/var/lib/cc-ci-runs/<run-id>/` retention. abra
|
||||||
|
auto-clones any recipe it needs to resolve (e.g. during `app ls`) into the per-run `recipes/` —
|
||||||
|
a few seconds of git per run, gone with the run dir.
|
||||||
|
|
||||||
|
## 5. Mechanism 2: per-app-domain flock (`lifecycle.acquire_app_lock`)
|
||||||
|
|
||||||
|
- Lock file: `/run/lock/cc-ci-app-<domain>.lock` (dir overridable via `CCCI_APP_LOCK_DIR` for the
|
||||||
|
test suite), exclusive `fcntl.flock`, taken in `deploy_app()` **before the app is created** — a
|
||||||
|
concurrent janitor can never see a run app without its held lock.
|
||||||
|
- Blocks (with a log line: `== app lock: another run of <domain> is in flight — waiting ==`) when
|
||||||
|
another run of the SAME domain is in flight — the double-`!testme` serialisation point; the
|
||||||
|
waiting run is visibly parked at that line in its drone log, by design.
|
||||||
|
- The returned file object is ALSO retained in module-level `_held_app_locks` — if a caller
|
||||||
|
dropped it, GC would close the fd and silently release the lock.
|
||||||
|
- mtime is touched at acquisition: lock age feeds the janitor's long-held flag (§6).
|
||||||
|
- **Unlink/recreate race guard**: the janitor unlinks reaped lockfiles, so after EVERY
|
||||||
|
acquisition the locked fd is verified to still be the inode the path names
|
||||||
|
(`fstat().st_ino == stat().st_ino`); a waiter that won a just-unlinked inode closes it and
|
||||||
|
retries on the live path. (A lock on an unlinked inode protects nothing: a later opener gets a
|
||||||
|
fresh inode and would acquire "the same" lock.)
|
||||||
|
- Release is implicit: process exit (any kind). `teardown_app()` does NOT release or unlink —
|
||||||
|
a clean run's leftover lockfile is unheld and is unlinked on sight by the next janitor sweep.
|
||||||
|
|
||||||
|
## 6. The flock-probe janitor (`lifecycle.janitor`)
|
||||||
|
|
||||||
|
Runs at every run start (cold + quick paths) and in the warm/upgrade sweeps. Candidate discovery
|
||||||
|
is unchanged from the old model: `abra app ls` + a docker-service sweep (catches stacks whose
|
||||||
|
`.env` is already gone), both matched against `RUN_APP_RE` — warm/canonical apps never match and
|
||||||
|
are never probed.
|
||||||
|
|
||||||
|
Decision table (per candidate domain, `_probe_and_reap`):
|
||||||
|
|
||||||
|
| Probe (`LOCK_EX\|LOCK_NB`) | Meaning | Action |
|
||||||
|
|---|---|---|
|
||||||
|
| acquires (+ inode identity OK) | nobody holds it → owner died (kernel-guaranteed) | **reap**: `teardown_app(verify=False)` WHILE HOLDING the probe lock, then unlink the lockfile, then release |
|
||||||
|
| acquires, inode stale | another janitor reaped + unlinked while we raced | skip (reap already done; unlinking now would hit a newer run's file) |
|
||||||
|
| `BlockingIOError` (held) | live concurrent run | leave it; if lockfile mtime > 120 min (2× the hard deadline): `!! lock for <domain> held >120min — possible leaked run; inspect with lslocks` — flag, **never steal** |
|
||||||
|
| `open()` fails (`OSError`) | garbled/unopenable lockfile | skip + log, never crash |
|
||||||
|
|
||||||
|
- Reaping under the probe lock closes the janitor-vs-new-run race: a new run of that domain
|
||||||
|
blocks in `acquire_app_lock` until the reap finishes — no window where a fresh app coexists
|
||||||
|
with a half-reaped one.
|
||||||
|
- Two racing janitors arbitrate on the flock: one reaps, the other sees "held" and leaves; reaps
|
||||||
|
are idempotent (`teardown_app(verify=False)` tolerates half-gone stacks).
|
||||||
|
- After the candidates, a tidy sweep unlinks stale **unheld** `cc-ci-app-*.lock` files with no
|
||||||
|
app behind them (under their own probe lock + identity check), keeping `/run/lock` clean.
|
||||||
|
- **Post-reboot**: `/run/lock` is tmpfs → lockfiles gone → every surviving app probes as an
|
||||||
|
orphan → reaped immediately. (Improvement over the old 2-hour age fallback; there IS no age
|
||||||
|
logic anymore.)
|
||||||
|
|
||||||
|
## 7. Failure-mode guarantees
|
||||||
|
|
||||||
|
| Event | Outcome |
|
||||||
|
|---|---|
|
||||||
|
| Run crashes / SIGKILL mid-run | flock auto-released by kernel → next janitor probe reaps app + lockfile |
|
||||||
|
| Drone build canceled via API | step trap TERMs the harness process group → SIGTERM funnel runs the run's own teardown (exit 143); if anything still leaks, PDEATHSIG + janitor reap (the old "cancel leaks the harness" gap is CLOSED) |
|
||||||
|
| Run exceeds 60 min | SIGALRM → distinct log line → own teardown → exit 142 |
|
||||||
|
| Host reboot | locks and lockfiles vanish (tmpfs, correct: no owners survived) → all surviving run apps reaped at the next run start, immediately |
|
||||||
|
| Two same-recipe `!testme`s (different PRs) | run in parallel — separate domains, separate per-run recipe trees |
|
||||||
|
| Double-`!testme` (same PR → same domain) | second blocks on the app lock before creating anything, visibly in its drone log, runs after the first finishes |
|
||||||
|
| Janitor vs. app being created | impossible to mis-reap: the lock is held before `app new`, and a held lock is never touched |
|
||||||
|
| Janitor unlink vs. blocked waiter | inode identity re-check on every acquisition → waiter retries on the live path |
|
||||||
|
| Lock held implausibly long (>120 min) | flagged loudly for a human (`lslocks`), never stolen |
|
||||||
|
|
||||||
|
## 8. Where convergence fits (adjacent; unchanged by the restructure)
|
||||||
|
|
||||||
|
Two swarm-convergence behaviors in `services_converged()` look like concurrency bugs but aren't —
|
||||||
|
any future work must keep them fixed:
|
||||||
|
|
||||||
|
- **N/N replicas ≠ converged** during a stop-first rolling update — `UpdateStatus.State` is also
|
||||||
|
inspected (build 238: backupbot exec'd into a container killed seconds later).
|
||||||
|
- **`paused` persists forever** (swarm's default `update-failure-action`) — only `updating` and
|
||||||
|
`rollback_started` block convergence; `paused`/`rollback_paused` are settled (build 241).
|
||||||
|
- `backup_app()` additionally waits (bounded 300s) for convergence before `backup create`.
|
||||||
|
|
||||||
|
## 9. Configuration knobs
|
||||||
|
|
||||||
|
| Knob | Where | Current | Meaning |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `DRONE_RUNNER_CAPACITY` (aka `MAX_TESTS`) | `nix/modules/drone-runner.nix` (`maxTests`) | `2` | **THE single concurrency knob.** Max builds the exec runner executes at once; Drone queues the rest. (The `.drone.yml` `concurrency.limit` duplicate was removed.) Change requires `nixos-rebuild switch`. |
|
||||||
|
| `CCCI_APP_LOCK_DIR` | env, read at call time | unset → `/run/lock` | App-domain lockfile dir override — used by `tests/concurrency` to sandbox locks. Never set in production. |
|
||||||
|
| hard deadline | `lifetime.HARD_DEADLINE_SECONDS` | 3600 s | the whole-run alarm; long-held flag threshold is 2× this (`LONG_HELD_LOCK_SECONDS`) |
|
||||||
|
|
||||||
|
## 10. Testing: `tests/concurrency/`
|
||||||
|
|
||||||
|
Real-kernel suite (19 planned cases + companions): helper subprocesses hold REAL flocks and
|
||||||
|
install the REAL prctl/signal/alarm guards — flock itself is never mocked; the janitor runs with
|
||||||
|
injected candidates + stubbed teardown but probes real locks. **Not part of the default
|
||||||
|
`pytest tests/unit` gate** (it spawns processes and sleeps); run it explicitly:
|
||||||
|
|
||||||
|
```
|
||||||
|
cc-ci-run -m pytest tests/concurrency -q
|
||||||
|
```
|
||||||
|
|
||||||
|
Covers: kernel auto-release on SIGKILL; LOCK_NB probe semantics; PEP 446 fd non-inheritance;
|
||||||
|
same-domain serialisation; orphan reap + unlink; live-run protection; reap-under-probe-lock
|
||||||
|
blocking; two-janitor arbitration; reboot-immediate reap; long-held flag; RUN_APP_RE allowlist;
|
||||||
|
degrade-on-garbage; PDEATHSIG; ppid start race; deadline + SIGTERM funnels; per-run ABRA_DIR
|
||||||
|
construction/export; concurrent same-recipe fetch isolation; symlinked-servers .env canonicality;
|
||||||
|
run-keyed (never domain-keyed) run-scoped state files (M2(c) regression, `test_run_state.py`).
|
||||||
|
|
||||||
|
## 11. File / symbol index
|
||||||
|
|
||||||
|
| What | Where |
|
||||||
|
|---|---|
|
||||||
|
| lifetime guards (PDEATHSIG, signal funnels, deadline) | `runner/harness/lifetime.py`; installed in `run_recipe_ci.main()` |
|
||||||
|
| setsid/trap cancel forwarding | `.drone.yml` (`recipe-ci` step) |
|
||||||
|
| `acquire_app_lock`, `_held_app_locks`, `_app_lock_path` | `runner/harness/lifecycle.py` |
|
||||||
|
| `acquire_app_lock` call site | `lifecycle.deploy_app()` (before app creation) |
|
||||||
|
| janitor + probe (`janitor`, `_probe_and_reap`, `LONG_HELD_LOCK_SECONDS`) | `runner/harness/lifecycle.py` |
|
||||||
|
| per-run ABRA_DIR (`setup_run_abra_dir`, `fetch_recipe`) | `runner/run_recipe_ci.py` |
|
||||||
|
| path resolution (`abra_dir`, `recipe_dir`) | `runner/harness/abra.py` (used by `generic`, `lifecycle.prepull_images`, `warm_reconcile`) |
|
||||||
|
| run-app naming | `runner/harness/naming.py` (`app_domain`), `RUN_APP_RE` in `lifecycle.py` |
|
||||||
|
| capacity knob | `nix/modules/drone-runner.nix` (`maxTests`) |
|
||||||
|
| convergence (adjacent) | `lifecycle.services_converged()`, `lifecycle.backup_app()` |
|
||||||
|
| the test suite | `tests/concurrency/` (`helpers.py` subprocess entrypoints, `concutil.py` probes) |
|
||||||
|
|
||||||
|
Deleted in the restructure (grep should find NOTHING): `register_run_app`, `unregister_run_app`,
|
||||||
|
`_run_owner_state`, `ACTIVE_RUN_DIR`, `CCCI_JANITOR_MAX_AGE`, `_stack_age_seconds`,
|
||||||
|
`acquire_recipe_lock`, `RECIPE_LOCK_DIR`.
|
||||||
54
flake.nix
54
flake.nix
@ -31,34 +31,36 @@
|
|||||||
];
|
];
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
# Canonical live host target: the Hetzner cc-ci server.
|
nixosConfigurations = {
|
||||||
# Use `.#cc-ci` for the current production host.
|
# Canonical live host target: the Hetzner cc-ci server.
|
||||||
nixosConfigurations.cc-ci = nixpkgs.lib.nixosSystem {
|
# Use `.#cc-ci` for the current production host.
|
||||||
inherit system;
|
cc-ci = nixpkgs.lib.nixosSystem {
|
||||||
modules = [
|
inherit system;
|
||||||
sops-nix.nixosModules.sops
|
modules = [
|
||||||
./nix/hosts/cc-ci-hetzner/configuration.nix
|
sops-nix.nixosModules.sops
|
||||||
];
|
./nix/hosts/cc-ci-hetzner/configuration.nix
|
||||||
};
|
];
|
||||||
|
};
|
||||||
|
|
||||||
# Legacy Incus VM host definition retained only for historical comparison and fallback.
|
# Legacy Incus VM host definition retained only for historical comparison and fallback.
|
||||||
# Do NOT use this target on the live Hetzner server.
|
# Do NOT use this target on the live Hetzner server.
|
||||||
nixosConfigurations.cc-ci-incus = nixpkgs.lib.nixosSystem {
|
cc-ci-incus = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
modules = [
|
modules = [
|
||||||
sops-nix.nixosModules.sops
|
sops-nix.nixosModules.sops
|
||||||
./nix/hosts/cc-ci/configuration.nix
|
./nix/hosts/cc-ci/configuration.nix
|
||||||
];
|
];
|
||||||
};
|
};
|
||||||
|
|
||||||
# Explicit alias for the live Hetzner host. Kept alongside `cc-ci` so the intended host target
|
# Explicit alias for the live Hetzner host. Kept alongside `cc-ci` so the intended host
|
||||||
# remains obvious in recovery/migration workflows.
|
# target remains obvious in recovery/migration workflows.
|
||||||
nixosConfigurations.cc-ci-hetzner = nixpkgs.lib.nixosSystem {
|
cc-ci-hetzner = nixpkgs.lib.nixosSystem {
|
||||||
inherit system;
|
inherit system;
|
||||||
modules = [
|
modules = [
|
||||||
sops-nix.nixosModules.sops
|
sops-nix.nixosModules.sops
|
||||||
./nix/hosts/cc-ci-hetzner/configuration.nix
|
./nix/hosts/cc-ci-hetzner/configuration.nix
|
||||||
];
|
];
|
||||||
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
devShells.${system} = {
|
devShells.${system} = {
|
||||||
|
|||||||
@ -7,7 +7,7 @@
|
|||||||
# git clone --recursive https://git.autonomic.zone/recipe-maintainers/cc-ci.git /etc/cc-ci
|
# git clone --recursive https://git.autonomic.zone/recipe-maintainers/cc-ci.git /etc/cc-ci
|
||||||
# install -m600 <age-private-key> /var/lib/sops-nix/key.txt
|
# install -m600 <age-private-key> /var/lib/sops-nix/key.txt
|
||||||
# nixos-rebuild switch --flake /etc/cc-ci#cc-ci-hetzner
|
# nixos-rebuild switch --flake /etc/cc-ci#cc-ci-hetzner
|
||||||
{ pkgs, lib, ... }:
|
{ pkgs, ... }:
|
||||||
{
|
{
|
||||||
imports = [
|
imports = [
|
||||||
./hardware.nix
|
./hardware.nix
|
||||||
|
|||||||
@ -11,13 +11,17 @@
|
|||||||
{
|
{
|
||||||
imports = [ (modulesPath + "/profiles/qemu-guest.nix") ];
|
imports = [ (modulesPath + "/profiles/qemu-guest.nix") ];
|
||||||
|
|
||||||
boot.loader = {
|
boot = {
|
||||||
efi.efiSysMountPoint = "/boot/efi";
|
loader = {
|
||||||
grub = {
|
efi.efiSysMountPoint = "/boot/efi";
|
||||||
efiSupport = true;
|
grub = {
|
||||||
efiInstallAsRemovable = true;
|
efiSupport = true;
|
||||||
device = "nodev";
|
efiInstallAsRemovable = true;
|
||||||
|
device = "nodev";
|
||||||
|
};
|
||||||
};
|
};
|
||||||
|
initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "xen_blkfront" "vmw_pvscsi" ];
|
||||||
|
initrd.kernelModules = [ "nvme" ];
|
||||||
};
|
};
|
||||||
|
|
||||||
fileSystems."/boot/efi" = {
|
fileSystems."/boot/efi" = {
|
||||||
@ -25,9 +29,6 @@
|
|||||||
fsType = "vfat";
|
fsType = "vfat";
|
||||||
};
|
};
|
||||||
|
|
||||||
boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "xen_blkfront" "vmw_pvscsi" ];
|
|
||||||
boot.initrd.kernelModules = [ "nvme" ];
|
|
||||||
|
|
||||||
fileSystems."/" = {
|
fileSystems."/" = {
|
||||||
device = "/dev/sda1";
|
device = "/dev/sda1";
|
||||||
fsType = "ext4";
|
fsType = "ext4";
|
||||||
|
|||||||
@ -8,14 +8,19 @@
|
|||||||
{ pkgs, config, lib, ... }:
|
{ pkgs, config, lib, ... }:
|
||||||
let
|
let
|
||||||
# MAX_TESTS (plan §4.2/§4.3 resource safety): max CI builds the exec runner runs at once. Drone
|
# MAX_TESTS (plan §4.2/§4.3 resource safety): max CI builds the exec runner runs at once. Drone
|
||||||
# queues the rest in its native pending-build queue (no custom queue). THE concurrency cap that
|
# queues the rest in its native pending-build queue (no custom queue). THE SINGLE concurrency
|
||||||
# bounds how many test apps can be live at once — kept LOW (1) on this single 28GiB node since
|
# knob — nothing else caps recipe-ci parallelism (the .drone.yml concurrency.limit was removed:
|
||||||
# recipes are heavy (immich/matrix large volumes). With capacity=1 there is never a concurrent
|
# one knob, one place). Bounds how many test apps can be live at once.
|
||||||
# in-flight run, so the run-start janitor can safely reap *any* orphan (a SIGKILL'd build runs no
|
#
|
||||||
# teardown) and the "at most MAX_TESTS apps live" bound holds exactly. Raise to 2 only if the node
|
# Raised to 2 (operator request 2026-06-09) so two recipes can be tested in parallel (e.g. immich
|
||||||
# is shown to handle two light recipes at once (then the janitor MUST stay age-based to avoid
|
# and plausible under active development at once). Verified safe on the current node (Hetzner cpx22,
|
||||||
# reaping a concurrent run — see DECISIONS.md "Resource safety").
|
# ~7.6 GiB / 4 vCPU — NOTE: smaller than the original 28 GiB this was written for): a full immich CI
|
||||||
maxTests = "1";
|
# stack measured ~1 GiB (server+ML+pg+redis) with multiple GiB free, so two concurrent recipes fit.
|
||||||
|
# Concurrent-run safety is the harness's job at ANY capacity (docs/concurrency.md): per-run
|
||||||
|
# ABRA_DIR recipe trees, per-app-domain flocks, and a flock-probe janitor that reaps a crashed
|
||||||
|
# build's orphan immediately (held lock = live run, never touched). Revert to "1" if OOM /
|
||||||
|
# disk-I/O contention is observed under load.
|
||||||
|
maxTests = "2";
|
||||||
in
|
in
|
||||||
{
|
{
|
||||||
# Drone ships under the Polyform Small Business license (nixpkgs marks it unfree);
|
# Drone ships under the Polyform Small Business license (nixpkgs marks it unfree);
|
||||||
|
|||||||
@ -29,7 +29,7 @@ in
|
|||||||
serviceConfig = {
|
serviceConfig = {
|
||||||
Type = "oneshot";
|
Type = "oneshot";
|
||||||
# A full sweep across several recipes (each a cold deploy/test/teardown) is long; bound it.
|
# A full sweep across several recipes (each a cold deploy/test/teardown) is long; bound it.
|
||||||
TimeoutStartSec = "21600"; # 6h ceiling
|
TimeoutStartSec = "21600"; # 6h ceiling
|
||||||
ExecStart = "${sweep}/bin/cc-ci-nightly-sweep";
|
ExecStart = "${sweep}/bin/cc-ci-nightly-sweep";
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
@ -39,7 +39,7 @@ in
|
|||||||
wantedBy = [ "timers.target" ];
|
wantedBy = [ "timers.target" ];
|
||||||
timerConfig = {
|
timerConfig = {
|
||||||
OnCalendar = "*-*-* 03:00:00";
|
OnCalendar = "*-*-* 03:00:00";
|
||||||
Persistent = true; # catch up a missed nightly after downtime
|
Persistent = true; # catch up a missed nightly after downtime
|
||||||
RandomizedDelaySec = "600";
|
RandomizedDelaySec = "600";
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|||||||
@ -3,10 +3,49 @@
|
|||||||
# no secrets — just static files behind traefik + the wildcard TLS (same pattern as dashboard.nix,
|
# no secrets — just static files behind traefik + the wildcard TLS (same pattern as dashboard.nix,
|
||||||
# but a plain nginx:alpine since there's nothing to render server-side). Content is updated by writing
|
# but a plain nginx:alpine since there's nothing to render server-side). Content is updated by writing
|
||||||
# files into /var/lib/cc-ci-reports; nginx serves them live (no redeploy needed).
|
# files into /var/lib/cc-ci-reports; nginx serves them live (no redeploy needed).
|
||||||
|
#
|
||||||
|
# It ALSO serves a same-origin realtime PR-status proxy at /pr/<recipe>/<n>: the report's STATUS
|
||||||
|
# column fetches it client-side to show each PR's live state (open vs. ✓). Same-origin means no
|
||||||
|
# dependency on the Gitea CORS allow-list; the recipe mirrors are public so no token is needed. The
|
||||||
|
# proxy is pinned to recipe-maintainers + a safe recipe-name charset and is read-only (GET/HEAD).
|
||||||
{ pkgs, ... }:
|
{ pkgs, ... }:
|
||||||
let
|
let
|
||||||
reportsDir = "/var/lib/cc-ci-reports";
|
reportsDir = "/var/lib/cc-ci-reports";
|
||||||
|
|
||||||
|
# Custom nginx server: static report files + the /pr/<recipe>/<n> → Gitea-API proxy. Replaces the
|
||||||
|
# stock /etc/nginx/conf.d/default.conf (which the image's nginx.conf includes inside http{}).
|
||||||
|
nginxConf = pkgs.writeText "cc-ci-reports-default.conf" ''
|
||||||
|
server {
|
||||||
|
listen 80;
|
||||||
|
server_name _;
|
||||||
|
root /usr/share/nginx/html;
|
||||||
|
index index.html;
|
||||||
|
|
||||||
|
# Realtime PR-status proxy for the Recipe Report STATUS column.
|
||||||
|
# GET /pr/<recipe>/<n> -> the PUBLIC Gitea PR JSON ({state, merged, ...}). Same-origin from
|
||||||
|
# the browser's view, so no CORS dependency; unauthenticated, since the recipe mirrors are
|
||||||
|
# public. The repo owner is hard-pinned to recipe-maintainers and the recipe name to a
|
||||||
|
# slashless charset, so the proxied path can only ever address recipe-maintainers/<name>/pulls
|
||||||
|
# (it cannot be coerced to another org or path). Only safe read methods are allowed.
|
||||||
|
location ~ ^/pr/([a-z0-9._-]+)/([0-9]+)$ {
|
||||||
|
limit_except GET HEAD { deny all; }
|
||||||
|
resolver 127.0.0.11 ipv6=off valid=30s; # docker embedded DNS (forwards external names)
|
||||||
|
proxy_ssl_server_name on;
|
||||||
|
proxy_set_header Host git.autonomic.zone;
|
||||||
|
proxy_set_header Accept "application/json";
|
||||||
|
proxy_pass https://git.autonomic.zone/api/v1/repos/recipe-maintainers/$1/pulls/$2;
|
||||||
|
proxy_intercept_errors off;
|
||||||
|
proxy_connect_timeout 5s;
|
||||||
|
proxy_read_timeout 10s;
|
||||||
|
add_header Cache-Control "no-store" always; # always fetch live state, never cache in the browser
|
||||||
|
}
|
||||||
|
|
||||||
|
location / {
|
||||||
|
try_files $uri $uri/ =404;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
'';
|
||||||
|
|
||||||
stack = pkgs.writeText "cc-ci-reports-stack.yml" ''
|
stack = pkgs.writeText "cc-ci-reports-stack.yml" ''
|
||||||
version: "3.8"
|
version: "3.8"
|
||||||
services:
|
services:
|
||||||
@ -17,6 +56,10 @@ let
|
|||||||
source: ${reportsDir}
|
source: ${reportsDir}
|
||||||
target: /usr/share/nginx/html
|
target: /usr/share/nginx/html
|
||||||
read_only: true
|
read_only: true
|
||||||
|
- type: bind
|
||||||
|
source: ${nginxConf}
|
||||||
|
target: /etc/nginx/conf.d/default.conf
|
||||||
|
read_only: true
|
||||||
networks:
|
networks:
|
||||||
- proxy
|
- proxy
|
||||||
deploy:
|
deploy:
|
||||||
|
|||||||
@ -10,6 +10,7 @@ Bakes in the known abra gotchas (re-verify per installed abra version, currently
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import json
|
import json
|
||||||
|
import os
|
||||||
import subprocess
|
import subprocess
|
||||||
|
|
||||||
ABRA = "abra"
|
ABRA = "abra"
|
||||||
@ -19,6 +20,20 @@ class AbraError(RuntimeError):
|
|||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def abra_dir() -> str:
|
||||||
|
"""abra's state dir, resolved the same way the abra CLI resolves it: $ABRA_DIR if set, else
|
||||||
|
~/.abra. Inside a CI run, run_recipe_ci exports a PER-RUN $ABRA_DIR (fresh recipes/, shared
|
||||||
|
servers/+catalogue/ symlinks) before any abra call, so every helper here and every abra
|
||||||
|
subprocess agree on the same tree; outside a run (warm_reconcile's systemd timer, manual use)
|
||||||
|
both fall back to the canonical /root/.abra."""
|
||||||
|
return os.environ.get("ABRA_DIR") or os.path.expanduser("~/.abra")
|
||||||
|
|
||||||
|
|
||||||
|
def recipe_dir(recipe: str) -> str:
|
||||||
|
"""The current ABRA_DIR's working tree for a recipe (per-run inside a CI run)."""
|
||||||
|
return os.path.join(abra_dir(), "recipes", recipe)
|
||||||
|
|
||||||
|
|
||||||
def _run_pty(
|
def _run_pty(
|
||||||
args: list[str], timeout: int = 900, check: bool = True
|
args: list[str], timeout: int = 900, check: bool = True
|
||||||
) -> subprocess.CompletedProcess:
|
) -> subprocess.CompletedProcess:
|
||||||
@ -77,9 +92,7 @@ def recipe_checkout(recipe: str, version: str) -> None:
|
|||||||
a chaos (`-C`) deploy ignores ENV VERSION and uses the current checkout — together that silently
|
a chaos (`-C`) deploy ignores ENV VERSION and uses the current checkout — together that silently
|
||||||
deployed LATEST for a 'previous-version' base, making the upgrade a no-op (Adversary F1d-2). With
|
deployed LATEST for a 'previous-version' base, making the upgrade a no-op (Adversary F1d-2). With
|
||||||
this checkout + a non-chaos deploy, a pinned deploy genuinely deploys that version."""
|
this checkout + a non-chaos deploy, a pinned deploy genuinely deploys that version."""
|
||||||
import os
|
path = recipe_dir(recipe)
|
||||||
|
|
||||||
path = os.path.expanduser(f"~/.abra/recipes/{recipe}")
|
|
||||||
# -f (force): the version-pinning checkout must yield the EXACT ref tree. Without it, a cc-ci
|
# -f (force): the version-pinning checkout must yield the EXACT ref tree. Without it, a cc-ci
|
||||||
# install_steps-provided overlay (e.g. discourse's compose.ccci.yml, copied into the pinned base)
|
# install_steps-provided overlay (e.g. discourse's compose.ccci.yml, copied into the pinned base)
|
||||||
# is an UNTRACKED file that collides with the same path TRACKED in a later ref, and
|
# is an UNTRACKED file that collides with the same path TRACKED in a later ref, and
|
||||||
@ -100,9 +113,7 @@ def has_lightweight_version_tags(recipe: str) -> bool:
|
|||||||
'reference not found'.) The caller (deploy_app) uses this to fall back to a chaos base deploy
|
'reference not found'.) The caller (deploy_app) uses this to fall back to a chaos base deploy
|
||||||
(which skips lint and deploys the explicitly-checked-out pinned version — see lifecycle.deploy_app).
|
(which skips lint and deploys the explicitly-checked-out pinned version — see lifecycle.deploy_app).
|
||||||
Read-only: just `git tag` + `cat-file -t`; no fetch/mutation, so it can't trigger abra's revert."""
|
Read-only: just `git tag` + `cat-file -t`; no fetch/mutation, so it can't trigger abra's revert."""
|
||||||
import os
|
path = recipe_dir(recipe)
|
||||||
|
|
||||||
path = os.path.expanduser(f"~/.abra/recipes/{recipe}")
|
|
||||||
tags = subprocess.run(
|
tags = subprocess.run(
|
||||||
["git", "-C", path, "tag", "-l"], capture_output=True, text=True
|
["git", "-C", path, "tag", "-l"], capture_output=True, text=True
|
||||||
).stdout.split()
|
).stdout.split()
|
||||||
@ -168,7 +179,9 @@ def secret_generate(domain: str, timeout: int = 300) -> None:
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def deploy(domain: str, chaos: bool = True, timeout: int = 900, no_converge_checks: bool = False) -> None:
|
def deploy(
|
||||||
|
domain: str, chaos: bool = True, timeout: int = 900, no_converge_checks: bool = False
|
||||||
|
) -> None:
|
||||||
args = ["app", "deploy", domain, "-o", "-n"]
|
args = ["app", "deploy", domain, "-o", "-n"]
|
||||||
if chaos:
|
if chaos:
|
||||||
args.append("-C")
|
args.append("-C")
|
||||||
@ -203,7 +216,10 @@ def backup_create(domain: str, timeout: int = 900) -> str:
|
|||||||
# remote and fails "authentication required: Unauthorized". Returns the captured output, whose
|
# remote and fails "authentication required: Unauthorized". Returns the captured output, whose
|
||||||
# restic JSON summary line carries the produced "snapshot_id" (the backup artifact, DG3) — note
|
# restic JSON summary line carries the produced "snapshot_id" (the backup artifact, DG3) — note
|
||||||
# `abra app backup snapshots` needs a TTY and is awkward to script, so we read the create output.
|
# `abra app backup snapshots` needs a TTY and is awkward to script, so we read the create output.
|
||||||
out = _run_pty(["app", "backup", "create", domain, "-n", "-C", "-o"], timeout=timeout).stdout or ""
|
out = (
|
||||||
|
_run_pty(["app", "backup", "create", domain, "-n", "-C", "-o"], timeout=timeout).stdout
|
||||||
|
or ""
|
||||||
|
)
|
||||||
# Echo the backup output (incl. backupbot's pre-hook run / any "Failed to run command" or
|
# Echo the backup output (incl. backupbot's pre-hook run / any "Failed to run command" or
|
||||||
# "Container ... not running" ERROR) into the run log. Backup is otherwise opaque: a pre-hook that
|
# "Container ... not running" ERROR) into the run log. Backup is otherwise opaque: a pre-hook that
|
||||||
# fails to register/run leaves the DB dump out of the snapshot, surfacing only as a downstream
|
# fails to register/run leaves the DB dump out of the snapshot, surfacing only as a downstream
|
||||||
@ -226,9 +242,7 @@ def recipe_head_commit(recipe: str) -> str | None:
|
|||||||
"""The current HEAD commit of the recipe checkout — captured right after fetch (the PR head, or
|
"""The current HEAD commit of the recipe checkout — captured right after fetch (the PR head, or
|
||||||
the catalogue current) so the upgrade tier can re-checkout it for the chaos redeploy after the
|
the catalogue current) so the upgrade tier can re-checkout it for the chaos redeploy after the
|
||||||
prev-tag base deploy reset the working tree (HC1)."""
|
prev-tag base deploy reset the working tree (HC1)."""
|
||||||
import os
|
path = recipe_dir(recipe)
|
||||||
|
|
||||||
path = os.path.expanduser(f"~/.abra/recipes/{recipe}")
|
|
||||||
proc = subprocess.run(["git", "-C", path, "rev-parse", "HEAD"], capture_output=True, text=True)
|
proc = subprocess.run(["git", "-C", path, "rev-parse", "HEAD"], capture_output=True, text=True)
|
||||||
out = proc.stdout.strip()
|
out = proc.stdout.strip()
|
||||||
return out or None
|
return out or None
|
||||||
@ -236,10 +250,7 @@ def recipe_head_commit(recipe: str) -> str | None:
|
|||||||
|
|
||||||
def recipe_versions(recipe: str) -> list[str]:
|
def recipe_versions(recipe: str) -> list[str]:
|
||||||
"""Published versions of a recipe, oldest→newest (from the recipe git tags)."""
|
"""Published versions of a recipe, oldest→newest (from the recipe git tags)."""
|
||||||
import os
|
path = recipe_dir(recipe)
|
||||||
import subprocess
|
|
||||||
|
|
||||||
path = os.path.expanduser(f"~/.abra/recipes/{recipe}")
|
|
||||||
proc = subprocess.run(
|
proc = subprocess.run(
|
||||||
["git", "-C", path, "tag", "--sort=creatordate"], capture_output=True, text=True
|
["git", "-C", path, "tag", "--sort=creatordate"], capture_output=True, text=True
|
||||||
)
|
)
|
||||||
|
|||||||
@ -13,8 +13,15 @@ from __future__ import annotations
|
|||||||
import time
|
import time
|
||||||
|
|
||||||
|
|
||||||
def goto_with_retry(page, url, *, deadline_seconds: int = 120, accept_statuses=(200, 304),
|
def goto_with_retry(
|
||||||
goto_timeout_ms: int = 30_000, wait_until: str = "domcontentloaded"):
|
page,
|
||||||
|
url,
|
||||||
|
*,
|
||||||
|
deadline_seconds: int = 120,
|
||||||
|
accept_statuses=(200, 304),
|
||||||
|
goto_timeout_ms: int = 30_000,
|
||||||
|
wait_until: str = "domcontentloaded",
|
||||||
|
):
|
||||||
"""Poll `page.goto(url)` until status is in `accept_statuses` OR the deadline expires.
|
"""Poll `page.goto(url)` until status is in `accept_statuses` OR the deadline expires.
|
||||||
|
|
||||||
Returns the final Playwright response. Raises AssertionError if the deadline expires without
|
Returns the final Playwright response. Raises AssertionError if the deadline expires without
|
||||||
|
|||||||
@ -55,7 +55,9 @@ def enrolled_recipes() -> list[str]:
|
|||||||
out = []
|
out = []
|
||||||
try:
|
try:
|
||||||
for name in sorted(os.listdir(tests_dir)):
|
for name in sorted(os.listdir(tests_dir)):
|
||||||
if os.path.isfile(os.path.join(tests_dir, name, "recipe_meta.py")) and is_enrolled(name):
|
if os.path.isfile(os.path.join(tests_dir, name, "recipe_meta.py")) and is_enrolled(
|
||||||
|
name
|
||||||
|
):
|
||||||
out.append(name)
|
out.append(name)
|
||||||
except OSError:
|
except OSError:
|
||||||
pass
|
pass
|
||||||
@ -122,11 +124,15 @@ def deploy_canonical(recipe: str, timeout: int = 900) -> None:
|
|||||||
abra.recipe_checkout(recipe, version)
|
abra.recipe_checkout(recipe, version)
|
||||||
r = subprocess.run(
|
r = subprocess.run(
|
||||||
["abra", "app", "deploy", domain, version, "-o", "-n", "-f"],
|
["abra", "app", "deploy", domain, version, "-o", "-n", "-f"],
|
||||||
capture_output=True, text=True, timeout=timeout,
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=timeout,
|
||||||
)
|
)
|
||||||
if r.returncode != 0:
|
if r.returncode != 0:
|
||||||
raise RuntimeError(f"deploy canonical {domain} {version} failed: "
|
raise RuntimeError(
|
||||||
f"{(r.stderr + ' ' + r.stdout).strip()[:300]}")
|
f"deploy canonical {domain} {version} failed: "
|
||||||
|
f"{(r.stderr + ' ' + r.stdout).strip()[:300]}"
|
||||||
|
)
|
||||||
_set_status(recipe, "warm")
|
_set_status(recipe, "warm")
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -79,10 +79,44 @@ def render_badge_svg(label: str, message: str, color: str) -> str:
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def level_badge_svg(level: int, cap_reason: str = "") -> str:
|
# Third-segment colours for the level badge: amber = an UNINTENTIONAL skip (a rung skipped but not
|
||||||
"""Per-recipe/-run LEVEL badge: 'cc-ci | level N'. Colour by level (R6)."""
|
# in the recipe's intentional list — likely missing coverage) capped the climb; muted = an
|
||||||
msg = f"level {int(level)}"
|
# INTENTIONAL skip (declared in recipe_meta.EXPECTED_NA — nothing to fix). Font-safe text labels
|
||||||
return render_badge_svg("cc-ci", msg, level_color(level))
|
# (no emoji) so the SVG renders anywhere.
|
||||||
|
GAP_COLOR = "#d29922"
|
||||||
|
EXPECT_COLOR = "#6e7681"
|
||||||
|
|
||||||
|
|
||||||
|
def level_badge_svg(level: int, cap_reason: str = "", cap_skip: str = "") -> str:
|
||||||
|
"""Per-recipe/-run LEVEL badge: 'cc-ci | level N' coloured by level (R6), with a THIRD segment
|
||||||
|
that differentiates *why* the climb stopped when a SKIP capped it (`cap_skip`):
|
||||||
|
- "unintentional" (a rung skipped but not in the recipe's intentional list): amber 'gap?'.
|
||||||
|
- "intentional" (a skip declared in recipe_meta.EXPECTED_NA): muted 'expected'.
|
||||||
|
- "" (clean cap / full climb / a real failure): no third segment (the level + card carry it).
|
||||||
|
The badge never inflates — it only annotates the cap the level already reflects."""
|
||||||
|
label, msg = "cc-ci", f"level {int(level)}"
|
||||||
|
lw, mw = _text_width(label), _text_width(msg)
|
||||||
|
third: tuple[str, str] | None = None
|
||||||
|
if cap_skip == "unintentional":
|
||||||
|
third = ("gap?", GAP_COLOR)
|
||||||
|
elif cap_skip == "intentional":
|
||||||
|
third = ("expected", EXPECT_COLOR)
|
||||||
|
if third is None:
|
||||||
|
return render_badge_svg(label, msg, level_color(level))
|
||||||
|
txt, tcolor = third
|
||||||
|
tw = _text_width(txt)
|
||||||
|
w = lw + mw + tw
|
||||||
|
return (
|
||||||
|
f'<svg xmlns="http://www.w3.org/2000/svg" width="{w}" height="20" role="img" '
|
||||||
|
f'aria-label="{html.escape(label)}: {html.escape(msg)} ({html.escape(txt)})">'
|
||||||
|
f'<rect width="{lw}" height="20" fill="#555"/>'
|
||||||
|
f'<rect x="{lw}" width="{mw}" height="20" fill="{level_color(level)}"/>'
|
||||||
|
f'<rect x="{lw + mw}" width="{tw}" height="20" fill="{tcolor}"/>'
|
||||||
|
f'<g fill="#fff" font-family="Verdana,Geneva,sans-serif" font-size="11">'
|
||||||
|
f'<text x="6" y="14">{html.escape(label)}</text>'
|
||||||
|
f'<text x="{lw + 6}" y="14">{html.escape(msg)}</text>'
|
||||||
|
f'<text x="{lw + mw + 6}" y="14">{html.escape(txt)}</text></g></svg>'
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def _stage_rows(stages: list[dict]) -> str:
|
def _stage_rows(stages: list[dict]) -> str:
|
||||||
@ -107,6 +141,45 @@ def _stage_rows(stages: list[dict]) -> str:
|
|||||||
return "\n".join(rows) or '<tr><td colspan="3">no stages</td></tr>'
|
return "\n".join(rows) or '<tr><td colspan="3">no stages</td></tr>'
|
||||||
|
|
||||||
|
|
||||||
|
# Friendly rung labels for the skip rows (the four essential rungs).
|
||||||
|
RUNG_LABEL = {
|
||||||
|
"install": "install",
|
||||||
|
"upgrade": "upgrade",
|
||||||
|
"backup_restore": "backup/restore",
|
||||||
|
"functional": "functional",
|
||||||
|
}
|
||||||
|
SKIP_GREEN = (
|
||||||
|
"#57ab5a" # muted green — an intentional skip reads like a pass (but labelled, never inflating)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _skip_rows(skips: dict) -> str:
|
||||||
|
"""Render SKIPPED rungs as stage-like rows. An intentional (declared) skip looks like a pass row
|
||||||
|
but its status says 'INTENTIONAL SKIP' (muted green) with the declared reason on the line below;
|
||||||
|
an unintentional skip is amber 'UNINTENTIONAL SKIP' with a prompt to add a test or declare it."""
|
||||||
|
rows = []
|
||||||
|
for rung, reason in (skips.get("intentional") or {}).items():
|
||||||
|
rows.append(
|
||||||
|
f'<tr class="stage"><td colspan="2"><span class="mark" style="color:{SKIP_GREEN}">⊘</span>'
|
||||||
|
f"<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>"
|
||||||
|
f'<td class="st" style="color:{SKIP_GREEN}">intentional skip</td></tr>'
|
||||||
|
)
|
||||||
|
rows.append(
|
||||||
|
f'<tr class="skipreason"><td></td><td colspan="2">{html.escape(reason)}</td></tr>'
|
||||||
|
)
|
||||||
|
for rung in skips.get("unintentional") or []:
|
||||||
|
rows.append(
|
||||||
|
f'<tr class="stage"><td colspan="2"><span class="mark" style="color:{GAP_COLOR}">⊘</span>'
|
||||||
|
f"<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>"
|
||||||
|
f'<td class="st" style="color:{GAP_COLOR}">unintentional skip</td></tr>'
|
||||||
|
)
|
||||||
|
rows.append(
|
||||||
|
'<tr class="skipreason"><td></td><td colspan="2">not declared in EXPECTED_NA — add the '
|
||||||
|
"missing test/label, or declare the skip with a reason</td></tr>"
|
||||||
|
)
|
||||||
|
return "\n".join(rows)
|
||||||
|
|
||||||
|
|
||||||
def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png") -> str:
|
def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png") -> str:
|
||||||
"""Build the summary-card HTML from a results.json dict. `screenshot_rel` is the relative path to
|
"""Build the summary-card HTML from a results.json dict. `screenshot_rel` is the relative path to
|
||||||
the screenshot PNG (same dir as the card) — omitted from the card if None / absent.
|
the screenshot PNG (same dir as the card) — omitted from the card if None / absent.
|
||||||
@ -116,7 +189,9 @@ def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png")
|
|||||||
recipe = html.escape(str(data.get("recipe", "?")))
|
recipe = html.escape(str(data.get("recipe", "?")))
|
||||||
version = html.escape(str(data.get("version") or data.get("ref") or ""))
|
version = html.escape(str(data.get("version") or data.get("ref") or ""))
|
||||||
level = int(data.get("level", 0))
|
level = int(data.get("level", 0))
|
||||||
cap = html.escape(str(data.get("level_cap_reason") or ""))
|
cap_reason = str(data.get("level_cap_reason") or "")
|
||||||
|
cap = html.escape(cap_reason)
|
||||||
|
sk = data.get("skips", {}) or {}
|
||||||
color = level_color(level)
|
color = level_color(level)
|
||||||
flags = data.get("flags", {}) or {}
|
flags = data.get("flags", {}) or {}
|
||||||
flag_bits = []
|
flag_bits = []
|
||||||
@ -132,7 +207,7 @@ def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png")
|
|||||||
if show_shot
|
if show_shot
|
||||||
else '<div class="shot noshot">no screenshot</div>'
|
else '<div class="shot noshot">no screenshot</div>'
|
||||||
)
|
)
|
||||||
rows = _stage_rows(data.get("stages", []))
|
rows = _stage_rows(data.get("stages", [])) + "\n" + _skip_rows(sk)
|
||||||
return f"""<!doctype html><html><head><meta charset="utf-8"><style>
|
return f"""<!doctype html><html><head><meta charset="utf-8"><style>
|
||||||
*{{box-sizing:border-box}}
|
*{{box-sizing:border-box}}
|
||||||
body{{margin:0;font-family:system-ui,-apple-system,Segoe UI,sans-serif;background:#0d1117;color:#c9d1d9}}
|
body{{margin:0;font-family:system-ui,-apple-system,Segoe UI,sans-serif;background:#0d1117;color:#c9d1d9}}
|
||||||
@ -157,6 +232,7 @@ tr.stage td{{padding-top:.5rem;border-bottom:1px solid #30363d}}
|
|||||||
.test .tmark{{width:1.4rem;text-align:center}}
|
.test .tmark{{width:1.4rem;text-align:center}}
|
||||||
.test .tname{{color:#c9d1d9;font-family:ui-monospace,monospace;font-size:.8rem}}
|
.test .tname{{color:#c9d1d9;font-family:ui-monospace,monospace;font-size:.8rem}}
|
||||||
.test .tms{{text-align:right;color:#8b949e;font-size:.74rem;width:5rem}}
|
.test .tms{{text-align:right;color:#8b949e;font-size:.74rem;width:5rem}}
|
||||||
|
tr.skipreason td{{color:#8b949e;font-size:.78rem;font-style:italic;padding-top:0;padding-bottom:.45rem;border-bottom:1px solid #21262d}}
|
||||||
.shot{{width:360px;flex:none;border:1px solid #30363d;border-radius:8px;overflow:hidden;background:#0d1117}}
|
.shot{{width:360px;flex:none;border:1px solid #30363d;border-radius:8px;overflow:hidden;background:#0d1117}}
|
||||||
.shot img{{width:100%;display:block}}
|
.shot img{{width:100%;display:block}}
|
||||||
.shot.noshot{{display:flex;align-items:center;justify-content:center;height:225px;color:#8b949e;font-size:.85rem}}
|
.shot.noshot{{display:flex;align-items:center;justify-content:center;height:225px;color:#8b949e;font-size:.85rem}}
|
||||||
@ -167,7 +243,7 @@ tr.stage td{{padding-top:.5rem;border-bottom:1px solid #30363d}}
|
|||||||
<div class="hd">{FLOWER_SVG}
|
<div class="hd">{FLOWER_SVG}
|
||||||
<div class="title"><h1>{recipe}</h1><span class="ver">{version}</span></div>
|
<div class="title"><h1>{recipe}</h1><span class="ver">{version}</span></div>
|
||||||
<div class="lvl"><span class="num">{level}</span><span class="lbl">level</span></div></div>
|
<div class="lvl"><span class="num">{level}</span><span class="lbl">level</span></div></div>
|
||||||
<div class="cap">{("<b>capped:</b> " + cap) if cap else "<b>full clean climb</b> — top level (6)"}</div>
|
<div class="cap">{("<b>capped:</b> " + cap) if cap else "<b>full clean climb</b> — top level (4)"}</div>
|
||||||
<div class="body"><div class="tbl"><table>{rows}</table></div>{shot_html}</div>
|
<div class="body"><div class="tbl"><table>{rows}</table></div>{shot_html}</div>
|
||||||
<div class="flags">{"".join(flag_bits)}</div>
|
<div class="flags">{"".join(flag_bits)}</div>
|
||||||
</div></body></html>"""
|
</div></body></html>"""
|
||||||
|
|||||||
@ -28,7 +28,7 @@ from __future__ import annotations
|
|||||||
import contextlib
|
import contextlib
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
from typing import Iterable
|
from collections.abc import Iterable
|
||||||
|
|
||||||
from . import lifecycle, naming
|
from . import lifecycle, naming
|
||||||
|
|
||||||
@ -36,9 +36,7 @@ from . import lifecycle, naming
|
|||||||
def declared_deps(recipe: str) -> list[str]:
|
def declared_deps(recipe: str) -> list[str]:
|
||||||
"""Read `DEPS` from `tests/<recipe>/recipe_meta.py` — a list of recipe names this recipe needs
|
"""Read `DEPS` from `tests/<recipe>/recipe_meta.py` — a list of recipe names this recipe needs
|
||||||
deployed alongside it. Returns [] if none."""
|
deployed alongside it. Returns [] if none."""
|
||||||
path = os.path.join(
|
path = os.path.join(os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py")
|
||||||
os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py"
|
|
||||||
)
|
|
||||||
if not os.path.exists(path):
|
if not os.path.exists(path):
|
||||||
return []
|
return []
|
||||||
ns: dict = {}
|
ns: dict = {}
|
||||||
|
|||||||
@ -25,7 +25,7 @@ _BACKUPBOT_RE = re.compile(r"backupbot\.backup\b[^\n]*\btrue\b", re.IGNORECASE)
|
|||||||
|
|
||||||
|
|
||||||
def _recipe_dir(recipe: str) -> str:
|
def _recipe_dir(recipe: str) -> str:
|
||||||
return os.path.expanduser(f"~/.abra/recipes/{recipe}")
|
return abra.recipe_dir(recipe) # the per-run tree inside a CI run ($ABRA_DIR)
|
||||||
|
|
||||||
|
|
||||||
def backup_capable(recipe: str, meta: dict | None = None) -> bool:
|
def backup_capable(recipe: str, meta: dict | None = None) -> bool:
|
||||||
@ -222,7 +222,11 @@ def assert_restore_healthy(domain: str, meta: dict) -> None:
|
|||||||
|
|
||||||
|
|
||||||
def perform_upgrade(
|
def perform_upgrade(
|
||||||
domain: str, recipe: str, head_ref: str | None, deploy_timeout: int = 900, meta: dict | None = None
|
domain: str,
|
||||||
|
recipe: str,
|
||||||
|
head_ref: str | None,
|
||||||
|
deploy_timeout: int = 900,
|
||||||
|
meta: dict | None = None,
|
||||||
) -> dict[str, str | None]:
|
) -> dict[str, str | None]:
|
||||||
"""Perform the UPGRADE op once, in place, to the PR-HEAD code under test (HC1): re-checkout the
|
"""Perform the UPGRADE op once, in place, to the PR-HEAD code under test (HC1): re-checkout the
|
||||||
PR head (the prev-tag base deploy reset the recipe working tree), then `abra app deploy --chaos`
|
PR head (the prev-tag base deploy reset the recipe working tree), then `abra app deploy --chaos`
|
||||||
@ -267,7 +271,9 @@ def perform_upgrade(
|
|||||||
deploy_timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout)),
|
deploy_timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout)),
|
||||||
http_timeout=int(meta.get("HTTP_TIMEOUT", 300)),
|
http_timeout=int(meta.get("HTTP_TIMEOUT", 300)),
|
||||||
)
|
)
|
||||||
lifecycle.wait_ready_probes(meta, domain, timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout)))
|
lifecycle.wait_ready_probes(
|
||||||
|
meta, domain, timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout))
|
||||||
|
)
|
||||||
after = lifecycle.deployed_identity(domain)
|
after = lifecycle.deployed_identity(domain)
|
||||||
# Evidence (HC1): the chaos-version label = the deployed recipe commit; it should match the
|
# Evidence (HC1): the chaos-version label = the deployed recipe commit; it should match the
|
||||||
# PR-head we checked out — proving the upgrade deployed the code under test, not a published tag.
|
# PR-head we checked out — proving the upgrade deployed the code under test, not a published tag.
|
||||||
|
|||||||
@ -73,7 +73,7 @@ def http_post(
|
|||||||
`data` is JSON-encoded if content_type='application/json',
|
`data` is JSON-encoded if content_type='application/json',
|
||||||
form-encoded if 'application/x-www-form-urlencoded' (the OIDC token endpoint form),
|
form-encoded if 'application/x-www-form-urlencoded' (the OIDC token endpoint form),
|
||||||
or sent raw bytes if data is already bytes."""
|
or sent raw bytes if data is already bytes."""
|
||||||
if isinstance(data, (bytes, bytearray)):
|
if isinstance(data, bytes | bytearray):
|
||||||
body: bytes | None = bytes(data)
|
body: bytes | None = bytes(data)
|
||||||
elif content_type == "application/json" and data is not None:
|
elif content_type == "application/json" and data is not None:
|
||||||
body = json.dumps(data).encode()
|
body = json.dumps(data).encode()
|
||||||
@ -107,7 +107,7 @@ def http_request(
|
|||||||
) -> tuple[int, object | None]:
|
) -> tuple[int, object | None]:
|
||||||
"""Arbitrary-method HTTP (PUT/DELETE/PATCH) for parity tests that mutate. Same shape as
|
"""Arbitrary-method HTTP (PUT/DELETE/PATCH) for parity tests that mutate. Same shape as
|
||||||
http_post (returns (status, json_or_None))."""
|
http_post (returns (status, json_or_None))."""
|
||||||
if isinstance(data, (bytes, bytearray)):
|
if isinstance(data, bytes | bytearray):
|
||||||
body: bytes | None = bytes(data)
|
body: bytes | None = bytes(data)
|
||||||
elif content_type == "application/json" and data is not None:
|
elif content_type == "application/json" and data is not None:
|
||||||
body = json.dumps(data).encode()
|
body = json.dumps(data).encode()
|
||||||
@ -142,7 +142,7 @@ def post_with_headers(
|
|||||||
"""Like http_post but ALSO returns the response headers as a dict — for APIs that hand back an
|
"""Like http_post but ALSO returns the response headers as a dict — for APIs that hand back an
|
||||||
auth token in a response header rather than the body (e.g. mattermost login → `Token` header).
|
auth token in a response header rather than the body (e.g. mattermost login → `Token` header).
|
||||||
Returns (status, parsed_json_or_None, response_headers). status=0 + {} on transport failure."""
|
Returns (status, parsed_json_or_None, response_headers). status=0 + {} on transport failure."""
|
||||||
if isinstance(data, (bytes, bytearray)):
|
if isinstance(data, bytes | bytearray):
|
||||||
body: bytes | None = bytes(data)
|
body: bytes | None = bytes(data)
|
||||||
elif content_type == "application/json" and data is not None:
|
elif content_type == "application/json" and data is not None:
|
||||||
body = json.dumps(data).encode()
|
body = json.dumps(data).encode()
|
||||||
@ -252,13 +252,16 @@ def retry_http_post(
|
|||||||
) -> tuple[int, object | None]:
|
) -> tuple[int, object | None]:
|
||||||
"""POST with retry until expect_fn(status, json) is truthy. Defaults to any 2xx."""
|
"""POST with retry until expect_fn(status, json) is truthy. Defaults to any 2xx."""
|
||||||
if expect_fn is None:
|
if expect_fn is None:
|
||||||
|
|
||||||
def expect_fn(s, _j): # noqa: ARG001
|
def expect_fn(s, _j): # noqa: ARG001
|
||||||
return 200 <= s < 300
|
return 200 <= s < 300
|
||||||
|
|
||||||
result: list[tuple[int, object | None]] = [(0, None)]
|
result: list[tuple[int, object | None]] = [(0, None)]
|
||||||
|
|
||||||
def _check():
|
def _check():
|
||||||
s, j = http_post(url, data=data, headers=headers, content_type=content_type, timeout=timeout)
|
s, j = http_post(
|
||||||
|
url, data=data, headers=headers, content_type=content_type, timeout=timeout
|
||||||
|
)
|
||||||
result[0] = (s, j)
|
result[0] = (s, j)
|
||||||
return expect_fn(s, j)
|
return expect_fn(s, j)
|
||||||
|
|
||||||
|
|||||||
@ -5,37 +5,39 @@ YunoHost semantics: **a gap caps the level** — you only earn level L if every
|
|||||||
PASS. The first rung that is not a clean PASS (a real FAIL *or* genuinely N/A for this recipe) stops
|
PASS. The first rung that is not a clean PASS (a real FAIL *or* genuinely N/A for this recipe) stops
|
||||||
the climb; `cap_reason` records why. This is deliberately conservative: presentation must NEVER make
|
the climb; `cap_reason` records why. This is deliberately conservative: presentation must NEVER make
|
||||||
a run look greener than its tests (plan §6 cardinal guardrail), so an N/A rung caps just like a fail
|
a run look greener than its tests (plan §6 cardinal guardrail), so an N/A rung caps just like a fail
|
||||||
(the L5 example in §4.1 — "recipes with no integration surface cap at L4 by definition" — is exactly
|
— with a recorded reason so the level is *fair*, not inflated.
|
||||||
this: N/A caps, with a recorded reason so the level is *fair*, not inflated).
|
|
||||||
|
|
||||||
The ladder (§4.1):
|
The ladder is the FOUR essential rungs every recipe is held to:
|
||||||
L0 — install failed / app never became healthy.
|
L0 — install failed / app never became healthy.
|
||||||
L1 — Installs: deploys + passes health/readiness.
|
L1 — Installs: deploys + passes health/readiness.
|
||||||
L2 — Upgrades: previous published version → PR version, stays healthy, data intact.
|
L2 — Upgrades: previous published version → PR version, stays healthy, data intact.
|
||||||
L3 — Backup/restore: seeded data survives backup → wipe → restore.
|
L3 — Backup/restore: seeded data survives backup → wipe → restore.
|
||||||
L4 — Functional: recipe-specific functional tests pass.
|
L4 — Functional: recipe-specific functional tests pass.
|
||||||
L5 — Integration: SSO/OIDC + cross-app integration tests pass.
|
|
||||||
L6 — Recipe-local: the recipe repo's own tests/ (D4) pass and are merged.
|
Integration (SSO/OIDC + cross-app) and recipe-local (the recipe repo's own tests/) are **OPTIONAL**
|
||||||
|
capabilities — they are NOT part of the level ladder and never cap it. They still run when present
|
||||||
|
(and SSO is still enforced for the run VERDICT via the deps/SSO checks in run_recipe_ci.py), but a
|
||||||
|
recipe without an SSO surface or without repo-local tests is simply not penalised on the level.
|
||||||
|
|
||||||
This module is PURE (no I/O) so it is cheaply unit-testable and the Adversary can re-run the unit
|
This module is PURE (no I/O) so it is cheaply unit-testable and the Adversary can re-run the unit
|
||||||
test cold (`cc-ci-run -m pytest tests/unit/test_level.py -q`). The orchestrator
|
test cold (`cc-ci-run -m pytest tests/unit/test_level.py -q`). The orchestrator
|
||||||
(`run_recipe_ci.py`) is responsible for translating its raw per-tier results + deps/SSO signals into
|
(`run_recipe_ci.py`) is responsible for translating its raw per-tier results into the rung-status
|
||||||
the rung-status dict this function consumes; that mapping is documented in DECISIONS.md (Phase 3).
|
dict this function consumes; that mapping is documented in DECISIONS.md (Phase 3).
|
||||||
|
|
||||||
Rung status vocabulary (each rung ∈ these three):
|
Rung status vocabulary (each rung ∈ these three):
|
||||||
"pass" — the rung was exercised and passed.
|
"pass" — the rung was exercised and passed.
|
||||||
"fail" — the rung was exercised and failed.
|
"fail" — the rung was exercised and failed.
|
||||||
"na" — the rung does not apply to this recipe (e.g. only one published version → no upgrade;
|
"na" — the rung does not apply to this recipe (e.g. only one published version → no upgrade;
|
||||||
not backup-capable; no SSO/integration surface; no recipe-local tests). N/A is NOT a
|
not backup-capable). N/A is NOT a failure, but it DOES cap the climb (with a distinct
|
||||||
failure, but it DOES cap the climb (with a distinct cap_reason) so the level never
|
cap_reason) so the level never overstates what was actually verified.
|
||||||
overstates what was actually verified.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
# The climbable rungs in ascending order. install (L1) is the foundation; L0 means install itself
|
# The climbable rungs in ascending order. install (L1) is the foundation; L0 means install itself
|
||||||
# did not pass. Each later rung requires every earlier rung to be a clean PASS.
|
# did not pass. Each later rung requires every earlier rung to be a clean PASS. These four are the
|
||||||
RUNGS = ("install", "upgrade", "backup_restore", "functional", "integration", "recipe_local")
|
# ESSENTIAL rungs — integration/recipe-local are optional and deliberately NOT in this tuple.
|
||||||
|
RUNGS = ("install", "upgrade", "backup_restore", "functional")
|
||||||
|
|
||||||
# Human-readable label per rung level, for cap_reason + the summary card.
|
# Human-readable label per rung level, for cap_reason + the summary card.
|
||||||
RUNG_LABEL = {
|
RUNG_LABEL = {
|
||||||
@ -43,22 +45,20 @@ RUNG_LABEL = {
|
|||||||
2: "upgrade (prev published → PR)",
|
2: "upgrade (prev published → PR)",
|
||||||
3: "backup/restore (data integrity)",
|
3: "backup/restore (data integrity)",
|
||||||
4: "functional (recipe-specific tests)",
|
4: "functional (recipe-specific tests)",
|
||||||
5: "integration (SSO/OIDC + cross-app)",
|
|
||||||
6: "recipe-local (recipe repo tests/)",
|
|
||||||
}
|
}
|
||||||
|
|
||||||
VALID = {"pass", "fail", "na"}
|
VALID = {"pass", "fail", "na"}
|
||||||
|
|
||||||
|
|
||||||
def compute_level(rungs: dict[str, str]) -> tuple[int, str]:
|
def compute_level(rungs: dict[str, str]) -> tuple[int, str]:
|
||||||
"""Map a rung-status dict → (level 0..6, cap_reason).
|
"""Map a rung-status dict → (level 0..4, cap_reason).
|
||||||
|
|
||||||
`rungs` must contain a status in {"pass","fail","na"} for every name in RUNGS. The level is the
|
`rungs` must contain a status in {"pass","fail","na"} for every name in RUNGS. The level is the
|
||||||
highest L such that rungs[1..L] are all "pass"; the first non-"pass" rung caps the climb. L0 is
|
highest L such that rungs[1..L] are all "pass"; the first non-"pass" rung caps the climb. L0 is
|
||||||
returned when the install rung itself is not "pass" (install failed / never healthy).
|
returned when the install rung itself is not "pass" (install failed / never healthy).
|
||||||
|
|
||||||
cap_reason explains where the climb stopped:
|
cap_reason explains where the climb stopped:
|
||||||
- "" (empty) when the recipe earned the top rung (L6, full clean climb).
|
- "" (empty) when the recipe earned the top rung (L4, full clean climb).
|
||||||
- "L<k> <label> FAILED" when a rung was exercised and failed.
|
- "L<k> <label> FAILED" when a rung was exercised and failed.
|
||||||
- "L<k> <label> N/A" when a rung does not apply to this recipe.
|
- "L<k> <label> N/A" when a rung does not apply to this recipe.
|
||||||
Returns the reason for the FIRST rung that stopped the climb (the binding constraint).
|
Returns the reason for the FIRST rung that stopped the climb (the binding constraint).
|
||||||
|
|||||||
@ -7,7 +7,8 @@ next run. Callers wrap deploy()/teardown() in try/finally (or a pytest finalizer
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import contextlib
|
import contextlib
|
||||||
import datetime
|
import fcntl
|
||||||
|
import glob
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
import re
|
import re
|
||||||
@ -17,7 +18,7 @@ import subprocess
|
|||||||
import time
|
import time
|
||||||
import urllib.request
|
import urllib.request
|
||||||
|
|
||||||
from . import abra
|
from . import abra, lifetime
|
||||||
|
|
||||||
GATEWAY_IP = "143.244.213.108" # *.ci.commoninternet.net -> gateway (TLS passthrough to cc-ci)
|
GATEWAY_IP = "143.244.213.108" # *.ci.commoninternet.net -> gateway (TLS passthrough to cc-ci)
|
||||||
# A run app domain is "<recipe[:4]>-<6hex>.ci.commoninternet.net" (see DECISIONS.md). Used by the
|
# A run app domain is "<recipe[:4]>-<6hex>.ci.commoninternet.net" (see DECISIONS.md). Used by the
|
||||||
@ -29,6 +30,68 @@ class TeardownError(RuntimeError):
|
|||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
# --- Concurrent-run safety (capacity=2) -------------------------------------------------------
|
||||||
|
# ONE mechanism, process-lifetime-scoped so SIGKILL can't leak a stale claim: every run holds an
|
||||||
|
# exclusive kernel flock on its app DOMAIN (/run/lock/cc-ci-app-<domain>.lock) for the whole run.
|
||||||
|
# A held lock implies a live owner — the kernel releases a flock when the holding process dies,
|
||||||
|
# however it dies. The janitor probes the lock (LOCK_NB) to tell a live concurrent run (held →
|
||||||
|
# leave it) from a crashed run's orphan (acquirable → reap it); it never inspects pids and never
|
||||||
|
# steals a held lock. Recipe-tree corruption between same-recipe runs is gone structurally (each
|
||||||
|
# run deploys from its own per-run ABRA_DIR — there is no shared recipe tree and no recipe lock),
|
||||||
|
# and same-domain runs (double-!testme of one PR) serialise on this app lock.
|
||||||
|
# See docs/concurrency.md.
|
||||||
|
|
||||||
|
# Acquired app-lock file objects are retained here for the REMAINING PROCESS LIFETIME: if the
|
||||||
|
# caller drops the returned file object, GC would close the fd and silently release the lock —
|
||||||
|
# this list is the lock's owner of record. Never cleared; release is process exit.
|
||||||
|
_held_app_locks: list = []
|
||||||
|
|
||||||
|
|
||||||
|
def _app_lock_dir() -> str:
|
||||||
|
"""The app-domain lockfile dir. /run/lock (tmpfs: a reboot clears locks AND lockfiles, so
|
||||||
|
post-reboot apps probe as orphans and are reaped immediately). Env-overridable so the
|
||||||
|
tests/concurrency suite (and its helper subprocesses) can use a sandbox dir."""
|
||||||
|
return os.environ.get("CCCI_APP_LOCK_DIR", "/run/lock")
|
||||||
|
|
||||||
|
|
||||||
|
def _app_lock_path(domain: str) -> str:
|
||||||
|
return os.path.join(_app_lock_dir(), f"cc-ci-app-{domain}.lock")
|
||||||
|
|
||||||
|
|
||||||
|
def acquire_app_lock(domain: str):
|
||||||
|
"""Take the per-app-domain exclusive lock; blocks (with a log line) if another run of the
|
||||||
|
same domain is in flight (double-!testme serialisation). Returns the open lock file, which is
|
||||||
|
ALSO retained in _held_app_locks so the flock lives exactly as long as the process.
|
||||||
|
|
||||||
|
Unlink/recreate race guard: the janitor unlinks a reaped orphan's lockfile while holding its
|
||||||
|
flock, so a waiter blocked on the OLD inode can win a lock no later opener can observe (a new
|
||||||
|
open() at the path creates a FRESH inode). After every acquisition, verify the locked fd is
|
||||||
|
still the file at the path (st_ino match); if not, drop it and retry on the live path."""
|
||||||
|
path = _app_lock_path(domain)
|
||||||
|
waited = False
|
||||||
|
while True:
|
||||||
|
# PEP 446: the fd is non-inheritable, so subprocess children never carry the lock.
|
||||||
|
f = open(path, "a") # noqa: SIM115 — deliberately held for the rest of the process
|
||||||
|
try:
|
||||||
|
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
||||||
|
except BlockingIOError:
|
||||||
|
if not waited:
|
||||||
|
print(f"== app lock: another run of {domain} is in flight — waiting ==", flush=True)
|
||||||
|
waited = True
|
||||||
|
fcntl.flock(f, fcntl.LOCK_EX)
|
||||||
|
try:
|
||||||
|
if os.fstat(f.fileno()).st_ino == os.stat(path).st_ino:
|
||||||
|
break # we hold the lock on the inode the path names — done
|
||||||
|
except FileNotFoundError:
|
||||||
|
pass
|
||||||
|
f.close() # locked a stale (unlinked) inode — retry on the live path
|
||||||
|
os.utime(f.fileno()) # mtime = acquisition time = lock age (janitor's long-held flag)
|
||||||
|
_held_app_locks.append(f)
|
||||||
|
if waited:
|
||||||
|
print(f"== app lock: acquired {path} ==", flush=True)
|
||||||
|
return f
|
||||||
|
|
||||||
|
|
||||||
def _docker_names(kind: str, stack: str) -> list[str]:
|
def _docker_names(kind: str, stack: str) -> list[str]:
|
||||||
"""docker <kind> ls names filtered to a stack (kind: service|volume|secret)."""
|
"""docker <kind> ls names filtered to a stack (kind: service|volume|secret)."""
|
||||||
proc = subprocess.run(
|
proc = subprocess.run(
|
||||||
@ -48,31 +111,6 @@ def _residual(domain: str) -> dict:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def _stack_age_seconds(stack: str) -> float | None:
|
|
||||||
"""Age of the stack's oldest service, or None if not present."""
|
|
||||||
svcs = _docker_names("service", stack)
|
|
||||||
if not svcs:
|
|
||||||
return None
|
|
||||||
oldest = None
|
|
||||||
for s in svcs:
|
|
||||||
p = subprocess.run(
|
|
||||||
["docker", "service", "inspect", s, "--format", "{{.CreatedAt}}"],
|
|
||||||
capture_output=True,
|
|
||||||
text=True,
|
|
||||||
)
|
|
||||||
ts = p.stdout.strip()
|
|
||||||
try:
|
|
||||||
# docker emits e.g. 2026-05-27 00:12:33.123 +0000 UTC -> take the leading 19 chars
|
|
||||||
dt = datetime.datetime.strptime(ts[:19], "%Y-%m-%d %H:%M:%S").replace(
|
|
||||||
tzinfo=datetime.UTC
|
|
||||||
)
|
|
||||||
except ValueError:
|
|
||||||
continue
|
|
||||||
age = (datetime.datetime.now(datetime.UTC) - dt).total_seconds()
|
|
||||||
oldest = age if oldest is None else max(oldest, age)
|
|
||||||
return oldest
|
|
||||||
|
|
||||||
|
|
||||||
def _recipe_extra_env(recipe: str, domain: str) -> dict[str, str]:
|
def _recipe_extra_env(recipe: str, domain: str) -> dict[str, str]:
|
||||||
"""Per-recipe extra .env keys, applied at every deploy (install + upgrade's old_app) so a recipe
|
"""Per-recipe extra .env keys, applied at every deploy (install + upgrade's old_app) so a recipe
|
||||||
with multi-domain / config needs is enrolled with NO shared-harness change (D5/M6.5). A recipe
|
with multi-domain / config needs is enrolled with NO shared-harness change (D5/M6.5). A recipe
|
||||||
@ -149,9 +187,9 @@ def prepull_images(recipe: str, domain: str) -> None:
|
|||||||
app-INIT time (slow-init apps like collabora/immich still need their recipe healthcheck/READY_PROBE).
|
app-INIT time (slow-init apps like collabora/immich still need their recipe healthcheck/READY_PROBE).
|
||||||
Best-effort on resolution failure (skip + let the deploy pull as usual); HARD-fails on a real
|
Best-effort on resolution failure (skip + let the deploy pull as usual); HARD-fails on a real
|
||||||
pull error (don't mask it)."""
|
pull error (don't mask it)."""
|
||||||
import os
|
recipe_dir = abra.recipe_dir(recipe) # per-run tree inside a CI run
|
||||||
|
# The app .env lives in the CANONICAL servers path (the per-run ABRA_DIR's servers/ is a
|
||||||
recipe_dir = os.path.expanduser(f"~/.abra/recipes/{recipe}")
|
# symlink to it, so abra and this path agree on the same file).
|
||||||
env_path = os.path.expanduser(f"~/.abra/servers/default/{domain}.env")
|
env_path = os.path.expanduser(f"~/.abra/servers/default/{domain}.env")
|
||||||
if not os.path.isdir(recipe_dir) or not os.path.isfile(env_path):
|
if not os.path.isdir(recipe_dir) or not os.path.isfile(env_path):
|
||||||
print(f" prepull: recipe dir or .env missing for {recipe} — skipping", flush=True)
|
print(f" prepull: recipe dir or .env missing for {recipe} — skipping", flush=True)
|
||||||
@ -161,7 +199,8 @@ def prepull_images(recipe: str, domain: str) -> None:
|
|||||||
# --env-file supplies $VERSION-style interpolation so pinned tags resolve correctly.
|
# --env-file supplies $VERSION-style interpolation so pinned tags resolve correctly.
|
||||||
cf = subprocess.run(
|
cf = subprocess.run(
|
||||||
["bash", "-c", f'set -a; . "{env_path}"; printf "%s" "${{COMPOSE_FILE:-compose.yml}}"'],
|
["bash", "-c", f'set -a; . "{env_path}"; printf "%s" "${{COMPOSE_FILE:-compose.yml}}"'],
|
||||||
capture_output=True, text=True,
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
).stdout.strip()
|
).stdout.strip()
|
||||||
files = [f for f in cf.split(":") if f] or ["compose.yml"]
|
files = [f for f in cf.split(":") if f] or ["compose.yml"]
|
||||||
args = ["docker", "compose", "--env-file", env_path]
|
args = ["docker", "compose", "--env-file", env_path]
|
||||||
@ -209,6 +248,10 @@ def deploy_app(
|
|||||||
past the 900s default. abra's INTERNAL TIMEOUT (recipe's TIMEOUT env, default 300s) is set via
|
past the 900s default. abra's INTERNAL TIMEOUT (recipe's TIMEOUT env, default 300s) is set via
|
||||||
EXTRA_ENV; this is the Python subprocess wrapper's timeout so abra doesn't get SIGKILLed mid-deploy."""
|
EXTRA_ENV; this is the Python subprocess wrapper's timeout so abra doesn't get SIGKILLed mid-deploy."""
|
||||||
_record_deploy()
|
_record_deploy()
|
||||||
|
# Lock BEFORE the app exists: a concurrent run's janitor must never see this app without a
|
||||||
|
# held app lock (it would probe it as an orphan and reap an in-flight deploy). Also the
|
||||||
|
# double-!testme serialisation point: a second run of the same domain blocks here.
|
||||||
|
acquire_app_lock(domain)
|
||||||
abra.app_config_remove(domain) # clear any stale .env from a prior crashed run
|
abra.app_config_remove(domain) # clear any stale .env from a prior crashed run
|
||||||
abra.app_new(recipe, domain, version=version, secrets=secrets)
|
abra.app_new(recipe, domain, version=version, secrets=secrets)
|
||||||
# A pinned version must actually deploy that version: check the recipe out to the tag so the
|
# A pinned version must actually deploy that version: check the recipe out to the tag so the
|
||||||
@ -268,18 +311,22 @@ def _stack_name(domain: str) -> str:
|
|||||||
|
|
||||||
|
|
||||||
def services_converged(domain: str) -> bool:
|
def services_converged(domain: str) -> bool:
|
||||||
"""True when every service in the stack reports replicas N/N (N>0)."""
|
"""True when every service in the stack reports replicas N/N (N>0) AND no service is
|
||||||
|
mid-rolling-update (swarm UpdateStatus settled)."""
|
||||||
stack = _stack_name(domain)
|
stack = _stack_name(domain)
|
||||||
proc = subprocess.run(
|
proc = subprocess.run(
|
||||||
["docker", "stack", "services", stack, "--format", "{{.Replicas}}"],
|
["docker", "stack", "services", stack, "--format", "{{.Name}} {{.Replicas}}"],
|
||||||
capture_output=True,
|
capture_output=True,
|
||||||
text=True,
|
text=True,
|
||||||
)
|
)
|
||||||
rows = [r for r in proc.stdout.split("\n") if r.strip()]
|
rows = [r for r in proc.stdout.split("\n") if r.strip()]
|
||||||
if not rows:
|
if not rows:
|
||||||
return False
|
return False
|
||||||
|
names = []
|
||||||
for r in rows:
|
for r in rows:
|
||||||
cur, _, want = r.partition("/")
|
name, _, replicas = r.partition(" ")
|
||||||
|
names.append(name)
|
||||||
|
cur, _, want = replicas.partition("/")
|
||||||
# A service at its DESIRED replica count is converged — including a `replicas: 0`
|
# A service at its DESIRED replica count is converged — including a `replicas: 0`
|
||||||
# on-demand one-shot (e.g. lasuite-drive's `minio-createbuckets`, which is scaled up
|
# on-demand one-shot (e.g. lasuite-drive's `minio-createbuckets`, which is scaled up
|
||||||
# manually only when buckets need (re)creating), which reports "0/0". The earlier
|
# manually only when buckets need (re)creating), which reports "0/0". The earlier
|
||||||
@ -288,6 +335,34 @@ def services_converged(domain: str) -> bool:
|
|||||||
# still spinning up shows e.g. "0/1" (cur != want) and is correctly not-yet-converged.
|
# still spinning up shows e.g. "0/1" (cur != want) and is correctly not-yet-converged.
|
||||||
if not want or cur != want:
|
if not want or cur != want:
|
||||||
return False
|
return False
|
||||||
|
# N/N alone is NOT convergence during a stop-first rolling update: a chaos redeploy that changes
|
||||||
|
# a non-app service image (e.g. immich's db pin) registers the update immediately, but swarm may
|
||||||
|
# not have cycled that service's task yet — the OLD task still shows 1/1, then dies seconds later
|
||||||
|
# (immich CI 238: backupbot exec'd the db pre-hook into the just-killed container → 409). Require
|
||||||
|
# every service's UpdateStatus to be settled too, so the wait spans the whole rolling update.
|
||||||
|
proc = subprocess.run(
|
||||||
|
[
|
||||||
|
"docker",
|
||||||
|
"service",
|
||||||
|
"inspect",
|
||||||
|
*names,
|
||||||
|
"--format",
|
||||||
|
"{{if .UpdateStatus}}{{.UpdateStatus.State}}{{end}}",
|
||||||
|
],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
)
|
||||||
|
if proc.returncode != 0:
|
||||||
|
return False # a service vanished mid-check — not settled
|
||||||
|
for state in proc.stdout.split("\n"):
|
||||||
|
# Only ACTIVE states block convergence. 'paused'/'rollback_paused' are terminal-without-
|
||||||
|
# intervention: swarm's default update-failure-action pauses the update on one task flicker
|
||||||
|
# and the flag then persists FOREVER (immich CI 241: app service 'paused' from a restart
|
||||||
|
# during restore, service back at 1/1 and healthy — the wait hung to its deadline). With
|
||||||
|
# N/N already required above, a paused update is settled for our purposes; the HTTP-health
|
||||||
|
# and tier assertions still gate whether the app actually works.
|
||||||
|
if state.strip() in ("updating", "rollback_started"):
|
||||||
|
return False
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
@ -415,7 +490,9 @@ def recipe_checkout_ref(recipe: str, ref: str) -> None:
|
|||||||
abra.recipe_checkout(recipe, ref)
|
abra.recipe_checkout(recipe, ref)
|
||||||
|
|
||||||
|
|
||||||
def chaos_redeploy(domain: str, deploy_timeout: int = 900, no_converge_checks: bool = False) -> None:
|
def chaos_redeploy(
|
||||||
|
domain: str, deploy_timeout: int = 900, no_converge_checks: bool = False
|
||||||
|
) -> None:
|
||||||
"""In-place `abra app deploy --chaos`: redeploy the running app at the CURRENT recipe checkout
|
"""In-place `abra app deploy --chaos`: redeploy the running app at the CURRENT recipe checkout
|
||||||
(HC1: the PR-head code under test). This is the upgrade op, not a fresh install — it does NOT go
|
(HC1: the PR-head code under test). This is the upgrade op, not a fresh install — it does NOT go
|
||||||
through deploy_app, so the deploy-count guard (DG4.1) is not incremented.
|
through deploy_app, so the deploy-count guard (DG4.1) is not incremented.
|
||||||
@ -498,6 +575,16 @@ def wait_ready_probes(meta: dict, domain: str, timeout: int = 600) -> None:
|
|||||||
|
|
||||||
def backup_app(domain: str) -> str:
|
def backup_app(domain: str) -> str:
|
||||||
"""Create a backup; return the abra/restic output (carries the produced snapshot_id)."""
|
"""Create a backup; return the abra/restic output (carries the produced snapshot_id)."""
|
||||||
|
# Never back up a stack that is still converging/rolling-updating: backupbot resolves each
|
||||||
|
# service's hook container ONCE up front, so a task that cycles between that lookup and the
|
||||||
|
# pre-hook exec crashes the whole backup with a 409 (immich CI 238). Bounded wait — on timeout
|
||||||
|
# we still attempt the backup and let the tier's assertion deliver the verdict.
|
||||||
|
deadline = time.time() + 300
|
||||||
|
while time.time() < deadline and not services_converged(domain):
|
||||||
|
print(
|
||||||
|
f" backup: {domain} stack not settled yet — waiting before backup create", flush=True
|
||||||
|
)
|
||||||
|
time.sleep(5)
|
||||||
return abra.backup_create(domain)
|
return abra.backup_create(domain)
|
||||||
|
|
||||||
|
|
||||||
@ -603,17 +690,84 @@ def teardown_app(domain: str, verify: bool = True) -> None:
|
|||||||
residual = _residual(domain)
|
residual = _residual(domain)
|
||||||
if any(residual.values()):
|
if any(residual.values()):
|
||||||
raise TeardownError(f"teardown left residual for {domain}: {residual}")
|
raise TeardownError(f"teardown left residual for {domain}: {residual}")
|
||||||
|
# No unregistration step: the app lock releases implicitly at process exit. The clean run's
|
||||||
|
# leftover lockfile (unheld) is unlinked on sight by the next janitor's stale-lockfile sweep.
|
||||||
|
|
||||||
|
|
||||||
def janitor(max_age_seconds: int | None = None) -> None:
|
# A lock held longer than 2x the 60-min hard deadline can only be a leaked run (the deadline
|
||||||
"""Reap orphaned run apps from crashed/rebooted runs. Matches the real naming scheme and only
|
# bounds every healthy run). Flag it for a human — NEVER steal a held lock.
|
||||||
reaps apps older than max_age_seconds (so concurrent in-flight runs are never killed). Reaps via
|
LONG_HELD_LOCK_SECONDS = 2 * lifetime.HARD_DEADLINE_SECONDS
|
||||||
docker primitives so it works even when the .env is gone (A2/A3). Default 2h, env-overridable
|
|
||||||
via CCCI_JANITOR_MAX_AGE (e.g. 0 to reap all matching orphans immediately)."""
|
|
||||||
import os
|
|
||||||
|
|
||||||
if max_age_seconds is None:
|
|
||||||
max_age_seconds = int(os.environ.get("CCCI_JANITOR_MAX_AGE", "7200"))
|
def _probe_and_reap(domain: str) -> None:
|
||||||
|
"""Probe one run app's lock; reap iff nobody holds it (kernel-guaranteed orphan).
|
||||||
|
|
||||||
|
Reaping happens WHILE HOLDING the probe lock, closing the janitor-vs-new-run race: a new run
|
||||||
|
of the same domain blocks in acquire_app_lock until the reap finishes, so a fresh app never
|
||||||
|
coexists with a half-reaped one. The lockfile is unlinked before release (still holding the
|
||||||
|
lock); a waiter that blocked on the unlinked inode re-checks identity and retries. Two racing
|
||||||
|
janitors arbitrate on the same flock: one reaps, the other sees 'held' and leaves —
|
||||||
|
teardown_app(verify=False) is idempotent either way."""
|
||||||
|
path = _app_lock_path(domain)
|
||||||
|
try:
|
||||||
|
# PEP 446: non-inheritable fd, same as acquire_app_lock.
|
||||||
|
f = open(path, "a") # noqa: SIM115 — closed in the finally below, lock released with it
|
||||||
|
except OSError as e:
|
||||||
|
print(f"!! janitor: cannot open lockfile {path} ({e}) — skipping {domain}", flush=True)
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
try:
|
||||||
|
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
||||||
|
except BlockingIOError:
|
||||||
|
# Held -> live run. Never steal; flag if it has been held implausibly long.
|
||||||
|
try:
|
||||||
|
held_for = time.time() - os.stat(path).st_mtime
|
||||||
|
except OSError:
|
||||||
|
held_for = 0
|
||||||
|
if held_for > LONG_HELD_LOCK_SECONDS:
|
||||||
|
print(
|
||||||
|
f"!! lock for {domain} held >{LONG_HELD_LOCK_SECONDS // 60}min — possible "
|
||||||
|
"leaked run; inspect with lslocks",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
print(
|
||||||
|
f" janitor: {domain} lock held — live concurrent run, leaving it", flush=True
|
||||||
|
)
|
||||||
|
return
|
||||||
|
# Acquired — but only the inode the PATH names counts (another janitor may have reaped
|
||||||
|
# and unlinked this inode while we raced; a lock on an unlinked inode protects nothing
|
||||||
|
# and unlinking the path now would delete a NEWER run's lockfile).
|
||||||
|
try:
|
||||||
|
if os.fstat(f.fileno()).st_ino != os.stat(path).st_ino:
|
||||||
|
return
|
||||||
|
except FileNotFoundError:
|
||||||
|
return
|
||||||
|
# Orphan: no live owner (the kernel released the lock when the owner died). Reap while
|
||||||
|
# holding the probe lock, then unlink the lockfile before releasing.
|
||||||
|
print(f" janitor: {domain} lock acquirable — orphan, reaping", flush=True)
|
||||||
|
with contextlib.suppress(Exception):
|
||||||
|
teardown_app(domain, verify=False)
|
||||||
|
with contextlib.suppress(OSError):
|
||||||
|
os.unlink(path)
|
||||||
|
finally:
|
||||||
|
f.close()
|
||||||
|
|
||||||
|
|
||||||
|
def janitor() -> None:
|
||||||
|
"""Reap orphaned run apps from crashed/rebooted runs; the kernel flock is the only liveness
|
||||||
|
oracle. For every candidate run app, probe its app-domain lock (LOCK_NB):
|
||||||
|
|
||||||
|
acquirable -> nobody holds it -> orphan -> reap under the probe lock + unlink lockfile
|
||||||
|
held -> live concurrent run -> leave it (warn if held >2x the hard deadline)
|
||||||
|
|
||||||
|
Candidate discovery is unchanged: `abra app ls` + a docker-service sweep (catches stacks
|
||||||
|
whose .env is already gone), both matched against RUN_APP_RE — warm/canonical apps never
|
||||||
|
match and are never probed. Post-reboot, /run/lock (tmpfs) is empty, so every surviving app
|
||||||
|
probes as an orphan and is reaped immediately (no age threshold). Stale lockfiles with no
|
||||||
|
app behind them are unlinked on sight. Degrades safely: an unreadable lockfile/dir is
|
||||||
|
skipped with a log line, never a crash. Reaps via docker primitives so it works even when
|
||||||
|
the .env is gone (A2/A3)."""
|
||||||
seen = set()
|
seen = set()
|
||||||
for app in abra.app_ls():
|
for app in abra.app_ls():
|
||||||
name = app.get("appName") or app.get("domain") or ""
|
name = app.get("appName") or app.get("domain") or ""
|
||||||
@ -627,9 +781,22 @@ def janitor(max_age_seconds: int | None = None) -> None:
|
|||||||
seen.add(f"{m.group(1)}.ci.commoninternet.net")
|
seen.add(f"{m.group(1)}.ci.commoninternet.net")
|
||||||
|
|
||||||
for name in seen:
|
for name in seen:
|
||||||
stack = _stack_name(name)
|
_probe_and_reap(name)
|
||||||
age = _stack_age_seconds(stack)
|
|
||||||
if age is not None and age < max_age_seconds:
|
# Tidy /run/lock: a clean run's leftover lockfile is unheld and appless — unlink it (under
|
||||||
continue # likely a concurrent in-flight run; leave it
|
# its own probe lock, with the same identity check as above).
|
||||||
with contextlib.suppress(Exception):
|
with contextlib.suppress(OSError):
|
||||||
teardown_app(name, verify=False)
|
for path in glob.glob(os.path.join(_app_lock_dir(), "cc-ci-app-*.lock")):
|
||||||
|
domain = os.path.basename(path)[len("cc-ci-app-") : -len(".lock")]
|
||||||
|
if domain in seen:
|
||||||
|
continue # handled (or deliberately left) above
|
||||||
|
with contextlib.suppress(OSError):
|
||||||
|
f = open(path, "a") # noqa: SIM115 — closed below, lock released with it
|
||||||
|
try:
|
||||||
|
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
||||||
|
if os.fstat(f.fileno()).st_ino == os.stat(path).st_ino:
|
||||||
|
os.unlink(path)
|
||||||
|
except (BlockingIOError, FileNotFoundError):
|
||||||
|
pass # held (live run pre-deploy) or already gone — leave it
|
||||||
|
finally:
|
||||||
|
f.close()
|
||||||
|
|||||||
95
runner/harness/lifetime.py
Normal file
95
runner/harness/lifetime.py
Normal file
@ -0,0 +1,95 @@
|
|||||||
|
"""Run-lifetime hardening (concurrency restructure P1).
|
||||||
|
|
||||||
|
The concurrency model's invariant chain is:
|
||||||
|
|
||||||
|
lock lifetime ⊆ harness process lifetime ⊆ drone step lifetime ⊆ 60-min hard deadline
|
||||||
|
|
||||||
|
Locks are kernel flocks released on process exit, so the only thing that needs managing is the
|
||||||
|
PROCESS lifetime. Three guards, installed at run startup (before any abra call) by
|
||||||
|
`install_lifetime_guards()`:
|
||||||
|
|
||||||
|
1. `PR_SET_PDEATHSIG(SIGTERM)`: if the parent (the drone step shell) dies — cancel, runner
|
||||||
|
crash, host shutdown of the step — the kernel delivers SIGTERM to the harness, so a dead
|
||||||
|
build can never leak a running harness that holds locks. Paired with a ppid==1 re-check
|
||||||
|
AFTER the prctl: a parent that died BEFORE the prctl took effect would never trigger the
|
||||||
|
death signal, so a harness that finds itself already reparented refuses to run.
|
||||||
|
2. SIGTERM handler: raise SystemExit so the run's `finally:` teardown funnel executes and the
|
||||||
|
process exits non-zero. Re-entrant deliveries during teardown are logged and IGNORED so a
|
||||||
|
second signal can't abort the cleanup the first one asked for (`begin_teardown()` guards
|
||||||
|
this; the run's own `finally:` blocks also call it so a signal landing mid-normal-teardown
|
||||||
|
can't abort that either).
|
||||||
|
3. `signal.alarm(3600)`: self-imposed hard deadline. SIGALRM funnels into the same teardown
|
||||||
|
path with a distinct log line. Teardown time after the deadline is not alarm-bounded —
|
||||||
|
interrupting a teardown buys nothing; the janitor (flock probe) is the backstop if a
|
||||||
|
teardown wedges and the process is killed harder.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import ctypes
|
||||||
|
import os
|
||||||
|
import signal
|
||||||
|
import sys
|
||||||
|
|
||||||
|
HARD_DEADLINE_SECONDS = 60 * 60
|
||||||
|
|
||||||
|
_PR_SET_PDEATHSIG = 1 # linux/prctl.h
|
||||||
|
|
||||||
|
_state = {"tearing_down": False}
|
||||||
|
|
||||||
|
|
||||||
|
def begin_teardown() -> None:
|
||||||
|
"""Mark the teardown funnel as running. From here on SIGTERM/SIGALRM must NOT raise — it
|
||||||
|
would abort the very cleanup it asks for — so the handlers log and return instead. Called by
|
||||||
|
the handlers themselves before raising, and at the top of the run's `finally:` blocks."""
|
||||||
|
_state["tearing_down"] = True
|
||||||
|
|
||||||
|
|
||||||
|
def _funnel_handler(log_line: str, exit_code: int):
|
||||||
|
"""A signal handler that routes into the teardown funnel exactly once: log, then raise
|
||||||
|
SystemExit (propagates through the run's try/finally → teardown executes → non-zero exit).
|
||||||
|
While teardown is already running, further signals are logged and swallowed."""
|
||||||
|
|
||||||
|
def handler(signum: int, frame) -> None: # noqa: ARG001
|
||||||
|
print(log_line, flush=True)
|
||||||
|
if _state["tearing_down"]:
|
||||||
|
print(
|
||||||
|
f"== signal {signum} during teardown — ignored (teardown continues, "
|
||||||
|
"exit stays non-zero) ==",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
return
|
||||||
|
begin_teardown()
|
||||||
|
raise SystemExit(exit_code)
|
||||||
|
|
||||||
|
return handler
|
||||||
|
|
||||||
|
|
||||||
|
def install_lifetime_guards(deadline_seconds: int = HARD_DEADLINE_SECONDS) -> None:
|
||||||
|
"""Install all three lifetime guards (see module docstring). Must run at harness startup,
|
||||||
|
before any abra call and before any lock is taken."""
|
||||||
|
libc = ctypes.CDLL("libc.so.6", use_errno=True)
|
||||||
|
if libc.prctl(_PR_SET_PDEATHSIG, signal.SIGTERM, 0, 0, 0) != 0:
|
||||||
|
err = ctypes.get_errno()
|
||||||
|
raise OSError(err, f"prctl(PR_SET_PDEATHSIG, SIGTERM) failed: {os.strerror(err)}")
|
||||||
|
# The prctl is armed now — but only fires for a parent death AFTER this point. If the parent
|
||||||
|
# already died, we are reparented (ppid 1) and would never get the signal: refuse to run, an
|
||||||
|
# orphaned harness would hold locks/apps with nothing managing its lifetime.
|
||||||
|
if os.getppid() == 1:
|
||||||
|
sys.exit("parent died before prctl(PR_SET_PDEATHSIG) — refusing to run orphaned")
|
||||||
|
signal.signal(
|
||||||
|
signal.SIGTERM,
|
||||||
|
_funnel_handler(
|
||||||
|
"== SIGTERM received (drone cancel / parent death) — tearing down ==",
|
||||||
|
128 + signal.SIGTERM,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
minutes = deadline_seconds // 60
|
||||||
|
signal.signal(
|
||||||
|
signal.SIGALRM,
|
||||||
|
_funnel_handler(
|
||||||
|
f"== run exceeded {minutes}-minute hard deadline — tearing down ==",
|
||||||
|
128 + signal.SIGALRM,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
signal.alarm(deadline_seconds)
|
||||||
@ -2,7 +2,14 @@
|
|||||||
|
|
||||||
Turns a run's per-tier pytest outcomes into a single `results.json` artifact carrying, per the plan:
|
Turns a run's per-tier pytest outcomes into a single `results.json` artifact carrying, per the plan:
|
||||||
{ recipe, version, pr, ref, run_id, finished, stages:[{name,status,tests:[{name,status,ms}]}],
|
{ recipe, version, pr, ref, run_id, finished, stages:[{name,status,tests:[{name,status,ms}]}],
|
||||||
level, level_cap_reason, rungs, flags:{clean_teardown,no_secret_leak}, screenshot, summary_card }
|
level, level_cap_reason, level_cap_rung, rungs,
|
||||||
|
skips:{intentional:{rung:reason}, unintentional:[rung]},
|
||||||
|
flags:{clean_teardown,no_secret_leak}, screenshot, summary_card }
|
||||||
|
|
||||||
|
`skips` splits the N/A (skipped) rungs by a simple rule: a skip is INTENTIONAL iff the recipe lists
|
||||||
|
it (with a reason) in `recipe_meta.EXPECTED_NA = {rung: reason}`; any rung skipped but not listed is
|
||||||
|
UNINTENTIONAL (a coverage gap to fill or declare). Skips still cap the level either way — the harness
|
||||||
|
never claims a rung it did not verify; this only labels *why* a skip happened.
|
||||||
|
|
||||||
The per-test breakdown comes from JUnit XML emitted by each tier's pytest invocation (`--junitxml`),
|
The per-test breakdown comes from JUnit XML emitted by each tier's pytest invocation (`--junitxml`),
|
||||||
parsed here with the stdlib (no new dep). The integer **level** is computed by harness.level from a
|
parsed here with the stdlib (no new dep). The integer **level** is computed by harness.level from a
|
||||||
@ -127,41 +134,24 @@ def collect_stages(records: list[dict]) -> list[dict]:
|
|||||||
return stages
|
return stages
|
||||||
|
|
||||||
|
|
||||||
def _has_repo_local(records: list[dict]) -> bool:
|
|
||||||
return any(r.get("source") == "repo-local" for r in records)
|
|
||||||
|
|
||||||
|
|
||||||
def _repo_local_passed(records: list[dict]) -> bool:
|
|
||||||
repo = [r for r in records if r.get("source") == "repo-local"]
|
|
||||||
return bool(repo) and all(r.get("rc", 1) == 0 for r in repo)
|
|
||||||
|
|
||||||
|
|
||||||
def derive_rungs(
|
def derive_rungs(
|
||||||
results: dict[str, str],
|
results: dict[str, str],
|
||||||
*,
|
*,
|
||||||
backup_capable: bool,
|
backup_capable: bool,
|
||||||
declared: list[str] | None,
|
|
||||||
deps_ready: bool,
|
|
||||||
sso_unverified: bool,
|
|
||||||
has_custom: bool,
|
has_custom: bool,
|
||||||
has_repo_local: bool,
|
|
||||||
repo_local_passed: bool,
|
|
||||||
) -> dict[str, str]:
|
) -> dict[str, str]:
|
||||||
"""Translate the orchestrator's tier results + deps/SSO signals into the rung-status dict
|
"""Translate the orchestrator's tier results into the rung-status dict harness.level consumes —
|
||||||
harness.level consumes. Documented in DECISIONS.md (Phase 3). Conservative by design — never
|
the FOUR essential rungs only. Conservative by design — never reports a rung 'pass' it can't
|
||||||
reports a rung 'pass' it can't substantiate (cardinal guardrail: presentation never inflates).
|
substantiate (cardinal guardrail: presentation never inflates).
|
||||||
|
|
||||||
L1 install : install tier pass.
|
L1 install : install tier pass.
|
||||||
L2 upgrade : upgrade tier (skip → N/A: only one published version).
|
L2 upgrade : upgrade tier (skip → N/A: only one published version).
|
||||||
L3 backup/res : backup AND restore tiers pass (N/A if not backup-capable).
|
L3 backup/res : backup AND restore tiers pass (N/A if not backup-capable).
|
||||||
L4 functional : the recipe-specific functional (non-deps) tests pass — the custom tier, minus
|
L4 functional : recipe-specific functional tests pass — the custom tier. N/A if none ran.
|
||||||
its SSO/integration tests. N/A if the recipe has no custom tests at all.
|
|
||||||
L5 integration: SSO/OIDC + cross-app. Applies ONLY if the recipe declares deps (else N/A — the
|
Integration (SSO/OIDC) and recipe-local are OPTIONAL and intentionally NOT rungs here — they
|
||||||
"no integration surface caps at L4" rule, §4.1). pass iff deps wired
|
never cap the level (SSO is still enforced for the run VERDICT in run_recipe_ci.py).
|
||||||
(deps_ready) and not sso_unverified and the custom tier didn't fail.
|
|
||||||
L6 recipe-loc : the recipe repo's own tests/ (repo-local source) ran and passed (N/A if none).
|
|
||||||
"""
|
"""
|
||||||
declared = declared or []
|
|
||||||
rungs: dict[str, str] = {}
|
rungs: dict[str, str] = {}
|
||||||
rungs["install"] = level_mod.tier_to_rung(results.get("install"))
|
rungs["install"] = level_mod.tier_to_rung(results.get("install"))
|
||||||
rungs["upgrade"] = level_mod.tier_to_rung(results.get("upgrade"))
|
rungs["upgrade"] = level_mod.tier_to_rung(results.get("upgrade"))
|
||||||
@ -170,36 +160,34 @@ def derive_rungs(
|
|||||||
)
|
)
|
||||||
|
|
||||||
custom = results.get("custom")
|
custom = results.get("custom")
|
||||||
# Functional rung (L4): the non-deps custom tests.
|
|
||||||
if not has_custom or custom == "skip" or custom is None:
|
if not has_custom or custom == "skip" or custom is None:
|
||||||
rungs["functional"] = "na"
|
rungs["functional"] = "na"
|
||||||
elif custom == "fail":
|
elif custom == "fail":
|
||||||
# A custom test failed. With declared deps we cannot cheaply tell functional-vs-SSO apart, so
|
|
||||||
# conservatively fail the functional rung (caps at L3) — never inflate.
|
|
||||||
rungs["functional"] = "fail"
|
rungs["functional"] = "fail"
|
||||||
else: # custom == "pass"
|
else: # custom == "pass"
|
||||||
rungs["functional"] = "pass"
|
rungs["functional"] = "pass"
|
||||||
|
|
||||||
# Integration rung (L5): only recipes with an SSO/integration surface (declared deps) can climb.
|
|
||||||
if not declared:
|
|
||||||
rungs["integration"] = "na"
|
|
||||||
elif sso_unverified or not deps_ready or custom == "fail":
|
|
||||||
# SSO not wired/verified, or a custom test failed → integration not verified.
|
|
||||||
rungs["integration"] = "fail"
|
|
||||||
elif custom == "pass":
|
|
||||||
rungs["integration"] = "pass"
|
|
||||||
else:
|
|
||||||
# declared deps but no custom tests ran — can't claim integration verified
|
|
||||||
rungs["integration"] = "na"
|
|
||||||
|
|
||||||
# Recipe-local rung (L6).
|
|
||||||
if not has_repo_local:
|
|
||||||
rungs["recipe_local"] = "na"
|
|
||||||
else:
|
|
||||||
rungs["recipe_local"] = "pass" if repo_local_passed else "fail"
|
|
||||||
return rungs
|
return rungs
|
||||||
|
|
||||||
|
|
||||||
|
def skips(rungs: dict[str, str], expected_na: dict | None) -> dict:
|
||||||
|
"""Split the SKIPPED (N/A) rungs into intentional vs unintentional (operator model).
|
||||||
|
|
||||||
|
A recipe lists the rungs it intentionally skips, each with a reason, in
|
||||||
|
`recipe_meta.EXPECTED_NA = {rung: reason}`. The rule is dead simple: a skipped rung is
|
||||||
|
**intentional** iff it is in that list; any rung that is skipped and NOT in the list is
|
||||||
|
**unintentional** (a coverage gap someone should either fill or declare). N/A still caps the
|
||||||
|
level either way — the harness never claims a rung it did not verify — this only labels *why* a
|
||||||
|
skip happened. Returns:
|
||||||
|
{ "intentional": {rung: reason, ...}, # skipped AND declared in EXPECTED_NA
|
||||||
|
"unintentional": [rung, ...] } # skipped but NOT declared
|
||||||
|
"""
|
||||||
|
expected = {str(k): str(v) for k, v in (expected_na or {}).items()}
|
||||||
|
na = [r for r, st in rungs.items() if st == "na"]
|
||||||
|
intentional = {r: expected[r] for r in na if r in expected}
|
||||||
|
unintentional = sorted(r for r in na if r not in expected)
|
||||||
|
return {"intentional": intentional, "unintentional": unintentional}
|
||||||
|
|
||||||
|
|
||||||
def build_results(
|
def build_results(
|
||||||
*,
|
*,
|
||||||
recipe: str,
|
recipe: str,
|
||||||
@ -209,30 +197,24 @@ def build_results(
|
|||||||
records: list[dict],
|
records: list[dict],
|
||||||
results: dict[str, str],
|
results: dict[str, str],
|
||||||
backup_capable: bool,
|
backup_capable: bool,
|
||||||
declared: list[str] | None,
|
|
||||||
deps_ready: bool,
|
|
||||||
sso_unverified: bool,
|
|
||||||
clean_teardown: bool,
|
clean_teardown: bool,
|
||||||
no_secret_leak: bool,
|
no_secret_leak: bool,
|
||||||
finished_ts: float | None,
|
finished_ts: float | None,
|
||||||
screenshot: str | None = None,
|
screenshot: str | None = None,
|
||||||
summary_card: str | None = None,
|
summary_card: str | None = None,
|
||||||
|
expected_na: dict | None = None,
|
||||||
) -> dict:
|
) -> dict:
|
||||||
"""Assemble the full results.json dict (no I/O). `finished_ts` is passed in (the orchestrator
|
"""Assemble the full results.json dict (no I/O). `finished_ts` is passed in (the orchestrator
|
||||||
stamps it) so this stays pure and deterministic for unit tests."""
|
stamps it) so this stays pure and deterministic for unit tests. `expected_na` is the recipe's
|
||||||
|
declared intentional-skip map (recipe_meta.EXPECTED_NA) used to distinguish a deliberate skip from
|
||||||
|
accidentally-missing coverage."""
|
||||||
stages = collect_stages(records)
|
stages = collect_stages(records)
|
||||||
has_custom = any(r["tier"] == "custom" for r in records)
|
has_custom = any(r["tier"] == "custom" for r in records)
|
||||||
rungs = derive_rungs(
|
rungs = derive_rungs(results, backup_capable=backup_capable, has_custom=has_custom)
|
||||||
results,
|
|
||||||
backup_capable=backup_capable,
|
|
||||||
declared=declared,
|
|
||||||
deps_ready=deps_ready,
|
|
||||||
sso_unverified=sso_unverified,
|
|
||||||
has_custom=has_custom,
|
|
||||||
has_repo_local=_has_repo_local(records),
|
|
||||||
repo_local_passed=_repo_local_passed(records),
|
|
||||||
)
|
|
||||||
lvl, cap_reason = level_mod.compute_level(rungs)
|
lvl, cap_reason = level_mod.compute_level(rungs)
|
||||||
|
# The rung that capped the climb (lowest non-pass), or None on a full climb — lets a consumer
|
||||||
|
# (card/badge) tell whether the cap was an intentional skip, an unintentional one, or a failure.
|
||||||
|
capped = level_mod.RUNGS[lvl] if cap_reason else None
|
||||||
return {
|
return {
|
||||||
"schema": 1,
|
"schema": 1,
|
||||||
"run_id": run_id(),
|
"run_id": run_id(),
|
||||||
@ -243,7 +225,9 @@ def build_results(
|
|||||||
"finished": finished_ts,
|
"finished": finished_ts,
|
||||||
"level": lvl,
|
"level": lvl,
|
||||||
"level_cap_reason": cap_reason,
|
"level_cap_reason": cap_reason,
|
||||||
|
"level_cap_rung": capped,
|
||||||
"rungs": rungs,
|
"rungs": rungs,
|
||||||
|
"skips": skips(rungs, expected_na),
|
||||||
"stages": stages,
|
"stages": stages,
|
||||||
"results": results,
|
"results": results,
|
||||||
"flags": {
|
"flags": {
|
||||||
|
|||||||
@ -113,7 +113,9 @@ def _assert_undeployed(domain: str) -> None:
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def snapshot(recipe: str, domain: str, commit: str | None = None, version: str | None = None) -> dict:
|
def snapshot(
|
||||||
|
recipe: str, domain: str, commit: str | None = None, version: str | None = None
|
||||||
|
) -> dict:
|
||||||
"""Take a last-known-good snapshot of every data volume of <domain>'s stack. The app MUST be
|
"""Take a last-known-good snapshot of every data volume of <domain>'s stack. The app MUST be
|
||||||
undeployed. Atomically replaces the prior last-good. Returns the written meta dict."""
|
undeployed. Atomically replaces the prior last-good. Returns the written meta dict."""
|
||||||
_assert_undeployed(domain)
|
_assert_undeployed(domain)
|
||||||
@ -169,7 +171,9 @@ def restore(recipe: str, domain: str) -> dict:
|
|||||||
for vol in meta.get("volumes", []):
|
for vol in meta.get("volumes", []):
|
||||||
tar_path = os.path.join(volumes_dir(recipe), f"{vol}.tar")
|
tar_path = os.path.join(volumes_dir(recipe), f"{vol}.tar")
|
||||||
if vol not in current:
|
if vol not in current:
|
||||||
raise SnapshotError(f"snapshot volume {vol} absent from current stack {sorted(current)}")
|
raise SnapshotError(
|
||||||
|
f"snapshot volume {vol} absent from current stack {sorted(current)}"
|
||||||
|
)
|
||||||
mp = _volume_mountpoint(vol)
|
mp = _volume_mountpoint(vol)
|
||||||
# Clear the volume contents (incl. dotfiles) without removing the mountpoint itself.
|
# Clear the volume contents (incl. dotfiles) without removing the mountpoint itself.
|
||||||
r = _run(["sh", "-c", f'rm -rf -- "{mp}"/* "{mp}"/.[!.]* "{mp}"/..?* 2>/dev/null; true'])
|
r = _run(["sh", "-c", f'rm -rf -- "{mp}"/* "{mp}"/.[!.]* "{mp}"/..?* 2>/dev/null; true'])
|
||||||
|
|||||||
@ -60,14 +60,17 @@ def sweep() -> int:
|
|||||||
for r in recipes:
|
for r in recipes:
|
||||||
print(f"\n===== nightly: full-cold {r} (latest) =====", flush=True)
|
print(f"\n===== nightly: full-cold {r} (latest) =====", flush=True)
|
||||||
env = dict(os.environ, RECIPE=r)
|
env = dict(os.environ, RECIPE=r)
|
||||||
env.pop("REF", None) # latest, not a PR head
|
env.pop("REF", None) # latest, not a PR head
|
||||||
env.pop("CCCI_QUICK", None)
|
env.pop("CCCI_QUICK", None)
|
||||||
env.pop("MODE", None)
|
env.pop("MODE", None)
|
||||||
rc = subprocess.run(
|
rc = subprocess.run(
|
||||||
[sys.executable, os.path.join(_here(), "run_recipe_ci.py")], env=env
|
[sys.executable, os.path.join(_here(), "run_recipe_ci.py")], env=env
|
||||||
).returncode
|
).returncode
|
||||||
results[r] = rc
|
results[r] = rc
|
||||||
print(f"nightly: {r} rc={rc} ({'green→canonical refreshed' if rc == 0 else 'red'})", flush=True)
|
print(
|
||||||
|
f"nightly: {r} rc={rc} ({'green→canonical refreshed' if rc == 0 else 'red'})",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
# WC8 disk hygiene: drop warm data for de-enrolled canonicals; log the disk budget.
|
# WC8 disk hygiene: drop warm data for de-enrolled canonicals; log the disk budget.
|
||||||
pruned = canonical.prune_stale()
|
pruned = canonical.prune_stale()
|
||||||
if pruned:
|
if pruned:
|
||||||
|
|||||||
@ -44,17 +44,26 @@ sys.path.insert(0, os.path.join(ROOT, "runner"))
|
|||||||
from harness import ( # noqa: E402
|
from harness import ( # noqa: E402
|
||||||
abra,
|
abra,
|
||||||
canonical,
|
canonical,
|
||||||
card as card_mod,
|
|
||||||
deps as deps_mod,
|
|
||||||
discovery,
|
discovery,
|
||||||
generic,
|
generic,
|
||||||
lifecycle,
|
lifecycle,
|
||||||
|
lifetime,
|
||||||
naming,
|
naming,
|
||||||
results as results_mod,
|
|
||||||
screenshot as screenshot_mod,
|
|
||||||
warm,
|
warm,
|
||||||
warmsnap,
|
warmsnap,
|
||||||
)
|
)
|
||||||
|
from harness import ( # noqa: E402
|
||||||
|
card as card_mod,
|
||||||
|
)
|
||||||
|
from harness import ( # noqa: E402
|
||||||
|
deps as deps_mod,
|
||||||
|
)
|
||||||
|
from harness import ( # noqa: E402
|
||||||
|
results as results_mod,
|
||||||
|
)
|
||||||
|
from harness import ( # noqa: E402
|
||||||
|
screenshot as screenshot_mod,
|
||||||
|
)
|
||||||
|
|
||||||
ALL_STAGES = ("install", "upgrade", "backup", "restore", "custom")
|
ALL_STAGES = ("install", "upgrade", "backup", "restore", "custom")
|
||||||
|
|
||||||
@ -129,18 +138,73 @@ def _gitea_token() -> str | None:
|
|||||||
return tok or None
|
return tok or None
|
||||||
|
|
||||||
|
|
||||||
|
def _run_state_path(name: str) -> str:
|
||||||
|
"""Run-scoped state file in the tempdir, keyed by run id + harness pid — NEVER by app domain.
|
||||||
|
A second run of the SAME domain overlaps this process (its main() preamble executes before it
|
||||||
|
blocks at the app lock inside deploy_app), so domain-keyed files get reset/removed under the
|
||||||
|
live run: M2(c) double-!testme produced a false DG4.1 deploy-count=2 in run 1 and a countfile
|
||||||
|
FileNotFoundError crash in run 2. Children never re-derive these paths — they receive them
|
||||||
|
via the CCCI_*_FILE env vars, so the key only has to be unique per harness process."""
|
||||||
|
rid = results_mod.run_id()
|
||||||
|
return os.path.join(tempfile.gettempdir(), f"ccci-{name}-{rid}-{os.getpid()}")
|
||||||
|
|
||||||
|
|
||||||
|
def setup_run_abra_dir() -> str:
|
||||||
|
"""P3: build + export this run's PER-RUN ABRA_DIR — structural isolation of recipe trees.
|
||||||
|
|
||||||
|
`<runs_dir>/<run-id>/abra/` with:
|
||||||
|
servers/ -> symlink to the canonical ~/.abra/servers. App .env files land in the shared
|
||||||
|
canonical path, so janitor discovery (`abra app ls`) and env-based teardown
|
||||||
|
work unchanged from any process; per-domain filenames + the app-domain lock
|
||||||
|
prevent write conflicts.
|
||||||
|
catalogue/ -> symlink to the canonical ~/.abra/catalogue (read-mostly).
|
||||||
|
recipes/ fresh + empty — THE isolation that matters: each run clones and git-checkouts
|
||||||
|
its own recipe trees, so concurrent runs (same recipe included) can never
|
||||||
|
corrupt each other's deploy tree. Replaces the per-recipe flock.
|
||||||
|
Exported as $ABRA_DIR — honored by the abra CLI and by every harness path helper
|
||||||
|
(abra.abra_dir()) — BEFORE any abra call. Rides along the existing run-dir retention."""
|
||||||
|
canonical = os.path.expanduser("~/.abra")
|
||||||
|
rid = results_mod.run_id()
|
||||||
|
if rid == "manual":
|
||||||
|
rid = f"manual-{os.getpid()}" # two concurrent hand-runs must not share a tree
|
||||||
|
run_abra_dir = os.path.join(results_mod.runs_dir(), rid, "abra")
|
||||||
|
os.makedirs(os.path.join(run_abra_dir, "recipes"), exist_ok=True)
|
||||||
|
for shared in ("servers", "catalogue"):
|
||||||
|
link = os.path.join(run_abra_dir, shared)
|
||||||
|
if not os.path.islink(link):
|
||||||
|
os.symlink(os.path.join(canonical, shared), link)
|
||||||
|
os.environ["ABRA_DIR"] = run_abra_dir
|
||||||
|
print(
|
||||||
|
f"== per-run ABRA_DIR: {run_abra_dir} (servers/catalogue -> canonical; fresh recipes/) ==",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
return run_abra_dir
|
||||||
|
|
||||||
|
|
||||||
def fetch_recipe(recipe: str, ref: str | None, src: str | None) -> None:
|
def fetch_recipe(recipe: str, ref: str | None, src: str | None) -> None:
|
||||||
"""Make the recipe available at the code under test. If SRC+REF point at the mirror PR,
|
"""Make the recipe available at the code under test in THIS RUN's recipe tree
|
||||||
|
($ABRA_DIR/recipes/<recipe>): a plain clone — no locking needed, no rm-rf of any shared
|
||||||
|
state (the rm below only clears this run's own leftovers, e.g. a janitor-triggered
|
||||||
|
`abra app ls` auto-clone or a Drone build-number reuse). If SRC+REF point at the mirror PR,
|
||||||
clone it at that ref; otherwise fetch the catalogue copy. Private mirror repos need the bot
|
clone it at that ref; otherwise fetch the catalogue copy. Private mirror repos need the bot
|
||||||
token — passed via a per-command http.extraHeader (not persisted in .git/config, not printed)."""
|
token — passed via a per-command http.extraHeader (not persisted in .git/config, not printed)."""
|
||||||
recipes_dir = os.path.expanduser("~/.abra/recipes")
|
dest = abra.recipe_dir(recipe)
|
||||||
os.makedirs(recipes_dir, exist_ok=True)
|
os.makedirs(os.path.dirname(dest), exist_ok=True)
|
||||||
dest = os.path.join(recipes_dir, recipe)
|
# CCCI_SKIP_FETCH=1: use the locally STAGED recipe clone as-is (lets a test/Adversary stage a
|
||||||
# CCCI_SKIP_FETCH=1: use the local recipe clone as-is (lets a test/Adversary stage a fake/broken
|
# fake/broken ref — e.g. a simulated broken PR head for the --quick rollback proof — without it
|
||||||
# ref — e.g. a simulated broken PR head for the --quick rollback proof — without it being clobbered
|
# being clobbered by a re-fetch). Staging happens in the canonical ~/.abra/recipes/<recipe>;
|
||||||
# by a re-fetch). Never set in production CI.
|
# copy it into the per-run tree so the rest of the run reads the staged state. Never set in
|
||||||
|
# production CI.
|
||||||
if os.environ.get("CCCI_SKIP_FETCH") == "1":
|
if os.environ.get("CCCI_SKIP_FETCH") == "1":
|
||||||
print(f"[fetch] CCCI_SKIP_FETCH=1 — using local {recipe} recipe clone as-is", flush=True)
|
canonical = os.path.expanduser(f"~/.abra/recipes/{recipe}")
|
||||||
|
subprocess.run(["rm", "-rf", dest], check=False)
|
||||||
|
if os.path.isdir(canonical):
|
||||||
|
shutil.copytree(canonical, dest, symlinks=True)
|
||||||
|
print(
|
||||||
|
f"[fetch] CCCI_SKIP_FETCH=1 — using staged {recipe} clone as-is "
|
||||||
|
f"(copied {canonical} -> per-run tree)",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
return
|
return
|
||||||
if src and ref:
|
if src and ref:
|
||||||
url = f"https://git.autonomic.zone/{src}.git"
|
url = f"https://git.autonomic.zone/{src}.git"
|
||||||
@ -169,7 +233,7 @@ def fetch_recipe(recipe: str, ref: str | None, src: str | None) -> None:
|
|||||||
def snapshot_recipe_tests(recipe: str) -> str | None:
|
def snapshot_recipe_tests(recipe: str) -> str | None:
|
||||||
"""Copy the recipe-shipped tests/ to a stable temp dir, immune to abra re-checking-out the
|
"""Copy the recipe-shipped tests/ to a stable temp dir, immune to abra re-checking-out the
|
||||||
recipe to a version tag during the run. Returns the snapshot path, or None if no tests/."""
|
recipe to a version tag during the run. Returns the snapshot path, or None if no tests/."""
|
||||||
src = os.path.expanduser(f"~/.abra/recipes/{recipe}/tests")
|
src = os.path.join(abra.recipe_dir(recipe), "tests")
|
||||||
if not os.path.isdir(src):
|
if not os.path.isdir(src):
|
||||||
return None
|
return None
|
||||||
has_overlay = glob.glob(os.path.join(src, "test_*.py")) or os.path.isfile(
|
has_overlay = glob.glob(os.path.join(src, "test_*.py")) or os.path.isfile(
|
||||||
@ -200,6 +264,7 @@ def _load_meta(recipe: str) -> dict:
|
|||||||
for k in list(meta) + [
|
for k in list(meta) + [
|
||||||
"BACKUP_CAPABLE",
|
"BACKUP_CAPABLE",
|
||||||
"SKIP_GENERIC",
|
"SKIP_GENERIC",
|
||||||
|
"EXPECTED_NA",
|
||||||
"OIDC_AT_INSTALL",
|
"OIDC_AT_INSTALL",
|
||||||
"READY_PROBE",
|
"READY_PROBE",
|
||||||
"UPGRADE_BASE_VERSION",
|
"UPGRADE_BASE_VERSION",
|
||||||
@ -565,15 +630,15 @@ def run_quick(
|
|||||||
flush=True,
|
flush=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
statefile = os.path.join(tempfile.gettempdir(), f"ccci-opstate-{domain}.json")
|
statefile = _run_state_path("opstate") + ".json"
|
||||||
with open(statefile, "w") as f:
|
with open(statefile, "w") as f:
|
||||||
json.dump({}, f)
|
json.dump({}, f)
|
||||||
os.environ["CCCI_OP_STATE_FILE"] = statefile
|
os.environ["CCCI_OP_STATE_FILE"] = statefile
|
||||||
depsfile = os.path.join(tempfile.gettempdir(), f"ccci-deps-{domain}.json")
|
depsfile = _run_state_path("deps") + ".json"
|
||||||
with open(depsfile, "w") as f:
|
with open(depsfile, "w") as f:
|
||||||
json.dump({}, f)
|
json.dump({}, f)
|
||||||
os.environ["CCCI_DEPS_FILE"] = depsfile
|
os.environ["CCCI_DEPS_FILE"] = depsfile
|
||||||
skipfile = os.path.join(tempfile.gettempdir(), f"ccci-depskip-{domain}.txt")
|
skipfile = _run_state_path("depskip") + ".txt"
|
||||||
with contextlib.suppress(OSError):
|
with contextlib.suppress(OSError):
|
||||||
os.remove(skipfile)
|
os.remove(skipfile)
|
||||||
os.environ["CCCI_DEPS_SKIP_REPORT"] = skipfile
|
os.environ["CCCI_DEPS_SKIP_REPORT"] = skipfile
|
||||||
@ -649,6 +714,8 @@ def run_quick(
|
|||||||
results["upgrade"] = "fail"
|
results["upgrade"] = "fail"
|
||||||
results["custom"] = "skip"
|
results["custom"] = "skip"
|
||||||
finally:
|
finally:
|
||||||
|
# Teardown funnel running: further SIGTERM/SIGALRM are logged + ignored (lifetime.py).
|
||||||
|
lifetime.begin_teardown()
|
||||||
# F2-11 skip count (read before deciding pass/fail)
|
# F2-11 skip count (read before deciding pass/fail)
|
||||||
requires_deps_skipped = 0
|
requires_deps_skipped = 0
|
||||||
try:
|
try:
|
||||||
@ -812,6 +879,9 @@ def promote_canonical(recipe: str, head_ref: str | None) -> None:
|
|||||||
|
|
||||||
|
|
||||||
def main() -> int:
|
def main() -> int:
|
||||||
|
# P1 lock-lifetime hardening: PDEATHSIG + SIGTERM/SIGALRM teardown funnel + 60-min hard
|
||||||
|
# deadline, armed before ANY abra call or lock acquisition (see harness/lifetime.py).
|
||||||
|
lifetime.install_lifetime_guards()
|
||||||
recipe = os.environ.get("RECIPE")
|
recipe = os.environ.get("RECIPE")
|
||||||
if not recipe:
|
if not recipe:
|
||||||
print("RECIPE env is required", file=sys.stderr)
|
print("RECIPE env is required", file=sys.stderr)
|
||||||
@ -826,6 +896,10 @@ def main() -> int:
|
|||||||
print(
|
print(
|
||||||
f"== cc-ci run: recipe={recipe} ref={ref} pr={os.environ.get('PR', '0')} stages={sorted(stages)}"
|
f"== cc-ci run: recipe={recipe} ref={ref} pr={os.environ.get('PR', '0')} stages={sorted(stages)}"
|
||||||
)
|
)
|
||||||
|
# Concurrent-run safety is structural: this run's recipe trees live in its own ABRA_DIR
|
||||||
|
# (exported here, before ANY abra call), so no recipe-tree lock exists; same-DOMAIN runs
|
||||||
|
# serialise on the app-domain flock taken in deploy_app (see docs/concurrency.md).
|
||||||
|
setup_run_abra_dir()
|
||||||
fetch_recipe(recipe, ref, src)
|
fetch_recipe(recipe, ref, src)
|
||||||
# The PR-head commit the upgrade tier re-checks out for the chaos redeploy to the code under test
|
# The PR-head commit the upgrade tier re-checks out for the chaos redeploy to the code under test
|
||||||
# (HC1). Prefer the explicit PR head sha ($REF) — robust + exact; fall back to the recipe checkout
|
# (HC1). Prefer the explicit PR head sha ($REF) — robust + exact; fall back to the recipe checkout
|
||||||
@ -864,7 +938,7 @@ def main() -> int:
|
|||||||
hook = discovery.install_steps(recipe, repo_local)
|
hook = discovery.install_steps(recipe, repo_local)
|
||||||
|
|
||||||
# Deploy-count guard (DG4.1): exactly one deploy_app() per run.
|
# Deploy-count guard (DG4.1): exactly one deploy_app() per run.
|
||||||
countfile = os.path.join(tempfile.gettempdir(), f"ccci-deploys-{domain}")
|
countfile = _run_state_path("deploys")
|
||||||
with open(countfile, "w") as f:
|
with open(countfile, "w") as f:
|
||||||
f.write("0")
|
f.write("0")
|
||||||
os.environ["CCCI_DEPLOY_COUNT_FILE"] = countfile
|
os.environ["CCCI_DEPLOY_COUNT_FILE"] = countfile
|
||||||
@ -880,7 +954,7 @@ def main() -> int:
|
|||||||
|
|
||||||
# Run-scoped op state (HC3): the orchestrator records op results (pre-upgrade identity, backup
|
# Run-scoped op state (HC3): the orchestrator records op results (pre-upgrade identity, backup
|
||||||
# snapshot_id) here for the assertion tiers (generic + overlay) to read via generic.op_state().
|
# snapshot_id) here for the assertion tiers (generic + overlay) to read via generic.op_state().
|
||||||
statefile = os.path.join(tempfile.gettempdir(), f"ccci-opstate-{domain}.json")
|
statefile = _run_state_path("opstate") + ".json"
|
||||||
with open(statefile, "w") as f:
|
with open(statefile, "w") as f:
|
||||||
json.dump({}, f)
|
json.dump({}, f)
|
||||||
os.environ["CCCI_OP_STATE_FILE"] = statefile
|
os.environ["CCCI_OP_STATE_FILE"] = statefile
|
||||||
@ -891,12 +965,12 @@ def main() -> int:
|
|||||||
# cannot break the generic-tier signal. The `setup_custom_tests` step deploys each dep + runs
|
# cannot break the generic-tier signal. The `setup_custom_tests` step deploys each dep + runs
|
||||||
# `tests/<recipe>/setup_custom_tests.sh` to wire OIDC env via in-place redeploy.
|
# `tests/<recipe>/setup_custom_tests.sh` to wire OIDC env via in-place redeploy.
|
||||||
# `$CCCI_DEPS_FILE` is written with the full creds dict the hook script needs (jq-readable).
|
# `$CCCI_DEPS_FILE` is written with the full creds dict the hook script needs (jq-readable).
|
||||||
depsfile = os.path.join(tempfile.gettempdir(), f"ccci-deps-{domain}.json")
|
depsfile = _run_state_path("deps") + ".json"
|
||||||
with open(depsfile, "w") as f:
|
with open(depsfile, "w") as f:
|
||||||
json.dump({}, f)
|
json.dump({}, f)
|
||||||
os.environ["CCCI_DEPS_FILE"] = depsfile
|
os.environ["CCCI_DEPS_FILE"] = depsfile
|
||||||
# F2-11: conftest appends the count of requires_deps tests it skips (deps-not-ready) here.
|
# F2-11: conftest appends the count of requires_deps tests it skips (deps-not-ready) here.
|
||||||
skipfile = os.path.join(tempfile.gettempdir(), f"ccci-depskip-{domain}.txt")
|
skipfile = _run_state_path("depskip") + ".txt"
|
||||||
with contextlib.suppress(OSError):
|
with contextlib.suppress(OSError):
|
||||||
os.remove(skipfile)
|
os.remove(skipfile)
|
||||||
os.environ["CCCI_DEPS_SKIP_REPORT"] = skipfile
|
os.environ["CCCI_DEPS_SKIP_REPORT"] = skipfile
|
||||||
@ -1108,6 +1182,9 @@ def main() -> int:
|
|||||||
if op in stages:
|
if op in stages:
|
||||||
results[op] = "skip"
|
results[op] = "skip"
|
||||||
finally:
|
finally:
|
||||||
|
# From here the teardown funnel runs: a SIGTERM/SIGALRM landing now is logged + ignored
|
||||||
|
# (lifetime.py) so a second signal can't abort the cleanup the first one asked for.
|
||||||
|
lifetime.begin_teardown()
|
||||||
# Teardown the recipe under test FIRST, then deps in reverse declaration order.
|
# Teardown the recipe under test FIRST, then deps in reverse declaration order.
|
||||||
# Parent verify=False (Phase 1d): keep as-is so a parent residual doesn't mask a tier
|
# Parent verify=False (Phase 1d): keep as-is so a parent residual doesn't mask a tier
|
||||||
# failure. Dep teardown uses verify=True via teardown_deps (F2-5 fix); failures are
|
# failure. Dep teardown uses verify=True via teardown_deps (F2-5 fix); failures are
|
||||||
@ -1224,7 +1301,6 @@ def main() -> int:
|
|||||||
# a failure here NEVER changes `overall` (R7 — cosmetics never block the pipeline). ----
|
# a failure here NEVER changes `overall` (R7 — cosmetics never block the pipeline). ----
|
||||||
data: dict | None = None
|
data: dict | None = None
|
||||||
try:
|
try:
|
||||||
sso_unverified = sso_dep_unverified(declared, deps_ready, requires_deps_skipped)
|
|
||||||
clean_teardown = (deploy_count == expected_deploy_count) and not dep_teardown_error
|
clean_teardown = (deploy_count == expected_deploy_count) and not dep_teardown_error
|
||||||
data = results_mod.build_results(
|
data = results_mod.build_results(
|
||||||
recipe=recipe,
|
recipe=recipe,
|
||||||
@ -1234,13 +1310,11 @@ def main() -> int:
|
|||||||
records=records,
|
records=records,
|
||||||
results=results,
|
results=results,
|
||||||
backup_capable=backup_cap,
|
backup_capable=backup_cap,
|
||||||
declared=declared,
|
|
||||||
deps_ready=deps_ready,
|
|
||||||
sso_unverified=sso_unverified,
|
|
||||||
clean_teardown=clean_teardown,
|
clean_teardown=clean_teardown,
|
||||||
no_secret_leak=True, # narrowed below by an actual scan of the serialised artifact
|
no_secret_leak=True, # narrowed below by an actual scan of the serialised artifact
|
||||||
screenshot=screenshot_rel, # Phase 3 U1 (R4): relative PNG name iff capture succeeded
|
screenshot=screenshot_rel, # Phase 3 U1 (R4): relative PNG name iff capture succeeded
|
||||||
finished_ts=time.time(),
|
finished_ts=time.time(),
|
||||||
|
expected_na=meta.get("EXPECTED_NA"), # declared intentional-skip map (recipe_meta)
|
||||||
)
|
)
|
||||||
# Real (if narrow) leak check: no known infra-secret value may appear in the artifact (R7).
|
# Real (if narrow) leak check: no known infra-secret value may appear in the artifact (R7).
|
||||||
blob = json.dumps(data)
|
blob = json.dumps(data)
|
||||||
@ -1257,6 +1331,15 @@ def main() -> int:
|
|||||||
f"{' — ' + data['level_cap_reason'] if data['level_cap_reason'] else ''})",
|
f"{' — ' + data['level_cap_reason'] if data['level_cap_reason'] else ''})",
|
||||||
flush=True,
|
flush=True,
|
||||||
)
|
)
|
||||||
|
# Surface UNINTENTIONAL skips in the CI log (non-blocking, R7): a rung that was skipped (N/A)
|
||||||
|
# but is not in the recipe's intentional list — either add the missing coverage or declare it.
|
||||||
|
for rung in data.get("skips", {}).get("unintentional", []):
|
||||||
|
print(
|
||||||
|
f"⚠ coverage: rung '{rung}' was skipped (N/A) but is not declared intentional — add "
|
||||||
|
f"the missing test/label, or list it in tests/{recipe}/recipe_meta.py "
|
||||||
|
f"EXPECTED_NA = {{'{rung}': '<why>'}}.",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
except Exception as e: # noqa: BLE001 — results assembly is cosmetic; never fail a run on it (R7)
|
except Exception as e: # noqa: BLE001 — results assembly is cosmetic; never fail a run on it (R7)
|
||||||
print(
|
print(
|
||||||
f"!! results.json assembly failed (non-fatal, verdict unaffected): {_scrub(str(e))}",
|
f"!! results.json assembly failed (non-fatal, verdict unaffected): {_scrub(str(e))}",
|
||||||
@ -1275,8 +1358,21 @@ def main() -> int:
|
|||||||
with open(html_path, "w", encoding="utf-8") as f:
|
with open(html_path, "w", encoding="utf-8") as f:
|
||||||
f.write(card_mod.render_card_html(data, screenshot_rel=data.get("screenshot")))
|
f.write(card_mod.render_card_html(data, screenshot_rel=data.get("screenshot")))
|
||||||
png = card_mod.render_card_png(html_path, os.path.join(run_artifact_dir, "summary.png"))
|
png = card_mod.render_card_png(html_path, os.path.join(run_artifact_dir, "summary.png"))
|
||||||
|
capped = data.get("level_cap_rung")
|
||||||
|
sk = data.get("skips", {})
|
||||||
|
cap_skip = (
|
||||||
|
"intentional"
|
||||||
|
if capped in (sk.get("intentional") or {})
|
||||||
|
else "unintentional"
|
||||||
|
if capped in (sk.get("unintentional") or [])
|
||||||
|
else ""
|
||||||
|
)
|
||||||
with open(os.path.join(run_artifact_dir, "badge.svg"), "w", encoding="utf-8") as f:
|
with open(os.path.join(run_artifact_dir, "badge.svg"), "w", encoding="utf-8") as f:
|
||||||
f.write(card_mod.level_badge_svg(data["level"], data.get("level_cap_reason", "")))
|
f.write(
|
||||||
|
card_mod.level_badge_svg(
|
||||||
|
data["level"], data.get("level_cap_reason", ""), cap_skip
|
||||||
|
)
|
||||||
|
)
|
||||||
print(
|
print(
|
||||||
f"summary card {'rendered ' + png if png else '(PNG render unavailable)'} + "
|
f"summary card {'rendered ' + png if png else '(PNG render unavailable)'} + "
|
||||||
f"badge.svg written into {run_artifact_dir}",
|
f"badge.svg written into {run_artifact_dir}",
|
||||||
|
|||||||
@ -43,11 +43,16 @@ def _traefik_setup(recipe: str, domain: str, version: str) -> None:
|
|||||||
ssl_cert/ssl_key swarm secrets; NO ACME). Uses the proven abra.env_set (newline-safe, unlike the
|
ssl_cert/ssl_key swarm secrets; NO ACME). Uses the proven abra.env_set (newline-safe, unlike the
|
||||||
bash set_env that bit keycloak)."""
|
bash set_env that bit keycloak)."""
|
||||||
cert_dir = "/var/lib/ci-certs/live"
|
cert_dir = "/var/lib/ci-certs/live"
|
||||||
if not (os.path.isfile(f"{cert_dir}/fullchain.pem") and os.path.isfile(f"{cert_dir}/privkey.pem")):
|
if not (
|
||||||
|
os.path.isfile(f"{cert_dir}/fullchain.pem") and os.path.isfile(f"{cert_dir}/privkey.pem")
|
||||||
|
):
|
||||||
raise RuntimeError(f"FATAL: wildcard cert missing at {cert_dir} (sops decrypt broken?)")
|
raise RuntimeError(f"FATAL: wildcard cert missing at {cert_dir} (sops decrypt broken?)")
|
||||||
if not os.path.isfile(env_file(domain)):
|
if not os.path.isfile(env_file(domain)):
|
||||||
_run(["abra", "app", "new", recipe, "-s", "default", "-D", domain, version, "-o", "-n"],
|
_run(
|
||||||
timeout=120, check=True)
|
["abra", "app", "new", recipe, "-s", "default", "-D", domain, version, "-o", "-n"],
|
||||||
|
timeout=120,
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
abra.env_set(domain, "DOMAIN", domain)
|
abra.env_set(domain, "DOMAIN", domain)
|
||||||
abra.env_set(domain, "LETS_ENCRYPT_ENV", "")
|
abra.env_set(domain, "LETS_ENCRYPT_ENV", "")
|
||||||
abra.env_set(domain, "WILDCARDS_ENABLED", "1")
|
abra.env_set(domain, "WILDCARDS_ENABLED", "1")
|
||||||
@ -61,11 +66,39 @@ def _traefik_setup(recipe: str, domain: str, version: str) -> None:
|
|||||||
return any(s.endswith(f"_{name}_v1") for s in have)
|
return any(s.endswith(f"_{name}_v1") for s in have)
|
||||||
|
|
||||||
if not _has("ssl_cert"):
|
if not _has("ssl_cert"):
|
||||||
_run(["abra", "app", "secret", "insert", domain, "ssl_cert", "v1",
|
_run(
|
||||||
f"{cert_dir}/fullchain.pem", "-f", "-n"], timeout=120, check=True)
|
[
|
||||||
|
"abra",
|
||||||
|
"app",
|
||||||
|
"secret",
|
||||||
|
"insert",
|
||||||
|
domain,
|
||||||
|
"ssl_cert",
|
||||||
|
"v1",
|
||||||
|
f"{cert_dir}/fullchain.pem",
|
||||||
|
"-f",
|
||||||
|
"-n",
|
||||||
|
],
|
||||||
|
timeout=120,
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
if not _has("ssl_key"):
|
if not _has("ssl_key"):
|
||||||
_run(["abra", "app", "secret", "insert", domain, "ssl_key", "v1",
|
_run(
|
||||||
f"{cert_dir}/privkey.pem", "-f", "-n"], timeout=120, check=True)
|
[
|
||||||
|
"abra",
|
||||||
|
"app",
|
||||||
|
"secret",
|
||||||
|
"insert",
|
||||||
|
domain,
|
||||||
|
"ssl_key",
|
||||||
|
"v1",
|
||||||
|
f"{cert_dir}/privkey.pem",
|
||||||
|
"-f",
|
||||||
|
"-n",
|
||||||
|
],
|
||||||
|
timeout=120,
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
SPECS: dict[str, dict] = {
|
SPECS: dict[str, dict] = {
|
||||||
@ -166,7 +199,13 @@ def _run(cmd, timeout=120, check=False):
|
|||||||
|
|
||||||
|
|
||||||
def _recipe_dir(recipe: str) -> str:
|
def _recipe_dir(recipe: str) -> str:
|
||||||
return os.path.expanduser(f"~/.abra/recipes/{recipe}")
|
# Resolve like the abra CLI does: $ABRA_DIR (the per-run tree when imported by a CI run,
|
||||||
|
# e.g. promote_canonical) else the canonical ~/.abra (this module's own systemd-timer runs,
|
||||||
|
# which set no ABRA_DIR). Keeps fetch_recipe (an `abra` subprocess) and the git readers
|
||||||
|
# below pointed at the SAME tree in both contexts.
|
||||||
|
return os.path.join(
|
||||||
|
os.environ.get("ABRA_DIR") or os.path.expanduser("~/.abra"), "recipes", recipe
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def recipe_tags(recipe: str) -> list[str]:
|
def recipe_tags(recipe: str) -> list[str]:
|
||||||
@ -218,8 +257,17 @@ def health_code(spec: dict) -> int:
|
|||||||
domain = spec.get("health_domain", spec["domain"])
|
domain = spec.get("health_domain", spec["domain"])
|
||||||
r = _run(
|
r = _run(
|
||||||
[
|
[
|
||||||
"curl", "-sk", "-o", "/dev/null", "-w", "%{http_code}", "--max-time", "10",
|
"curl",
|
||||||
"--resolve", f"{domain}:443:127.0.0.1", f"https://{domain}{spec['health_path']}",
|
"-sk",
|
||||||
|
"-o",
|
||||||
|
"/dev/null",
|
||||||
|
"-w",
|
||||||
|
"%{http_code}",
|
||||||
|
"--max-time",
|
||||||
|
"10",
|
||||||
|
"--resolve",
|
||||||
|
f"{domain}:443:127.0.0.1",
|
||||||
|
f"https://{domain}{spec['health_path']}",
|
||||||
],
|
],
|
||||||
timeout=20,
|
timeout=20,
|
||||||
)
|
)
|
||||||
@ -230,7 +278,6 @@ def health_code(spec: dict) -> int:
|
|||||||
|
|
||||||
|
|
||||||
def wait_healthy(spec: dict, timeout: int | None = None) -> bool:
|
def wait_healthy(spec: dict, timeout: int | None = None) -> bool:
|
||||||
domain = spec["domain"]
|
|
||||||
deadline = time.time() + (timeout or spec["health_timeout"])
|
deadline = time.time() + (timeout or spec["health_timeout"])
|
||||||
while time.time() < deadline:
|
while time.time() < deadline:
|
||||||
if health_code(spec) in tuple(spec["health_ok"]):
|
if health_code(spec) in tuple(spec["health_ok"]):
|
||||||
@ -325,15 +372,18 @@ def ensure_server() -> None:
|
|||||||
|
|
||||||
def ensure_app_config(recipe: str, domain: str, version: str) -> None:
|
def ensure_app_config(recipe: str, domain: str, version: str) -> None:
|
||||||
if not os.path.isfile(env_file(domain)):
|
if not os.path.isfile(env_file(domain)):
|
||||||
_run(["abra", "app", "new", recipe, "-s", "default", "-D", domain, version, "-o", "-n"],
|
_run(
|
||||||
timeout=120, check=True)
|
["abra", "app", "new", recipe, "-s", "default", "-D", domain, version, "-o", "-n"],
|
||||||
|
timeout=120,
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
abra.env_set(domain, "DOMAIN", domain)
|
abra.env_set(domain, "DOMAIN", domain)
|
||||||
abra.env_set(domain, "LETS_ENCRYPT_ENV", "")
|
abra.env_set(domain, "LETS_ENCRYPT_ENV", "")
|
||||||
|
|
||||||
|
|
||||||
def ensure_secrets(domain: str) -> None:
|
def ensure_secrets(domain: str) -> None:
|
||||||
stack = lifecycle._stack_name(domain) # noqa: SLF001
|
stack = lifecycle._stack_name(domain) # noqa: SLF001
|
||||||
have = {n for n in lifecycle._docker_names("secret", stack)} # noqa: SLF001
|
have = set(lifecycle._docker_names("secret", stack)) # noqa: SLF001
|
||||||
if not any(n.endswith("_admin_password_v1") for n in have):
|
if not any(n.endswith("_admin_password_v1") for n in have):
|
||||||
abra.secret_generate(domain)
|
abra.secret_generate(domain)
|
||||||
|
|
||||||
@ -393,8 +443,9 @@ def reconcile(app: str) -> str:
|
|||||||
write_alert(app, "held-major", current=current, latest=latest, release_notes=notes[:4000])
|
write_alert(app, "held-major", current=current, latest=latest, release_notes=notes[:4000])
|
||||||
return f"held-major:{current}->{latest}"
|
return f"held-major:{current}->{latest}"
|
||||||
if notes_flag_manual_migration(notes):
|
if notes_flag_manual_migration(notes):
|
||||||
write_alert(app, "held-manual-migration", current=current, latest=latest,
|
write_alert(
|
||||||
release_notes=notes[:4000])
|
app, "held-manual-migration", current=current, latest=latest, release_notes=notes[:4000]
|
||||||
|
)
|
||||||
return f"held-manual-migration:{current}->{latest}"
|
return f"held-manual-migration:{current}->{latest}"
|
||||||
|
|
||||||
# WC1.1 health-gated upgrade with rollback.
|
# WC1.1 health-gated upgrade with rollback.
|
||||||
@ -428,8 +479,14 @@ def reconcile(app: str) -> str:
|
|||||||
warmsnap.restore(recipe, domain)
|
warmsnap.restore(recipe, domain)
|
||||||
deploy_version(recipe, domain, last_good, dt)
|
deploy_version(recipe, domain, last_good, dt)
|
||||||
recovered = wait_healthy(spec)
|
recovered = wait_healthy(spec)
|
||||||
write_alert(app, "rollback", last_good=last_good, attempted=latest, recovered=recovered,
|
write_alert(
|
||||||
release_notes=notes[:2000])
|
app,
|
||||||
|
"rollback",
|
||||||
|
last_good=last_good,
|
||||||
|
attempted=latest,
|
||||||
|
recovered=recovered,
|
||||||
|
release_notes=notes[:2000],
|
||||||
|
)
|
||||||
if not recovered:
|
if not recovered:
|
||||||
raise RuntimeError(f"{app} rollback to {last_good} did not become healthy")
|
raise RuntimeError(f"{app} rollback to {last_good} did not become healthy")
|
||||||
return f"rolled-back:{latest}->{last_good}"
|
return f"rolled-back:{latest}->{last_good}"
|
||||||
|
|||||||
@ -15,7 +15,8 @@ import shlex
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
from harness import http as harness_http, lifecycle # noqa: E402
|
from harness import http as harness_http # noqa: E402
|
||||||
|
from harness import lifecycle
|
||||||
|
|
||||||
PDS_HOST_LOCAL = "http://localhost:3000"
|
PDS_HOST_LOCAL = "http://localhost:3000"
|
||||||
_PW = "ccci-P4-marker-pw-2026"
|
_PW = "ccci-P4-marker-pw-2026"
|
||||||
|
|||||||
@ -27,6 +27,7 @@ CRUD). A wedged PDS subsystem fails AT its layer.
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import contextlib
|
||||||
import os
|
import os
|
||||||
import re
|
import re
|
||||||
import secrets
|
import secrets
|
||||||
@ -35,7 +36,8 @@ import sys
|
|||||||
import uuid
|
import uuid
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
||||||
from harness import http as harness_http, lifecycle # noqa: E402
|
from harness import http as harness_http # noqa: E402
|
||||||
|
from harness import lifecycle
|
||||||
|
|
||||||
PDS_HOST_LOCAL = "http://localhost:3000"
|
PDS_HOST_LOCAL = "http://localhost:3000"
|
||||||
|
|
||||||
@ -58,14 +60,18 @@ def _goat_admin(domain: str, args: str) -> str:
|
|||||||
return _in_container(domain, cmd)
|
return _in_container(domain, cmd)
|
||||||
|
|
||||||
|
|
||||||
def _xrpc_post(domain: str, nsid: str, data: dict, token: str | None = None) -> tuple[int, dict | None]:
|
def _xrpc_post(
|
||||||
|
domain: str, nsid: str, data: dict, token: str | None = None
|
||||||
|
) -> tuple[int, dict | None]:
|
||||||
headers = {}
|
headers = {}
|
||||||
if token:
|
if token:
|
||||||
headers["Authorization"] = f"Bearer {token}"
|
headers["Authorization"] = f"Bearer {token}"
|
||||||
return harness_http.http_post(f"https://{domain}/xrpc/{nsid}", data=data, headers=headers)
|
return harness_http.http_post(f"https://{domain}/xrpc/{nsid}", data=data, headers=headers)
|
||||||
|
|
||||||
|
|
||||||
def _xrpc_get(domain: str, nsid: str, query: str, token: str | None = None) -> tuple[int, dict | None]:
|
def _xrpc_get(
|
||||||
|
domain: str, nsid: str, query: str, token: str | None = None
|
||||||
|
) -> tuple[int, dict | None]:
|
||||||
headers = {}
|
headers = {}
|
||||||
if token:
|
if token:
|
||||||
headers["Authorization"] = f"Bearer {token}"
|
headers["Authorization"] = f"Bearer {token}"
|
||||||
@ -82,9 +88,9 @@ def test_account_lifecycle_and_post_roundtrip(live_app):
|
|||||||
|
|
||||||
# Step 1: PDS describe via goat — recipe self-identifies as did:web:<domain>
|
# Step 1: PDS describe via goat — recipe self-identifies as did:web:<domain>
|
||||||
out = _in_container(domain, f"goat pds describe {PDS_HOST_LOCAL} 2>&1")
|
out = _in_container(domain, f"goat pds describe {PDS_HOST_LOCAL} 2>&1")
|
||||||
assert f"did:web:{domain}" in out, (
|
assert (
|
||||||
f"goat pds describe did not contain expected DID 'did:web:{domain}'. Output:\n{out[:500]!r}"
|
f"did:web:{domain}" in out
|
||||||
)
|
), f"goat pds describe did not contain expected DID 'did:web:{domain}'. Output:\n{out[:500]!r}"
|
||||||
|
|
||||||
# Step 2: Create account (UUID-suffixed handle = no run-to-run collision)
|
# Step 2: Create account (UUID-suffixed handle = no run-to-run collision)
|
||||||
out = _goat_admin(
|
out = _goat_admin(
|
||||||
@ -127,9 +133,9 @@ def test_account_lifecycle_and_post_roundtrip(live_app):
|
|||||||
assert s == 200, f"createRecord HTTP {s}: {body!r}"
|
assert s == 200, f"createRecord HTTP {s}: {body!r}"
|
||||||
record_uri = (body or {}).get("uri", "")
|
record_uri = (body or {}).get("uri", "")
|
||||||
# URI format: at://<did>/app.bsky.feed.post/<rkey>
|
# URI format: at://<did>/app.bsky.feed.post/<rkey>
|
||||||
assert record_uri.startswith(f"at://{new_did}/app.bsky.feed.post/"), (
|
assert record_uri.startswith(
|
||||||
f"unexpected record uri: {record_uri!r}"
|
f"at://{new_did}/app.bsky.feed.post/"
|
||||||
)
|
), f"unexpected record uri: {record_uri!r}"
|
||||||
rkey = record_uri.rsplit("/", 1)[-1]
|
rkey = record_uri.rsplit("/", 1)[-1]
|
||||||
assert rkey, f"no rkey in uri: {record_uri!r}"
|
assert rkey, f"no rkey in uri: {record_uri!r}"
|
||||||
|
|
||||||
@ -142,15 +148,13 @@ def test_account_lifecycle_and_post_roundtrip(live_app):
|
|||||||
)
|
)
|
||||||
assert s == 200, f"getRecord HTTP {s}: {body!r}"
|
assert s == 200, f"getRecord HTTP {s}: {body!r}"
|
||||||
record_value = (body or {}).get("value", {})
|
record_value = (body or {}).get("value", {})
|
||||||
assert record_value.get("text") == marker, (
|
assert (
|
||||||
f"post text did not round-trip: created={marker!r}, fetched={record_value.get('text')!r}"
|
record_value.get("text") == marker
|
||||||
)
|
), f"post text did not round-trip: created={marker!r}, fetched={record_value.get('text')!r}"
|
||||||
assert record_value.get("$type") == "app.bsky.feed.post"
|
assert record_value.get("$type") == "app.bsky.feed.post"
|
||||||
finally:
|
finally:
|
||||||
# Step 6: Best-effort cleanup. (The per-run domain teardown will discard the volume
|
# Step 6: Best-effort cleanup. (The per-run domain teardown will discard the volume
|
||||||
# too, but we exercise the delete-account path because it's part of §4.3.)
|
# too, but we exercise the delete-account path because it's part of §4.3.)
|
||||||
if cleanup_did:
|
if cleanup_did:
|
||||||
try:
|
with contextlib.suppress(Exception):
|
||||||
_goat_admin(domain, f"account delete {cleanup_did}")
|
_goat_admin(domain, f"account delete {cleanup_did}")
|
||||||
except Exception: # noqa: BLE001
|
|
||||||
pass
|
|
||||||
|
|||||||
@ -26,6 +26,6 @@ def test_describe_server_returns_atproto_envelope(live_app):
|
|||||||
# At least one of these atproto-spec fields must be present
|
# At least one of these atproto-spec fields must be present
|
||||||
expected_any = ("availableUserDomains", "inviteCodeRequired", "links", "did")
|
expected_any = ("availableUserDomains", "inviteCodeRequired", "links", "did")
|
||||||
present = [k for k in expected_any if k in body]
|
present = [k for k in expected_any if k in body]
|
||||||
assert present, (
|
assert (
|
||||||
f"describe-server missing all of {expected_any}; got keys: {sorted(body.keys())[:20]}"
|
present
|
||||||
)
|
), f"describe-server missing all of {expected_any}; got keys: {sorted(body.keys())[:20]}"
|
||||||
|
|||||||
@ -17,6 +17,6 @@ def test_pds_health_returns_version(live_app):
|
|||||||
url = f"https://{live_app}/xrpc/_health"
|
url = f"https://{live_app}/xrpc/_health"
|
||||||
status, body = harness_http.retry_http_get(url, expect_status=200, max_wait=60, interval=3)
|
status, body = harness_http.retry_http_get(url, expect_status=200, max_wait=60, interval=3)
|
||||||
assert status == 200, f"GET {url} HTTP {status} (expected 200)"
|
assert status == 200, f"GET {url} HTTP {status} (expected 200)"
|
||||||
assert isinstance(body, dict) and isinstance(body.get("version"), str) and body["version"], (
|
assert (
|
||||||
f"GET {url} response is not the expected health envelope: {body!r}"
|
isinstance(body, dict) and isinstance(body.get("version"), str) and body["version"]
|
||||||
)
|
), f"GET {url} response is not the expected health envelope: {body!r}"
|
||||||
|
|||||||
@ -30,6 +30,6 @@ def test_get_session_requires_auth(live_app):
|
|||||||
f"body: {body!r}"
|
f"body: {body!r}"
|
||||||
)
|
)
|
||||||
# The XRPC error envelope is JSON with an `error` field per the atproto spec.
|
# The XRPC error envelope is JSON with an `error` field per the atproto spec.
|
||||||
assert isinstance(body, dict) and body.get("error"), (
|
assert isinstance(body, dict) and body.get(
|
||||||
f"expected XRPC JSON error envelope; got: {body!r}"
|
"error"
|
||||||
)
|
), f"expected XRPC JSON error envelope; got: {body!r}"
|
||||||
|
|||||||
@ -22,12 +22,12 @@ echo " bluesky-pds install_steps: generating secp256k1 PLC rotation key..."
|
|||||||
# same shape the PDS expects (32-byte hex). Equivalent for atproto PDS bootstrap.
|
# same shape the PDS expects (32-byte hex). Equivalent for atproto PDS bootstrap.
|
||||||
KEY_HEX=$(cc-ci-run -c 'import secrets; print(secrets.token_bytes(32).hex())')
|
KEY_HEX=$(cc-ci-run -c 'import secrets; print(secrets.token_bytes(32).hex())')
|
||||||
if [ -z "${KEY_HEX}" ] || [ "${#KEY_HEX}" != "64" ]; then
|
if [ -z "${KEY_HEX}" ] || [ "${#KEY_HEX}" != "64" ]; then
|
||||||
echo " install_steps: failed to generate PLC rotation key (KEY_HEX length=${#KEY_HEX})" >&2
|
echo " install_steps: failed to generate PLC rotation key (KEY_HEX length=${#KEY_HEX})" >&2
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Insert via abra under TTY-wrap (`abra app secret insert` requires a TTY on this version).
|
# Insert via abra under TTY-wrap (`abra app secret insert` requires a TTY on this version).
|
||||||
# We DON'T log the key value — abra also doesn't print it.
|
# We DON'T log the key value — abra also doesn't print it.
|
||||||
script -qec "abra app secret insert ${CCCI_APP_DOMAIN} pds_plc_rotation_key v1 ${KEY_HEX} --no-input" /dev/null \
|
script -qec "abra app secret insert ${CCCI_APP_DOMAIN} pds_plc_rotation_key v1 ${KEY_HEX} --no-input" /dev/null \
|
||||||
>/dev/null 2>&1
|
>/dev/null 2>&1
|
||||||
echo " bluesky-pds install_steps: PLC rotation key inserted (v1)."
|
echo " bluesky-pds install_steps: PLC rotation key inserted (v1)."
|
||||||
|
|||||||
@ -11,6 +11,6 @@ import _p4 # noqa: E402
|
|||||||
|
|
||||||
|
|
||||||
def test_restore_returns_state(live_app):
|
def test_restore_returns_state(live_app):
|
||||||
assert _p4.account_exists(live_app), (
|
assert _p4.account_exists(
|
||||||
"restore did not bring back the seeded marker account (PDS data did not survive restore)"
|
live_app
|
||||||
)
|
), "restore did not bring back the seeded marker account (PDS data did not survive restore)"
|
||||||
|
|||||||
108
tests/concurrency/concutil.py
Normal file
108
tests/concurrency/concutil.py
Normal file
@ -0,0 +1,108 @@
|
|||||||
|
"""Shared utilities for the real-kernel concurrency suite (imported by the test modules; the
|
||||||
|
fixtures in conftest.py wrap these). No flock mocking anywhere — probes use real LOCK_NB."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import contextlib
|
||||||
|
import fcntl
|
||||||
|
import os
|
||||||
|
import signal
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
|
from harness import lifecycle # noqa: E402
|
||||||
|
|
||||||
|
HELPERS = os.path.join(os.path.dirname(__file__), "helpers.py")
|
||||||
|
DOMAIN = "test-abc123.ci.commoninternet.net" # matches RUN_APP_RE
|
||||||
|
|
||||||
|
|
||||||
|
class HelperPool:
|
||||||
|
"""Spawns helpers.py subprocesses and GUARANTEES their cleanup (incl. recorded grandchild
|
||||||
|
pids from `hold-with-child`/`wrapper` markers) — no leaked children in the test VM."""
|
||||||
|
|
||||||
|
def __init__(self, out_dir: str):
|
||||||
|
self.out_dir = out_dir
|
||||||
|
self.procs: list[subprocess.Popen] = []
|
||||||
|
self.extra_pids: list[int] = []
|
||||||
|
self._n = 0
|
||||||
|
|
||||||
|
def spawn(self, *args: str, env_extra: dict | None = None) -> tuple[subprocess.Popen, str]:
|
||||||
|
"""Start `helpers.py <args...>`; returns (proc, marker_file)."""
|
||||||
|
self._n += 1
|
||||||
|
out = os.path.join(self.out_dir, f"helper-{self._n}.out")
|
||||||
|
env = dict(os.environ, CCCI_HELPER_OUT=out, **(env_extra or {}))
|
||||||
|
p = subprocess.Popen( # noqa: S603
|
||||||
|
[sys.executable, HELPERS, *args],
|
||||||
|
env=env,
|
||||||
|
stdout=subprocess.DEVNULL,
|
||||||
|
stderr=subprocess.STDOUT,
|
||||||
|
)
|
||||||
|
self.procs.append(p)
|
||||||
|
return p, out
|
||||||
|
|
||||||
|
def track_pid(self, pid: int) -> None:
|
||||||
|
self.extra_pids.append(pid)
|
||||||
|
|
||||||
|
def cleanup(self) -> None:
|
||||||
|
for p in self.procs:
|
||||||
|
if p.poll() is None:
|
||||||
|
p.kill()
|
||||||
|
with contextlib.suppress(subprocess.TimeoutExpired):
|
||||||
|
p.wait(timeout=10)
|
||||||
|
for pid in self.extra_pids:
|
||||||
|
with contextlib.suppress(OSError):
|
||||||
|
os.kill(pid, signal.SIGKILL)
|
||||||
|
|
||||||
|
|
||||||
|
def wait_marker(out: str, token: str, timeout: float = 15.0) -> str | None:
|
||||||
|
"""Poll a helper's marker file for a line containing `token`; returns the line or None."""
|
||||||
|
deadline = time.time() + timeout
|
||||||
|
while time.time() < deadline:
|
||||||
|
try:
|
||||||
|
with open(out) as f:
|
||||||
|
for line in f:
|
||||||
|
if token in line:
|
||||||
|
return line.strip()
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
time.sleep(0.1)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def lock_state(domain: str) -> str:
|
||||||
|
"""'held' | 'free' | 'absent' for the domain's lockfile, probed with a REAL LOCK_NB."""
|
||||||
|
path = lifecycle._app_lock_path(domain) # noqa: SLF001
|
||||||
|
if not os.path.exists(path):
|
||||||
|
return "absent"
|
||||||
|
with open(path, "a") as f:
|
||||||
|
try:
|
||||||
|
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
||||||
|
return "free"
|
||||||
|
except BlockingIOError:
|
||||||
|
return "held"
|
||||||
|
|
||||||
|
|
||||||
|
def wait_lock_state(domain: str, want: str, timeout: float = 10.0) -> str:
|
||||||
|
"""Poll until lock_state(domain) == want (kernel release on process death is fast, but give
|
||||||
|
the scheduler room). Returns the final observed state."""
|
||||||
|
deadline = time.time() + timeout
|
||||||
|
state = lock_state(domain)
|
||||||
|
while state != want and time.time() < deadline:
|
||||||
|
time.sleep(0.1)
|
||||||
|
state = lock_state(domain)
|
||||||
|
return state
|
||||||
|
|
||||||
|
|
||||||
|
def pid_alive(pid: int) -> bool:
|
||||||
|
return os.path.exists(f"/proc/{pid}")
|
||||||
|
|
||||||
|
|
||||||
|
def wait_pid_gone(pid: int, timeout: float = 15.0) -> bool:
|
||||||
|
deadline = time.time() + timeout
|
||||||
|
while time.time() < deadline:
|
||||||
|
if not pid_alive(pid):
|
||||||
|
return True
|
||||||
|
time.sleep(0.1)
|
||||||
|
return False
|
||||||
34
tests/concurrency/conftest.py
Normal file
34
tests/concurrency/conftest.py
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
"""Fixtures for the real-kernel concurrency suite (concurrency-restructure plan, 19 cases).
|
||||||
|
|
||||||
|
NOT part of the default `pytest tests/unit` gate — run explicitly with `pytest tests/concurrency
|
||||||
|
-q` (docs/concurrency.md). Locks live in a per-test tmp dir (CCCI_APP_LOCK_DIR); helper
|
||||||
|
subprocesses hold REAL flocks / install the REAL prctl+signal guards and are always reaped in
|
||||||
|
fixture finalizers (no leaked children in the test VM).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(__file__))
|
||||||
|
from concutil import HelperPool # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def lock_dir(tmp_path, monkeypatch):
|
||||||
|
"""Sandbox lock dir, exported so BOTH this process's lifecycle calls and helper subprocesses
|
||||||
|
(which inherit os.environ) resolve their lockfiles here — never /run/lock."""
|
||||||
|
d = tmp_path / "locks"
|
||||||
|
d.mkdir()
|
||||||
|
monkeypatch.setenv("CCCI_APP_LOCK_DIR", str(d))
|
||||||
|
return str(d)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def pool(tmp_path):
|
||||||
|
hp = HelperPool(str(tmp_path))
|
||||||
|
yield hp
|
||||||
|
hp.cleanup()
|
||||||
149
tests/concurrency/helpers.py
Normal file
149
tests/concurrency/helpers.py
Normal file
@ -0,0 +1,149 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Subprocess helpers for tests/concurrency — REAL kernel locks and the REAL lifetime guards in
|
||||||
|
separate processes (flock/prctl are never mocked; tests assert on actual kernel behavior).
|
||||||
|
|
||||||
|
Invoked as: python3 helpers.py <command> <args...>
|
||||||
|
|
||||||
|
Env contract (set by the spawning test):
|
||||||
|
CCCI_APP_LOCK_DIR sandbox lock dir (never /run/lock in tests)
|
||||||
|
CCCI_HELPER_OUT marker file this helper APPENDS progress lines to (ACQUIRED/READY/...)
|
||||||
|
|
||||||
|
Commands:
|
||||||
|
hold <domain> acquire the app lock, mark `ACQUIRED <ts>`, sleep forever
|
||||||
|
hold-with-child <domain> acquire the lock, spawn a plain sleeping subprocess child, mark
|
||||||
|
`ACQUIRED <ts>` + `CHILD <pid>` (PEP 446: the child must NOT
|
||||||
|
inherit the lock fd), sleep forever
|
||||||
|
guarded <domain> <deadline> install the REAL lifetime guards (alarm=<deadline>s), acquire the
|
||||||
|
lock, mark `READY`; when the teardown funnel runs (`finally:`),
|
||||||
|
mark `TEARDOWN` before exiting
|
||||||
|
wrapper <domain> spawn `guarded <domain> 3600` as MY child, mark `WRAPPED <pid>`,
|
||||||
|
sleep — the test kills me to prove PDEATHSIG TERMs the child
|
||||||
|
orphan-probe wait (bounded) until reparented (ppid==1), then install the
|
||||||
|
guards; mark `REFUSED` if they exit (expected) or `GUARDS_OK`
|
||||||
|
fetch-checkout <recipe> <ref> run run_recipe_ci.fetch_recipe (the test sets CCCI_SKIP_FETCH=1
|
||||||
|
+ a per-"run" ABRA_DIR), git-checkout <ref>, mark
|
||||||
|
`RESULT <head> <data.txt content>`
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "runner"))
|
||||||
|
from harness import abra, lifecycle, lifetime # noqa: E402
|
||||||
|
|
||||||
|
OUT = os.environ.get("CCCI_HELPER_OUT")
|
||||||
|
|
||||||
|
|
||||||
|
def mark(line: str) -> None:
|
||||||
|
if OUT:
|
||||||
|
with open(OUT, "a") as f:
|
||||||
|
f.write(line + "\n")
|
||||||
|
f.flush()
|
||||||
|
print(line, flush=True)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_hold(domain: str) -> None:
|
||||||
|
lifecycle.acquire_app_lock(domain)
|
||||||
|
mark(f"ACQUIRED {time.time()}")
|
||||||
|
time.sleep(3600)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_hold_with_child(domain: str) -> None:
|
||||||
|
lifecycle.acquire_app_lock(domain)
|
||||||
|
child = subprocess.Popen([sys.executable, "-c", "import time; time.sleep(3600)"])
|
||||||
|
mark(f"ACQUIRED {time.time()}")
|
||||||
|
mark(f"CHILD {child.pid}")
|
||||||
|
time.sleep(3600)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_guarded(domain: str, deadline: str) -> None:
|
||||||
|
lifetime.install_lifetime_guards(deadline_seconds=int(deadline))
|
||||||
|
lifecycle.acquire_app_lock(domain)
|
||||||
|
mark("READY")
|
||||||
|
try:
|
||||||
|
time.sleep(3600)
|
||||||
|
finally:
|
||||||
|
mark("TEARDOWN")
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_wrapper(domain: str) -> None:
|
||||||
|
p = subprocess.Popen( # noqa: S603
|
||||||
|
[sys.executable, os.path.abspath(__file__), "guarded", domain, "3600"],
|
||||||
|
env=os.environ.copy(),
|
||||||
|
)
|
||||||
|
mark(f"WRAPPED {p.pid}")
|
||||||
|
time.sleep(3600)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_orphan_probe() -> None:
|
||||||
|
# Our spawner exits immediately after fork; wait (bounded) until we are reparented so the
|
||||||
|
# prctl is installed with the parent ALREADY dead — the exact race the ppid check closes.
|
||||||
|
for _ in range(200):
|
||||||
|
if os.getppid() == 1:
|
||||||
|
break
|
||||||
|
time.sleep(0.05)
|
||||||
|
else:
|
||||||
|
mark("NEVER_REPARENTED") # e.g. a subreaper environment — test will fail visibly
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
lifetime.install_lifetime_guards()
|
||||||
|
except SystemExit:
|
||||||
|
mark("REFUSED")
|
||||||
|
raise
|
||||||
|
mark("GUARDS_OK")
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_fetch_checkout(recipe: str, ref: str) -> None:
|
||||||
|
import run_recipe_ci
|
||||||
|
|
||||||
|
run_recipe_ci.fetch_recipe(recipe, None, None)
|
||||||
|
abra.recipe_checkout(recipe, ref)
|
||||||
|
head = abra.recipe_head_commit(recipe)
|
||||||
|
with open(os.path.join(abra.recipe_dir(recipe), "data.txt")) as f:
|
||||||
|
content = f.read().strip()
|
||||||
|
mark(f"RESULT {head} {content}")
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_deploy_count_run(domain: str, gate: str) -> None:
|
||||||
|
"""Mirror the REAL run flow for the DG4.1 counter (CONC-A1 regression): countfile init
|
||||||
|
(main() preamble) → _record_deploy (deploy_app fires it BEFORE the app lock) → acquire
|
||||||
|
the app lock → wait for `gate` (file path; '' = no wait) → read + remove own countfile.
|
||||||
|
Two of these on the SAME domain must each see COUNT 1 and never lose their file."""
|
||||||
|
import run_recipe_ci
|
||||||
|
|
||||||
|
countfile = run_recipe_ci._run_state_path("deploys")
|
||||||
|
with open(countfile, "w") as f:
|
||||||
|
f.write("0")
|
||||||
|
os.environ["CCCI_DEPLOY_COUNT_FILE"] = countfile
|
||||||
|
lifecycle._record_deploy() # pre-lock, exactly like lifecycle.deploy_app()
|
||||||
|
mark("PRELOCK")
|
||||||
|
lifecycle.acquire_app_lock(domain)
|
||||||
|
mark("ACQUIRED")
|
||||||
|
if gate:
|
||||||
|
deadline = time.time() + 15
|
||||||
|
while not os.path.exists(gate) and time.time() < deadline:
|
||||||
|
time.sleep(0.05)
|
||||||
|
try:
|
||||||
|
with open(countfile) as f:
|
||||||
|
n = int(f.read().strip() or "0")
|
||||||
|
os.remove(countfile)
|
||||||
|
mark(f"COUNT {n}")
|
||||||
|
except FileNotFoundError:
|
||||||
|
mark("COUNT_FILE_MISSING")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
cmd, *args = sys.argv[1:]
|
||||||
|
{
|
||||||
|
"hold": cmd_hold,
|
||||||
|
"hold-with-child": cmd_hold_with_child,
|
||||||
|
"guarded": cmd_guarded,
|
||||||
|
"wrapper": cmd_wrapper,
|
||||||
|
"orphan-probe": cmd_orphan_probe,
|
||||||
|
"fetch-checkout": cmd_fetch_checkout,
|
||||||
|
"deploy-count-run": cmd_deploy_count_run,
|
||||||
|
}[cmd](*args)
|
||||||
175
tests/concurrency/test_abra_dir.py
Normal file
175
tests/concurrency/test_abra_dir.py
Normal file
@ -0,0 +1,175 @@
|
|||||||
|
"""Per-run ABRA_DIR isolation (concurrency-restructure plan, cases 17-19). Real directories,
|
||||||
|
real symlinks, real git — abra itself is replaced by a recording stub where a CLI call is
|
||||||
|
involved (case 17), because these cases test OUR dir/env plumbing, not abra."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import stat
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(__file__))
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
|
import run_recipe_ci # noqa: E402
|
||||||
|
from concutil import wait_marker # noqa: E402
|
||||||
|
from harness import abra # noqa: E402
|
||||||
|
|
||||||
|
RECIPE = "fakerecipe"
|
||||||
|
|
||||||
|
|
||||||
|
def _git(cwd, *args):
|
||||||
|
subprocess.run(
|
||||||
|
["git", "-c", "user.email=t@t", "-c", "user.name=t", *args],
|
||||||
|
cwd=cwd,
|
||||||
|
check=True,
|
||||||
|
capture_output=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _make_fake_home(tmp_path):
|
||||||
|
"""A fake $HOME with a canonical ~/.abra: servers/default + catalogue dirs, and a recipe git
|
||||||
|
repo with two tags whose data.txt differs (v1 -> 'one', v2 -> 'two', HEAD at v2)."""
|
||||||
|
home = tmp_path / "home"
|
||||||
|
(home / ".abra" / "servers" / "default").mkdir(parents=True)
|
||||||
|
(home / ".abra" / "catalogue").mkdir(parents=True)
|
||||||
|
repo = home / ".abra" / "recipes" / RECIPE
|
||||||
|
repo.mkdir(parents=True)
|
||||||
|
_git(repo, "init", "-q")
|
||||||
|
(repo / "data.txt").write_text("one\n")
|
||||||
|
_git(repo, "add", "data.txt")
|
||||||
|
_git(repo, "commit", "-qm", "v1")
|
||||||
|
_git(repo, "tag", "v1")
|
||||||
|
(repo / "data.txt").write_text("two\n")
|
||||||
|
_git(repo, "add", "data.txt")
|
||||||
|
_git(repo, "commit", "-qm", "v2")
|
||||||
|
_git(repo, "tag", "v2")
|
||||||
|
return home
|
||||||
|
|
||||||
|
|
||||||
|
def test_17_per_run_dir_built_and_exported_before_abra(tmp_path, monkeypatch):
|
||||||
|
"""Case 17: setup_run_abra_dir builds the per-run dir correctly (servers/catalogue symlinks
|
||||||
|
resolve to the canonical tree, recipes/ empty + writable) and $ABRA_DIR is exported before
|
||||||
|
the first abra call — proven by a stub `abra` on PATH that records the env it saw."""
|
||||||
|
home = _make_fake_home(tmp_path)
|
||||||
|
monkeypatch.setenv("HOME", str(home))
|
||||||
|
monkeypatch.setenv("CCCI_RUNS_DIR", str(tmp_path / "runs"))
|
||||||
|
monkeypatch.setenv("DRONE_BUILD_NUMBER", "777")
|
||||||
|
monkeypatch.setenv("ABRA_DIR", "sentinel-to-be-overwritten") # so monkeypatch restores it
|
||||||
|
|
||||||
|
d = run_recipe_ci.setup_run_abra_dir()
|
||||||
|
assert d == str(tmp_path / "runs" / "777" / "abra")
|
||||||
|
assert os.environ["ABRA_DIR"] == d
|
||||||
|
assert os.readlink(os.path.join(d, "servers")) == str(home / ".abra" / "servers")
|
||||||
|
assert os.readlink(os.path.join(d, "catalogue")) == str(home / ".abra" / "catalogue")
|
||||||
|
# symlinks RESOLVE (targets exist) and recipes/ is empty + writable
|
||||||
|
assert os.path.isdir(os.path.join(d, "servers", "default"))
|
||||||
|
assert os.path.isdir(os.path.join(d, "catalogue"))
|
||||||
|
assert os.listdir(os.path.join(d, "recipes")) == []
|
||||||
|
probe = os.path.join(d, "recipes", ".write-probe")
|
||||||
|
open(probe, "w").close()
|
||||||
|
os.remove(probe)
|
||||||
|
# idempotent re-entry (Drone build-number retry): must not raise on existing symlinks
|
||||||
|
assert run_recipe_ci.setup_run_abra_dir() == d
|
||||||
|
|
||||||
|
# stub abra records $ABRA_DIR at call time; fetch_recipe's catalogue branch invokes it
|
||||||
|
stub_dir = tmp_path / "bin"
|
||||||
|
stub_dir.mkdir()
|
||||||
|
log = tmp_path / "abra-env.log"
|
||||||
|
stub = stub_dir / "abra"
|
||||||
|
stub.write_text(f'#!/bin/sh\necho "$ABRA_DIR" >> {log}\nexit 0\n')
|
||||||
|
stub.chmod(stub.stat().st_mode | stat.S_IEXEC)
|
||||||
|
monkeypatch.setenv("PATH", f"{stub_dir}{os.pathsep}{os.environ['PATH']}")
|
||||||
|
monkeypatch.delenv("CCCI_SKIP_FETCH", raising=False)
|
||||||
|
run_recipe_ci.fetch_recipe(RECIPE, None, None)
|
||||||
|
assert log.read_text().strip() == d, "abra was called without the per-run ABRA_DIR exported"
|
||||||
|
|
||||||
|
|
||||||
|
def test_18_concurrent_same_recipe_fetch_no_cross_talk(tmp_path, monkeypatch, pool):
|
||||||
|
"""Case 18: two CONCURRENT fetch+checkout flows of the SAME recipe into different ABRA_DIRs
|
||||||
|
produce two correct, divergent trees (v1 vs v2) — the old shared-tree corruption scenario,
|
||||||
|
now structurally safe with no lock. The canonical staged clone is untouched."""
|
||||||
|
home = _make_fake_home(tmp_path)
|
||||||
|
canonical_repo = home / ".abra" / "recipes" / RECIPE
|
||||||
|
head_before = subprocess.run(
|
||||||
|
["git", "-C", canonical_repo, "rev-parse", "HEAD"], capture_output=True, text=True
|
||||||
|
).stdout.strip()
|
||||||
|
|
||||||
|
runs = {}
|
||||||
|
for name, ref in (("runA", "v1"), ("runB", "v2")):
|
||||||
|
abra_dir = tmp_path / name / "abra"
|
||||||
|
abra_dir.mkdir(parents=True)
|
||||||
|
_, out = pool.spawn(
|
||||||
|
"fetch-checkout",
|
||||||
|
RECIPE,
|
||||||
|
ref,
|
||||||
|
env_extra={
|
||||||
|
"HOME": str(home),
|
||||||
|
"ABRA_DIR": str(abra_dir),
|
||||||
|
"CCCI_SKIP_FETCH": "1",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
runs[name] = (out, ref, abra_dir)
|
||||||
|
|
||||||
|
expect = {"v1": "one", "v2": "two"}
|
||||||
|
for name, (out, ref, abra_dir) in runs.items():
|
||||||
|
line = wait_marker(out, "RESULT", timeout=30)
|
||||||
|
assert line, f"{name} never produced a RESULT"
|
||||||
|
_, head, content = line.split()
|
||||||
|
assert content == expect[ref], f"{name}@{ref}: tree content {content!r}"
|
||||||
|
tree = abra_dir / "recipes" / RECIPE
|
||||||
|
assert (tree / "data.txt").read_text().strip() == expect[ref]
|
||||||
|
assert (
|
||||||
|
head
|
||||||
|
== subprocess.run(
|
||||||
|
["git", "-C", tree, "rev-parse", "HEAD"], capture_output=True, text=True
|
||||||
|
).stdout.strip()
|
||||||
|
)
|
||||||
|
|
||||||
|
# the two trees genuinely diverge AND the canonical staged clone is untouched
|
||||||
|
a = (runs["runA"][2] / "recipes" / RECIPE / "data.txt").read_text()
|
||||||
|
b = (runs["runB"][2] / "recipes" / RECIPE / "data.txt").read_text()
|
||||||
|
assert a != b
|
||||||
|
head_after = subprocess.run(
|
||||||
|
["git", "-C", canonical_repo, "rev-parse", "HEAD"], capture_output=True, text=True
|
||||||
|
).stdout.strip()
|
||||||
|
assert head_after == head_before, "canonical clone must not be touched by per-run fetches"
|
||||||
|
|
||||||
|
|
||||||
|
def test_19_env_written_through_servers_symlink_lands_canonical(tmp_path, monkeypatch):
|
||||||
|
"""Case 19: an app .env written through the per-run servers/ symlink (what abra does under
|
||||||
|
$ABRA_DIR) lands in the CANONICAL shared path — so janitor discovery and every
|
||||||
|
expanduser('~/.abra/servers/...') reader keep working unchanged."""
|
||||||
|
home = _make_fake_home(tmp_path)
|
||||||
|
monkeypatch.setenv("HOME", str(home))
|
||||||
|
monkeypatch.setenv("CCCI_RUNS_DIR", str(tmp_path / "runs"))
|
||||||
|
monkeypatch.setenv("DRONE_BUILD_NUMBER", "778")
|
||||||
|
monkeypatch.setenv("ABRA_DIR", "sentinel-to-be-overwritten")
|
||||||
|
d = run_recipe_ci.setup_run_abra_dir()
|
||||||
|
|
||||||
|
domain = "test-abc123.ci.commoninternet.net"
|
||||||
|
via_symlink = os.path.join(d, "servers", "default", f"{domain}.env")
|
||||||
|
with open(via_symlink, "w") as f:
|
||||||
|
f.write("TYPE=fakerecipe:1.0.0\nDOMAIN=placeholder\n")
|
||||||
|
|
||||||
|
canonical = home / ".abra" / "servers" / "default" / f"{domain}.env"
|
||||||
|
assert canonical.is_file(), ".env written via the symlink must land in the canonical path"
|
||||||
|
# the canonical-path readers/writers (abra.env_get/env_set use ~/.abra) see the same file
|
||||||
|
assert abra.env_get(domain, "TYPE") == "fakerecipe:1.0.0"
|
||||||
|
abra.env_set(domain, "DOMAIN", domain)
|
||||||
|
with open(via_symlink) as f:
|
||||||
|
assert f"DOMAIN={domain}" in f.read()
|
||||||
|
|
||||||
|
|
||||||
|
def test_18b_run_id_manual_fallback_is_per_process(tmp_path, monkeypatch):
|
||||||
|
"""Companion to case 18: two concurrent MANUAL runs (no DRONE_BUILD_NUMBER) must not share an
|
||||||
|
abra dir either — the manual fallback is pid-suffixed."""
|
||||||
|
home = _make_fake_home(tmp_path)
|
||||||
|
monkeypatch.setenv("HOME", str(home))
|
||||||
|
monkeypatch.setenv("CCCI_RUNS_DIR", str(tmp_path / "runs"))
|
||||||
|
monkeypatch.delenv("DRONE_BUILD_NUMBER", raising=False)
|
||||||
|
monkeypatch.delenv("CCCI_APP_DOMAIN", raising=False)
|
||||||
|
monkeypatch.delenv("CCCI_RUN_ID", raising=False)
|
||||||
|
monkeypatch.setenv("ABRA_DIR", "sentinel-to-be-overwritten")
|
||||||
|
d = run_recipe_ci.setup_run_abra_dir()
|
||||||
|
assert f"manual-{os.getpid()}" in d
|
||||||
189
tests/concurrency/test_janitor.py
Normal file
189
tests/concurrency/test_janitor.py
Normal file
@ -0,0 +1,189 @@
|
|||||||
|
"""Janitor / flock-probe semantics (concurrency-restructure plan, cases 5-12).
|
||||||
|
|
||||||
|
The janitor runs IN-PROCESS with its discovery monkeypatched (candidates injected via a stubbed
|
||||||
|
abra.app_ls + empty docker sweep) and teardown_app stubbed to record calls — but the LOCKS are
|
||||||
|
real kernel flocks, held by real helper subprocesses where a live owner is needed."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(__file__))
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
|
from concutil import DOMAIN, lock_state, wait_marker # noqa: E402
|
||||||
|
from harness import lifecycle # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _inject_candidates(monkeypatch, domains):
|
||||||
|
"""Point janitor discovery at exactly `domains`: abra lists them, docker sweep is empty.
|
||||||
|
teardown_app is stubbed to a recorder; returns the calls list."""
|
||||||
|
calls = []
|
||||||
|
monkeypatch.setattr(lifecycle.abra, "app_ls", lambda: [{"appName": d} for d in domains])
|
||||||
|
monkeypatch.setattr(lifecycle, "_docker_names", lambda kind, stack: [])
|
||||||
|
monkeypatch.setattr(lifecycle, "teardown_app", lambda d, verify=True: calls.append(d))
|
||||||
|
return calls
|
||||||
|
|
||||||
|
|
||||||
|
def test_5_orphan_reaped_lockfile_unlinked(lock_dir, pool, monkeypatch):
|
||||||
|
"""Case 5: an orphan (lockfile exists, no holder — its run was SIGKILL'd) is reaped exactly
|
||||||
|
once and its lockfile unlinked."""
|
||||||
|
p, out = pool.spawn("hold", DOMAIN)
|
||||||
|
assert wait_marker(out, "ACQUIRED")
|
||||||
|
p.kill()
|
||||||
|
p.wait(timeout=10)
|
||||||
|
calls = _inject_candidates(monkeypatch, [DOMAIN])
|
||||||
|
lifecycle.janitor()
|
||||||
|
assert calls == [DOMAIN], f"teardown calls: {calls} (expected exactly one)"
|
||||||
|
assert lock_state(DOMAIN) == "absent", "reaped orphan's lockfile must be unlinked"
|
||||||
|
|
||||||
|
|
||||||
|
def test_6_live_run_never_reaped(lock_dir, pool, monkeypatch, capsys):
|
||||||
|
"""Case 6: a held lock (live helper) is never reaped and is logged as live."""
|
||||||
|
p, out = pool.spawn("hold", DOMAIN)
|
||||||
|
assert wait_marker(out, "ACQUIRED")
|
||||||
|
calls = _inject_candidates(monkeypatch, [DOMAIN])
|
||||||
|
lifecycle.janitor()
|
||||||
|
assert calls == []
|
||||||
|
assert "live concurrent run" in capsys.readouterr().out
|
||||||
|
assert lock_state(DOMAIN) == "held"
|
||||||
|
|
||||||
|
|
||||||
|
def test_7_new_run_blocks_until_reap_finishes(lock_dir, pool, monkeypatch):
|
||||||
|
"""Case 7: the janitor reaps WHILE HOLDING the probe lock, so a new run of the same domain
|
||||||
|
blocks in acquire_app_lock until the reap completes — no window where a fresh app coexists
|
||||||
|
with a half-reaped one."""
|
||||||
|
# Make an orphan.
|
||||||
|
p, out = pool.spawn("hold", DOMAIN)
|
||||||
|
assert wait_marker(out, "ACQUIRED")
|
||||||
|
p.kill()
|
||||||
|
p.wait(timeout=10)
|
||||||
|
|
||||||
|
state = {"teardown_end": None, "acquirer_out": None}
|
||||||
|
|
||||||
|
def slow_teardown(domain, verify=True):
|
||||||
|
# While the janitor holds the probe lock mid-reap, a new run starts acquiring.
|
||||||
|
_, aout = pool.spawn("hold", DOMAIN)
|
||||||
|
state["acquirer_out"] = aout
|
||||||
|
time.sleep(2.0)
|
||||||
|
state["teardown_end"] = time.time()
|
||||||
|
|
||||||
|
monkeypatch.setattr(lifecycle.abra, "app_ls", lambda: [{"appName": DOMAIN}])
|
||||||
|
monkeypatch.setattr(lifecycle, "_docker_names", lambda kind, stack: [])
|
||||||
|
monkeypatch.setattr(lifecycle, "teardown_app", slow_teardown)
|
||||||
|
lifecycle.janitor()
|
||||||
|
|
||||||
|
line = wait_marker(state["acquirer_out"], "ACQUIRED", timeout=15)
|
||||||
|
assert line, "new run never acquired after the reap"
|
||||||
|
acquired_ts = float(line.split()[1])
|
||||||
|
assert (
|
||||||
|
acquired_ts >= state["teardown_end"]
|
||||||
|
), f"new run acquired at {acquired_ts} BEFORE the reap finished at {state['teardown_end']}"
|
||||||
|
# The new run must hold a lock the next probe can SEE (fresh inode at the path).
|
||||||
|
assert lock_state(DOMAIN) == "held"
|
||||||
|
|
||||||
|
|
||||||
|
def test_8_two_janitors_exactly_one_reaps(lock_dir, pool, monkeypatch):
|
||||||
|
"""Case 8: two concurrent janitors arbitrate on the probe flock — exactly one reaps (the
|
||||||
|
other sees 'held' and leaves). Teardown is slowed so the runs genuinely overlap."""
|
||||||
|
p, out = pool.spawn("hold", DOMAIN)
|
||||||
|
assert wait_marker(out, "ACQUIRED")
|
||||||
|
p.kill()
|
||||||
|
p.wait(timeout=10)
|
||||||
|
|
||||||
|
calls = []
|
||||||
|
calls_lock = threading.Lock()
|
||||||
|
|
||||||
|
def slow_teardown(domain, verify=True):
|
||||||
|
with calls_lock:
|
||||||
|
calls.append(domain)
|
||||||
|
time.sleep(2.0)
|
||||||
|
|
||||||
|
monkeypatch.setattr(lifecycle.abra, "app_ls", lambda: [{"appName": DOMAIN}])
|
||||||
|
monkeypatch.setattr(lifecycle, "_docker_names", lambda kind, stack: [])
|
||||||
|
monkeypatch.setattr(lifecycle, "teardown_app", slow_teardown)
|
||||||
|
|
||||||
|
barrier = threading.Barrier(2)
|
||||||
|
|
||||||
|
def run_janitor():
|
||||||
|
barrier.wait()
|
||||||
|
lifecycle.janitor()
|
||||||
|
|
||||||
|
t1, t2 = threading.Thread(target=run_janitor), threading.Thread(target=run_janitor)
|
||||||
|
t1.start(), t2.start()
|
||||||
|
t1.join(timeout=30), t2.join(timeout=30)
|
||||||
|
assert calls == [DOMAIN], f"expected exactly one reap, got {calls}"
|
||||||
|
assert lock_state(DOMAIN) == "absent"
|
||||||
|
|
||||||
|
|
||||||
|
def test_9_reboot_lockfile_absent_reaped_immediately(lock_dir, monkeypatch):
|
||||||
|
"""Case 9: post-reboot simulation — the app exists but its lockfile is gone (/run/lock is
|
||||||
|
tmpfs). The probe trivially acquires -> immediate reap, NO age threshold (improvement over
|
||||||
|
the old 2h fallback)."""
|
||||||
|
assert lock_state(DOMAIN) == "absent"
|
||||||
|
calls = _inject_candidates(monkeypatch, [DOMAIN])
|
||||||
|
t0 = time.time()
|
||||||
|
lifecycle.janitor()
|
||||||
|
assert calls == [DOMAIN]
|
||||||
|
assert time.time() - t0 < 5, "reap must be immediate (no age wait)"
|
||||||
|
|
||||||
|
|
||||||
|
def test_10_long_held_lock_flagged_never_stolen(lock_dir, pool, monkeypatch, capsys):
|
||||||
|
"""Case 10: a lock held with mtime older than 120min is flagged as a possible leaked run —
|
||||||
|
and NOT reaped (never steal a held lock)."""
|
||||||
|
p, out = pool.spawn("hold", DOMAIN)
|
||||||
|
assert wait_marker(out, "ACQUIRED")
|
||||||
|
path = lifecycle._app_lock_path(DOMAIN) # noqa: SLF001
|
||||||
|
backdate = time.time() - (130 * 60)
|
||||||
|
os.utime(path, (backdate, backdate))
|
||||||
|
calls = _inject_candidates(monkeypatch, [DOMAIN])
|
||||||
|
lifecycle.janitor()
|
||||||
|
assert calls == []
|
||||||
|
out_text = capsys.readouterr().out
|
||||||
|
assert "possible leaked run" in out_text and "lslocks" in out_text
|
||||||
|
assert lock_state(DOMAIN) == "held"
|
||||||
|
|
||||||
|
|
||||||
|
def test_11_warm_canonical_names_never_probed(lock_dir, monkeypatch):
|
||||||
|
"""Case 11: RUN_APP_RE allowlist — warm/canonical-shaped names never become candidates, so
|
||||||
|
they are never probed (no lockfile is even created for them) and never reaped."""
|
||||||
|
warmish = [
|
||||||
|
"warm-keycloak.ci.commoninternet.net",
|
||||||
|
"keycloak.ci.commoninternet.net",
|
||||||
|
"warm-hedgedoc.ci.commoninternet.net",
|
||||||
|
"drone.ci.commoninternet.net",
|
||||||
|
]
|
||||||
|
calls = []
|
||||||
|
monkeypatch.setattr(lifecycle.abra, "app_ls", lambda: [{"appName": d} for d in warmish])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
lifecycle,
|
||||||
|
"_docker_names",
|
||||||
|
lambda kind, stack: ["warm-keycloak_ci_commoninternet_net_app"]
|
||||||
|
if kind == "service"
|
||||||
|
else [],
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(lifecycle, "teardown_app", lambda d, verify=True: calls.append(d))
|
||||||
|
lifecycle.janitor()
|
||||||
|
assert calls == []
|
||||||
|
lockdir = os.environ["CCCI_APP_LOCK_DIR"]
|
||||||
|
assert [
|
||||||
|
f for f in os.listdir(lockdir) if f.startswith("cc-ci-app-")
|
||||||
|
] == [], "janitor must not create lockfiles for non-run-app names"
|
||||||
|
|
||||||
|
|
||||||
|
def test_12_degrades_safely_on_bad_lockfile_and_missing_dir(lock_dir, monkeypatch, capsys):
|
||||||
|
"""Case 12: a garbled/unopenable lockfile (here: a DIRECTORY at the lockfile path) is skipped
|
||||||
|
with a log line; a missing lock dir doesn't crash the janitor either. Never a crash."""
|
||||||
|
path = lifecycle._app_lock_path(DOMAIN) # noqa: SLF001
|
||||||
|
os.makedirs(path) # open(path, "a") -> IsADirectoryError (an OSError)
|
||||||
|
calls = _inject_candidates(monkeypatch, [DOMAIN])
|
||||||
|
lifecycle.janitor() # must not raise
|
||||||
|
assert calls == []
|
||||||
|
assert "skipping" in capsys.readouterr().out
|
||||||
|
|
||||||
|
os.rmdir(path)
|
||||||
|
monkeypatch.setenv("CCCI_APP_LOCK_DIR", os.path.join(os.environ["CCCI_APP_LOCK_DIR"], "gone"))
|
||||||
|
lifecycle.janitor() # missing dir: probe open fails -> skip; tidy glob -> empty. No crash.
|
||||||
|
assert calls == []
|
||||||
82
tests/concurrency/test_lifetime.py
Normal file
82
tests/concurrency/test_lifetime.py
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
"""Lifetime hardening (concurrency-restructure plan, cases 13-16): the REAL prctl/signal/alarm
|
||||||
|
guards installed by helper subprocesses; tests assert teardown ran, exit was non-zero, and the
|
||||||
|
lock was released."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import signal
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(__file__))
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
|
from concutil import ( # noqa: E402
|
||||||
|
DOMAIN,
|
||||||
|
wait_lock_state,
|
||||||
|
wait_marker,
|
||||||
|
wait_pid_gone,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_13_pdeathsig_parent_kill_terms_harness(lock_dir, pool):
|
||||||
|
"""Case 13: wrapper-parent spawns a guarded harness-child; the parent is SIGKILL'd (the
|
||||||
|
harness gets no courtesy signal) -> the kernel's PDEATHSIG TERMs the child, its teardown
|
||||||
|
funnel runs, it exits, and the lock is released."""
|
||||||
|
p, out = pool.spawn("wrapper", DOMAIN)
|
||||||
|
line = wait_marker(out, "WRAPPED")
|
||||||
|
assert line, "wrapper never spawned its child"
|
||||||
|
child_pid = int(line.split()[1])
|
||||||
|
pool.track_pid(child_pid)
|
||||||
|
assert wait_marker(out, "READY"), "guarded child never got ready"
|
||||||
|
|
||||||
|
p.kill() # parent dies WITHOUT signalling the child — only PDEATHSIG can save us
|
||||||
|
p.wait(timeout=10)
|
||||||
|
assert wait_pid_gone(child_pid), "guarded child must exit on parent death (PDEATHSIG)"
|
||||||
|
assert wait_marker(out, "TEARDOWN", timeout=5), "teardown funnel did not run"
|
||||||
|
assert wait_lock_state(DOMAIN, "free") == "free"
|
||||||
|
|
||||||
|
|
||||||
|
def test_14_already_orphaned_helper_refuses_to_run(lock_dir, pool):
|
||||||
|
"""Case 14 (ppid race): a helper whose parent died BEFORE the prctl was armed (it starts
|
||||||
|
already reparented to pid 1) must refuse to run — PDEATHSIG would never fire for it."""
|
||||||
|
# Spawn an intermediate parent that forks orphan-probe and exits immediately.
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
out = os.path.join(pool.out_dir, "orphan.out")
|
||||||
|
intermediate = (
|
||||||
|
"import subprocess, sys, os; "
|
||||||
|
"subprocess.Popen([sys.executable, os.environ['CCCI_HELPERS'], 'orphan-probe']); "
|
||||||
|
)
|
||||||
|
env = dict(
|
||||||
|
os.environ,
|
||||||
|
CCCI_HELPER_OUT=out,
|
||||||
|
CCCI_HELPERS=os.path.join(os.path.dirname(__file__), "helpers.py"),
|
||||||
|
)
|
||||||
|
subprocess.run([sys.executable, "-c", intermediate], env=env, timeout=15, check=True)
|
||||||
|
line = wait_marker(out, "REFUSED", timeout=20)
|
||||||
|
assert line, "orphaned helper did not refuse to run (or never reparented to pid 1)"
|
||||||
|
|
||||||
|
|
||||||
|
def test_15_deadline_alarm_fires_teardown_and_releases(lock_dir, pool):
|
||||||
|
"""Case 15: the self-deadline (alarm). A guarded helper with a 2s deadline tears down via
|
||||||
|
the funnel (finally: ran), exits NON-zero, and its lock is released."""
|
||||||
|
p, out = pool.spawn("guarded", DOMAIN, "2")
|
||||||
|
assert wait_marker(out, "READY")
|
||||||
|
rc = p.wait(timeout=20)
|
||||||
|
assert rc != 0, f"deadline exit must be non-zero (got {rc})"
|
||||||
|
assert rc == 128 + signal.SIGALRM, f"expected 142 (128+SIGALRM), got {rc}"
|
||||||
|
assert wait_marker(out, "TEARDOWN", timeout=5), "teardown funnel did not run on deadline"
|
||||||
|
assert wait_lock_state(DOMAIN, "free") == "free"
|
||||||
|
|
||||||
|
|
||||||
|
def test_16_sigterm_runs_teardown_funnel_and_releases(lock_dir, pool):
|
||||||
|
"""Case 16: SIGTERM (drone cancel path) -> the finally: teardown funnel runs, exit is
|
||||||
|
non-zero, lock released."""
|
||||||
|
p, out = pool.spawn("guarded", DOMAIN, "3600")
|
||||||
|
assert wait_marker(out, "READY")
|
||||||
|
p.send_signal(signal.SIGTERM)
|
||||||
|
rc = p.wait(timeout=20)
|
||||||
|
assert rc != 0, f"SIGTERM exit must be non-zero (got {rc})"
|
||||||
|
assert rc == 128 + signal.SIGTERM, f"expected 143 (128+SIGTERM), got {rc}"
|
||||||
|
assert wait_marker(out, "TEARDOWN", timeout=5), "teardown funnel did not run on SIGTERM"
|
||||||
|
assert wait_lock_state(DOMAIN, "free") == "free"
|
||||||
85
tests/concurrency/test_locks.py
Normal file
85
tests/concurrency/test_locks.py
Normal file
@ -0,0 +1,85 @@
|
|||||||
|
"""Lock fundamentals (concurrency-restructure plan, cases 1-4). Real kernel flocks held by real
|
||||||
|
subprocesses — nothing mocked."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import fcntl
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(__file__))
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
|
from concutil import ( # noqa: E402
|
||||||
|
DOMAIN,
|
||||||
|
lock_state,
|
||||||
|
wait_lock_state,
|
||||||
|
wait_marker,
|
||||||
|
)
|
||||||
|
from harness import lifecycle # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def test_1_sigkill_releases_lock(lock_dir, pool):
|
||||||
|
"""Case 1: acquire -> holder SIGKILL'd -> lock immediately acquirable (kernel auto-release).
|
||||||
|
The exact property the old pidfile registry approximated with /proc checks."""
|
||||||
|
p, out = pool.spawn("hold", DOMAIN)
|
||||||
|
assert wait_marker(out, "ACQUIRED"), "holder never acquired"
|
||||||
|
assert lock_state(DOMAIN) == "held"
|
||||||
|
p.kill()
|
||||||
|
p.wait(timeout=10)
|
||||||
|
assert wait_lock_state(DOMAIN, "free") == "free"
|
||||||
|
|
||||||
|
|
||||||
|
def test_2_nb_probe_held_vs_unheld(lock_dir, pool):
|
||||||
|
"""Case 2: LOCK_NB probe raises BlockingIOError against a held lock; succeeds when unheld."""
|
||||||
|
p, out = pool.spawn("hold", DOMAIN)
|
||||||
|
assert wait_marker(out, "ACQUIRED")
|
||||||
|
path = lifecycle._app_lock_path(DOMAIN) # noqa: SLF001
|
||||||
|
with open(path, "a") as f:
|
||||||
|
try:
|
||||||
|
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
||||||
|
raise AssertionError("LOCK_NB succeeded against a held lock")
|
||||||
|
except BlockingIOError:
|
||||||
|
pass
|
||||||
|
p.kill()
|
||||||
|
p.wait(timeout=10)
|
||||||
|
assert wait_lock_state(DOMAIN, "free") == "free"
|
||||||
|
with open(path, "a") as f:
|
||||||
|
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB) # must not raise now
|
||||||
|
|
||||||
|
|
||||||
|
def test_3_lock_fd_not_inherited_by_children(lock_dir, pool):
|
||||||
|
"""Case 3 (PEP 446): the holder spawns a subprocess child, the holder dies, the child lives —
|
||||||
|
and the lock is STILL released (the child never inherited the lock fd). This is what makes
|
||||||
|
'held lock == live HARNESS owner' sound even though runs spawn abra/docker/pytest children."""
|
||||||
|
p, out = pool.spawn("hold-with-child", DOMAIN)
|
||||||
|
assert wait_marker(out, "ACQUIRED")
|
||||||
|
child_line = wait_marker(out, "CHILD")
|
||||||
|
assert child_line, "holder never reported its child pid"
|
||||||
|
child_pid = int(child_line.split()[1])
|
||||||
|
pool.track_pid(child_pid)
|
||||||
|
p.kill()
|
||||||
|
p.wait(timeout=10)
|
||||||
|
assert os.path.exists(f"/proc/{child_pid}"), "child should outlive the holder"
|
||||||
|
assert (
|
||||||
|
wait_lock_state(DOMAIN, "free") == "free"
|
||||||
|
), "lock must release on holder death even with a live child (PEP 446 non-inheritable fd)"
|
||||||
|
|
||||||
|
|
||||||
|
def test_4_second_acquire_blocks_until_first_exits(lock_dir, pool):
|
||||||
|
"""Case 4: a second same-domain acquire blocks until the first holder exits — the
|
||||||
|
double-!testme serialisation property."""
|
||||||
|
p1, out1 = pool.spawn("hold", DOMAIN)
|
||||||
|
assert wait_marker(out1, "ACQUIRED")
|
||||||
|
p2, out2 = pool.spawn("hold", DOMAIN)
|
||||||
|
# p2 must NOT acquire while p1 holds.
|
||||||
|
time.sleep(1.5)
|
||||||
|
assert wait_marker(out2, "ACQUIRED", timeout=0.1) is None, "second acquire did not block"
|
||||||
|
t_kill = time.time()
|
||||||
|
p1.kill()
|
||||||
|
p1.wait(timeout=10)
|
||||||
|
line = wait_marker(out2, "ACQUIRED", timeout=15)
|
||||||
|
assert line, "second acquire never completed after first holder exited"
|
||||||
|
acquired_ts = float(line.split()[1])
|
||||||
|
assert acquired_ts >= t_kill - 0.05, "second holder acquired before the first exited"
|
||||||
|
assert lock_state(DOMAIN) == "held"
|
||||||
79
tests/concurrency/test_run_state.py
Normal file
79
tests/concurrency/test_run_state.py
Normal file
@ -0,0 +1,79 @@
|
|||||||
|
"""Run-scoped state files — M2(c) live-verify regression (not one of the 19 plan cases).
|
||||||
|
|
||||||
|
The four CCCI state files (deploys countfile, opstate, deps, depskip) must be keyed by
|
||||||
|
run id + harness pid, NEVER by app domain: a second run of the SAME domain executes its
|
||||||
|
main() preamble (state-file init, deploy_app's _record_deploy) BEFORE it blocks at the
|
||||||
|
app lock, so domain-keyed files in the shared tempdir get reset/removed underneath the
|
||||||
|
live first run. Observed live (builds 279/281): false DG4.1 deploy-count=2 in run 1,
|
||||||
|
countfile FileNotFoundError crash in run 2. Children never re-derive these paths — they
|
||||||
|
receive them via the CCCI_*_FILE env vars, so per-process uniqueness is sufficient.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import tempfile
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(__file__))
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
|
import run_recipe_ci # noqa: E402
|
||||||
|
from concutil import wait_marker # noqa: E402
|
||||||
|
|
||||||
|
DOMAIN = "fake-abc123.ci.commoninternet.net"
|
||||||
|
|
||||||
|
|
||||||
|
def test_20_state_paths_keyed_by_run_and_pid_never_by_domain(monkeypatch):
|
||||||
|
domain = "immi-ad3e33.ci.commoninternet.net"
|
||||||
|
monkeypatch.setenv("CCCI_APP_DOMAIN", domain)
|
||||||
|
|
||||||
|
monkeypatch.setenv("DRONE_BUILD_NUMBER", "279")
|
||||||
|
p279 = run_recipe_ci._run_state_path("deploys")
|
||||||
|
monkeypatch.setenv("DRONE_BUILD_NUMBER", "281")
|
||||||
|
p281 = run_recipe_ci._run_state_path("deploys")
|
||||||
|
|
||||||
|
# the double-!testme invariant: two runs (same domain) share NO state file
|
||||||
|
assert p279 != p281
|
||||||
|
# keyed by run id + pid, under the tempdir
|
||||||
|
base = os.path.basename(p279)
|
||||||
|
assert base == f"ccci-deploys-279-{os.getpid()}"
|
||||||
|
assert os.path.dirname(p279) == tempfile.gettempdir()
|
||||||
|
# the app domain must not appear in the path at all
|
||||||
|
assert domain not in p279 and domain not in p281
|
||||||
|
|
||||||
|
|
||||||
|
def test_20c_same_domain_runs_each_keep_their_own_count(tmp_path, lock_dir, pool):
|
||||||
|
"""The live CONC-A1 interleaving, with REAL processes + the REAL lock and counter code:
|
||||||
|
run A holds the app lock; run B (same domain) fires its pre-lock _record_deploy and
|
||||||
|
blocks; A then reads its counter — must still be 1 (not polluted by B) — and removes
|
||||||
|
its own file; B acquires and must find ITS file intact (no FileNotFoundError)."""
|
||||||
|
gate = tmp_path / "gate"
|
||||||
|
env_a = {"TMPDIR": str(tmp_path), "DRONE_BUILD_NUMBER": "9001"}
|
||||||
|
env_b = {"TMPDIR": str(tmp_path), "DRONE_BUILD_NUMBER": "9002"}
|
||||||
|
|
||||||
|
pa, out_a = pool.spawn("deploy-count-run", DOMAIN, str(gate), env_extra=env_a)
|
||||||
|
assert wait_marker(out_a, "ACQUIRED")
|
||||||
|
pb, out_b = pool.spawn("deploy-count-run", DOMAIN, "", env_extra=env_b)
|
||||||
|
# B's main()-preamble + pre-lock increment have fired; B is now blocked on the app lock
|
||||||
|
assert wait_marker(out_b, "PRELOCK")
|
||||||
|
assert wait_marker(out_b, "ACQUIRED", timeout=1.0) is None # still serialised behind A
|
||||||
|
|
||||||
|
gate.touch() # let A read its counter only AFTER B's pre-lock work landed
|
||||||
|
line_a = wait_marker(out_a, "COUNT")
|
||||||
|
assert line_a is not None and line_a.strip() == "COUNT 1", line_a # not 2: B didn't pollute A
|
||||||
|
pa.wait(timeout=15)
|
||||||
|
|
||||||
|
line_b = wait_marker(out_b, "COUNT")
|
||||||
|
assert (
|
||||||
|
line_b is not None and line_b.strip() == "COUNT 1"
|
||||||
|
), line_b # B's file survived A's remove
|
||||||
|
pb.wait(timeout=15)
|
||||||
|
|
||||||
|
|
||||||
|
def test_20b_manual_runs_distinct_via_pid(monkeypatch):
|
||||||
|
# no DRONE_BUILD_NUMBER and no domain/run-id env → run_id() falls back to "manual";
|
||||||
|
# the pid suffix still separates two concurrent hand-runs of the same domain.
|
||||||
|
for var in ("DRONE_BUILD_NUMBER", "CCCI_APP_DOMAIN", "CCCI_RUN_ID"):
|
||||||
|
monkeypatch.delenv(var, raising=False)
|
||||||
|
p = run_recipe_ci._run_state_path("opstate")
|
||||||
|
assert os.path.basename(p) == f"ccci-opstate-manual-{os.getpid()}"
|
||||||
@ -13,7 +13,8 @@ import sys
|
|||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "runner"))
|
||||||
from harness import deps as deps_mod, lifecycle, naming # noqa: E402
|
from harness import deps as deps_mod # noqa: E402
|
||||||
|
from harness import lifecycle, naming
|
||||||
|
|
||||||
|
|
||||||
def _short(s: str, n: int = 8) -> str:
|
def _short(s: str, n: int = 8) -> str:
|
||||||
|
|||||||
@ -26,6 +26,7 @@ Transient `net::ERR_NETWORK_CHANGED` is handled by the shared `goto_with_retry`
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import contextlib
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
import uuid
|
import uuid
|
||||||
@ -39,7 +40,11 @@ def _open_pad(ctx, url):
|
|||||||
bar once CryptPad has created/loaded the fragment-keyed pad (`#/2/pad/edit/<key>/`)."""
|
bar once CryptPad has created/loaded the fragment-keyed pad (`#/2/pad/edit/<key>/`)."""
|
||||||
page = ctx.new_page()
|
page = ctx.new_page()
|
||||||
harness_browser.goto_with_retry(
|
harness_browser.goto_with_retry(
|
||||||
page, url, accept_statuses=(200,), goto_timeout_ms=60_000, wait_until="load",
|
page,
|
||||||
|
url,
|
||||||
|
accept_statuses=(200,),
|
||||||
|
goto_timeout_ms=60_000,
|
||||||
|
wait_until="load",
|
||||||
deadline_seconds=150,
|
deadline_seconds=150,
|
||||||
)
|
)
|
||||||
pad_url = url
|
pad_url = url
|
||||||
@ -53,13 +58,15 @@ def _open_pad(ctx, url):
|
|||||||
pad_url = page.url
|
pad_url = page.url
|
||||||
break
|
break
|
||||||
if i == 40:
|
if i == 40:
|
||||||
try:
|
with contextlib.suppress(Exception): # best-effort unstick
|
||||||
harness_browser.goto_with_retry(
|
harness_browser.goto_with_retry(
|
||||||
page, url, accept_statuses=(200,), goto_timeout_ms=60_000,
|
page,
|
||||||
wait_until="load", deadline_seconds=120,
|
url,
|
||||||
|
accept_statuses=(200,),
|
||||||
|
goto_timeout_ms=60_000,
|
||||||
|
wait_until="load",
|
||||||
|
deadline_seconds=120,
|
||||||
)
|
)
|
||||||
except Exception: # noqa: BLE001 — best-effort unstick
|
|
||||||
pass
|
|
||||||
return page, pad_url
|
return page, pad_url
|
||||||
|
|
||||||
|
|
||||||
@ -74,18 +81,22 @@ def _ckeditor_frame(page, deadline_polls=90, reload_at=22, reload_url=None):
|
|||||||
if "ckeditor-inner" in f.url:
|
if "ckeditor-inner" in f.url:
|
||||||
return f
|
return f
|
||||||
if i == reload_at and reload_url is not None:
|
if i == reload_at and reload_url is not None:
|
||||||
try:
|
with contextlib.suppress(Exception): # reload is a best-effort unstick
|
||||||
harness_browser.goto_with_retry(
|
harness_browser.goto_with_retry(
|
||||||
page, reload_url, accept_statuses=(200,), goto_timeout_ms=60_000,
|
page,
|
||||||
wait_until="load", deadline_seconds=120,
|
reload_url,
|
||||||
|
accept_statuses=(200,),
|
||||||
|
goto_timeout_ms=60_000,
|
||||||
|
wait_until="load",
|
||||||
|
deadline_seconds=120,
|
||||||
)
|
)
|
||||||
except Exception: # noqa: BLE001 — reload is a best-effort unstick
|
|
||||||
pass
|
|
||||||
page.wait_for_timeout(2000)
|
page.wait_for_timeout(2000)
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def _poll_any_frame_for_text(page, needle, deadline_polls=120, reload_at=(20, 45, 75, 100), reload_url=None):
|
def _poll_any_frame_for_text(
|
||||||
|
page, needle, deadline_polls=120, reload_at=(20, 45, 75, 100), reload_url=None
|
||||||
|
):
|
||||||
"""Robust read-back (F2-13): poll EVERY frame's body text for `needle`, returning True as soon as
|
"""Robust read-back (F2-13): poll EVERY frame's body text for `needle`, returning True as soon as
|
||||||
it appears. The fresh cold-cache read-back context's deeply-nested CKEditor frame is slow/flaky to
|
it appears. The fresh cold-cache read-back context's deeply-nested CKEditor frame is slow/flaky to
|
||||||
*attach* by URL (the prior `_ckeditor_frame` wait timed out on the Adversary's cold run), but the
|
*attach* by URL (the prior `_ckeditor_frame` wait timed out on the Adversary's cold run), but the
|
||||||
@ -101,13 +112,15 @@ def _poll_any_frame_for_text(page, needle, deadline_polls=120, reload_at=(20, 45
|
|||||||
except Exception: # noqa: BLE001 — frame not ready / detached; keep polling
|
except Exception: # noqa: BLE001 — frame not ready / detached; keep polling
|
||||||
pass
|
pass
|
||||||
if reload_url and i in reload_at:
|
if reload_url and i in reload_at:
|
||||||
try:
|
with contextlib.suppress(Exception): # best-effort unstick
|
||||||
harness_browser.goto_with_retry(
|
harness_browser.goto_with_retry(
|
||||||
page, reload_url, accept_statuses=(200,), goto_timeout_ms=60_000,
|
page,
|
||||||
wait_until="load", deadline_seconds=120,
|
reload_url,
|
||||||
|
accept_statuses=(200,),
|
||||||
|
goto_timeout_ms=60_000,
|
||||||
|
wait_until="load",
|
||||||
|
deadline_seconds=120,
|
||||||
)
|
)
|
||||||
except Exception: # noqa: BLE001 — best-effort unstick
|
|
||||||
pass
|
|
||||||
page.wait_for_timeout(2000)
|
page.wait_for_timeout(2000)
|
||||||
return False
|
return False
|
||||||
|
|
||||||
@ -137,9 +150,9 @@ def test_cryptpad_pad_content_survives_fresh_session(live_app):
|
|||||||
# --- session 1: create the pad + write the marker ---
|
# --- session 1: create the pad + write the marker ---
|
||||||
ctx1 = browser.new_context(ignore_https_errors=True)
|
ctx1 = browser.new_context(ignore_https_errors=True)
|
||||||
page, pad_url = _open_pad(ctx1, f"https://{live_app}/pad/")
|
page, pad_url = _open_pad(ctx1, f"https://{live_app}/pad/")
|
||||||
assert "#/2/pad/edit/" in pad_url, (
|
assert (
|
||||||
f"CryptPad did not create a fragment-keyed pad URL; got {pad_url!r}"
|
"#/2/pad/edit/" in pad_url
|
||||||
)
|
), f"CryptPad did not create a fragment-keyed pad URL; got {pad_url!r}"
|
||||||
ck = _ckeditor_frame(page, reload_url=pad_url)
|
ck = _ckeditor_frame(page, reload_url=pad_url)
|
||||||
assert ck is not None, "CKEditor content frame never attached (pad editor not ready)"
|
assert ck is not None, "CKEditor content frame never attached (pad editor not ready)"
|
||||||
_dismiss_store_modal(page)
|
_dismiss_store_modal(page)
|
||||||
@ -148,9 +161,9 @@ def test_cryptpad_pad_content_survives_fresh_session(live_app):
|
|||||||
page.wait_for_timeout(1000)
|
page.wait_for_timeout(1000)
|
||||||
body.type(marker, delay=40)
|
body.type(marker, delay=40)
|
||||||
page.wait_for_timeout(12000) # let CryptPad encrypt + sync the update to the server
|
page.wait_for_timeout(12000) # let CryptPad encrypt + sync the update to the server
|
||||||
assert marker in ck.locator("body").inner_text(), (
|
assert (
|
||||||
"marker not present in the editor after typing — type did not land"
|
marker in ck.locator("body").inner_text()
|
||||||
)
|
), "marker not present in the editor after typing — type did not land"
|
||||||
ctx1.close()
|
ctx1.close()
|
||||||
|
|
||||||
# --- session 2: FRESH context (no shared storage/localStorage) reads the pad back by URL.
|
# --- session 2: FRESH context (no shared storage/localStorage) reads the pad back by URL.
|
||||||
|
|||||||
@ -51,9 +51,9 @@ def test_cryptpad_spa_renders_with_no_console_errors(live_app):
|
|||||||
title = (page.title() or "").lower()
|
title = (page.title() or "").lower()
|
||||||
body = page.content()
|
body = page.content()
|
||||||
blower = body.lower()
|
blower = body.lower()
|
||||||
assert "cryptpad" in title or "cryptpad" in blower, (
|
assert (
|
||||||
f"CryptPad SPA does not carry brand. title={title!r}, body excerpt: {body[:200]!r}"
|
"cryptpad" in title or "cryptpad" in blower
|
||||||
)
|
), f"CryptPad SPA does not carry brand. title={title!r}, body excerpt: {body[:200]!r}"
|
||||||
|
|
||||||
# Canonical CryptPad asset references in the rendered DOM
|
# Canonical CryptPad asset references in the rendered DOM
|
||||||
canonical = ("/customize/", "/components/", "main.js", "/api/broadcast")
|
canonical = ("/customize/", "/components/", "main.js", "/api/broadcast")
|
||||||
|
|||||||
@ -8,7 +8,8 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
from harness import browser as harness_browser, generic, lifecycle # noqa: E402
|
from harness import browser as harness_browser # noqa: E402
|
||||||
|
from harness import generic, lifecycle
|
||||||
|
|
||||||
|
|
||||||
def test_serving_and_content(live_app, meta):
|
def test_serving_and_content(live_app, meta):
|
||||||
|
|||||||
@ -20,7 +20,9 @@ def test_backup_captures_state(live_app):
|
|||||||
Since custom-html-bkp-bad has no ops.py::pre_backup to seed the marker, this file does NOT
|
Since custom-html-bkp-bad has no ops.py::pre_backup to seed the marker, this file does NOT
|
||||||
exist at backup time — exec_in_app returns empty or raises → assertion fails → backup tier RED.
|
exist at backup time — exec_in_app returns empty or raises → assertion fails → backup tier RED.
|
||||||
This models a recipe that declares backup capability but omits the data-seeding hook."""
|
This models a recipe that declares backup capability but omits the data-seeding hook."""
|
||||||
result = lifecycle.exec_in_app(live_app, ["sh", "-c", f"cat {MARKER_PATH} 2>/dev/null || echo MISSING"]).strip()
|
result = lifecycle.exec_in_app(
|
||||||
|
live_app, ["sh", "-c", f"cat {MARKER_PATH} 2>/dev/null || echo MISSING"]
|
||||||
|
).strip()
|
||||||
assert result == "original", (
|
assert result == "original", (
|
||||||
f"backup did not capture the expected marker at {MARKER_PATH}: got {result!r}. "
|
f"backup did not capture the expected marker at {MARKER_PATH}: got {result!r}. "
|
||||||
"Expected 'original' (seeded by pre_backup). If the marker is 'MISSING', the pre_backup "
|
"Expected 'original' (seeded by pre_backup). If the marker is 'MISSING', the pre_backup "
|
||||||
|
|||||||
87
tests/custom-html-tiny/functional/test_serves_content.py
Normal file
87
tests/custom-html-tiny/functional/test_serves_content.py
Normal file
@ -0,0 +1,87 @@
|
|||||||
|
"""custom-html-tiny — recipe-specific functional test (static-web-server).
|
||||||
|
|
||||||
|
Proves the deployed static-web-server is *actually serving files from its `content` volume* with real
|
||||||
|
file-server semantics, not merely returning 200 from a Traefik fallback or a generic stub:
|
||||||
|
|
||||||
|
1. exact-byte round-trip — write a uniquely-named file with random content into the served volume,
|
||||||
|
fetch it over HTTPS, and assert the bytes come back verbatim. Non-vacuous: the content is random
|
||||||
|
per run, so only a server that reads this file off the volume can pass.
|
||||||
|
2. real 404 — a random non-existent path returns 404, proving directory/file semantics (a
|
||||||
|
200-everything stub or mis-routed host would not 404).
|
||||||
|
|
||||||
|
The recipe's image (joseluisq/static-web-server) is shell-less (scratch-based) and its content volume
|
||||||
|
is seeded via the install_steps.sh host-mountpoint mechanism — so this test writes its probe file the
|
||||||
|
same way (resolve the swarm volume's mountpoint with `docker volume inspect`, write directly) rather
|
||||||
|
than `docker exec`-ing in a container that has no shell.
|
||||||
|
|
||||||
|
Runs in the custom tier against the shared post-install deployment (the `live_app` fixture is its
|
||||||
|
per-run domain). Mirrors install_steps.sh: the app's content volume is named `<stack>_content`, where
|
||||||
|
`stack` is the domain with dots replaced by underscores; HTTP_SUBDIR is empty, so the volume root is
|
||||||
|
served at `/`.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import contextlib
|
||||||
|
import os
|
||||||
|
import ssl
|
||||||
|
import subprocess
|
||||||
|
import urllib.error
|
||||||
|
import urllib.request
|
||||||
|
import uuid
|
||||||
|
|
||||||
|
|
||||||
|
def _served_dir(domain: str) -> str:
|
||||||
|
"""Host mountpoint of the app's served `content` volume (same naming as install_steps.sh)."""
|
||||||
|
vol = f"{domain.replace('.', '_')}_content"
|
||||||
|
out = subprocess.run(
|
||||||
|
["docker", "volume", "inspect", vol, "--format", "{{.Mountpoint}}"],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
|
mountpoint = out.stdout.strip()
|
||||||
|
assert mountpoint, f"could not resolve mountpoint for volume {vol!r}"
|
||||||
|
return mountpoint
|
||||||
|
|
||||||
|
|
||||||
|
def _get(url: str) -> tuple[int, bytes]:
|
||||||
|
"""GET the URL; return (status, body). A 4xx/5xx is returned, not raised (we assert on the code).
|
||||||
|
TLS verification is relaxed: the served wildcard cert is validated separately by the infra check;
|
||||||
|
here we care only about the app's response."""
|
||||||
|
ctx = ssl.create_default_context()
|
||||||
|
ctx.check_hostname = False
|
||||||
|
ctx.verify_mode = ssl.CERT_NONE
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(url, timeout=20, context=ctx) as resp:
|
||||||
|
return resp.status, resp.read()
|
||||||
|
except urllib.error.HTTPError as e:
|
||||||
|
return e.code, e.read()
|
||||||
|
|
||||||
|
|
||||||
|
def test_static_file_roundtrip_and_404(live_app):
|
||||||
|
"""Write a random file into the served volume → fetch it → bytes match; and a missing path 404s."""
|
||||||
|
served = _served_dir(live_app)
|
||||||
|
token = uuid.uuid4().hex
|
||||||
|
name = f"ccci-probe-{token}.txt"
|
||||||
|
body = f"cc-ci-functional-{token}\n".encode()
|
||||||
|
path = os.path.join(served, name)
|
||||||
|
with open(path, "wb") as fh:
|
||||||
|
fh.write(body)
|
||||||
|
try:
|
||||||
|
status, got = _get(f"https://{live_app}/{name}")
|
||||||
|
assert status == 200, f"served probe file returned {status} (expected 200)"
|
||||||
|
assert got == body, (
|
||||||
|
f"content round-trip mismatch: served {got!r}, wrote {body!r} "
|
||||||
|
"(static-web-server not serving the content volume?)"
|
||||||
|
)
|
||||||
|
|
||||||
|
# A random non-existent path must 404 — proves real static-file semantics, distinguishing a
|
||||||
|
# working server from a 200-everything stub or a mis-routed Traefik fallback.
|
||||||
|
miss_status, _ = _get(f"https://{live_app}/ccci-missing-{uuid.uuid4().hex}.txt")
|
||||||
|
assert (
|
||||||
|
miss_status == 404
|
||||||
|
), f"missing path returned {miss_status} (expected 404 — generic 200-returner / mis-route?)"
|
||||||
|
finally:
|
||||||
|
with contextlib.suppress(OSError):
|
||||||
|
os.remove(path)
|
||||||
@ -3,3 +3,14 @@
|
|||||||
# (DG5) is detected quickly instead of waiting the default 300s HTTP timeout.
|
# (DG5) is detected quickly instead of waiting the default 300s HTTP timeout.
|
||||||
DEPLOY_TIMEOUT = 120
|
DEPLOY_TIMEOUT = 120
|
||||||
HTTP_TIMEOUT = 90
|
HTTP_TIMEOUT = 90
|
||||||
|
|
||||||
|
# Rungs this recipe INTENTIONALLY skips, each with a reason. Any essential rung skipped (N/A) and NOT
|
||||||
|
# listed here is reported as an *unintentional* skip (a coverage gap to fill or declare). A skip still
|
||||||
|
# caps the level either way — the harness never claims a rung it did not verify; this only records
|
||||||
|
# that the skip is deliberate. (The level ladder is the four essential rungs install/upgrade/
|
||||||
|
# backup_restore/functional; integration + recipe-local are optional and not leveled.)
|
||||||
|
# custom-html-tiny is a stateless static-web-server, so it has no backup surface:
|
||||||
|
EXPECTED_NA = {
|
||||||
|
"backup_restore": "stateless static file server: serves an ephemeral content volume seeded at "
|
||||||
|
"deploy, with no persistent/user data to back up or restore (no backupbot.backup label)",
|
||||||
|
}
|
||||||
|
|||||||
@ -15,7 +15,8 @@ import sys
|
|||||||
import uuid
|
import uuid
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
||||||
from harness import http as harness_http, lifecycle # noqa: E402
|
from harness import http as harness_http # noqa: E402
|
||||||
|
from harness import lifecycle
|
||||||
|
|
||||||
|
|
||||||
def test_content_roundtrip(live_app):
|
def test_content_roundtrip(live_app):
|
||||||
|
|||||||
@ -53,9 +53,9 @@ def test_content_type_html_and_txt(live_app):
|
|||||||
ct_txt = h_txt.get("content-type", "")
|
ct_txt = h_txt.get("content-type", "")
|
||||||
|
|
||||||
# nginx default: "text/html" for .html and "text/plain" for .txt (may include "; charset=utf-8")
|
# nginx default: "text/html" for .html and "text/plain" for .txt (may include "; charset=utf-8")
|
||||||
assert ct_html.startswith("text/html"), (
|
assert ct_html.startswith(
|
||||||
f"{html_name} Content-Type={ct_html!r}, expected text/html (nginx MIME config broken?)"
|
"text/html"
|
||||||
)
|
), f"{html_name} Content-Type={ct_html!r}, expected text/html (nginx MIME config broken?)"
|
||||||
assert ct_txt.startswith("text/plain"), (
|
assert ct_txt.startswith(
|
||||||
f"{txt_name} Content-Type={ct_txt!r}, expected text/plain (nginx MIME config broken?)"
|
"text/plain"
|
||||||
)
|
), f"{txt_name} Content-Type={ct_txt!r}, expected text/plain (nginx MIME config broken?)"
|
||||||
|
|||||||
@ -9,7 +9,8 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
from harness import browser as harness_browser, generic # noqa: E402
|
from harness import browser as harness_browser # noqa: E402
|
||||||
|
from harness import generic
|
||||||
|
|
||||||
|
|
||||||
def test_serving_and_content(live_app, meta):
|
def test_serving_and_content(live_app, meta):
|
||||||
|
|||||||
@ -53,7 +53,7 @@ def mint_admin(domain: str) -> tuple[str, str]:
|
|||||||
cmd = (
|
cmd = (
|
||||||
"cd /opt/bitnami/discourse && "
|
"cd /opt/bitnami/discourse && "
|
||||||
"RUBY=$(command -v ruby || echo /opt/bitnami/ruby/bin/ruby) && "
|
"RUBY=$(command -v ruby || echo /opt/bitnami/ruby/bin/ruby) && "
|
||||||
f"RAILS_ENV=production \"$RUBY\" bin/rails runner \"{_BOOTSTRAP_RB}\""
|
f'RAILS_ENV=production "$RUBY" bin/rails runner "{_BOOTSTRAP_RB}"'
|
||||||
)
|
)
|
||||||
out = lifecycle.exec_in_app(domain, ["bash", "-c", cmd], service="app", timeout=240)
|
out = lifecycle.exec_in_app(domain, ["bash", "-c", cmd], service="app", timeout=240)
|
||||||
key = user = None
|
key = user = None
|
||||||
@ -63,9 +63,9 @@ def mint_admin(domain: str) -> tuple[str, str]:
|
|||||||
key = line.split("=", 1)[1].strip()
|
key = line.split("=", 1)[1].strip()
|
||||||
elif line.startswith("CCCI_API_USER="):
|
elif line.startswith("CCCI_API_USER="):
|
||||||
user = line.split("=", 1)[1].strip()
|
user = line.split("=", 1)[1].strip()
|
||||||
assert key and user, (
|
assert (
|
||||||
f"could not bootstrap discourse admin/API key; rails output tail:\n{out[-1000:]}"
|
key and user
|
||||||
)
|
), f"could not bootstrap discourse admin/API key; rails output tail:\n{out[-1000:]}"
|
||||||
return key, user
|
return key, user
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -48,21 +48,23 @@ def test_create_topic_roundtrip(live_app):
|
|||||||
headers=hdrs,
|
headers=hdrs,
|
||||||
timeout=60,
|
timeout=60,
|
||||||
)
|
)
|
||||||
assert status in (200, 201) and isinstance(body, dict), (
|
assert status in (200, 201) and isinstance(
|
||||||
f"create topic failed: HTTP {status}, body={body!r}"
|
body, dict
|
||||||
)
|
), f"create topic failed: HTTP {status}, body={body!r}"
|
||||||
topic_id = body.get("topic_id")
|
topic_id = body.get("topic_id")
|
||||||
assert topic_id, f"create topic returned no topic_id: {body!r}"
|
assert topic_id, f"create topic returned no topic_id: {body!r}"
|
||||||
|
|
||||||
# 4) Read the topic back and assert title + first-post body round-trip.
|
# 4) Read the topic back and assert title + first-post body round-trip.
|
||||||
status, got = harness_http.http_get(f"{base}/t/{topic_id}.json", headers=hdrs, timeout=30)
|
status, got = harness_http.http_get(f"{base}/t/{topic_id}.json", headers=hdrs, timeout=30)
|
||||||
assert status == 200 and isinstance(got, dict), f"read topic failed: HTTP {status}, body={got!r}"
|
assert status == 200 and isinstance(
|
||||||
assert got.get("title") == title, (
|
got, dict
|
||||||
f"topic title did not round-trip: sent {title!r}, got {got.get('title')!r}"
|
), f"read topic failed: HTTP {status}, body={got!r}"
|
||||||
)
|
assert (
|
||||||
|
got.get("title") == title
|
||||||
|
), f"topic title did not round-trip: sent {title!r}, got {got.get('title')!r}"
|
||||||
posts = (got.get("post_stream") or {}).get("posts") or []
|
posts = (got.get("post_stream") or {}).get("posts") or []
|
||||||
assert posts, f"topic has no posts on read-back: {got!r}"
|
assert posts, f"topic has no posts on read-back: {got!r}"
|
||||||
first_cooked = posts[0].get("cooked", "")
|
first_cooked = posts[0].get("cooked", "")
|
||||||
assert marker in first_cooked, (
|
assert (
|
||||||
f"topic body did not round-trip: marker {marker!r} not in first post {first_cooked!r}"
|
marker in first_cooked
|
||||||
)
|
), f"topic body did not round-trip: marker {marker!r} not in first post {first_cooked!r}"
|
||||||
|
|||||||
@ -20,12 +20,12 @@ def test_site_json_has_discourse_config(live_app):
|
|||||||
status, body = harness_http.retry_http_get(
|
status, body = harness_http.retry_http_get(
|
||||||
f"https://{live_app}/site.json", expect_status=200, max_wait=120, interval=5
|
f"https://{live_app}/site.json", expect_status=200, max_wait=120, interval=5
|
||||||
)
|
)
|
||||||
assert status == 200 and isinstance(body, dict), (
|
assert status == 200 and isinstance(
|
||||||
f"GET /site.json failed: HTTP {status}, body type={type(body).__name__}"
|
body, dict
|
||||||
)
|
), f"GET /site.json failed: HTTP {status}, body type={type(body).__name__}"
|
||||||
# /site.json carries Discourse-specific structure — `categories` (a list) and `groups` are always
|
# /site.json carries Discourse-specific structure — `categories` (a list) and `groups` are always
|
||||||
# present in a booted Discourse. A non-Discourse 200 (placeholder page) would not parse to this.
|
# present in a booted Discourse. A non-Discourse 200 (placeholder page) would not parse to this.
|
||||||
assert "categories" in body, f"/site.json missing 'categories' key: keys={list(body)[:20]}"
|
assert "categories" in body, f"/site.json missing 'categories' key: keys={list(body)[:20]}"
|
||||||
assert isinstance(body["categories"], list), (
|
assert isinstance(
|
||||||
f"/site.json 'categories' not a list: {type(body['categories']).__name__}"
|
body["categories"], list
|
||||||
)
|
), f"/site.json 'categories' not a list: {type(body['categories']).__name__}"
|
||||||
|
|||||||
@ -15,7 +15,9 @@ set -euo pipefail
|
|||||||
|
|
||||||
: "${CCCI_RECIPE:?missing CCCI_RECIPE}"
|
: "${CCCI_RECIPE:?missing CCCI_RECIPE}"
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
RECIPE_DIR="${HOME}/.abra/recipes/${CCCI_RECIPE}"
|
# Resolve the recipe tree the way abra does: $ABRA_DIR (the per-run tree inside a CI run) else
|
||||||
|
# the canonical ~/.abra — the overlay must land in the tree this run actually deploys from.
|
||||||
|
RECIPE_DIR="${ABRA_DIR:-${HOME}/.abra}/recipes/${CCCI_RECIPE}"
|
||||||
|
|
||||||
if [ ! -d "$RECIPE_DIR" ]; then
|
if [ ! -d "$RECIPE_DIR" ]; then
|
||||||
echo " discourse install_steps: recipe dir $RECIPE_DIR missing — cannot provide compose.ccci.yml" >&2
|
echo " discourse install_steps: recipe dir $RECIPE_DIR missing — cannot provide compose.ccci.yml" >&2
|
||||||
|
|||||||
@ -15,8 +15,7 @@ from harness import lifecycle # noqa: E402
|
|||||||
|
|
||||||
def _psql(domain, sql):
|
def _psql(domain, sql):
|
||||||
cmd = (
|
cmd = (
|
||||||
'PGPASSWORD=$(cat /run/secrets/db_password) '
|
"PGPASSWORD=$(cat /run/secrets/db_password) " f'psql -U discourse -d discourse -tAc "{sql}"'
|
||||||
f'psql -U discourse -d discourse -tAc "{sql}"'
|
|
||||||
)
|
)
|
||||||
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
||||||
|
|
||||||
@ -42,6 +41,7 @@ def pre_backup(domain, meta):
|
|||||||
def pre_restore(domain, meta):
|
def pre_restore(domain, meta):
|
||||||
# diverge from the backup so a successful restore is observable
|
# diverge from the backup so a successful restore is observable
|
||||||
_psql(domain, "DROP TABLE IF EXISTS ci_marker;")
|
_psql(domain, "DROP TABLE IF EXISTS ci_marker;")
|
||||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in ("", "NULL"), (
|
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||||
"drop did not take"
|
"",
|
||||||
)
|
"NULL",
|
||||||
|
), "drop did not take"
|
||||||
|
|||||||
@ -6,7 +6,9 @@
|
|||||||
# app is actually serving (the canonical "is discourse up" signal — NOT "/", which may redirect to setup).
|
# app is actually serving (the canonical "is discourse up" signal — NOT "/", which may redirect to setup).
|
||||||
HEALTH_PATH = "/srv/status"
|
HEALTH_PATH = "/srv/status"
|
||||||
HEALTH_OK = (200,)
|
HEALTH_OK = (200,)
|
||||||
DEPLOY_TIMEOUT = 3600 # slow Rails cold boot (15-25min) on the 7-GiB single node; bumped 2400→3600 for
|
DEPLOY_TIMEOUT = (
|
||||||
|
3600 # slow Rails cold boot (15-25min) on the 7-GiB single node; bumped 2400→3600 for
|
||||||
|
)
|
||||||
# headroom after full4's base deploy timed out at 2400s (RAM/CPU-constrained boot + image re-pull).
|
# headroom after full4's base deploy timed out at 2400s (RAM/CPU-constrained boot + image re-pull).
|
||||||
HTTP_TIMEOUT = 1200
|
HTTP_TIMEOUT = 1200
|
||||||
|
|
||||||
@ -59,7 +61,11 @@ def BACKUP_VERIFY(domain):
|
|||||||
try:
|
try:
|
||||||
out = lifecycle.exec_in_app(
|
out = lifecycle.exec_in_app(
|
||||||
domain,
|
domain,
|
||||||
["sh", "-c", "gzip -t /var/lib/postgresql/data/backup.sql && wc -c < /var/lib/postgresql/data/backup.sql"],
|
[
|
||||||
|
"sh",
|
||||||
|
"-c",
|
||||||
|
"gzip -t /var/lib/postgresql/data/backup.sql && wc -c < /var/lib/postgresql/data/backup.sql",
|
||||||
|
],
|
||||||
service="db",
|
service="db",
|
||||||
timeout=60,
|
timeout=60,
|
||||||
).strip()
|
).strip()
|
||||||
|
|||||||
@ -14,13 +14,12 @@ from harness import lifecycle # noqa: E402
|
|||||||
|
|
||||||
def _psql(domain, sql):
|
def _psql(domain, sql):
|
||||||
cmd = (
|
cmd = (
|
||||||
'PGPASSWORD=$(cat /run/secrets/db_password) '
|
"PGPASSWORD=$(cat /run/secrets/db_password) " f'psql -U discourse -d discourse -tAc "{sql}"'
|
||||||
f'psql -U discourse -d discourse -tAc "{sql}"'
|
|
||||||
)
|
)
|
||||||
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
||||||
|
|
||||||
|
|
||||||
def test_backup_captures_state(live_app):
|
def test_backup_captures_state(live_app):
|
||||||
assert _psql(live_app, "SELECT v FROM ci_marker;") == "original", (
|
assert (
|
||||||
"the seeded discourse postgres state was not present at backup time"
|
_psql(live_app, "SELECT v FROM ci_marker;") == "original"
|
||||||
)
|
), "the seeded discourse postgres state was not present at backup time"
|
||||||
|
|||||||
@ -14,13 +14,12 @@ from harness import lifecycle # noqa: E402
|
|||||||
|
|
||||||
def _psql(domain, sql):
|
def _psql(domain, sql):
|
||||||
cmd = (
|
cmd = (
|
||||||
'PGPASSWORD=$(cat /run/secrets/db_password) '
|
"PGPASSWORD=$(cat /run/secrets/db_password) " f'psql -U discourse -d discourse -tAc "{sql}"'
|
||||||
f'psql -U discourse -d discourse -tAc "{sql}"'
|
|
||||||
)
|
)
|
||||||
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
||||||
|
|
||||||
|
|
||||||
def test_restore_returns_state(live_app):
|
def test_restore_returns_state(live_app):
|
||||||
assert _psql(live_app, "SELECT v FROM ci_marker;") == "original", (
|
assert (
|
||||||
"restore did not return the pre-mutation discourse postgres state (data-integrity failure)"
|
_psql(live_app, "SELECT v FROM ci_marker;") == "original"
|
||||||
)
|
), "restore did not return the pre-mutation discourse postgres state (data-integrity failure)"
|
||||||
|
|||||||
@ -93,9 +93,10 @@ class GhostAdmin:
|
|||||||
status, body = self.req(
|
status, body = self.req(
|
||||||
"POST", "/session/", {"username": ADMIN_EMAIL, "password": ADMIN_PW}
|
"POST", "/session/", {"username": ADMIN_EMAIL, "password": ADMIN_PW}
|
||||||
)
|
)
|
||||||
assert status in (200, 201), (
|
assert status in (
|
||||||
f"ghost admin session login failed: HTTP {status}, body={body!r}"
|
200,
|
||||||
)
|
201,
|
||||||
|
), f"ghost admin session login failed: HTTP {status}, body={body!r}"
|
||||||
|
|
||||||
def create_post(self, title: str, html: str) -> dict:
|
def create_post(self, title: str, html: str) -> dict:
|
||||||
status, body = self.req(
|
status, body = self.req(
|
||||||
|
|||||||
@ -53,13 +53,15 @@ def test_ghost_admin_route_is_wired(live_app):
|
|||||||
return None
|
return None
|
||||||
|
|
||||||
status_body = harness_http.assert_converges(
|
status_body = harness_http.assert_converges(
|
||||||
_ready, f"GET {url} returns Ghost admin (200) or setup redirect (302)",
|
_ready,
|
||||||
max_wait=60, interval=3,
|
f"GET {url} returns Ghost admin (200) or setup redirect (302)",
|
||||||
|
max_wait=60,
|
||||||
|
interval=3,
|
||||||
)
|
)
|
||||||
status, body = status_body
|
status, body = status_body
|
||||||
assert status in (200, 302), f"unexpected status: {status}"
|
assert status in (200, 302), f"unexpected status: {status}"
|
||||||
if status == 200:
|
if status == 200:
|
||||||
# The admin SPA references /ghost-assets/ or contains "ghost" in title/body
|
# The admin SPA references /ghost-assets/ or contains "ghost" in title/body
|
||||||
assert "ghost" in body.lower(), (
|
assert (
|
||||||
f"GET {url} 200 but body has no Ghost markers: {body[:200]!r}"
|
"ghost" in body.lower()
|
||||||
)
|
), f"GET {url} 200 but body has no Ghost markers: {body[:200]!r}"
|
||||||
|
|||||||
@ -35,10 +35,10 @@ def test_content_api_settings_endpoint(live_app):
|
|||||||
assert body is not None, f"GET {url} returned non-JSON body"
|
assert body is not None, f"GET {url} returned non-JSON body"
|
||||||
# On success: {"settings": {...}}. On error: {"errors": [...]}. Either shape is valid.
|
# On success: {"settings": {...}}. On error: {"errors": [...]}. Either shape is valid.
|
||||||
if status == 200:
|
if status == 200:
|
||||||
assert isinstance(body, dict) and "settings" in body, (
|
assert (
|
||||||
f"200 response missing 'settings' envelope: {body!r}"
|
isinstance(body, dict) and "settings" in body
|
||||||
)
|
), f"200 response missing 'settings' envelope: {body!r}"
|
||||||
else:
|
else:
|
||||||
assert isinstance(body, dict) and ("errors" in body or "message" in body or body), (
|
assert isinstance(body, dict) and (
|
||||||
f"error response not a proper Ghost error envelope: {body!r}"
|
"errors" in body or "message" in body or body
|
||||||
)
|
), f"error response not a proper Ghost error envelope: {body!r}"
|
||||||
|
|||||||
@ -43,17 +43,17 @@ def test_create_post_roundtrip(live_app):
|
|||||||
title = f"ccci-marker-{uniq}"
|
title = f"ccci-marker-{uniq}"
|
||||||
marker = f"ccci-body-marker-{uniq}-roundtrip"
|
marker = f"ccci-body-marker-{uniq}-roundtrip"
|
||||||
created = admin.create_post(title, f"<p>{marker}</p>")
|
created = admin.create_post(title, f"<p>{marker}</p>")
|
||||||
assert created.get("title") == title, (
|
assert (
|
||||||
f"created post title mismatch: sent {title!r}, got {created.get('title')!r}"
|
created.get("title") == title
|
||||||
)
|
), f"created post title mismatch: sent {title!r}, got {created.get('title')!r}"
|
||||||
|
|
||||||
# 4) Read it back by id and assert the post survived the round-trip (title always returned;
|
# 4) Read it back by id and assert the post survived the round-trip (title always returned;
|
||||||
# html returned because we requested ?formats=html).
|
# html returned because we requested ?formats=html).
|
||||||
got = admin.get_post(created["id"])
|
got = admin.get_post(created["id"])
|
||||||
assert got.get("title") == title, (
|
assert (
|
||||||
f"post title did not round-trip: sent {title!r}, got {got.get('title')!r}"
|
got.get("title") == title
|
||||||
)
|
), f"post title did not round-trip: sent {title!r}, got {got.get('title')!r}"
|
||||||
html = got.get("html") or ""
|
html = got.get("html") or ""
|
||||||
assert marker in html, (
|
assert (
|
||||||
f"post body did not round-trip: marker {marker!r} not in read-back html {html!r}"
|
marker in html
|
||||||
)
|
), f"post body did not round-trip: marker {marker!r} not in read-back html {html!r}"
|
||||||
|
|||||||
@ -15,7 +15,9 @@ set -euo pipefail
|
|||||||
|
|
||||||
: "${CCCI_RECIPE:?missing CCCI_RECIPE}"
|
: "${CCCI_RECIPE:?missing CCCI_RECIPE}"
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
RECIPE_DIR="${HOME}/.abra/recipes/${CCCI_RECIPE}"
|
# Resolve the recipe tree the way abra does: $ABRA_DIR (the per-run tree inside a CI run) else
|
||||||
|
# the canonical ~/.abra — the overlay must land in the tree this run actually deploys from.
|
||||||
|
RECIPE_DIR="${ABRA_DIR:-${HOME}/.abra}/recipes/${CCCI_RECIPE}"
|
||||||
|
|
||||||
if [ ! -d "$RECIPE_DIR" ]; then
|
if [ ! -d "$RECIPE_DIR" ]; then
|
||||||
echo " ghost install_steps: recipe dir $RECIPE_DIR missing — cannot provide compose.ccci.yml" >&2
|
echo " ghost install_steps: recipe dir $RECIPE_DIR missing — cannot provide compose.ccci.yml" >&2
|
||||||
|
|||||||
@ -22,10 +22,7 @@ from harness import lifecycle # noqa: E402
|
|||||||
|
|
||||||
|
|
||||||
def _mysql(domain, sql):
|
def _mysql(domain, sql):
|
||||||
cmd = (
|
cmd = 'MYSQL_PWD="$(cat /run/secrets/db_password)" ' f'mysql -u root -N -s ghost -e "{sql}"'
|
||||||
'MYSQL_PWD="$(cat /run/secrets/db_password)" '
|
|
||||||
f'mysql -u root -N -s ghost -e "{sql}"'
|
|
||||||
)
|
|
||||||
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -63,7 +63,11 @@ def BACKUP_VERIFY(domain):
|
|||||||
try:
|
try:
|
||||||
out = lifecycle.exec_in_app(
|
out = lifecycle.exec_in_app(
|
||||||
domain,
|
domain,
|
||||||
["sh", "-c", "gzip -t /var/lib/mysql/backup.sql.gz && wc -c < /var/lib/mysql/backup.sql.gz"],
|
[
|
||||||
|
"sh",
|
||||||
|
"-c",
|
||||||
|
"gzip -t /var/lib/mysql/backup.sql.gz && wc -c < /var/lib/mysql/backup.sql.gz",
|
||||||
|
],
|
||||||
service="db",
|
service="db",
|
||||||
timeout=60,
|
timeout=60,
|
||||||
).strip()
|
).strip()
|
||||||
|
|||||||
@ -15,14 +15,11 @@ from harness import lifecycle # noqa: E402
|
|||||||
|
|
||||||
|
|
||||||
def _mysql(domain, sql):
|
def _mysql(domain, sql):
|
||||||
cmd = (
|
cmd = 'MYSQL_PWD="$(cat /run/secrets/db_password)" ' f'mysql -u root -N -s ghost -e "{sql}"'
|
||||||
'MYSQL_PWD="$(cat /run/secrets/db_password)" '
|
|
||||||
f'mysql -u root -N -s ghost -e "{sql}"'
|
|
||||||
)
|
|
||||||
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
||||||
|
|
||||||
|
|
||||||
def test_backup_captures_state(live_app):
|
def test_backup_captures_state(live_app):
|
||||||
assert _mysql(live_app, "SELECT v FROM ci_marker;") == "original", (
|
assert (
|
||||||
"the seeded ghost MySQL marker was not present at backup time"
|
_mysql(live_app, "SELECT v FROM ci_marker;") == "original"
|
||||||
)
|
), "the seeded ghost MySQL marker was not present at backup time"
|
||||||
|
|||||||
@ -22,10 +22,7 @@ from harness import lifecycle # noqa: E402
|
|||||||
|
|
||||||
|
|
||||||
def _mysql(domain, sql):
|
def _mysql(domain, sql):
|
||||||
cmd = (
|
cmd = 'MYSQL_PWD="$(cat /run/secrets/db_password)" ' f'mysql -u root -N -s ghost -e "{sql}"'
|
||||||
'MYSQL_PWD="$(cat /run/secrets/db_password)" '
|
|
||||||
f'mysql -u root -N -s ghost -e "{sql}"'
|
|
||||||
)
|
|
||||||
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -14,14 +14,11 @@ from harness import lifecycle # noqa: E402
|
|||||||
|
|
||||||
|
|
||||||
def _mysql(domain, sql):
|
def _mysql(domain, sql):
|
||||||
cmd = (
|
cmd = 'MYSQL_PWD="$(cat /run/secrets/db_password)" ' f'mysql -u root -N -s ghost -e "{sql}"'
|
||||||
'MYSQL_PWD="$(cat /run/secrets/db_password)" '
|
|
||||||
f'mysql -u root -N -s ghost -e "{sql}"'
|
|
||||||
)
|
|
||||||
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
|
||||||
|
|
||||||
|
|
||||||
def test_upgrade_preserves_state(live_app):
|
def test_upgrade_preserves_state(live_app):
|
||||||
assert _mysql(live_app, "SELECT v FROM ci_marker;") == "upgrade-survives", (
|
assert (
|
||||||
"the seeded ghost MySQL marker did not survive the upgrade redeploy (data loss on upgrade)"
|
_mysql(live_app, "SELECT v FROM ci_marker;") == "upgrade-survives"
|
||||||
)
|
), "the seeded ghost MySQL marker did not survive the upgrade redeploy (data loss on upgrade)"
|
||||||
|
|||||||
@ -14,7 +14,6 @@ import urllib.request
|
|||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
||||||
from harness import http as harness_http # noqa: E402
|
from harness import http as harness_http # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
_CTX = ssl.create_default_context()
|
_CTX = ssl.create_default_context()
|
||||||
_CTX.check_hostname = False
|
_CTX.check_hostname = False
|
||||||
_CTX.verify_mode = ssl.CERT_NONE
|
_CTX.verify_mode = ssl.CERT_NONE
|
||||||
|
|||||||
@ -15,7 +15,5 @@ from harness import http as harness_http # noqa: E402
|
|||||||
def test_hedgedoc_root_serves(live_app):
|
def test_hedgedoc_root_serves(live_app):
|
||||||
"""GET / → 200 or 302 (login/new redirect)."""
|
"""GET / → 200 or 302 (login/new redirect)."""
|
||||||
url = f"https://{live_app}/"
|
url = f"https://{live_app}/"
|
||||||
status, _ = harness_http.retry_http_get(
|
status, _ = harness_http.retry_http_get(url, expect_status=(200, 302), max_wait=90, interval=5)
|
||||||
url, expect_status=(200, 302), max_wait=90, interval=5
|
|
||||||
)
|
|
||||||
assert status in (200, 302), f"GET {url} HTTP {status} (expected 200 or 302)"
|
assert status in (200, 302), f"GET {url} HTTP {status} (expected 200 or 302)"
|
||||||
|
|||||||
@ -111,13 +111,13 @@ def test_immich_processes_uploaded_asset_metadata_and_statistics(live_app):
|
|||||||
if exif and exif.get("exifImageWidth"):
|
if exif and exif.get("exifImageWidth"):
|
||||||
break
|
break
|
||||||
time.sleep(5)
|
time.sleep(5)
|
||||||
assert exif and exif.get("exifImageWidth") == 1 and exif.get("exifImageHeight") == 1, (
|
assert (
|
||||||
f"immich metadata-extraction did not populate the 1x1 PNG dimensions in exifInfo: {exif!r}"
|
exif and exif.get("exifImageWidth") == 1 and exif.get("exifImageHeight") == 1
|
||||||
)
|
), f"immich metadata-extraction did not populate the 1x1 PNG dimensions in exifInfo: {exif!r}"
|
||||||
|
|
||||||
# the asset is catalogued into the owner's library statistics (list-back in aggregate)
|
# the asset is catalogued into the owner's library statistics (list-back in aggregate)
|
||||||
sst, stats = harness_http.http_request("GET", f"{base}/api/assets/statistics", headers=auth)
|
sst, stats = harness_http.http_request("GET", f"{base}/api/assets/statistics", headers=auth)
|
||||||
assert sst == 200 and isinstance(stats, dict), f"statistics HTTP {sst}: {stats!r}"
|
assert sst == 200 and isinstance(stats, dict), f"statistics HTTP {sst}: {stats!r}"
|
||||||
assert stats.get("images", 0) >= 1 and stats.get("total", 0) >= 1, (
|
assert (
|
||||||
f"uploaded asset not reflected in library statistics: {stats!r}"
|
stats.get("images", 0) >= 1 and stats.get("total", 0) >= 1
|
||||||
)
|
), f"uploaded asset not reflected in library statistics: {stats!r}"
|
||||||
|
|||||||
@ -121,6 +121,6 @@ def test_immich_upload_asset_readback_and_thumbnail(live_app):
|
|||||||
if thumb == 200:
|
if thumb == 200:
|
||||||
break
|
break
|
||||||
time.sleep(5)
|
time.sleep(5)
|
||||||
assert thumb == 200, (
|
assert (
|
||||||
f"immich did not generate a thumbnail/derivative for the uploaded asset (last HTTP {thumb})"
|
thumb == 200
|
||||||
)
|
), f"immich did not generate a thumbnail/derivative for the uploaded asset (last HTTP {thumb})"
|
||||||
|
|||||||
@ -16,5 +16,11 @@ from harness import http as harness_http # noqa: E402
|
|||||||
|
|
||||||
def test_immich_returns_200(live_app):
|
def test_immich_returns_200(live_app):
|
||||||
url = f"https://{live_app}/"
|
url = f"https://{live_app}/"
|
||||||
status, _ = harness_http.retry_http_get(url, expect_status=(200, 301, 302), max_wait=60, interval=3)
|
status, _ = harness_http.retry_http_get(
|
||||||
assert status in (200, 301, 302), f"immich at {url} returned HTTP {status} (expected 200/301/302)"
|
url, expect_status=(200, 301, 302), max_wait=60, interval=3
|
||||||
|
)
|
||||||
|
assert status in (
|
||||||
|
200,
|
||||||
|
301,
|
||||||
|
302,
|
||||||
|
), f"immich at {url} returned HTTP {status} (expected 200/301/302)"
|
||||||
|
|||||||
@ -35,4 +35,7 @@ def pre_backup(domain, meta):
|
|||||||
|
|
||||||
def pre_restore(domain, meta):
|
def pre_restore(domain, meta):
|
||||||
_psql(domain, "DROP TABLE ci_marker;")
|
_psql(domain, "DROP TABLE ci_marker;")
|
||||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in ("", "NULL"), "drop did not take"
|
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||||
|
"",
|
||||||
|
"NULL",
|
||||||
|
), "drop did not take"
|
||||||
|
|||||||
@ -14,4 +14,6 @@ def _psql(domain, sql):
|
|||||||
|
|
||||||
|
|
||||||
def test_backup_captures_state(live_app):
|
def test_backup_captures_state(live_app):
|
||||||
assert _psql(live_app, "SELECT v FROM ci_marker;") == "original", "seeded postgres state not present at backup time"
|
assert (
|
||||||
|
_psql(live_app, "SELECT v FROM ci_marker;") == "original"
|
||||||
|
), "seeded postgres state not present at backup time"
|
||||||
|
|||||||
@ -7,7 +7,8 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
from harness import browser as harness_browser, generic, lifecycle # noqa: E402
|
from harness import browser as harness_browser # noqa: E402
|
||||||
|
from harness import generic, lifecycle
|
||||||
|
|
||||||
|
|
||||||
def test_serving_and_frontend(live_app, meta):
|
def test_serving_and_frontend(live_app, meta):
|
||||||
@ -25,7 +26,11 @@ def test_serving_and_frontend(live_app, meta):
|
|||||||
resp = harness_browser.goto_with_retry(
|
resp = harness_browser.goto_with_retry(
|
||||||
page, url, accept_statuses=(200, 301, 302), goto_timeout_ms=60_000
|
page, url, accept_statuses=(200, 301, 302), goto_timeout_ms=60_000
|
||||||
)
|
)
|
||||||
assert resp is not None and resp.status in (200, 301, 302), f"page status {resp and resp.status}"
|
assert resp is not None and resp.status in (
|
||||||
|
200,
|
||||||
|
301,
|
||||||
|
302,
|
||||||
|
), f"page status {resp and resp.status}"
|
||||||
assert "<html" in page.content().lower(), "no HTML served by the immich frontend"
|
assert "<html" in page.content().lower(), "no HTML served by the immich frontend"
|
||||||
finally:
|
finally:
|
||||||
browser.close()
|
browser.close()
|
||||||
|
|||||||
@ -14,4 +14,6 @@ def _psql(domain, sql):
|
|||||||
|
|
||||||
|
|
||||||
def test_restore_returns_state(live_app):
|
def test_restore_returns_state(live_app):
|
||||||
assert _psql(live_app, "SELECT v FROM ci_marker;") == "original", "restore did not return the pre-mutation postgres state"
|
assert (
|
||||||
|
_psql(live_app, "SELECT v FROM ci_marker;") == "original"
|
||||||
|
), "restore did not return the pre-mutation postgres state"
|
||||||
|
|||||||
@ -14,4 +14,6 @@ def _psql(domain, sql):
|
|||||||
|
|
||||||
|
|
||||||
def test_upgrade_preserves_data(live_app):
|
def test_upgrade_preserves_data(live_app):
|
||||||
assert _psql(live_app, "SELECT v FROM ci_marker;") == "upgrade-survives", "postgres data did not survive the upgrade"
|
assert (
|
||||||
|
_psql(live_app, "SELECT v FROM ci_marker;") == "upgrade-survives"
|
||||||
|
), "postgres data did not survive the upgrade"
|
||||||
|
|||||||
@ -120,9 +120,9 @@ def test_create_confidential_client_and_obtain_token(live_app):
|
|||||||
"clientId": client_id,
|
"clientId": client_id,
|
||||||
"enabled": True,
|
"enabled": True,
|
||||||
"secret": client_secret,
|
"secret": client_secret,
|
||||||
"publicClient": False, # confidential client
|
"publicClient": False, # confidential client
|
||||||
"serviceAccountsEnabled": True, # required for client_credentials grant
|
"serviceAccountsEnabled": True, # required for client_credentials grant
|
||||||
"standardFlowEnabled": False, # not needed for service-account-only client
|
"standardFlowEnabled": False, # not needed for service-account-only client
|
||||||
"directAccessGrantsEnabled": False,
|
"directAccessGrantsEnabled": False,
|
||||||
"protocol": "openid-connect",
|
"protocol": "openid-connect",
|
||||||
}
|
}
|
||||||
@ -144,25 +144,25 @@ def test_create_confidential_client_and_obtain_token(live_app):
|
|||||||
|
|
||||||
# Use the client to obtain its own token (client_credentials grant)
|
# Use the client to obtain its own token (client_credentials grant)
|
||||||
tok_status, tok_resp = _client_credentials_token(live_app, client_id, client_secret)
|
tok_status, tok_resp = _client_credentials_token(live_app, client_id, client_secret)
|
||||||
assert tok_status == 200, (
|
assert (
|
||||||
f"client_credentials token returned HTTP {tok_status}: {tok_resp!r}"
|
tok_status == 200
|
||||||
)
|
), f"client_credentials token returned HTTP {tok_status}: {tok_resp!r}"
|
||||||
access_token = tok_resp.get("access_token") if isinstance(tok_resp, dict) else None
|
access_token = tok_resp.get("access_token") if isinstance(tok_resp, dict) else None
|
||||||
assert isinstance(access_token, str) and access_token.count(".") == 2, (
|
assert (
|
||||||
f"client_credentials access_token not a JWT: {access_token!r}"
|
isinstance(access_token, str) and access_token.count(".") == 2
|
||||||
)
|
), f"client_credentials access_token not a JWT: {access_token!r}"
|
||||||
|
|
||||||
# Decode the JWT payload; assert azp matches the new client
|
# Decode the JWT payload; assert azp matches the new client
|
||||||
payload = json.loads(_b64url_decode(access_token.split(".")[1]))
|
payload = json.loads(_b64url_decode(access_token.split(".")[1]))
|
||||||
assert payload.get("azp") == client_id, (
|
assert (
|
||||||
f"client_credentials JWT azp={payload.get('azp')!r} != client_id={client_id!r}"
|
payload.get("azp") == client_id
|
||||||
)
|
), f"client_credentials JWT azp={payload.get('azp')!r} != client_id={client_id!r}"
|
||||||
# Service-account token does NOT carry a session-scoped user (azp + clientId differ from
|
# Service-account token does NOT carry a session-scoped user (azp + clientId differ from
|
||||||
# admin-cli token). The presence of azp + iss == per-run-domain proves the issuance flow.
|
# admin-cli token). The presence of azp + iss == per-run-domain proves the issuance flow.
|
||||||
expected_iss = f"https://{live_app}/realms/master"
|
expected_iss = f"https://{live_app}/realms/master"
|
||||||
assert payload.get("iss") == expected_iss, (
|
assert (
|
||||||
f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
|
payload.get("iss") == expected_iss
|
||||||
)
|
), f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
|
||||||
finally:
|
finally:
|
||||||
# Idempotent cleanup
|
# Idempotent cleanup
|
||||||
if cleanup_id:
|
if cleanup_id:
|
||||||
|
|||||||
@ -43,22 +43,20 @@ def test_password_grant_issues_valid_jwt(live_app):
|
|||||||
token = kc_admin.admin_token(live_app, password)
|
token = kc_admin.admin_token(live_app, password)
|
||||||
|
|
||||||
# Shape: a JWT is exactly 3 base64url segments
|
# Shape: a JWT is exactly 3 base64url segments
|
||||||
assert isinstance(token, str) and token.count(".") == 2, (
|
assert (
|
||||||
f"access_token does not look like a JWT (no 3 segments): len={len(token) if token else 0}"
|
isinstance(token, str) and token.count(".") == 2
|
||||||
)
|
), f"access_token does not look like a JWT (no 3 segments): len={len(token) if token else 0}"
|
||||||
|
|
||||||
payload = _decode_jwt_payload(token)
|
payload = _decode_jwt_payload(token)
|
||||||
|
|
||||||
# iss = the issuer URL, must be the per-run domain's /realms/master endpoint
|
# iss = the issuer URL, must be the per-run domain's /realms/master endpoint
|
||||||
expected_iss = f"https://{live_app}/realms/master"
|
expected_iss = f"https://{live_app}/realms/master"
|
||||||
assert payload.get("iss") == expected_iss, (
|
assert (
|
||||||
f"JWT iss claim {payload.get('iss')!r} != {expected_iss!r}"
|
payload.get("iss") == expected_iss
|
||||||
)
|
), f"JWT iss claim {payload.get('iss')!r} != {expected_iss!r}"
|
||||||
|
|
||||||
# azp = authorized party (which client requested this token)
|
# azp = authorized party (which client requested this token)
|
||||||
assert payload.get("azp") == "admin-cli", (
|
assert payload.get("azp") == "admin-cli", f"JWT azp claim {payload.get('azp')!r} != 'admin-cli'"
|
||||||
f"JWT azp claim {payload.get('azp')!r} != 'admin-cli'"
|
|
||||||
)
|
|
||||||
|
|
||||||
# typ = token type
|
# typ = token type
|
||||||
assert payload.get("typ") == "Bearer", f"JWT typ claim {payload.get('typ')!r} != 'Bearer'"
|
assert payload.get("typ") == "Bearer", f"JWT typ claim {payload.get('typ')!r} != 'Bearer'"
|
||||||
@ -70,6 +68,6 @@ def test_password_grant_issues_valid_jwt(live_app):
|
|||||||
|
|
||||||
# iat (issued at) is also a standard claim
|
# iat (issued at) is also a standard claim
|
||||||
iat = payload.get("iat")
|
iat = payload.get("iat")
|
||||||
assert isinstance(iat, int) and iat <= time.time() + 60, (
|
assert (
|
||||||
f"JWT iat {iat!r} not a reasonable past timestamp"
|
isinstance(iat, int) and iat <= time.time() + 60
|
||||||
)
|
), f"JWT iat {iat!r} not a reasonable past timestamp"
|
||||||
|
|||||||
@ -2,5 +2,7 @@
|
|||||||
# conftest — enrolling this recipe needs NO change to runner/harness code (D5).
|
# conftest — enrolling this recipe needs NO change to runner/harness code (D5).
|
||||||
HEALTH_PATH = "/realms/master" # 200 JSON once keycloak is up (not "/", which redirects)
|
HEALTH_PATH = "/realms/master" # 200 JSON once keycloak is up (not "/", which redirects)
|
||||||
HEALTH_OK = (200,)
|
HEALTH_OK = (200,)
|
||||||
DEPLOY_TIMEOUT = 900 # JVM + DB migration are slow on a 2-vCPU VM; observed 502 fallback up to ~10min
|
DEPLOY_TIMEOUT = (
|
||||||
|
900 # JVM + DB migration are slow on a 2-vCPU VM; observed 502 fallback up to ~10min
|
||||||
|
)
|
||||||
HTTP_TIMEOUT = 900
|
HTTP_TIMEOUT = 900
|
||||||
|
|||||||
@ -8,7 +8,8 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
from harness import browser as harness_browser, generic, lifecycle # noqa: E402
|
from harness import browser as harness_browser # noqa: E402
|
||||||
|
from harness import generic, lifecycle
|
||||||
|
|
||||||
|
|
||||||
def test_serving_and_admin_console(live_app, meta):
|
def test_serving_and_admin_console(live_app, meta):
|
||||||
|
|||||||
@ -28,9 +28,7 @@ def test_users_me_requires_auth(live_app):
|
|||||||
url = f"https://{live_app}/api/v1.0/users/me/"
|
url = f"https://{live_app}/api/v1.0/users/me/"
|
||||||
# Retry with broad acceptance: any 4xx (or specific 401) indicates the route exists + auth is
|
# Retry with broad acceptance: any 4xx (or specific 401) indicates the route exists + auth is
|
||||||
# required. Reject 200 (anonymous access) and 5xx (broken backend).
|
# required. Reject 200 (anonymous access) and 5xx (broken backend).
|
||||||
status, _ = harness_http.retry_http_get(
|
status, _ = harness_http.retry_http_get(url, expect_status=(401, 403), max_wait=60, interval=3)
|
||||||
url, expect_status=(401, 403), max_wait=60, interval=3
|
|
||||||
)
|
|
||||||
assert status in (401, 403), (
|
assert status in (401, 403), (
|
||||||
f"GET {url} returned {status}, expected 401 (auth required). "
|
f"GET {url} returned {status}, expected 401 (auth required). "
|
||||||
f"200 = anonymous access leaked; 404 = route missing; 5xx = backend broken."
|
f"200 = anonymous access leaked; 404 = route missing; 5xx = backend broken."
|
||||||
|
|||||||
@ -27,7 +27,8 @@ import uuid
|
|||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
||||||
from harness import http as harness_http, sso # noqa: E402
|
from harness import http as harness_http # noqa: E402
|
||||||
|
from harness import sso
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.requires_deps
|
@pytest.mark.requires_deps
|
||||||
@ -36,13 +37,15 @@ def test_create_doc_and_read_back(live_app, deps_creds):
|
|||||||
kc = deps_creds["keycloak"]
|
kc = deps_creds["keycloak"]
|
||||||
|
|
||||||
# Obtain a JWT via OIDC password grant
|
# Obtain a JWT via OIDC password grant
|
||||||
access_token = sso.oidc_password_grant({
|
access_token = sso.oidc_password_grant(
|
||||||
"client_id": kc["client_id"],
|
{
|
||||||
"client_secret": kc["client_secret"],
|
"client_id": kc["client_id"],
|
||||||
"user": kc["user"],
|
"client_secret": kc["client_secret"],
|
||||||
"password": kc["password"],
|
"user": kc["user"],
|
||||||
"token_url": kc["token_url"],
|
"password": kc["password"],
|
||||||
})
|
"token_url": kc["token_url"],
|
||||||
|
}
|
||||||
|
)
|
||||||
auth = {"Authorization": f"Bearer {access_token}"}
|
auth = {"Authorization": f"Bearer {access_token}"}
|
||||||
|
|
||||||
# Create a doc with a unique title
|
# Create a doc with a unique title
|
||||||
@ -56,9 +59,9 @@ def test_create_doc_and_read_back(live_app, deps_creds):
|
|||||||
assert isinstance(body, dict), f"unexpected response shape: {body!r}"
|
assert isinstance(body, dict), f"unexpected response shape: {body!r}"
|
||||||
doc_id = body.get("id")
|
doc_id = body.get("id")
|
||||||
assert doc_id, f"created doc has no id: {body!r}"
|
assert doc_id, f"created doc has no id: {body!r}"
|
||||||
assert body.get("title") == title, (
|
assert (
|
||||||
f"created doc title mismatch: created={title!r}, response={body.get('title')!r}"
|
body.get("title") == title
|
||||||
)
|
), f"created doc title mismatch: created={title!r}, response={body.get('title')!r}"
|
||||||
|
|
||||||
# Fetch it back via the dedicated GET endpoint
|
# Fetch it back via the dedicated GET endpoint
|
||||||
s, fetched = harness_http.http_get(
|
s, fetched = harness_http.http_get(
|
||||||
@ -66,9 +69,10 @@ def test_create_doc_and_read_back(live_app, deps_creds):
|
|||||||
)
|
)
|
||||||
assert s == 200, f"GET /api/v1.0/documents/{doc_id}/ HTTP {s}: {fetched!r}"
|
assert s == 200, f"GET /api/v1.0/documents/{doc_id}/ HTTP {s}: {fetched!r}"
|
||||||
assert isinstance(fetched, dict), f"unexpected GET response: {fetched!r}"
|
assert isinstance(fetched, dict), f"unexpected GET response: {fetched!r}"
|
||||||
assert fetched.get("id") in (doc_id, str(doc_id)), (
|
assert fetched.get("id") in (
|
||||||
f"fetched id mismatch: created={doc_id!r}, fetched={fetched.get('id')!r}"
|
doc_id,
|
||||||
)
|
str(doc_id),
|
||||||
assert fetched.get("title") == title, (
|
), f"fetched id mismatch: created={doc_id!r}, fetched={fetched.get('id')!r}"
|
||||||
f"fetched title mismatch: created={title!r}, fetched={fetched.get('title')!r}"
|
assert (
|
||||||
)
|
fetched.get("title") == title
|
||||||
|
), f"fetched title mismatch: created={title!r}, fetched={fetched.get('title')!r}"
|
||||||
|
|||||||
@ -22,7 +22,11 @@ def test_lasuite_docs_returns_200(live_app):
|
|||||||
url = f"https://{live_app}/"
|
url = f"https://{live_app}/"
|
||||||
# accept 200 (frontend SPA shell) — lasuite-docs serves the SPA at root unauthenticated;
|
# accept 200 (frontend SPA shell) — lasuite-docs serves the SPA at root unauthenticated;
|
||||||
# the SPA itself bootstraps via /api/v1.0/users/me/ which requires OIDC (separate test).
|
# the SPA itself bootstraps via /api/v1.0/users/me/ which requires OIDC (separate test).
|
||||||
status, _ = harness_http.retry_http_get(url, expect_status=(200, 301, 302), max_wait=60, interval=3)
|
status, _ = harness_http.retry_http_get(
|
||||||
assert status in (200, 301, 302), (
|
url, expect_status=(200, 301, 302), max_wait=60, interval=3
|
||||||
f"lasuite-docs at {url} returned HTTP {status} (expected 200/301/302)"
|
|
||||||
)
|
)
|
||||||
|
assert status in (
|
||||||
|
200,
|
||||||
|
301,
|
||||||
|
302,
|
||||||
|
), f"lasuite-docs at {url} returned HTTP {status} (expected 200/301/302)"
|
||||||
|
|||||||
@ -25,7 +25,8 @@ import urllib.request
|
|||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
|
||||||
from harness import http as harness_http, sso # noqa: E402
|
from harness import http as harness_http # noqa: E402
|
||||||
|
from harness import sso
|
||||||
|
|
||||||
_CTX = ssl.create_default_context()
|
_CTX = ssl.create_default_context()
|
||||||
_CTX.check_hostname = False
|
_CTX.check_hostname = False
|
||||||
@ -61,9 +62,9 @@ def test_oidc_login_via_keycloak(live_app, deps_creds):
|
|||||||
# 302 redirect. Both are valid "auth-required" indicators — accept either, but if a
|
# 302 redirect. Both are valid "auth-required" indicators — accept either, but if a
|
||||||
# redirect is returned it must point at the dep keycloak realm.
|
# redirect is returned it must point at the dep keycloak realm.
|
||||||
if status in (301, 302, 303, 307, 308):
|
if status in (301, 302, 303, 307, 308):
|
||||||
assert expected_prefix in (redirect or ""), (
|
assert expected_prefix in (
|
||||||
f"Docs redirected to {redirect!r}, expected to start with {expected_prefix!r}"
|
redirect or ""
|
||||||
)
|
), f"Docs redirected to {redirect!r}, expected to start with {expected_prefix!r}"
|
||||||
else:
|
else:
|
||||||
assert status in (401, 403), (
|
assert status in (401, 403), (
|
||||||
f"GET /api/v1.0/users/me/ unauth: HTTP {status}; expected redirect to keycloak "
|
f"GET /api/v1.0/users/me/ unauth: HTTP {status}; expected redirect to keycloak "
|
||||||
@ -88,6 +89,6 @@ def test_oidc_login_via_keycloak(live_app, deps_creds):
|
|||||||
)
|
)
|
||||||
assert status == 200, f"GET /api/v1.0/users/me/ with token HTTP {status}: {body!r}"
|
assert status == 200, f"GET /api/v1.0/users/me/ with token HTTP {status}: {body!r}"
|
||||||
assert isinstance(body, dict), f"unexpected response: {body!r}"
|
assert isinstance(body, dict), f"unexpected response: {body!r}"
|
||||||
assert body.get("email") == kc["email"], (
|
assert (
|
||||||
f"unexpected user email: got {body.get('email')!r}, expected {kc['email']!r}"
|
body.get("email") == kc["email"]
|
||||||
)
|
), f"unexpected user email: got {body.get('email')!r}, expected {kc['email']!r}"
|
||||||
|
|||||||
@ -42,9 +42,9 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
|
|||||||
# Sanity-check the creds shape — orchestrator-written
|
# Sanity-check the creds shape — orchestrator-written
|
||||||
assert kc["domain"]
|
assert kc["domain"]
|
||||||
# WC1: realm is per-run namespaced "<parent>-<6hex>" so concurrent dependents never collide.
|
# WC1: realm is per-run namespaced "<parent>-<6hex>" so concurrent dependents never collide.
|
||||||
assert re.fullmatch(r"lasuite-docs-[0-9a-f]{6}", kc["realm"]), (
|
assert re.fullmatch(
|
||||||
f"realm {kc['realm']!r} not the per-run namespaced form lasuite-docs-<6hex>"
|
r"lasuite-docs-[0-9a-f]{6}", kc["realm"]
|
||||||
)
|
), f"realm {kc['realm']!r} not the per-run namespaced form lasuite-docs-<6hex>"
|
||||||
assert kc["client_id"] == "lasuite-docs"
|
assert kc["client_id"] == "lasuite-docs"
|
||||||
assert isinstance(kc["client_secret"], str) and len(kc["client_secret"]) >= 16
|
assert isinstance(kc["client_secret"], str) and len(kc["client_secret"]) >= 16
|
||||||
assert isinstance(kc["password"], str) and len(kc["password"]) >= 16
|
assert isinstance(kc["password"], str) and len(kc["password"]) >= 16
|
||||||
@ -74,16 +74,14 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
|
|||||||
|
|
||||||
# Password grant → real JWT
|
# Password grant → real JWT
|
||||||
token = sso.oidc_password_grant(creds)
|
token = sso.oidc_password_grant(creds)
|
||||||
assert isinstance(token, str) and token.count(".") == 2, (
|
assert isinstance(token, str) and token.count(".") == 2, f"access_token is not a JWT: {token!r}"
|
||||||
f"access_token is not a JWT: {token!r}"
|
|
||||||
)
|
|
||||||
payload = json.loads(_b64url_decode(token.split(".")[1]))
|
payload = json.loads(_b64url_decode(token.split(".")[1]))
|
||||||
assert payload.get("iss") == expected_iss, f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
|
assert payload.get("iss") == expected_iss, f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
|
||||||
assert payload.get("azp") == kc["client_id"], (
|
assert (
|
||||||
f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
|
payload.get("azp") == kc["client_id"]
|
||||||
)
|
), f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
|
||||||
assert payload.get("typ") == "Bearer", f"JWT typ={payload.get('typ')!r} != 'Bearer'"
|
assert payload.get("typ") == "Bearer", f"JWT typ={payload.get('typ')!r} != 'Bearer'"
|
||||||
exp = payload.get("exp")
|
exp = payload.get("exp")
|
||||||
assert isinstance(exp, int) and exp > time.time(), (
|
assert (
|
||||||
f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
|
isinstance(exp, int) and exp > time.time()
|
||||||
)
|
), f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
|
||||||
|
|||||||
@ -21,15 +21,24 @@ set -euo pipefail
|
|||||||
|
|
||||||
: "${CCCI_APP_DOMAIN:?missing}"
|
: "${CCCI_APP_DOMAIN:?missing}"
|
||||||
: "${CCCI_DEPS_FILE:?missing}"
|
: "${CCCI_DEPS_FILE:?missing}"
|
||||||
test -s "$CCCI_DEPS_FILE" || { echo " setup_custom_tests: deps file empty"; exit 1; }
|
test -s "$CCCI_DEPS_FILE" || {
|
||||||
|
echo " setup_custom_tests: deps file empty"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
# Read keycloak dep info via jq
|
# Read keycloak dep info via jq
|
||||||
KC_DOMAIN=$(jq -r '.keycloak.domain' "$CCCI_DEPS_FILE")
|
KC_DOMAIN=$(jq -r '.keycloak.domain' "$CCCI_DEPS_FILE")
|
||||||
KC_REALM=$( jq -r '.keycloak.realm' "$CCCI_DEPS_FILE")
|
KC_REALM=$(jq -r '.keycloak.realm' "$CCCI_DEPS_FILE")
|
||||||
KC_CLIENT=$(jq -r '.keycloak.client_id' "$CCCI_DEPS_FILE")
|
KC_CLIENT=$(jq -r '.keycloak.client_id' "$CCCI_DEPS_FILE")
|
||||||
KC_SECRET=$(jq -r '.keycloak.client_secret' "$CCCI_DEPS_FILE")
|
KC_SECRET=$(jq -r '.keycloak.client_secret' "$CCCI_DEPS_FILE")
|
||||||
[ -n "$KC_DOMAIN" ] && [ "$KC_DOMAIN" != "null" ] || { echo " setup_custom_tests: no keycloak.domain in deps"; exit 1; }
|
if [ -z "$KC_DOMAIN" ] || [ "$KC_DOMAIN" = "null" ]; then
|
||||||
[ -n "$KC_SECRET" ] && [ "$KC_SECRET" != "null" ] || { echo " setup_custom_tests: no keycloak.client_secret"; exit 1; }
|
echo " setup_custom_tests: no keycloak.domain in deps"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if [ -z "$KC_SECRET" ] || [ "$KC_SECRET" = "null" ]; then
|
||||||
|
echo " setup_custom_tests: no keycloak.client_secret"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
echo " lasuite-docs setup_custom_tests: wiring OIDC against keycloak dep ${KC_DOMAIN}"
|
echo " lasuite-docs setup_custom_tests: wiring OIDC against keycloak dep ${KC_DOMAIN}"
|
||||||
|
|
||||||
@ -39,12 +48,15 @@ echo " lasuite-docs setup_custom_tests: wiring OIDC against keycloak dep ${KC_D
|
|||||||
# update SECRET_OIDC_RPCS_VERSION in the .env to point at the new one.
|
# update SECRET_OIDC_RPCS_VERSION in the .env to point at the new one.
|
||||||
ENV_PATH="$HOME/.abra/servers/default/${CCCI_APP_DOMAIN}.env"
|
ENV_PATH="$HOME/.abra/servers/default/${CCCI_APP_DOMAIN}.env"
|
||||||
CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
|
CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
|
||||||
NEW_NUM=$(( ${CUR_VER#v} + 1 ))
|
NEW_NUM=$((${CUR_VER#v} + 1))
|
||||||
NEW_VER="v${NEW_NUM}"
|
NEW_VER="v${NEW_NUM}"
|
||||||
|
|
||||||
INSERT_LOG=$(abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o 2>&1) \
|
INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) ||
|
||||||
|| INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) \
|
INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) ||
|
||||||
|| { echo " setup_custom_tests: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"; exit 1; }
|
{
|
||||||
|
echo " setup_custom_tests: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
# Repoint the env var to the new version
|
# Repoint the env var to the new version
|
||||||
sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
|
sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
|
||||||
echo " setup_custom_tests: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
|
echo " setup_custom_tests: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
|
||||||
@ -52,25 +64,25 @@ echo " setup_custom_tests: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)
|
|||||||
# 2) Write OIDC env vars to the app's .env (names per lasuite-docs's .env.sample).
|
# 2) Write OIDC env vars to the app's .env (names per lasuite-docs's .env.sample).
|
||||||
# Ensure the file ends with a newline FIRST so our appends don't concatenate onto the last line
|
# Ensure the file ends with a newline FIRST so our appends don't concatenate onto the last line
|
||||||
# (we saw `TIMEOUT=900OIDC_REALM=...` malformed by a missing-trailing-newline file).
|
# (we saw `TIMEOUT=900OIDC_REALM=...` malformed by a missing-trailing-newline file).
|
||||||
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >> "$ENV_PATH"
|
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
|
||||||
write_env () {
|
write_env() {
|
||||||
local key="$1" val="$2"
|
local key="$1" val="$2"
|
||||||
# remove any existing key (commented or live) then append the live key=val
|
# remove any existing key (commented or live) then append the live key=val
|
||||||
sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
|
sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
|
||||||
# Re-ensure trailing newline after each delete (sed may leave the file without one)
|
# Re-ensure trailing newline after each delete (sed may leave the file without one)
|
||||||
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >> "$ENV_PATH"
|
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
|
||||||
printf '%s=%s\n' "$key" "$val" >> "$ENV_PATH"
|
printf '%s=%s\n' "$key" "$val" >>"$ENV_PATH"
|
||||||
}
|
}
|
||||||
write_env OIDC_REALM "$KC_REALM"
|
write_env OIDC_REALM "$KC_REALM"
|
||||||
write_env OIDC_OP_DISCOVERY_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration"
|
write_env OIDC_OP_DISCOVERY_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration"
|
||||||
write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
|
write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
|
||||||
write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
|
write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
|
||||||
write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
|
write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
|
||||||
write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
|
write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
|
||||||
write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
|
write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
|
||||||
write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
|
write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
|
||||||
write_env OIDC_RP_SIGN_ALGO "RS256"
|
write_env OIDC_RP_SIGN_ALGO "RS256"
|
||||||
write_env OIDC_RP_SCOPES "openid email profile"
|
write_env OIDC_RP_SCOPES "openid email profile"
|
||||||
|
|
||||||
# 3) Trigger an in-place redeploy so the env update takes effect. --force re-deploys even when
|
# 3) Trigger an in-place redeploy so the env update takes effect. --force re-deploys even when
|
||||||
# the recipe hasn't changed; --chaos avoids the chaos prompt; --no-input non-interactive.
|
# the recipe hasn't changed; --chaos avoids the chaos prompt; --no-input non-interactive.
|
||||||
|
|||||||
@ -10,7 +10,8 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
from harness import browser as harness_browser, generic, lifecycle # noqa: E402
|
from harness import browser as harness_browser # noqa: E402
|
||||||
|
from harness import generic, lifecycle
|
||||||
|
|
||||||
|
|
||||||
def test_serving_and_frontend(live_app, meta):
|
def test_serving_and_frontend(live_app, meta):
|
||||||
|
|||||||
@ -25,6 +25,8 @@ def test_lasuite_drive_returns_200(live_app):
|
|||||||
status, _ = harness_http.retry_http_get(
|
status, _ = harness_http.retry_http_get(
|
||||||
url, expect_status=(200, 301, 302), max_wait=60, interval=3
|
url, expect_status=(200, 301, 302), max_wait=60, interval=3
|
||||||
)
|
)
|
||||||
assert status in (200, 301, 302), (
|
assert status in (
|
||||||
f"lasuite-drive at {url} returned HTTP {status} (expected 200/301/302)"
|
200,
|
||||||
)
|
301,
|
||||||
|
302,
|
||||||
|
), f"lasuite-drive at {url} returned HTTP {status} (expected 200/301/302)"
|
||||||
|
|||||||
@ -29,8 +29,8 @@ BUCKET = "drive-media-storage"
|
|||||||
def _mc(domain: str, script: str) -> str:
|
def _mc(domain: str, script: str) -> str:
|
||||||
"""Run an `mc` shell script inside the minio container (root creds from /run/secrets)."""
|
"""Run an `mc` shell script inside the minio container (root creds from /run/secrets)."""
|
||||||
prelude = (
|
prelude = (
|
||||||
'set -e; '
|
"set -e; "
|
||||||
'U=$(cat /run/secrets/minio_ru); P=$(cat /run/secrets/minio_rp); '
|
"U=$(cat /run/secrets/minio_ru); P=$(cat /run/secrets/minio_rp); "
|
||||||
'mc alias set ccci http://localhost:9000 "$U" "$P" >/dev/null 2>&1; '
|
'mc alias set ccci http://localhost:9000 "$U" "$P" >/dev/null 2>&1; '
|
||||||
)
|
)
|
||||||
return lifecycle.exec_in_app(domain, ["sh", "-c", prelude + script], service="minio")
|
return lifecycle.exec_in_app(domain, ["sh", "-c", prelude + script], service="minio")
|
||||||
@ -49,13 +49,13 @@ def test_minio_bucket_present_and_object_roundtrip(live_app):
|
|||||||
domain,
|
domain,
|
||||||
# upload via stdin; list the object; read it back (tagged); then delete.
|
# upload via stdin; list the object; read it back (tagged); then delete.
|
||||||
f'printf %s "{marker}" | mc pipe ccci/{BUCKET}/{key} >/dev/null 2>&1; '
|
f'printf %s "{marker}" | mc pipe ccci/{BUCKET}/{key} >/dev/null 2>&1; '
|
||||||
f'mc ls ccci/{BUCKET}/{key}; '
|
f"mc ls ccci/{BUCKET}/{key}; "
|
||||||
f'echo "READBACK:$(mc cat ccci/{BUCKET}/{key})"; '
|
f'echo "READBACK:$(mc cat ccci/{BUCKET}/{key})"; '
|
||||||
f'mc rm ccci/{BUCKET}/{key} >/dev/null 2>&1',
|
f"mc rm ccci/{BUCKET}/{key} >/dev/null 2>&1",
|
||||||
)
|
)
|
||||||
|
|
||||||
# The object was listed (its key appears) and its content round-tripped intact.
|
# The object was listed (its key appears) and its content round-tripped intact.
|
||||||
assert f"{marker}.txt" in out, f"uploaded object not listed in bucket: {out!r}"
|
assert f"{marker}.txt" in out, f"uploaded object not listed in bucket: {out!r}"
|
||||||
assert f"READBACK:{marker}" in out, (
|
assert (
|
||||||
f"object content did not round-trip through MinIO; got: {out!r}"
|
f"READBACK:{marker}" in out
|
||||||
)
|
), f"object content did not round-trip through MinIO; got: {out!r}"
|
||||||
|
|||||||
@ -46,9 +46,9 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
|
|||||||
|
|
||||||
# Creds shape. WC1: realm is per-run namespaced "<parent>-<6hex>"; client_id stays the parent.
|
# Creds shape. WC1: realm is per-run namespaced "<parent>-<6hex>"; client_id stays the parent.
|
||||||
assert kc["domain"]
|
assert kc["domain"]
|
||||||
assert re.fullmatch(r"lasuite-drive-[0-9a-f]{6}", kc["realm"]), (
|
assert re.fullmatch(
|
||||||
f"realm {kc['realm']!r} not the per-run namespaced form lasuite-drive-<6hex>"
|
r"lasuite-drive-[0-9a-f]{6}", kc["realm"]
|
||||||
)
|
), f"realm {kc['realm']!r} not the per-run namespaced form lasuite-drive-<6hex>"
|
||||||
assert kc["client_id"] == "lasuite-drive"
|
assert kc["client_id"] == "lasuite-drive"
|
||||||
assert isinstance(kc["client_secret"], str) and len(kc["client_secret"]) >= 16
|
assert isinstance(kc["client_secret"], str) and len(kc["client_secret"]) >= 16
|
||||||
assert isinstance(kc["password"], str) and len(kc["password"]) >= 16
|
assert isinstance(kc["password"], str) and len(kc["password"]) >= 16
|
||||||
@ -77,16 +77,14 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
|
|||||||
|
|
||||||
# Password grant → real JWT
|
# Password grant → real JWT
|
||||||
token = sso.oidc_password_grant(creds)
|
token = sso.oidc_password_grant(creds)
|
||||||
assert isinstance(token, str) and token.count(".") == 2, (
|
assert isinstance(token, str) and token.count(".") == 2, f"access_token is not a JWT: {token!r}"
|
||||||
f"access_token is not a JWT: {token!r}"
|
|
||||||
)
|
|
||||||
payload = json.loads(_b64url_decode(token.split(".")[1]))
|
payload = json.loads(_b64url_decode(token.split(".")[1]))
|
||||||
assert payload.get("iss") == expected_iss, f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
|
assert payload.get("iss") == expected_iss, f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
|
||||||
assert payload.get("azp") == kc["client_id"], (
|
assert (
|
||||||
f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
|
payload.get("azp") == kc["client_id"]
|
||||||
)
|
), f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
|
||||||
assert payload.get("typ") == "Bearer", f"JWT typ={payload.get('typ')!r} != 'Bearer'"
|
assert payload.get("typ") == "Bearer", f"JWT typ={payload.get('typ')!r} != 'Bearer'"
|
||||||
exp = payload.get("exp")
|
exp = payload.get("exp")
|
||||||
assert isinstance(exp, int) and exp > time.time(), (
|
assert (
|
||||||
f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
|
isinstance(exp, int) and exp > time.time()
|
||||||
)
|
), f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
|
||||||
|
|||||||
@ -28,7 +28,7 @@ if [ -z "${CCCI_DEPS_FILE:-}" ] || [ ! -s "${CCCI_DEPS_FILE}" ]; then
|
|||||||
exit 0
|
exit 0
|
||||||
fi
|
fi
|
||||||
KC_DOMAIN=$(jq -r '.keycloak.domain // empty' "$CCCI_DEPS_FILE")
|
KC_DOMAIN=$(jq -r '.keycloak.domain // empty' "$CCCI_DEPS_FILE")
|
||||||
KC_REALM=$( jq -r '.keycloak.realm // empty' "$CCCI_DEPS_FILE")
|
KC_REALM=$(jq -r '.keycloak.realm // empty' "$CCCI_DEPS_FILE")
|
||||||
KC_CLIENT=$(jq -r '.keycloak.client_id // empty' "$CCCI_DEPS_FILE")
|
KC_CLIENT=$(jq -r '.keycloak.client_id // empty' "$CCCI_DEPS_FILE")
|
||||||
KC_SECRET=$(jq -r '.keycloak.client_secret // empty' "$CCCI_DEPS_FILE")
|
KC_SECRET=$(jq -r '.keycloak.client_secret // empty' "$CCCI_DEPS_FILE")
|
||||||
if [ -z "$KC_DOMAIN" ] || [ -z "$KC_SECRET" ]; then
|
if [ -z "$KC_DOMAIN" ] || [ -z "$KC_SECRET" ]; then
|
||||||
@ -43,35 +43,38 @@ echo " lasuite-drive install_steps: wiring OIDC at install against keycloak ${K
|
|||||||
# point SECRET_OIDC_RPCS_VERSION at it. (The app is not deployed yet — a swarm secret can be created
|
# point SECRET_OIDC_RPCS_VERSION at it. (The app is not deployed yet — a swarm secret can be created
|
||||||
# independently of a running stack — so the single deploy below picks up v2.)
|
# independently of a running stack — so the single deploy below picks up v2.)
|
||||||
CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
|
CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
|
||||||
NEW_NUM=$(( ${CUR_VER#v} + 1 ))
|
NEW_NUM=$((${CUR_VER#v} + 1))
|
||||||
NEW_VER="v${NEW_NUM}"
|
NEW_VER="v${NEW_NUM}"
|
||||||
INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) \
|
INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) ||
|
||||||
|| INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) \
|
INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) ||
|
||||||
|| { echo " install_steps: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"; exit 1; }
|
{
|
||||||
|
echo " install_steps: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
|
sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
|
||||||
echo " install_steps: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
|
echo " install_steps: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
|
||||||
|
|
||||||
# 2) Write the OIDC env vars (explicit endpoints — deterministic, no reliance on ${AUTH_DOMAIN}
|
# 2) Write the OIDC env vars (explicit endpoints — deterministic, no reliance on ${AUTH_DOMAIN}
|
||||||
# expansion). Mirrors the recipe-maintainer impress/La Suite OIDC env contract.
|
# expansion). Mirrors the recipe-maintainer impress/La Suite OIDC env contract.
|
||||||
write_env () {
|
write_env() {
|
||||||
local key="$1" val="$2"
|
local key="$1" val="$2"
|
||||||
sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
|
sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
|
||||||
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >> "$ENV_PATH"
|
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
|
||||||
printf '%s=%s\n' "$key" "$val" >> "$ENV_PATH"
|
printf '%s=%s\n' "$key" "$val" >>"$ENV_PATH"
|
||||||
}
|
}
|
||||||
write_env AUTH_DOMAIN "$KC_DOMAIN"
|
write_env AUTH_DOMAIN "$KC_DOMAIN"
|
||||||
write_env OIDC_REALM "$KC_REALM"
|
write_env OIDC_REALM "$KC_REALM"
|
||||||
write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
|
write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
|
||||||
write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
|
write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
|
||||||
write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
|
write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
|
||||||
write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
|
write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
|
||||||
write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
|
write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
|
||||||
write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
|
write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
|
||||||
write_env OIDC_RP_SIGN_ALGO "RS256"
|
write_env OIDC_RP_SIGN_ALGO "RS256"
|
||||||
write_env OIDC_RP_SCOPES "openid email profile"
|
write_env OIDC_RP_SCOPES "openid email profile"
|
||||||
write_env OIDC_REDIRECT_ALLOWED_HOSTS "[\"https://${KC_DOMAIN}\", \"https://${CCCI_APP_DOMAIN}\"]"
|
write_env OIDC_REDIRECT_ALLOWED_HOSTS "[\"https://${KC_DOMAIN}\", \"https://${CCCI_APP_DOMAIN}\"]"
|
||||||
# The recipe default acr_values=eidas1 is FranceConnect-specific; keycloak can't satisfy it and it
|
# The recipe default acr_values=eidas1 is FranceConnect-specific; keycloak can't satisfy it and it
|
||||||
# would break the interactive auth flow. Clear it so the keycloak OIDC client works.
|
# would break the interactive auth flow. Clear it so the keycloak OIDC client works.
|
||||||
write_env OIDC_AUTH_REQUEST_EXTRA_PARAMS "{}"
|
write_env OIDC_AUTH_REQUEST_EXTRA_PARAMS "{}"
|
||||||
|
|
||||||
echo " lasuite-drive install_steps: OIDC env wired into .env (deploy will pick it up, no reconverge)"
|
echo " lasuite-drive install_steps: OIDC env wired into .env (deploy will pick it up, no reconverge)"
|
||||||
|
|||||||
@ -29,7 +29,7 @@ docker service scale --detach "${STACK}_minio-createbuckets=1" >/dev/null 2>&1 |
|
|||||||
for i in $(seq 1 30); do
|
for i in $(seq 1 30); do
|
||||||
MC_CID=$(docker ps -q -f "name=${STACK}_minio.1" | head -1)
|
MC_CID=$(docker ps -q -f "name=${STACK}_minio.1" | head -1)
|
||||||
if [ -n "$MC_CID" ] && docker exec "$MC_CID" sh -c \
|
if [ -n "$MC_CID" ] && docker exec "$MC_CID" sh -c \
|
||||||
'mc alias set _c http://localhost:9000 "$(cat /run/secrets/minio_ru)" "$(cat /run/secrets/minio_rp)" >/dev/null 2>&1 && mc ls _c/drive-media-storage >/dev/null 2>&1'; then
|
'mc alias set _c http://localhost:9000 "$(cat /run/secrets/minio_ru)" "$(cat /run/secrets/minio_rp)" >/dev/null 2>&1 && mc ls _c/drive-media-storage >/dev/null 2>&1'; then
|
||||||
echo " setup: bucket drive-media-storage present after ${i} poll(s)"
|
echo " setup: bucket drive-media-storage present after ${i} poll(s)"
|
||||||
break
|
break
|
||||||
fi
|
fi
|
||||||
|
|||||||
@ -10,7 +10,8 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||||
from harness import browser as harness_browser, generic, lifecycle # noqa: E402
|
from harness import browser as harness_browser # noqa: E402
|
||||||
|
from harness import generic, lifecycle
|
||||||
|
|
||||||
|
|
||||||
def test_serving_and_frontend(live_app, meta):
|
def test_serving_and_frontend(live_app, meta):
|
||||||
|
|||||||
@ -21,6 +21,8 @@ def test_lasuite_meet_returns_200(live_app):
|
|||||||
status, _ = harness_http.retry_http_get(
|
status, _ = harness_http.retry_http_get(
|
||||||
url, expect_status=(200, 301, 302), max_wait=60, interval=3
|
url, expect_status=(200, 301, 302), max_wait=60, interval=3
|
||||||
)
|
)
|
||||||
assert status in (200, 301, 302), (
|
assert status in (
|
||||||
f"lasuite-meet at {url} returned HTTP {status} (expected 200/301/302)"
|
200,
|
||||||
)
|
301,
|
||||||
|
302,
|
||||||
|
), f"lasuite-meet at {url} returned HTTP {status} (expected 200/301/302)"
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user