M4: harness + green install stage (custom-html + Playwright); guaranteed teardown; M4 CLAIMED
All checks were successful
continuous-integration/drone/push Build is passing

run_recipe_ci.py + conftest + abra/lifecycle wrappers + Nix python/playwright env.
deploy_app forces LETS_ENCRYPT_ENV='' (addresses A1). Short per-run domain scheme
for the 64-char swarm name limit. 2 passed; teardown leaves zero orphans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-27 00:23:52 +01:00
parent 796b642519
commit 38a145fd9c
13 changed files with 447 additions and 6 deletions

View File

@ -39,8 +39,11 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver
- [ ] Gate: M3 — live demo on scratch PR; auth enforced
### M4 — Harness + install stage
- [ ] run_recipe_ci.py + conftest; install stage for recipe #1 + Playwright assertion; teardown
- [ ] Gate: M4 — green install run, no orphaned app/volume
- [x] run_recipe_ci.py + conftest + harness (abra wrappers, lifecycle) + Nix python/playwright env
(cc-ci-run); install stage for recipe #1 (custom-html) + Playwright assertion; guaranteed teardown
- [x] Gate: M4 — green install run, no orphaned app/volume → CLAIMED 2026-05-27, awaiting Adversary.
Repro: `cd /root/cc-ci && RECIPE=custom-html PR=0 REF=m4demo cc-ci-run runner/run_recipe_ci.py`
→ 2 passed (http 200 + playwright); teardown leaves services/volumes/secrets/containers/env = 0.
### M5 — Upgrade + backup/restore stages
- [ ] Add upgrade + backup/restore stages for recipe #1

View File

@ -91,6 +91,15 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
cryptpad (stateful no-DB), keycloak (SSO/DB), matrix-synapse (DB+media), lasuite-docs (multi+S3),
bluesky-pds (TLS-passthrough) — covers all five categories. Confirm during M4M6.5.
- **Per-run app domain scheme — adapted (M4, deviates from plan §4.0).** Plan §4.0 wanted
`<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`, but Docker swarm config/secret names
(`<stackname>_<resource>_<version>`) must be ≤ 64 chars and abra derives `<stackname>` from the
domain (dots→`_`, hyphens kept). `.ci.commoninternet.net` alone is 22 chars, so long recipe names
+ config names overflow 64 (hit with `custom-html-pr0-m4demo…_nginx_default_conf_v6` = 66). New
scheme: **`<recipe[:4]>-<6hex(recipe|pr|ref)>.ci.commoninternet.net`** (e.g. `cust-e084bd`) — short,
unique per run, collision-safe across recipes (full recipe in the hash). Human-readable recipe/PR/
ref context lives in the Drone build params + the PR comment, not the (ephemeral) domain.
## Risks
- **Disk — RESOLVED 2026-05-26.** Original 8.9 GiB root had only ~3.8 GiB free *and* a hard

View File

@ -311,3 +311,32 @@ Recorded in STATUS ## Blocked with operator options (whitelist host, or I pivot
**Plan:** surface to operator; meanwhile proceed to M4 (harness + install stage) which doesn't depend
on the webhook (dev recipe-CI builds triggerable directly via the Drone API). Revisit M3 gate once the
host is whitelisted or via the polling fallback.
## 2026-05-27 — M4: harness + install stage green (custom-html), guaranteed teardown
**Built the harness:** `runner/harness/abra.py` (abra wrappers w/ gotchas: no --chaos on
undeploy/volume-remove, `-n` everywhere, parse `app ls -S -m` nested {server:{apps}}, timeouts),
`runner/harness/lifecycle.py` (deploy_app forcing `LETS_ENCRYPT_ENV=""` [A1], wait_healthy =
services-converged + HTTPS, teardown_app = undeploy+volume+secret+env-config, janitor for orphans),
`tests/conftest.py` (`deployed_app` session fixture with finalizer teardown; short unique domain),
`tests/custom-html/test_install.py` (HTTP 200 + Playwright/Chromium content assertion),
`runner/run_recipe_ci.py` (orchestrator: fetch recipe@REF, run stage pytest), `modules/harness.nix`
(`cc-ci-run` = Nix python3+pytest+playwright with PLAYWRIGHT_BROWSERS_PATH from nixpkgs).
**Bugs fixed en route (3):**
1. Swarm config name > 64 chars (long domain) → switched to short `<recipe[:4]>-<6hex>` domain
scheme (DECISIONS.md).
2. `services_converged` used wrong stack name (replaced hyphens) → abra keeps hyphens, only dots→_.
3. `http_get` connected to the gateway IP (drops SNI, gateway routes by SNI) → use the real URL
(resolves to gateway on cc-ci, correct SNI). Also teardown now removes the app .env config.
**Green run + teardown (commands + output):**
- `RECIPE=custom-html PR=0 REF=m4demo cc-ci-run runner/run_recipe_ci.py` →
`tests/custom-html/test_install.py::test_http_reachable PASSED`,
`::test_playwright_page PASSED` — **2 passed in 57.99s**.
- Leak check after: services 0 / volumes 0 / secrets 0 / containers 0 / env config removed. Clean.
**A1 addressed:** deploy_app forces `LETS_ENCRYPT_ENV=""` (no ACME) on every deploy. M4 CLAIMED.
**M3 still blocked** (Gitea webhook delivery — operator); no response yet. Next: M5 (upgrade +
backup/restore for custom-html), then wire the parameterized Drone pipeline (API-triggerable).

View File

@ -1,8 +1,9 @@
# STATUS — cc-ci Builder
**Phase:** M2 complete & CLAIMED → starting M3 (comment bridge). M0+M1 PASS (Adversary). M2 awaiting verdict.
**In-flight:** M3 — comment-bridge service (!testme webhook → Drone build trigger).
**Last updated:** 2026-05-26 (M2 claimed, green build #1)
**Phase:** M4 complete & CLAIMED. M0/M1/M2 PASS. M3 gate BLOCKED (Gitea webhook delivery; operator).
M4 awaiting verdict. Next: M5 (upgrade + backup/restore for custom-html).
**In-flight:** M5 — add upgrade + backup/restore stages for recipe #1.
**Last updated:** 2026-05-27 (M4 claimed; install stage green)
## Gates
- **Gate: M0 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: flake rebuilds cc-ci from repo
@ -45,7 +46,9 @@
the M1 manual custom-html deploy; `scripts/deploy-drone.sh` will too). Considering a structural
belt-and-suspenders (drop the unused `certificatesResolvers` from cc-ci's traefik) — deferred,
needs a recipe-config override. Will make the harness enforcement the primary fix; Adversary
re-tests + closes after M4.
re-tests + closes after M4.**Now enforced**: `harness.lifecycle.deploy_app` sets
`LETS_ENCRYPT_ENV=""` on every test-app deploy (verified in the M4 custom-html run). Adversary can
re-test + close A1.
## Notes
- **Disk RESOLVED:** operator grew the VM 8.9→**28 GiB** (22 GiB free) on 2026-05-26. Inodes

View File

@ -13,6 +13,7 @@
../../modules/drone.nix
../../modules/drone-runner.nix
../../modules/bridge.nix
../../modules/harness.nix
];
# --- Tailscale (ACCESS-CRITICAL: do not break, this is the only route in) ---

20
modules/harness.nix Normal file
View File

@ -0,0 +1,20 @@
# CI harness runtime (M4): a reproducible Python env with pytest + Playwright and the
# Nix-provided browsers, exposed as `cc-ci-run` on the host so the Drone exec pipeline (and
# manual dev) can run the harness with `cc-ci-run runner/run_recipe_ci.py`. Playwright on NixOS
# needs the browsers from nixpkgs (not a downloaded copy) via PLAYWRIGHT_BROWSERS_PATH.
{ pkgs, ... }:
let
pyEnv = pkgs.python3.withPackages (ps: with ps; [ pytest playwright ]);
ccciRun = pkgs.writeShellApplication {
name = "cc-ci-run";
runtimeInputs = [ pyEnv pkgs.abra pkgs.docker pkgs.git pkgs.coreutils ];
text = ''
export PLAYWRIGHT_BROWSERS_PATH=${pkgs.playwright-driver.browsers}
export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
exec ${pyEnv}/bin/python3 "$@"
'';
};
in
{
environment.systemPackages = [ ccciRun ];
}

109
runner/harness/abra.py Normal file
View File

@ -0,0 +1,109 @@
"""Thin, robust wrappers around the `abra` CLI for the CI harness (plan §4.3).
Bakes in the known abra gotchas (re-verify per installed abra version, currently 0.13.0-beta):
- `abra app undeploy` / `abra app volume remove` do NOT accept `--chaos` → never pass it.
- plumb a `timeout` through secret generate/insert/remove calls.
- `abra app ls -S -m` returns nested {server: {apps: [...]}} — parse the inner structure.
- run non-interactively with `-n` (`--no-input`) everywhere.
"""
from __future__ import annotations
import json
import subprocess
from typing import Optional
ABRA = "abra"
class AbraError(RuntimeError):
pass
def _run(args: list[str], timeout: int = 300, check: bool = True) -> subprocess.CompletedProcess:
proc = subprocess.run(
[ABRA, *args],
capture_output=True,
text=True,
timeout=timeout,
)
if check and proc.returncode != 0:
raise AbraError(f"abra {' '.join(args)} failed ({proc.returncode}):\n{proc.stdout}\n{proc.stderr}")
return proc
def app_new(recipe: str, domain: str, server: str = "default", version: Optional[str] = None,
secrets: bool = False) -> None:
args = ["app", "new", recipe]
if version:
args.append(version)
args += ["-s", server, "-D", domain, "-n"]
if secrets:
args.append("-S")
_run(args)
def env_set(domain: str, key: str, value: str) -> None:
"""Set a key in the app's .env (abra has no setter; edit the file directly)."""
import os
import re
path = os.path.expanduser(f"~/.abra/servers/default/{domain}.env")
with open(path) as fh:
lines = fh.read().splitlines()
out, seen = [], False
pat = re.compile(rf"^\s*#?\s*{re.escape(key)}=")
for ln in lines:
if pat.match(ln):
out.append(f"{key}={value}")
seen = True
else:
out.append(ln)
if not seen:
out.append(f"{key}={value}")
with open(path, "w") as fh:
fh.write("\n".join(out) + "\n")
def secret_generate(domain: str, timeout: int = 300) -> None:
_run(["app", "secret", "generate", domain, "--all", "-n"], timeout=timeout, check=False)
def deploy(domain: str, chaos: bool = True, timeout: int = 900) -> None:
args = ["app", "deploy", domain, "-n"]
if chaos:
args.append("-C")
_run(args, timeout=timeout)
def undeploy(domain: str, timeout: int = 600) -> None:
# NB: no --chaos here (unsupported).
_run(["app", "undeploy", domain, "-n"], timeout=timeout, check=False)
def volume_remove(domain: str, timeout: int = 300) -> None:
# NB: no --chaos here (unsupported); -f to skip prompts.
_run(["app", "volume", "remove", domain, "-f", "-n"], timeout=timeout, check=False)
def secret_remove_all(domain: str, timeout: int = 300) -> None:
_run(["app", "secret", "remove", domain, "--all", "-n"], timeout=timeout, check=False)
def app_config_remove(domain: str, server: str = "default") -> None:
"""Delete the app's .env config so a re-run can recreate it (teardown completeness)."""
import os
path = os.path.expanduser(f"~/.abra/servers/{server}/{domain}.env")
try:
os.remove(path)
except FileNotFoundError:
pass
def app_ls(server: str = "default") -> list[dict]:
"""Parse `abra app ls -S -m` nested {server: {apps: [...]}} structure."""
proc = _run(["app", "ls", "-S", "-m", "-n"], check=False)
try:
data = json.loads(proc.stdout)
except (ValueError, json.JSONDecodeError):
return []
node = data.get(server) or {}
return node.get("apps", []) if isinstance(node, dict) else []

104
runner/harness/lifecycle.py Normal file
View File

@ -0,0 +1,104 @@
"""App lifecycle for the CI harness: deploy, wait-healthy, teardown, janitor (plan §4.3).
The teardown guarantee is sacred: a failed test must never leak an app/volume/secret into the
next run. Callers wrap deploy()/teardown() in try/finally (or a pytest finalizer).
"""
from __future__ import annotations
import ssl
import subprocess
import time
import urllib.request
from . import abra
GATEWAY_IP = "143.244.213.108" # *.ci.commoninternet.net -> gateway (TLS passthrough to cc-ci)
def deploy_app(recipe: str, domain: str, version: str | None = None, secrets: bool = True) -> None:
"""Create + configure + deploy an app. Forces LETS_ENCRYPT_ENV='' so traefik serves the
wildcard cert via the file provider and NEVER attempts ACME (adversary finding A1)."""
abra.app_config_remove(domain) # clear any stale .env from a prior crashed run
abra.app_new(recipe, domain, version=version, secrets=secrets)
abra.env_set(domain, "LETS_ENCRYPT_ENV", "")
if secrets:
abra.secret_generate(domain)
abra.deploy(domain)
def _stack_name(domain: str) -> str:
# abra derives the swarm stack name from the domain by replacing dots with underscores
# and KEEPING hyphens (e.g. custom-html-x.ci.commoninternet.net -> custom-html-x_ci_...).
return domain.replace(".", "_")
def services_converged(domain: str) -> bool:
"""True when every service in the stack reports replicas N/N (N>0)."""
stack = _stack_name(domain)
proc = subprocess.run(
["docker", "stack", "services", stack, "--format", "{{.Replicas}}"],
capture_output=True, text=True,
)
rows = [r for r in proc.stdout.split("\n") if r.strip()]
if not rows:
return False
for r in rows:
cur, _, want = r.partition("/")
if not want or cur != want or want == "0":
return False
return True
def http_get(domain: str, path: str = "/", timeout: int = 15) -> int:
"""HTTPS GET the app by its real hostname. On cc-ci the *.ci.commoninternet.net wildcard
resolves (public DNS) to the gateway, which SNI-passthroughs to cc-ci's traefik — so using
the real URL keeps SNI correct (connecting to the bare IP would drop SNI and fail to route)."""
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
req = urllib.request.Request(f"https://{domain}{path}", method="GET")
try:
with urllib.request.urlopen(req, timeout=timeout, context=ctx) as resp:
return resp.status
except urllib.error.HTTPError as e:
return e.code
except Exception:
return 0
def wait_healthy(domain: str, ok_codes=(200, 301, 302), deploy_timeout: int = 600,
http_timeout: int = 300) -> None:
"""Wait for stack services converged, then for the app to answer over HTTPS."""
deadline = time.time() + deploy_timeout
while time.time() < deadline:
if services_converged(domain):
break
time.sleep(5)
else:
raise TimeoutError(f"{domain}: services did not converge in {deploy_timeout}s")
deadline = time.time() + http_timeout
last = 0
while time.time() < deadline:
last = http_get(domain)
if last in ok_codes:
return
time.sleep(5)
raise TimeoutError(f"{domain}: not healthy over HTTPS (last status {last})")
def teardown_app(domain: str) -> None:
"""Idempotent, best-effort full teardown. Never raises (finalizer-safe)."""
abra.undeploy(domain)
abra.volume_remove(domain)
abra.secret_remove_all(domain)
abra.app_config_remove(domain)
def janitor(max_age_hours: int = 6) -> None:
"""Remove orphaned *-pr* apps left by crashed runs older than max_age_hours."""
for app in abra.app_ls():
name = app.get("appName") or app.get("domain") or ""
if "-pr" in name and ".ci.commoninternet.net" in name:
# best-effort; deployed-status/age detail varies by abra version
teardown_app(name)

80
runner/run_recipe_ci.py Normal file
View File

@ -0,0 +1,80 @@
#!/usr/bin/env python3
"""Top-level CI orchestrator (plan §4.3), invoked by the Drone pipeline (or by hand).
Reads the run parameters from env (set by the comment-bridge via Drone build params):
RECIPE recipe name (e.g. custom-html) [required]
REF PR head commit sha [optional; recorded, used for fetch]
PR PR number [optional, default 0]
SRC head repo full_name on the mirror [optional]
STAGES comma list: install,upgrade,backup [optional, default install]
It fetches the recipe at REF, then runs the requested per-stage pytest files under
tests/<recipe>/. Teardown is guaranteed by the conftest fixture finalizer.
Run env (python with pytest+playwright, PLAYWRIGHT_BROWSERS_PATH) is provided by `cc-ci-run`
(modules/harness.nix); invoke as: cc-ci-run runner/run_recipe_ci.py
"""
from __future__ import annotations
import os
import subprocess
import sys
ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
STAGE_FILES = {
"install": "test_install.py",
"upgrade": "test_upgrade.py",
"backup": "test_backup.py",
}
def fetch_recipe(recipe: str, ref: str | None, src: str | None) -> None:
"""Make the recipe available at the code under test. If SRC+REF point at the mirror PR,
clone it at that ref; otherwise fetch the catalogue copy."""
recipes_dir = os.path.expanduser("~/.abra/recipes")
os.makedirs(recipes_dir, exist_ok=True)
dest = os.path.join(recipes_dir, recipe)
if src and ref:
url = f"https://git.autonomic.zone/{src}.git"
subprocess.run(["rm", "-rf", dest], check=False)
subprocess.run(["git", "clone", "--quiet", url, dest], check=True)
subprocess.run(["git", "-C", dest, "checkout", "--quiet", ref], check=True)
else:
subprocess.run(["abra", "recipe", "fetch", recipe, "-n"], check=True)
def main() -> int:
recipe = os.environ.get("RECIPE")
if not recipe:
print("RECIPE env is required", file=sys.stderr)
return 2
ref = os.environ.get("REF") or None
src = os.environ.get("SRC") or None
stages = [s.strip() for s in os.environ.get("STAGES", "install").split(",") if s.strip()]
print(f"== cc-ci run: recipe={recipe} ref={ref} pr={os.environ.get('PR', '0')} stages={stages}")
fetch_recipe(recipe, ref, src)
test_dir = os.path.join(ROOT, "tests", recipe)
targets = []
for stage in stages:
fname = STAGE_FILES.get(stage)
if not fname:
print(f"unknown stage {stage}", file=sys.stderr)
return 2
path = os.path.join(test_dir, fname)
if os.path.exists(path):
targets.append(path)
else:
print(f" (skip {stage}: {path} not present)")
# also discover recipe-local tests later (D4); install stage first (M4)
if not targets:
print("no stage test files found", file=sys.stderr)
return 1
rc = subprocess.call([sys.executable, "-m", "pytest", "-v", "-rA", *targets], cwd=ROOT)
return rc
if __name__ == "__main__":
raise SystemExit(main())

53
tests/conftest.py Normal file
View File

@ -0,0 +1,53 @@
"""Shared pytest fixtures for recipe CI (plan §4.3).
A run is parameterized by env: RECIPE, REF (PR head sha), PR, SRC (head repo). The harness
computes a unique app domain per run so concurrent runs never collide, and GUARANTEES teardown
(undeploy + volume + secret removal) via a finalizer, even on failure.
"""
from __future__ import annotations
import hashlib
import os
import sys
import time
import pytest
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "runner"))
from harness import lifecycle # noqa: E402
def _short(s: str, n: int = 8) -> str:
return "".join(c for c in s if c.isalnum())[:n] or "local"
@pytest.fixture(scope="session")
def recipe() -> str:
return os.environ.get("RECIPE", "custom-html")
@pytest.fixture(scope="session")
def app_domain(recipe) -> str:
# Docker swarm config/secret names = <stackname>_<res>_<ver> must be <= 64 chars, and
# stackname is the sanitized domain. ".ci.commoninternet.net" alone is 22 chars, so the
# subdomain label must stay short. Use <recipe[:4]>-<6hex(recipe|pr|ref)> — unique per run,
# collision-safe across recipes (full recipe in the hash), readable context lives in the
# Drone build params + PR comment. (Deviation from plan §4.0 long name; see DECISIONS.md.)
pr = os.environ.get("PR", "0")
ref = os.environ.get("REF", "local" + str(int(time.time())))
tag = _short(recipe, 4).lower()
h = hashlib.sha1(f"{recipe}|{pr}|{ref}".encode()).hexdigest()[:6]
return f"{tag}-{h}.ci.commoninternet.net"
@pytest.fixture(scope="session")
def deployed_app(recipe, app_domain):
"""Install stage: deploy the recipe and wait until healthy; tear down at session end."""
version = os.environ.get("VERSION") or None
lifecycle.janitor() # sweep orphans from crashed runs first
try:
lifecycle.deploy_app(recipe, app_domain, version=version, secrets=True)
lifecycle.wait_healthy(app_domain)
yield app_domain
finally:
lifecycle.teardown_app(app_domain)

View File

@ -0,0 +1,30 @@
"""custom-html — install stage (recipe #1, simple/stateless). D2 install + D3 Playwright."""
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
from harness import lifecycle # noqa: E402
def test_http_reachable(deployed_app):
"""The deployed app answers 200 over real HTTPS through the gateway."""
status = lifecycle.http_get(deployed_app, "/")
assert status == 200, f"expected 200 from {deployed_app}, got {status}"
def test_playwright_page(deployed_app):
"""A real browser (Playwright/Chromium) loads the live app and sees served content."""
from playwright.sync_api import sync_playwright
url = f"https://{deployed_app}/"
with sync_playwright() as p:
browser = p.chromium.launch(args=["--no-sandbox"])
try:
ctx = browser.new_context(ignore_https_errors=True)
page = ctx.new_page()
resp = page.goto(url, wait_until="load", timeout=30000)
assert resp is not None and resp.status == 200, f"page status {resp and resp.status}"
body = page.content()
assert "nginx" in body.lower() or "<html" in body.lower(), "no served HTML content"
finally:
browser.close()