Earlier perl substitution missed the multi-line upgrade and restore run_lifecycle_tier calls (still passed `target` = VERSION env, None for !testme runs), so perform_upgrade got head_ref=None for upgrade tier → re-checkout skipped → chaos redeploy of leftover prev checkout (vacuous prev→prev that 'passed' via the chaos-label move fallback). Verified e2e on hedgedoc (install,upgrade; commit pending push): upgrade→PR-head: head_ref=09bf4d54 chaos-version=09bf4d54 version=3.0.9+1.10.7→3.0.10+1.10.8 deploy-count=1, install/upgrade=pass, clean teardown. The chaos-version label deterministically matches head_ref — direct proof PR-head code was deployed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.6 KiB
Enrolling a recipe under cc-ci (D5)
Adding a recipe is a small, repeatable, no-harness-surgery operation:
1. Make the recipe available on the mirror
Recipes under test live on the private mirror git.autonomic.zone/recipe-maintainers/<recipe>,
synced from upstream git.coopcloud.tech. If not yet mirrored, mirror it (abra fetch + push to the
org) — see the recipe mirror+PR flow (plan §4.1). A recipe may ship its own tests/ dir in its repo;
those are discovered and run against the live app (D4 — see below).
2. Add the per-recipe test tree in this repo
tests/<recipe>/
├── recipe_meta.py # optional per-recipe harness config (see below)
├── install_steps.sh # optional custom install-steps hook (pre-deploy setup)
├── ops.py # optional pre-op seed hooks (pre_install/pre_upgrade/pre_backup/pre_restore)
├── test_install.py # optional install overlay (runs ADDITIVELY alongside generic)
├── test_upgrade.py # optional upgrade overlay (runs ADDITIVELY alongside generic)
├── test_backup.py # optional backup overlay (runs ADDITIVELY alongside generic)
└── test_restore.py # optional restore overlay (runs ADDITIVELY alongside generic)
A recipe is testable with ZERO config: with no overlay files, the generic lifecycle suite
runs (install/upgrade/backup/restore) against a single shared deployment — see docs/testing.md for
the full model (deploy-once, additive generic+overlay, the chaos PR-head upgrade, the HC2 repo-local
allowlist, the install-steps hook). The per-recipe dir only holds the bits where the recipe needs
more than the generic.
To add recipe-specific coverage, drop a tests/<recipe>/test_<op>.py overlay — it runs
ALONGSIDE the generic for that op (HC3 additive, Phase 1e); the generic floor is never silently
dropped. Overlays are assertion-only against the shared live deployment (the live_app fixture;
they never perform the op or deploy/teardown — the orchestrator owns those). If the overlay needs to
SEED pre-op state (data-continuity markers, the backup→restore divergence), put pre_<op>(domain, meta) callables in tests/<recipe>/ops.py — the orchestrator runs them BEFORE the op. Copy an
existing recipe (tests/custom-html/ simple/volume marker; tests/keycloak/ admin-API; tests/ matrix-synapse/ db-service psql marker). Do not edit the shared tests/conftest.py /
runner/harness/ to add a recipe — set per-recipe knobs in recipe_meta.py:
HEALTH_PATH = "/realms/master" # path that returns a healthy status (default "/")
HEALTH_OK = (200,) # acceptable status codes (default 200/301/302)
DEPLOY_TIMEOUT = 600 # seconds for services to converge (default 600)
HTTP_TIMEOUT = 600 # seconds for the app to answer (default 300)
BACKUP_CAPABLE = True # override backup-capability auto-detect (default: scan compose)
EXTRA_ENV = {"KEY": "value"} # or EXTRA_ENV(domain) -> dict; extra .env keys set at deploy
SKIP_GENERIC = ["upgrade"] # per-recipe opt-out from the generic floor for the listed ops
# ("all"/"*" = every op); rarely needed — generic is the floor
Useful harness.lifecycle helpers for overlays: http_get, http_fetch, http_body,
exec_in_app (use this for data markers — volume/DB, hardened with returncode+retry); the lifecycle
ops themselves are orchestrator-owned (you never call them from an overlay). The harness forces
LETS_ENCRYPT_ENV="" (no ACME), a unique short domain per run, and guarantees teardown.
3. Recipe-local tests (D4) — default-deny (HC2)
If the recipe's own repo contains tests/test_*.py / install_steps.sh / ops.py, the runner
snapshots them right after fetch — but per Phase 1e HC2 it executes them only for recipes on the
cc-ci approval allowlist tests/repo-local-approved.txt (default empty ⇒ default-deny). PR-author
code runs on the CI host with /run/secrets/* present, so adding a recipe to the allowlist is a
deliberate cc-ci-maintainer act (in a cc-ci PR, after reviewing that recipe's repo-local tests).
Without approval, only the cc-ci overlays in this repo + the generic floor run. Approved recipe-local
files receive env CCCI_BASE_URL (e.g. https://<app>.ci.commoninternet.net/) and CCCI_APP_DOMAIN.
4. Add the repo to the bridge poll list
The trigger is polling (primary): add the repo's full name to the comment-bridge POLL_REPOS
csv (nix/modules/bridge.nix) and nixos-rebuild switch. The bridge then polls that repo's open PRs
every 30s and fires a run on a new !testme comment from an authorized org member. This needs only
read + comment access — no webhook, no repo-admin.
!testme on a PR runs install/upgrade/backup + any recipe-local tests, and reports back to the PR.
Optional: lower-latency webhook (admin-registered)
Polling already satisfies D1 (<60s). For lower latency an admin may optionally register a
Gitea issue_comment webhook (the bot does not self-register one — that needs repo-admin):
- URL
https://ci.commoninternet.net/hook, content-typeapplication/json, eventIssue Comment, secret = the shared webhook HMAC (secrets/secrets.yaml→webhook_hmac). - The Gitea instance must allow the host (admin: add
ci.commoninternet.netto the[webhook] ALLOWED_HOST_LIST).
The webhook and poller are deduped by comment id, so a comment seen by both fires only once.
Run locally
RECIPE=<recipe> PR=<n> REF=<sha-or-branch> SRC=recipe-maintainers/<recipe> \
STAGES=install,upgrade,backup cc-ci-run runner/run_recipe_ci.py