Files
cc-ci/tests/discourse/recipe_meta.py
autonomic-bot 8cd72fd78d
All checks were successful
continuous-integration/drone/push Build is passing
feat(harness): P2 — delete legacy customization keys & paths (rcust)
a) compose.ccci.yml is FIRST-CLASS: the harness auto-copies tests/<recipe>/
   compose.ccci.yml into the run's recipe checkout (ABRA_DIR-aware, lifecycle.
   provide_ccci_overlay) and auto-chaoses the pinned base deploy on its presence
   (kills the R7 implicit coupling). ghost/discourse install_steps.sh (copy-only
   boilerplate) deleted; CHAOS_BASE_DEPLOY removed from both metas + the registry.

b) install-time deps wiring is the ONLY mode: deps with DEPS provision BEFORE the
   single deploy; legacy post-deploy provisioning + the setup_custom_tests.sh
   invocation machinery deleted. lasuite-docs migrated to install_steps.sh OIDC
   wiring (same env names/values as the old hook — only the timing moved);
   lasuite-drive's remaining post-deploy MinIO bucket one-shot moved to ops.py
   pre_install; both setup_custom_tests.sh files deleted; OIDC_AT_INSTALL removed
   from drive/meet metas + the registry.

c) SKIP_GENERIC meta key deleted (zero users). Env form CCCI_SKIP_GENERIC* stays
   as the documented dev-only escape hatch; when active in a drone CI run the
   orchestrator prints a loud !! warning (manifest embedding lands in P5).

d) conftest cleanup: dead pre-deploy-once fixtures deployed/deployed_app deleted
   (zero users), app_domain + _short + _wait_healthy dropped (only users were the
   deleted fixtures); deps_apps+deps_creds consolidated into ONE deps fixture
   (entries expose .domain etc. as attributes; dict access intact); the 6 lasuite
   test files renamed deps_creds->deps (fixture name only — assertions and flows
   byte-identical). requires_deps marker + F2-11 skip-report plumbing unchanged.

Registry is now exactly the 14 final keys; docs §4 table regenerated. Stale
setup_custom_tests/OIDC_AT_INSTALL prose in docstrings/comments/assert MESSAGES
updated (no assert logic or expected value touched).

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 175 passed; scripts/lint.sh -> PASS.
2026-06-10 17:01:33 +00:00

75 lines
4.9 KiB
Python

# Per-recipe harness config for discourse (Phase 2 Q4.6 — forum; postgres + redis + sidekiq).
#
# Discourse (bitnamilegacy/discourse) is a slow-booting Rails app: the recipe healthcheck polls
# /srv/status, and a cold first boot (DB migrate + asset precompile) regularly takes 15-25 min on
# cc-ci's single node, so the deploy/HTTP timeouts are generous. /srv/status returns 200 only once the
# app is actually serving (the canonical "is discourse up" signal — NOT "/", which may redirect to setup).
HEALTH_PATH = "/srv/status"
HEALTH_OK = (200,)
DEPLOY_TIMEOUT = (
3600 # slow Rails cold boot (15-25min) on the 7-GiB single node; bumped 2400→3600 for
)
# headroom after full4's base deploy timed out at 2400s (RAM/CPU-constrained boot + image re-pull).
HTTP_TIMEOUT = 1200
# Slow-cold-boot handling: the recipe-PR (recipe-maintainers/discourse#1) bumps the app healthcheck
# `start_period` to a LITERAL 20m for the HEAD. discourse's 15-25min Rails cold boot (DB migrate +
# asset precompile) exceeds the published 5m start_period → swarm would kill the still-booting app.
# start_period CANNOT be an env var (abra validates the literal compose 'duration' BEFORE substitution
# → `FATA ...Does not match format 'duration'`; Adversary-reproduced, REVIEW-2 4b862f6), so a literal
# recipe-PR bump is the only §9-compliant way to widen it. start_period is grace-only (a healthy check
# still marks healthy immediately → fast hosts unaffected). Precedent: lasuite-drive collabora PR.
# TIMEOUT (abra's internal convergence wait) is raised to outlast the boot.
#
# UPGRADE-tier BASE (compose.ccci.yml + UPGRADE_BASE_VERSION): upgrade-to-latest must ALWAYS run
# (plan-ccci-compose-overlay-policy.md §1). The from-version is the latest published 0.7.0+3.3.1
# (UPGRADE_BASE_VERSION below; the PR head is 0.7.0-based, so 0.7.0 is the true predecessor — not the
# default [-2]=0.6.3). The published 0.7.0 has TWO blockers, both resolved by the policy-blessed
# minimal base overlay compose.ccci.yml (see its header), neither weakening a test:
# (1) it pins the Docker-Hub-removed `bitnami/discourse:3.3.1` (404) → overlay re-pins app+sidekiq to
# `bitnamilegacy/discourse:3.3.1` (namespace-only, identical image), the same re-pin the PR makes;
# (2) its 5m start_period is too tight for the 15-25min Rails boot → overlay widens it to 20m (grace).
# The harness auto-provides the overlay to the checkout and auto-chaoses the base deploy
# (first-class compose.ccci.yml, rcust P2a); it persists across the head checkout (idempotent — the
# PR head already re-pins + ships 20m).
# Upgrade crossover: 0.7.0 (re-pinned base) → PR head; full assertions run on the HEAD. The 0.7.0
# *custom* tests are not separately run (custom tier runs once, on the head — policy §1 allows skip+record).
UPGRADE_BASE_VERSION = "0.7.0+3.3.1"
EXTRA_ENV = {
"TIMEOUT": "3600", # abra's internal convergence wait; matches DEPLOY_TIMEOUT (slow Rails boot headroom)
"COMPOSE_FILE": "compose.yml:compose.ccci.yml",
}
def BACKUP_VERIFY(domain):
"""Post-backup integrity check (Q4.6, same race ghost F2-14b hit). The recipe's backupbot db
pre-hook (`/pg_backup.sh backup`) dumps the discourse postgres DB to `/var/lib/postgresql/data/
backup.sql` (gzip), then restic captures that path. On the loaded single CI node the db container
is cycled by the immediately-preceding UPGRADE tier (chaos redeploy), and at backup time the
pre-hook's pg_dump can race that cycle — the dump is truncated/never written, restic snapshots an
empty/absent path, and a later restore reimports nothing → the seeded ci_marker is lost (P4 RED;
observed full1/full2 WITH upgrade, vs full3 WITHOUT upgrade green). Proven first-hand: the pre-hook
itself succeeds on a stable db (manual exec → valid 922KB dump), so the failure is the cycle race,
not the script. This probe proves the dump completed: backup.sql exists, is a VALID gzip, non-empty.
False → the harness re-runs the WHOLE backup with a re-stabilised db (run_recipe_ci _perform_op,
caps at 3 then proceeds — a persistent failure still surfaces RED at restore, so it weakens no
assertion; it only retries a flaky CAPTURE). READ-ONLY."""
# recipe_meta.py is exec()'d into a bare namespace (no __file__); runner/ is already on sys.path
# and `harness` importable — import directly (ghost F2-14b shipped broken by computing a path here).
from harness import lifecycle
try:
out = lifecycle.exec_in_app(
domain,
[
"sh",
"-c",
"gzip -t /var/lib/postgresql/data/backup.sql && wc -c < /var/lib/postgresql/data/backup.sql",
],
service="db",
timeout=60,
).strip()
except Exception: # noqa: BLE001 — exec fails if the db is mid-cycle: treat as not-yet-captured
return False
return out.isdigit() and int(out) > 0