diff --git a/machine-docs/DECISIONS.md b/machine-docs/DECISIONS.md index 74ecf0b..6c4375e 100644 --- a/machine-docs/DECISIONS.md +++ b/machine-docs/DECISIONS.md @@ -1033,3 +1033,24 @@ Orchestrator policy (plan.md §9 + cc-ci-plan/plan-prefer-env-over-compose-overl Follow-ups (F2-14 / sub-plan E1-E6, DONE veto'd until cleared): ghost start_period overlay → APP_START_PERIOD env PR (E1); mumble host-ports overlay → justify-as-last-resort or migrate (E4). + +## 2026-05-30 — FINDING: abra rejects env-interpolated healthcheck start_period → literal recipe-PR bump (§9) + +While migrating discourse off its cc-ci compose overlay per plan §9 (prefer an upstream env-var +recipe-PR over a cc-ci `compose.*.yml`), discovered abra CANNOT env-interpolate the healthcheck +`start_period` field: both `start_period: ${APP_START_PERIOD:-5m}` and the quoted +`start_period: "${APP_START_PERIOD:-5m}"` FATA at `abra app new` with +`services.app.healthcheck.start_period Does not match format 'duration'`. abra validates the compose +schema's duration format on the LITERAL template string before any `.env` substitution, and NO recipe +in the catalogue env-interpolates start_period (grep confirmed empty). + +**Consequence for §9 pt1:** "expose the cc-ci-tuned value as an env var" is NOT achievable for +`start_period` specifically. The §9-compliant alternative is a **LITERAL bump in the upstream +recipe-PR** — still NOT a cc-ci compose overlay (the change lives in the recipe everyone runs), and a +larger start_period is strictly safer for all users (it only widens the startup failure-grace; a +healthy check still marks healthy immediately, so fast hosts are unaffected). Precedent: the +sub-plan's own lasuite-drive collabora "start_period [KEYSTONE]" recipe-PR. + +- **discourse**: recipe-PR `recipe-maintainers/discourse#1` sets `start_period: 20m` (covers the + 15-25min Rails first-boot; default was 5m). cc-ci recipe_meta no longer sets APP_START_PERIOD. +- **ghost (E1)**: must use the SAME literal-bump approach, NOT an env var (same abra limitation). diff --git a/tests/discourse/recipe_meta.py b/tests/discourse/recipe_meta.py index 8503a85..2505c45 100644 --- a/tests/discourse/recipe_meta.py +++ b/tests/discourse/recipe_meta.py @@ -9,16 +9,20 @@ HEALTH_OK = (200,) DEPLOY_TIMEOUT = 2400 # slow Rails cold boot (15-25min); matches the EXTRA_ENV TIMEOUT below HTTP_TIMEOUT = 1200 -# Slow-cold-boot handling via env, NOT a cc-ci compose overlay (plan.md §9 anti-drift guardrail): -# discourse's 15-25min Rails cold boot exceeds the recipe healthcheck's default start_period (5m) + -# grace, so swarm would kill the still-booting app and the deploy never converges. Rather than fork -# the recipe with a compose.*.yml overlay (which drifts from what ships), the recipe-PR -# (recipe-maintainers/discourse#1) parameterizes the app healthcheck as -# `start_period: ${APP_START_PERIOD:-5m}` (default unchanged for real users); cc-ci just sets a larger -# value here. TIMEOUT (abra's internal convergence wait) is raised to outlast the boot. +# Slow-cold-boot handling via a LITERAL recipe-PR start_period bump, NOT a cc-ci compose overlay +# (plan.md §9 anti-drift guardrail). discourse's 15-25min Rails cold boot exceeds the recipe +# healthcheck's default start_period (5m) + grace, so swarm would kill the still-booting app and the +# deploy never converges. §9 pt1 prefers exposing such a value as an env var — but abra REJECTS +# env-interpolation in healthcheck `start_period` (`FATA ...Does not match format 'duration'` for both +# `${VAR}` and quoted `"${VAR:-5m}"`; it validates the literal compose duration before substitution, +# and no catalogue recipe env-interpolates start_period). So the §9-compliant fix is a LITERAL bump in +# the recipe-PR (recipe-maintainers/discourse#1): `start_period: 20m` on the app healthcheck — a change +# to the recipe EVERYONE runs (not a cc-ci fork), and strictly safer (start_period only widens the +# startup grace; a healthy check still marks healthy immediately, so fast hosts are unaffected). +# Precedent: the lasuite-drive collabora start_period recipe-PR. (See DECISIONS.md 2026-05-30.) +# TIMEOUT (abra's internal convergence wait) is raised to outlast the boot. EXTRA_ENV = { "TIMEOUT": "2400", - "APP_START_PERIOD": "1200s", } # Upgrade tier — N/A (declared NOT-TESTABLE under cc-ci; Adversary §7.1 sign-off GRANTED, REVIEW-2