status(2): discourse install timed out at 1800s (slow Rails boot, not image) — needs ghost-style healthcheck start_period overlay; teardown clean; image re-pin proven

This commit is contained in:
2026-05-30 11:30:22 +01:00
parent 2ff24ae573
commit 0f597f2e3d

View File

@ -72,12 +72,17 @@ tree must carry:
verified first-hand) + bumps version 0.7.0→0.8.0. Mirror created + populated (was 404). **Validation run
IN FLIGHT** on cc-ci `/root/ccci-builder-clone`: `RECIPE=discourse PR=1 REF=7b7ddd70... STAGES=install,custom`
→ log `/root/ccci-discourse-pr1.log`, stack `disc-a622df`. **Image re-pin CONFIRMED WORKING** (db+redis+
sidekiq 1/1, no pull error); app 0/1 in slow Rails cold boot (~20min in, run ALIVE; one early exit137
healthcheck-restart). **NEXT:** when install green → author ≥2 functional incl §4.3 create-topic (admin
API) → full lifecycle run → CLAIM Q4.6. If it times out on the Rails-boot healthcheck → add a ghost-style
start_period overlay (compose.ccci-health.yml + install_steps + COMPOSE_FILE + CHAOS_BASE_DEPLOY).
**POLL with `ssh -T` (no PTY)** — a PTY echo-storms/truncates output. **THEN:** plausible Q4.7b recipe-PR
(fix `entrypoint.clickhouse.sh` wget restart-storm) → plausible-full green → CLAIM Q4.7.
sidekiq 1/1, no pull error). **RESULT @2026-05-30: install TIMED OUT at 1800s** (`deploy/readiness
failed ... timed out after 1800 seconds`; install:fail, custom:skip) — Rails first-boot exceeded the
deploy timeout; stack auto-torn-down clean. Image re-pin is PROVEN (not the blocker). **NEXT (clear
path):** add a ghost-style healthcheck start_period overlay to `tests/discourse/` — mirror
`tests/ghost/`: `compose.ccci-health.yml` raising the app healthcheck `start_period` to ~1200s +
`install_steps.sh` copying it + `recipe_meta.py` `COMPOSE_FILE=compose.yml:compose.ccci-health.yml` +
`CHAOS_BASE_DEPLOY=True`, and bump `DEPLOY_TIMEOUT`/`EXTRA_ENV TIMEOUT` to ~2400. Then re-run
`RECIPE=discourse PR=1 REF=7b7ddd70bc753608d086884b8de1ad3c327d9ac5 SRC=recipe-maintainers/discourse`.
On install green → author ≥2 functional incl §4.3 create-topic (admin API) → full lifecycle → CLAIM Q4.6.
**POLL with `ssh -T` (no PTY).** **THEN:** plausible Q4.7b recipe-PR (fix `entrypoint.clickhouse.sh`
wget restart-storm) → plausible-full green → CLAIM Q4.7.
- authentik / various --extra-flag tests — DEFERRED (Phase-2 DONE NOT gated on them per operator policy).
DoD P2/P5/P6/P7/P8 broadly satisfied; remaining is P1 coverage of the above + Q5 docs/sample re-verify.