Files
cc-ci/machine-docs/STATUS-prevb.md
2026-06-17 00:37:23 +00:00

76 lines
5.6 KiB
Markdown

# STATUS — phase `prevb` (dynamic upgrade base + per-recipe `previous/`)
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md`.
State files: this + BACKLOG-prevb.md, REVIEW-prevb.md (Adversary), JOURNAL-prevb.md. DECISIONS.md shared.
## Phase
Started 2026-06-17. Gates: **M1** (implemented + green locally), **M2** (proven in real CI + spot-check).
## Now
- **Gate: M1 CLAIMED, awaiting Adversary.** (claim commit below.)
## Gate: M1 — CLAIMED @2026-06-17T00:40Z (HEAD e1b32ea)
**WHAT (DoD §4 M1):** dynamic upgrade-base resolution (last-green → main-tip → skip); `previous/`
discovery + base-only application + version-guard/stale-flag; environmental overlay separated from
version-specific config; `UPGRADE_BASE_VERSION` removed from discourse; discourse migrated; unit tests
for the new surface; discourse upgrade tier GREEN locally with proof the head ran the real official
image (`discourse/discourse:3.5.3`, NOT bitnamilegacy) and no `sidekiq` service post-deploy.
**WHERE (commit e1b32ea on origin/main):**
- `runner/run_recipe_ci.py`: `BasePlan` + `resolve_upgrade_base(stages, meta, recipe, head_ref)`
(override → last-green via `canonical.read_registry` → main-tip via `lifecycle.recipe_branch_commit`
→ skip); wired in `main()` (deploy `base_ref`/`apply_previous`, gate upgrade tier on `base_plan.runs`).
- `runner/harness/lifecycle.py`: `previous_*` surface (`has_previous`, `previous_target_version`,
`previous_status`, `provide/remove_previous_overlay`, `compose_file_add/remove`),
`recipe_branch_commit`, `stack_service_names`, `compose_services`, `prune_orphan_services`;
`deploy_app` `base_ref`/`apply_previous` paths.
- `runner/harness/generic.py` `perform_upgrade`: strip `previous/` overlay + COMPOSE_FILE entry before
head redeploy; `prune_orphan_services` after convergence (reconcile stack to head compose).
- `tests/discourse/compose.ccci.yml`: ENVIRONMENTAL-only (`app.deploy.update_config.order: stop-first`;
bitnamilegacy pins + `sidekiq` removed). `tests/discourse/recipe_meta.py`: `UPGRADE_BASE_VERSION` removed.
- `tests/discourse/test_upgrade.py`: asserts head image == official 3.5.3 (not bitnamilegacy) + no sidekiq.
- Unit: `tests/unit/test_upgrade_base.py` (resolver matrix), `tests/unit/test_previous.py` (previous/ +
COMPOSE_FILE layering).
**HOW to verify (cold, from a fresh clone at e1b32ea):**
1. Unit (prevb surface): `cc-ci-run -m pytest tests/unit/test_upgrade_base.py tests/unit/test_previous.py tests/unit/test_meta.py -q`.
2. e2e: `RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a81802b4d1d6cd1b449ac46cfa16d80730aaa PR=4 STAGES=install,upgrade cc-ci-run runner/run_recipe_ci.py` (HOME=/root).
3. Inspect: `grep -vE '^\s*#' tests/discourse/compose.ccci.yml` (env-only); `grep UPGRADE_BASE_VERSION tests/discourse/recipe_meta.py` (none).
**EXPECTED:**
- Unit: all pass (38 across the 2 prevb files; test_meta clean). NOTE scope: the prevb surface is green;
the FULL `tests/unit/` suite has **1 PRE-EXISTING unrelated fail**
`test_warm_reconcile.py::test_traefik_spec_is_stateless_with_setup` (KeyError 'health_domain') — which
fails identically at gtea-DONE (778720c) and was not touched by prevb (pxgate 0e9fd38 refactored the
spec without updating the test). Out of scope for prevb; flagged to the operator/next phase.
- e2e log shows: `upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)`; base = main-tip
chaos deploy; `prune-orphans: removed 'sidekiq'`;
`upgrade→PR-head: head_ref=ae5a8180 chaos-version=ae5a8180+U version=0.8.1+3.5.0→1.0.0+3.5.3`;
RUN SUMMARY `deploy-count = 1 (expect 1)`, `install : pass`, `upgrade : pass`; both
`tests/discourse/test_upgrade.py` asserts PASS (app image official 3.5.3 not bitnamilegacy; no sidekiq);
teardown leaves no stacks/volumes/secrets. (Level caps at 2/5 because only install,upgrade ran — not a fail.)
**TEETH (where a broken head still goes RED — for the Adversary's break-it probe):** the upgrade tier
gates on the REAL head deploy — `assert_upgrade_converged` (rejects silent swarm rollback/pause) +
`wait_healthy` on HEALTH_PATH + HC1 `chaos-version`==head commit + the discourse image/sidekiq asserts.
Base resolution/`prune`/`previous` never deploy the head's code, so a deliberately-broken head cannot be
papered over: it won't converge/serve → RED. `previous/` is base-only (stripped before the head redeploy,
proven by `remove_previous_overlay` + COMPOSE_FILE strip in `perform_upgrade`); discourse ships no `previous/`.
## Ground-truth facts (verified 2026-06-17, recorded for Adversary)
- `recipe-maintainers/discourse` PR **#4** (`discourse-official-image` `ae5a8180``main` `f87c612d`), open.
- **main** (`compose.yml`): `app`/`sidekiq` image = `bitnamilegacy/discourse:3.5.0`; `app` healthcheck
`start_period: 20m`; `app.deploy.update_config.order: start-first`; `sidekiq` service present.
- **PR #4 head**: `app.image = discourse/discourse:3.5.3` (official), **`sidekiq` service deleted**,
loadbalancer port 3000→80, official-image entrypoint wrappers added. (PR `.diff` confirms both.)
- Published tags max = `0.7.0+3.3.1`; main (3.5.0) is AHEAD of all tags → main-tip is a branch ref, not a tag.
- Current `tests/discourse/compose.ccci.yml` re-pins `app`+`sidekiq` to `bitnamilegacy/discourse:3.3.1`,
re-adds `sidekiq`, sets `start_period:20m`, `order:stop-first` — applied to ALL deploys via
`EXTRA_ENV.COMPOSE_FILE` → forces the PR head back to bitnamilegacy:3.3.1 + sidekiq (the bug).
- Note vs plan §3 prose: main is `bitnamilegacy:3.5.0`, not `3.3.1` (main advanced); thesis unchanged —
the base (last-green/main, bitnamilegacy 3.5.0) deploys clean, NO `previous/` needed for discourse.
## Blocked
(none)