claim(prevb): M1 — dynamic base + previous/ + discourse migration; discourse upgrade GREEN locally (head=official 3.5.3, sidekiq pruned)
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
autonomic-bot
2026-06-17 00:37:23 +00:00
parent e1b32ea650
commit bb79e9140e
3 changed files with 81 additions and 18 deletions

View File

@ -4,22 +4,17 @@ SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md`.
## Build backlog
### M1 — implemented + green locally
- [ ] B1. Dynamic upgrade-base resolution: last-green (warm canonical registry version) → fallback
target-branch (`main`) tip → else skip (declared reason). Replace the static
`previous_version(vers[-2])` default in `run_recipe_ci.upgrade_base`. Wire into `main()` deploy.
- [ ] B2. `tests/<recipe>/previous/` mechanism: discovery, declared-target-version marker, base-only
application (added to base deploy's COMPOSE_FILE), head exclusion (never applied to PR head),
version-guard + stale-flag on mismatch.
- [ ] B3. Discourse migration: shrink `compose.ccci.yml` to environmental-only
(`order: stop-first`), delete bitnamilegacy image pins + sidekiq block; remove
`UPGRADE_BASE_VERSION` from `tests/discourse/recipe_meta.py`. (Expect NO `previous/`.)
- [ ] B4. Unit tests for the new surface: base resolution (last-green / main-tip / skip), `previous/`
match / skip / stale, environmental-vs-version overlay layering. Update `test_upgrade_base.py`
to the new resolver API without weakening coverage.
- [ ] B5. Discourse upgrade tier GREEN locally: base (bitnamilegacy:3.5.0) → head; assert deployed
`app` image == `discourse/discourse:3.5.3` (NOT bitnamilegacy) and no `sidekiq` service post-deploy.
- [ ] B6. CLAIM M1 (clean tree + STATUS verification block).
### M1 — implemented + green locally [CLAIMED @2026-06-17T00:40Z, awaiting Adversary]
- [x] B1. Dynamic upgrade-base resolution (last-green → main-tip → skip): `resolve_upgrade_base`/`BasePlan`.
- [x] B2. `tests/<recipe>/previous/` mechanism: discovery, VERSION marker, base-only application,
head exclusion (stripped before head redeploy), version-guard + stale-flag. Unit-tested.
- [x] B3. Discourse migration: `compose.ccci.yml` environmental-only (`order: stop-first`); bitnamilegacy
pins + sidekiq removed; `UPGRADE_BASE_VERSION` removed. No `previous/` (base deploys clean).
- [x] B4. Unit tests: resolver matrix + `previous/` apply/skip/stale + COMPOSE_FILE layering.
- [x] B5. Discourse upgrade tier GREEN locally (run-prevb-disc2): app image official 3.5.3 (not
bitnamilegacy), no sidekiq (pruned), version 0.8.1+3.5.0→1.0.0+3.5.3, install+upgrade pass.
(Found+fixed: docker stack deploy no-prune left sidekiq orphaned → `prune_orphan_services`.)
- [x] B6. CLAIM M1 (clean tree + STATUS WHAT/HOW/EXPECTED/WHERE/TEETH).
### M2 — proven in real CI + spot-check
- [ ] B7. discourse PR #4 `!testme` GREEN in real CI; head ran `discourse/discourse:3.5.3`, migration exercised.

View File

@ -50,3 +50,23 @@ B5 e2e launched on cc-ci (/root/prevb-deploy @ bb2e3c6), STAGES=install,upgrade,
(bitnamilegacy:3.5.0), env overlay provided. Base now in slow Rails cold boot (15-25min). Polling ~5min.
(lint rung fail R011 = recipe-level, a rung not a gate; prepull skipped on the known sidekiq-depends-on
config rc=15 — non-fatal.)
## 2026-06-17T00:40Z — M1 GREEN locally; claiming
discourse install,upgrade e2e GREEN (2nd run, after the prune fix). Evidence in run-prevb-disc2.log on
cc-ci /root/prevb-deploy. The dynamic main-tip base worked first try (kind=ref f87c612d) — crucial,
because main (0.8.1+3.5.0) is AHEAD of the newest published tag (0.7.0+3.3.1), so the OLD vers[-2]
default (=0.6.3) would have been the wrong predecessor entirely. The upgrade moved
0.8.1+3.5.0 (bitnamilegacy, main-tip) → 1.0.0+3.5.3 (official, PR head), chaos-version=ae5a8180+U.
**The one real bug found+fixed (WHY):** first run, `test_head_runs_official_image` PASSED (head app =
official 3.5.3 — the leak is gone) but `test_sidekiq_service_dropped` FAILED: `docker stack deploy`
(what `abra app deploy` runs) only adds/updates services, it does NOT prune ones the new compose dropped,
so the base's sidekiq orphaned on the old image. This is a swarm mechanic, not a head-deploy failure, but
it means the deployed stack didn't faithfully reflect the head. Fix = `prune_orphan_services` in
perform_upgrade: reconcile the live stack to the head compose's `config --services` set (remove orphans).
Faithful (deployed stack == head), no-op when service sets match / compose unresolvable, weakens nothing.
Decided to CLAIM with the e2e green + image/sidekiq proof and leave the deliberately-broken-head teeth
probe to the Adversary's cold acceptance (its explicit M1 check; I can't push a broken commit to the
recipe mirror per guardrails). STATUS spells out where the teeth hold so they know where to probe.

View File

@ -7,8 +7,56 @@ State files: this + BACKLOG-prevb.md, REVIEW-prevb.md (Adversary), JOURNAL-prevb
Started 2026-06-17. Gates: **M1** (implemented + green locally), **M2** (proven in real CI + spot-check).
## Now
- In flight: M1 implementation (dynamic base resolution + `previous/` mechanism + discourse migration + unit tests).
- No gate CLAIMED yet.
- **Gate: M1 CLAIMED, awaiting Adversary.** (claim commit below.)
## Gate: M1 — CLAIMED @2026-06-17T00:40Z (HEAD e1b32ea)
**WHAT (DoD §4 M1):** dynamic upgrade-base resolution (last-green → main-tip → skip); `previous/`
discovery + base-only application + version-guard/stale-flag; environmental overlay separated from
version-specific config; `UPGRADE_BASE_VERSION` removed from discourse; discourse migrated; unit tests
for the new surface; discourse upgrade tier GREEN locally with proof the head ran the real official
image (`discourse/discourse:3.5.3`, NOT bitnamilegacy) and no `sidekiq` service post-deploy.
**WHERE (commit e1b32ea on origin/main):**
- `runner/run_recipe_ci.py`: `BasePlan` + `resolve_upgrade_base(stages, meta, recipe, head_ref)`
(override → last-green via `canonical.read_registry` → main-tip via `lifecycle.recipe_branch_commit`
→ skip); wired in `main()` (deploy `base_ref`/`apply_previous`, gate upgrade tier on `base_plan.runs`).
- `runner/harness/lifecycle.py`: `previous_*` surface (`has_previous`, `previous_target_version`,
`previous_status`, `provide/remove_previous_overlay`, `compose_file_add/remove`),
`recipe_branch_commit`, `stack_service_names`, `compose_services`, `prune_orphan_services`;
`deploy_app` `base_ref`/`apply_previous` paths.
- `runner/harness/generic.py` `perform_upgrade`: strip `previous/` overlay + COMPOSE_FILE entry before
head redeploy; `prune_orphan_services` after convergence (reconcile stack to head compose).
- `tests/discourse/compose.ccci.yml`: ENVIRONMENTAL-only (`app.deploy.update_config.order: stop-first`;
bitnamilegacy pins + `sidekiq` removed). `tests/discourse/recipe_meta.py`: `UPGRADE_BASE_VERSION` removed.
- `tests/discourse/test_upgrade.py`: asserts head image == official 3.5.3 (not bitnamilegacy) + no sidekiq.
- Unit: `tests/unit/test_upgrade_base.py` (resolver matrix), `tests/unit/test_previous.py` (previous/ +
COMPOSE_FILE layering).
**HOW to verify (cold, from a fresh clone at e1b32ea):**
1. Unit (prevb surface): `cc-ci-run -m pytest tests/unit/test_upgrade_base.py tests/unit/test_previous.py tests/unit/test_meta.py -q`.
2. e2e: `RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a81802b4d1d6cd1b449ac46cfa16d80730aaa PR=4 STAGES=install,upgrade cc-ci-run runner/run_recipe_ci.py` (HOME=/root).
3. Inspect: `grep -vE '^\s*#' tests/discourse/compose.ccci.yml` (env-only); `grep UPGRADE_BASE_VERSION tests/discourse/recipe_meta.py` (none).
**EXPECTED:**
- Unit: all pass (38 across the 2 prevb files; test_meta clean). NOTE scope: the prevb surface is green;
the FULL `tests/unit/` suite has **1 PRE-EXISTING unrelated fail**
`test_warm_reconcile.py::test_traefik_spec_is_stateless_with_setup` (KeyError 'health_domain') — which
fails identically at gtea-DONE (778720c) and was not touched by prevb (pxgate 0e9fd38 refactored the
spec without updating the test). Out of scope for prevb; flagged to the operator/next phase.
- e2e log shows: `upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)`; base = main-tip
chaos deploy; `prune-orphans: removed 'sidekiq'`;
`upgrade→PR-head: head_ref=ae5a8180 chaos-version=ae5a8180+U version=0.8.1+3.5.0→1.0.0+3.5.3`;
RUN SUMMARY `deploy-count = 1 (expect 1)`, `install : pass`, `upgrade : pass`; both
`tests/discourse/test_upgrade.py` asserts PASS (app image official 3.5.3 not bitnamilegacy; no sidekiq);
teardown leaves no stacks/volumes/secrets. (Level caps at 2/5 because only install,upgrade ran — not a fail.)
**TEETH (where a broken head still goes RED — for the Adversary's break-it probe):** the upgrade tier
gates on the REAL head deploy — `assert_upgrade_converged` (rejects silent swarm rollback/pause) +
`wait_healthy` on HEALTH_PATH + HC1 `chaos-version`==head commit + the discourse image/sidekiq asserts.
Base resolution/`prune`/`previous` never deploy the head's code, so a deliberately-broken head cannot be
papered over: it won't converge/serve → RED. `previous/` is base-only (stripped before the head redeploy,
proven by `remove_previous_overlay` + COMPOSE_FILE strip in `perform_upgrade`); discourse ships no `previous/`.
## Ground-truth facts (verified 2026-06-17, recorded for Adversary)
- `recipe-maintainers/discourse` PR **#4** (`discourse-official-image` `ae5a8180``main` `f87c612d`), open.