Files
cc-ci/machine-docs/STATUS-prevb.md
2026-06-17 00:37:23 +00:00

5.6 KiB

STATUS — phase prevb (dynamic upgrade base + per-recipe previous/)

SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md. State files: this + BACKLOG-prevb.md, REVIEW-prevb.md (Adversary), JOURNAL-prevb.md. DECISIONS.md shared.

Phase

Started 2026-06-17. Gates: M1 (implemented + green locally), M2 (proven in real CI + spot-check).

Now

  • Gate: M1 CLAIMED, awaiting Adversary. (claim commit below.)

Gate: M1 — CLAIMED @2026-06-17T00:40Z (HEAD e1b32ea)

WHAT (DoD §4 M1): dynamic upgrade-base resolution (last-green → main-tip → skip); previous/ discovery + base-only application + version-guard/stale-flag; environmental overlay separated from version-specific config; UPGRADE_BASE_VERSION removed from discourse; discourse migrated; unit tests for the new surface; discourse upgrade tier GREEN locally with proof the head ran the real official image (discourse/discourse:3.5.3, NOT bitnamilegacy) and no sidekiq service post-deploy.

WHERE (commit e1b32ea on origin/main):

  • runner/run_recipe_ci.py: BasePlan + resolve_upgrade_base(stages, meta, recipe, head_ref) (override → last-green via canonical.read_registry → main-tip via lifecycle.recipe_branch_commit → skip); wired in main() (deploy base_ref/apply_previous, gate upgrade tier on base_plan.runs).
  • runner/harness/lifecycle.py: previous_* surface (has_previous, previous_target_version, previous_status, provide/remove_previous_overlay, compose_file_add/remove), recipe_branch_commit, stack_service_names, compose_services, prune_orphan_services; deploy_app base_ref/apply_previous paths.
  • runner/harness/generic.py perform_upgrade: strip previous/ overlay + COMPOSE_FILE entry before head redeploy; prune_orphan_services after convergence (reconcile stack to head compose).
  • tests/discourse/compose.ccci.yml: ENVIRONMENTAL-only (app.deploy.update_config.order: stop-first; bitnamilegacy pins + sidekiq removed). tests/discourse/recipe_meta.py: UPGRADE_BASE_VERSION removed.
  • tests/discourse/test_upgrade.py: asserts head image == official 3.5.3 (not bitnamilegacy) + no sidekiq.
  • Unit: tests/unit/test_upgrade_base.py (resolver matrix), tests/unit/test_previous.py (previous/ + COMPOSE_FILE layering).

HOW to verify (cold, from a fresh clone at e1b32ea):

  1. Unit (prevb surface): cc-ci-run -m pytest tests/unit/test_upgrade_base.py tests/unit/test_previous.py tests/unit/test_meta.py -q.
  2. e2e: RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a81802b4d1d6cd1b449ac46cfa16d80730aaa PR=4 STAGES=install,upgrade cc-ci-run runner/run_recipe_ci.py (HOME=/root).
  3. Inspect: grep -vE '^\s*#' tests/discourse/compose.ccci.yml (env-only); grep UPGRADE_BASE_VERSION tests/discourse/recipe_meta.py (none).

EXPECTED:

  • Unit: all pass (38 across the 2 prevb files; test_meta clean). NOTE scope: the prevb surface is green; the FULL tests/unit/ suite has 1 PRE-EXISTING unrelated failtest_warm_reconcile.py::test_traefik_spec_is_stateless_with_setup (KeyError 'health_domain') — which fails identically at gtea-DONE (778720c) and was not touched by prevb (pxgate 0e9fd38 refactored the spec without updating the test). Out of scope for prevb; flagged to the operator/next phase.
  • e2e log shows: upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip); base = main-tip chaos deploy; prune-orphans: removed 'sidekiq'; upgrade→PR-head: head_ref=ae5a8180 chaos-version=ae5a8180+U version=0.8.1+3.5.0→1.0.0+3.5.3; RUN SUMMARY deploy-count = 1 (expect 1), install : pass, upgrade : pass; both tests/discourse/test_upgrade.py asserts PASS (app image official 3.5.3 not bitnamilegacy; no sidekiq); teardown leaves no stacks/volumes/secrets. (Level caps at 2/5 because only install,upgrade ran — not a fail.)

TEETH (where a broken head still goes RED — for the Adversary's break-it probe): the upgrade tier gates on the REAL head deploy — assert_upgrade_converged (rejects silent swarm rollback/pause) + wait_healthy on HEALTH_PATH + HC1 chaos-version==head commit + the discourse image/sidekiq asserts. Base resolution/prune/previous never deploy the head's code, so a deliberately-broken head cannot be papered over: it won't converge/serve → RED. previous/ is base-only (stripped before the head redeploy, proven by remove_previous_overlay + COMPOSE_FILE strip in perform_upgrade); discourse ships no previous/.

Ground-truth facts (verified 2026-06-17, recorded for Adversary)

  • recipe-maintainers/discourse PR #4 (discourse-official-image ae5a8180main f87c612d), open.
    • main (compose.yml): app/sidekiq image = bitnamilegacy/discourse:3.5.0; app healthcheck start_period: 20m; app.deploy.update_config.order: start-first; sidekiq service present.
    • PR #4 head: app.image = discourse/discourse:3.5.3 (official), sidekiq service deleted, loadbalancer port 3000→80, official-image entrypoint wrappers added. (PR .diff confirms both.)
    • Published tags max = 0.7.0+3.3.1; main (3.5.0) is AHEAD of all tags → main-tip is a branch ref, not a tag.
  • Current tests/discourse/compose.ccci.yml re-pins app+sidekiq to bitnamilegacy/discourse:3.3.1, re-adds sidekiq, sets start_period:20m, order:stop-first — applied to ALL deploys via EXTRA_ENV.COMPOSE_FILE → forces the PR head back to bitnamilegacy:3.3.1 + sidekiq (the bug).
  • Note vs plan §3 prose: main is bitnamilegacy:3.5.0, not 3.3.1 (main advanced); thesis unchanged — the base (last-green/main, bitnamilegacy 3.5.0) deploys clean, NO previous/ needed for discourse.

Blocked

(none)