8.9 KiB
STATUS — phase prevb (dynamic upgrade base + per-recipe previous/)
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md.
State files: this + BACKLOG-prevb.md, REVIEW-prevb.md (Adversary), JOURNAL-prevb.md. DECISIONS.md shared.
Phase
Started 2026-06-17. Gates: M1 (implemented + green locally), M2 (proven in real CI + spot-check).
Now
- M1: PASS @2026-06-17T01:03Z (
dbc7a3b), no VETO. - Gate: M2 CLAIMED, awaiting Adversary (claim commit below).
Gate: M2 — CLAIMED @2026-06-17T01:40Z (HEAD = this commit)
WHAT (DoD §4 M2): discourse PR #4 !testme GREEN in real CI with evidence the head genuinely ran
discourse/discourse:3.5.3 (migration exercised); a representative spot-check (≥3 other upgrade-tier
recipes) still green under dynamic base; levels/records reconciled. (M1 already PASS.)
WHERE / evidence (durable, host-shared — Adversary can cold-read):
- discourse PR #4 real-CI run = Drone build 717 (triggered by my
!testmecomment 14597 → bridge reply 717 → bridge final comment "🌻 cc-ci — discourse @ ae5a8180 ✅ passed"). Artifacts at/var/lib/cc-ci-runs/717/:results.json(level 4/5),junit/(10 suites), badge/summary/screenshot. - Spot-check run logs (Builder clone):
/root/prevb-deploy/run-prevb-{cryptpad,keycloak,hedgedoc}.log. - Code under test = cc-ci@main (Drone builds main; prevb code through HEAD).
HOW to verify (cold):
- Re-read 717 junit cold: every suite
failures="0" errors="0"—for f in /var/lib/cc-ci-runs/717/junit/*.xml; do grep -o 'failures="[0-9]*" errors="[0-9]*"' $f; done(10 suites: install / upgrade generic+cc-ci / backup generic+cc-ci / restore generic+cc-ci / custom create_topic+health_check+site_basic). - Head-image proof in
upgrade__cc-ci__test_upgrade.xml:test_head_runs_official_image_not_bitnamilegacytest_sidekiq_service_dropped_by_headboth present, no<failure>.
- (Optional, strongest) re-trigger: comment
!testmeon discourse PR #4 → a fresh Drone build → ✅ passed. - Spot-check generality: from a clean clone,
RECIPE=cryptpad SRC=recipe-maintainers/cryptpad REF=<pr5 head> PR=5 STAGES=install,upgrade cc-ci-run runner/run_recipe_ci.py(and keycloak PR#3 base=master, hedgedoc PR#1) → eachupgrade base: kind=ref … main tip, install:pass upgrade:pass, clean teardown.
EXPECTED:
- 717: all 10 junit suites 0-fail;
install/upgrade/backup/restore/customall pass; level 4/5 (the 5th level is capped by the discourse recipelintrung R011 — a rung, NOT a gate, and a recipe-level nit on the head, not a cc-ci/prevb failure; the run is GREEN and the bridge marked PR#4 "✅ passed"). Head app image = officialdiscourse/discourse:3.5.3(not bitnamilegacy); nosidekiq. - Spot-checks: cryptpad #5 (kind=ref main-tip 36ee3451;
test_upgrade_preserves_datapass), keycloak #3 (kind=ref main-tip 12ac6db8 via origin/master fallback;test_upgrade_preserves_realmpass;prune-orphanssafe-skip when compose unresolvable), hedgedoc #1 (kind=ref main-tip 09bf4d54). All install:pass upgrade:pass deploy-count=1, clean teardown. - A NEW discourse custom-tier fix was required by prevb (not a regression): once the head genuinely
runs the official image,
tests/discourse/custom/_discourse.py::mint_admin(hardcoded the bitnamilegacy path) had to become image-agnostic (b66abc4) —/var/www/discourse+ DB-password re-export from the secret. Verified green in 717's custom tier. No test weakened.
SCOPE (next phase, not M2): full all-recipe regression = phase regall. Pre-existing host orphan
warm-keycloak stack (a warm-* domain, not created by any PR run) predates prevb — left untouched.
Gate: M1 — CLAIMED @2026-06-17T00:40Z (HEAD e1b32ea)
WHAT (DoD §4 M1): dynamic upgrade-base resolution (last-green → main-tip → skip); previous/
discovery + base-only application + version-guard/stale-flag; environmental overlay separated from
version-specific config; UPGRADE_BASE_VERSION removed from discourse; discourse migrated; unit tests
for the new surface; discourse upgrade tier GREEN locally with proof the head ran the real official
image (discourse/discourse:3.5.3, NOT bitnamilegacy) and no sidekiq service post-deploy.
WHERE (commit e1b32ea on origin/main):
runner/run_recipe_ci.py:BasePlan+resolve_upgrade_base(stages, meta, recipe, head_ref)(override → last-green viacanonical.read_registry→ main-tip vialifecycle.recipe_branch_commit→ skip); wired inmain()(deploybase_ref/apply_previous, gate upgrade tier onbase_plan.runs).runner/harness/lifecycle.py:previous_*surface (has_previous,previous_target_version,previous_status,provide/remove_previous_overlay,compose_file_add/remove),recipe_branch_commit,stack_service_names,compose_services,prune_orphan_services;deploy_appbase_ref/apply_previouspaths.runner/harness/generic.pyperform_upgrade: stripprevious/overlay + COMPOSE_FILE entry before head redeploy;prune_orphan_servicesafter convergence (reconcile stack to head compose).tests/discourse/compose.ccci.yml: ENVIRONMENTAL-only (app.deploy.update_config.order: stop-first; bitnamilegacy pins +sidekiqremoved).tests/discourse/recipe_meta.py:UPGRADE_BASE_VERSIONremoved.tests/discourse/test_upgrade.py: asserts head image == official 3.5.3 (not bitnamilegacy) + no sidekiq.- Unit:
tests/unit/test_upgrade_base.py(resolver matrix),tests/unit/test_previous.py(previous/ + COMPOSE_FILE layering).
HOW to verify (cold, from a fresh clone at e1b32ea):
- Unit (prevb surface):
cc-ci-run -m pytest tests/unit/test_upgrade_base.py tests/unit/test_previous.py tests/unit/test_meta.py -q. - e2e:
RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a81802b4d1d6cd1b449ac46cfa16d80730aaa PR=4 STAGES=install,upgrade cc-ci-run runner/run_recipe_ci.py(HOME=/root). - Inspect:
grep -vE '^\s*#' tests/discourse/compose.ccci.yml(env-only);grep UPGRADE_BASE_VERSION tests/discourse/recipe_meta.py(none).
EXPECTED:
- Unit: all pass (38 across the 2 prevb files; test_meta clean). NOTE scope: the prevb surface is green;
the FULL
tests/unit/suite has 1 PRE-EXISTING unrelated fail —test_warm_reconcile.py::test_traefik_spec_is_stateless_with_setup(KeyError 'health_domain') — which fails identically at gtea-DONE (778720c) and was not touched by prevb (pxgate0e9fd38refactored the spec without updating the test). Out of scope for prevb; flagged to the operator/next phase. - e2e log shows:
upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip); base = main-tip chaos deploy;prune-orphans: removed 'sidekiq';upgrade→PR-head: head_ref=ae5a8180 chaos-version=ae5a8180+U version=0.8.1+3.5.0→1.0.0+3.5.3; RUN SUMMARYdeploy-count = 1 (expect 1),install : pass,upgrade : pass; bothtests/discourse/test_upgrade.pyasserts PASS (app image official 3.5.3 not bitnamilegacy; no sidekiq); teardown leaves no stacks/volumes/secrets. (Level caps at 2/5 because only install,upgrade ran — not a fail.)
TEETH (where a broken head still goes RED — for the Adversary's break-it probe): the upgrade tier
gates on the REAL head deploy — assert_upgrade_converged (rejects silent swarm rollback/pause) +
wait_healthy on HEALTH_PATH + HC1 chaos-version==head commit + the discourse image/sidekiq asserts.
Base resolution/prune/previous never deploy the head's code, so a deliberately-broken head cannot be
papered over: it won't converge/serve → RED. previous/ is base-only (stripped before the head redeploy,
proven by remove_previous_overlay + COMPOSE_FILE strip in perform_upgrade); discourse ships no previous/.
Ground-truth facts (verified 2026-06-17, recorded for Adversary)
recipe-maintainers/discoursePR #4 (discourse-official-imageae5a8180→mainf87c612d), open.- main (
compose.yml):app/sidekiqimage =bitnamilegacy/discourse:3.5.0;apphealthcheckstart_period: 20m;app.deploy.update_config.order: start-first;sidekiqservice present. - PR #4 head:
app.image = discourse/discourse:3.5.3(official),sidekiqservice deleted, loadbalancer port 3000→80, official-image entrypoint wrappers added. (PR.diffconfirms both.) - Published tags max =
0.7.0+3.3.1; main (3.5.0) is AHEAD of all tags → main-tip is a branch ref, not a tag.
- main (
- Current
tests/discourse/compose.ccci.ymlre-pinsapp+sidekiqtobitnamilegacy/discourse:3.3.1, re-addssidekiq, setsstart_period:20m,order:stop-first— applied to ALL deploys viaEXTRA_ENV.COMPOSE_FILE→ forces the PR head back to bitnamilegacy:3.3.1 + sidekiq (the bug). - Note vs plan §3 prose: main is
bitnamilegacy:3.5.0, not3.3.1(main advanced); thesis unchanged — the base (last-green/main, bitnamilegacy 3.5.0) deploys clean, NOprevious/needed for discourse.
Blocked
(none)