138 lines
10 KiB
Markdown
138 lines
10 KiB
Markdown
# STATUS — phase `prevb` (dynamic upgrade base + per-recipe `previous/`)
|
|
|
|
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md`.
|
|
State files: this + BACKLOG-prevb.md, REVIEW-prevb.md (Adversary), JOURNAL-prevb.md. DECISIONS.md shared.
|
|
|
|
## Phase
|
|
Started 2026-06-17. Gates: **M1** (implemented + green locally), **M2** (proven in real CI + spot-check).
|
|
|
|
## DONE
|
|
|
|
Phase `prevb` COMPLETE @2026-06-17 — every Definition-of-Done item (plan §6) Adversary-verified with a
|
|
fresh PASS, no VETO:
|
|
- **M1: PASS** @2026-06-17T01:03Z (REVIEW-prevb, dbc7a3b) — dynamic base (last-green→main-tip→skip),
|
|
`previous/` base-only mechanism + version-guard/stale-flag, environmental vs version-specific overlay
|
|
split, discourse migrated off the static base + leaky overlay, unit tests; TEETH confirmed (deliberately
|
|
broken head → upgrade RED — base/prune/previous cannot mask it).
|
|
- **M2: PASS** @2026-06-17T01:58Z (REVIEW-prevb, 1c3ba71) — discourse PR#4 `!testme` GREEN in REAL CI
|
|
(Drone **717**, bridge "✅ passed"; all 5 tiers junit 0-fail) with live-swarm-image proof the head ran
|
|
the official `discourse/discourse:3.5.3` (NOT bitnamilegacy) and `sidekiq` dropped → the official-image
|
|
migration is finally tested; 3 spot-checks green under dynamic base (cryptpad#5 data-continuity,
|
|
keycloak#3 realm-continuity via origin/master fallback, hedgedoc#1) incl. the Adversary's own cryptpad
|
|
re-run; public surface secret-clean; levels/records reconciled; nothing merged on any mirror.
|
|
|
|
Notable in-scope fix surfaced by the phase: once the head genuinely ran the official image,
|
|
`tests/discourse/custom/_discourse.py::mint_admin` (hardcoded the bitnamilegacy container path) had to
|
|
become image-agnostic (b66abc4) — a direct, correct consequence of stopping the overlay from reverting
|
|
the head. No test weakened.
|
|
|
|
Carried forward (not prevb blockers): full all-recipe regression → phase `regall`; pre-existing unrelated
|
|
red `test_warm_reconcile::test_traefik_spec_is_stateless_with_setup` [F-prevb-A]; low-sev mint_admin
|
|
ApiKey-in-RAW-log [F-prevb-C] filed in DEFERRED.md.
|
|
|
|
## Gate: M2 — CLAIMED @2026-06-17T01:40Z (HEAD = this commit)
|
|
|
|
**WHAT (DoD §4 M2):** discourse PR #4 `!testme` GREEN in real CI with evidence the head genuinely ran
|
|
`discourse/discourse:3.5.3` (migration exercised); a representative spot-check (≥3 other upgrade-tier
|
|
recipes) still green under dynamic base; levels/records reconciled. (M1 already PASS.)
|
|
|
|
**WHERE / evidence (durable, host-shared — Adversary can cold-read):**
|
|
- **discourse PR #4 real-CI run = Drone build 717** (triggered by my `!testme` comment 14597 → bridge
|
|
reply 717 → bridge final comment "🌻 cc-ci — discourse @ ae5a8180 ✅ **passed**"). Artifacts at
|
|
`/var/lib/cc-ci-runs/717/`: `results.json` (**level 4/5**), `junit/` (10 suites), badge/summary/screenshot.
|
|
- Spot-check run logs (Builder clone): `/root/prevb-deploy/run-prevb-{cryptpad,keycloak,hedgedoc}.log`.
|
|
- Code under test = cc-ci@main (Drone builds main; prevb code through HEAD).
|
|
|
|
**HOW to verify (cold):**
|
|
1. Re-read 717 junit cold: every suite `failures="0" errors="0"` —
|
|
`for f in /var/lib/cc-ci-runs/717/junit/*.xml; do grep -o 'failures="[0-9]*" errors="[0-9]*"' $f; done`
|
|
(10 suites: install / upgrade generic+cc-ci / backup generic+cc-ci / restore generic+cc-ci /
|
|
custom create_topic+health_check+site_basic).
|
|
2. Head-image proof in `upgrade__cc-ci__test_upgrade.xml`: `test_head_runs_official_image_not_bitnamilegacy`
|
|
+ `test_sidekiq_service_dropped_by_head` both present, no `<failure>`.
|
|
3. (Optional, strongest) re-trigger: comment `!testme` on discourse PR #4 → a fresh Drone build → ✅ passed.
|
|
4. Spot-check generality: from a clean clone, `RECIPE=cryptpad SRC=recipe-maintainers/cryptpad REF=<pr5 head> PR=5 STAGES=install,upgrade cc-ci-run runner/run_recipe_ci.py`
|
|
(and keycloak PR#3 base=master, hedgedoc PR#1) → each `upgrade base: kind=ref … main tip`, install:pass upgrade:pass, clean teardown.
|
|
|
|
**EXPECTED:**
|
|
- 717: all 10 junit suites 0-fail; `install/upgrade/backup/restore/custom` all pass; level 4/5
|
|
(the 5th level is capped by the discourse *recipe* `lint` rung R011 — a rung, NOT a gate, and a
|
|
recipe-level nit on the head, not a cc-ci/prevb failure; the run is GREEN and the bridge marked PR#4
|
|
"✅ passed"). Head app image = official `discourse/discourse:3.5.3` (not bitnamilegacy); no `sidekiq`.
|
|
- Spot-checks: cryptpad #5 (kind=ref main-tip 36ee3451; `test_upgrade_preserves_data` pass),
|
|
keycloak #3 (kind=ref main-tip 12ac6db8 via origin/master fallback; `test_upgrade_preserves_realm` pass;
|
|
`prune-orphans` safe-skip when compose unresolvable), hedgedoc #1 (kind=ref main-tip 09bf4d54). All
|
|
install:pass upgrade:pass deploy-count=1, clean teardown.
|
|
- **A NEW discourse custom-tier fix was required by prevb** (not a regression): once the head genuinely
|
|
runs the official image, `tests/discourse/custom/_discourse.py::mint_admin` (hardcoded the bitnamilegacy
|
|
path) had to become image-agnostic (b66abc4) — `/var/www/discourse` + DB-password re-export from the
|
|
secret. Verified green in 717's custom tier. No test weakened.
|
|
|
|
**SCOPE (next phase, not M2):** full all-recipe regression = phase `regall`. Pre-existing host orphan
|
|
`warm-keycloak` stack (a warm-* domain, not created by any PR run) predates prevb — left untouched.
|
|
|
|
## Gate: M1 — CLAIMED @2026-06-17T00:40Z (HEAD e1b32ea)
|
|
|
|
**WHAT (DoD §4 M1):** dynamic upgrade-base resolution (last-green → main-tip → skip); `previous/`
|
|
discovery + base-only application + version-guard/stale-flag; environmental overlay separated from
|
|
version-specific config; `UPGRADE_BASE_VERSION` removed from discourse; discourse migrated; unit tests
|
|
for the new surface; discourse upgrade tier GREEN locally with proof the head ran the real official
|
|
image (`discourse/discourse:3.5.3`, NOT bitnamilegacy) and no `sidekiq` service post-deploy.
|
|
|
|
**WHERE (commit e1b32ea on origin/main):**
|
|
- `runner/run_recipe_ci.py`: `BasePlan` + `resolve_upgrade_base(stages, meta, recipe, head_ref)`
|
|
(override → last-green via `canonical.read_registry` → main-tip via `lifecycle.recipe_branch_commit`
|
|
→ skip); wired in `main()` (deploy `base_ref`/`apply_previous`, gate upgrade tier on `base_plan.runs`).
|
|
- `runner/harness/lifecycle.py`: `previous_*` surface (`has_previous`, `previous_target_version`,
|
|
`previous_status`, `provide/remove_previous_overlay`, `compose_file_add/remove`),
|
|
`recipe_branch_commit`, `stack_service_names`, `compose_services`, `prune_orphan_services`;
|
|
`deploy_app` `base_ref`/`apply_previous` paths.
|
|
- `runner/harness/generic.py` `perform_upgrade`: strip `previous/` overlay + COMPOSE_FILE entry before
|
|
head redeploy; `prune_orphan_services` after convergence (reconcile stack to head compose).
|
|
- `tests/discourse/compose.ccci.yml`: ENVIRONMENTAL-only (`app.deploy.update_config.order: stop-first`;
|
|
bitnamilegacy pins + `sidekiq` removed). `tests/discourse/recipe_meta.py`: `UPGRADE_BASE_VERSION` removed.
|
|
- `tests/discourse/test_upgrade.py`: asserts head image == official 3.5.3 (not bitnamilegacy) + no sidekiq.
|
|
- Unit: `tests/unit/test_upgrade_base.py` (resolver matrix), `tests/unit/test_previous.py` (previous/ +
|
|
COMPOSE_FILE layering).
|
|
|
|
**HOW to verify (cold, from a fresh clone at e1b32ea):**
|
|
1. Unit (prevb surface): `cc-ci-run -m pytest tests/unit/test_upgrade_base.py tests/unit/test_previous.py tests/unit/test_meta.py -q`.
|
|
2. e2e: `RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a81802b4d1d6cd1b449ac46cfa16d80730aaa PR=4 STAGES=install,upgrade cc-ci-run runner/run_recipe_ci.py` (HOME=/root).
|
|
3. Inspect: `grep -vE '^\s*#' tests/discourse/compose.ccci.yml` (env-only); `grep UPGRADE_BASE_VERSION tests/discourse/recipe_meta.py` (none).
|
|
|
|
**EXPECTED:**
|
|
- Unit: all pass (38 across the 2 prevb files; test_meta clean). NOTE scope: the prevb surface is green;
|
|
the FULL `tests/unit/` suite has **1 PRE-EXISTING unrelated fail** —
|
|
`test_warm_reconcile.py::test_traefik_spec_is_stateless_with_setup` (KeyError 'health_domain') — which
|
|
fails identically at gtea-DONE (778720c) and was not touched by prevb (pxgate 0e9fd38 refactored the
|
|
spec without updating the test). Out of scope for prevb; flagged to the operator/next phase.
|
|
- e2e log shows: `upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)`; base = main-tip
|
|
chaos deploy; `prune-orphans: removed 'sidekiq'`;
|
|
`upgrade→PR-head: head_ref=ae5a8180 chaos-version=ae5a8180+U version=0.8.1+3.5.0→1.0.0+3.5.3`;
|
|
RUN SUMMARY `deploy-count = 1 (expect 1)`, `install : pass`, `upgrade : pass`; both
|
|
`tests/discourse/test_upgrade.py` asserts PASS (app image official 3.5.3 not bitnamilegacy; no sidekiq);
|
|
teardown leaves no stacks/volumes/secrets. (Level caps at 2/5 because only install,upgrade ran — not a fail.)
|
|
|
|
**TEETH (where a broken head still goes RED — for the Adversary's break-it probe):** the upgrade tier
|
|
gates on the REAL head deploy — `assert_upgrade_converged` (rejects silent swarm rollback/pause) +
|
|
`wait_healthy` on HEALTH_PATH + HC1 `chaos-version`==head commit + the discourse image/sidekiq asserts.
|
|
Base resolution/`prune`/`previous` never deploy the head's code, so a deliberately-broken head cannot be
|
|
papered over: it won't converge/serve → RED. `previous/` is base-only (stripped before the head redeploy,
|
|
proven by `remove_previous_overlay` + COMPOSE_FILE strip in `perform_upgrade`); discourse ships no `previous/`.
|
|
|
|
## Ground-truth facts (verified 2026-06-17, recorded for Adversary)
|
|
- `recipe-maintainers/discourse` PR **#4** (`discourse-official-image` `ae5a8180` → `main` `f87c612d`), open.
|
|
- **main** (`compose.yml`): `app`/`sidekiq` image = `bitnamilegacy/discourse:3.5.0`; `app` healthcheck
|
|
`start_period: 20m`; `app.deploy.update_config.order: start-first`; `sidekiq` service present.
|
|
- **PR #4 head**: `app.image = discourse/discourse:3.5.3` (official), **`sidekiq` service deleted**,
|
|
loadbalancer port 3000→80, official-image entrypoint wrappers added. (PR `.diff` confirms both.)
|
|
- Published tags max = `0.7.0+3.3.1`; main (3.5.0) is AHEAD of all tags → main-tip is a branch ref, not a tag.
|
|
- Current `tests/discourse/compose.ccci.yml` re-pins `app`+`sidekiq` to `bitnamilegacy/discourse:3.3.1`,
|
|
re-adds `sidekiq`, sets `start_period:20m`, `order:stop-first` — applied to ALL deploys via
|
|
`EXTRA_ENV.COMPOSE_FILE` → forces the PR head back to bitnamilegacy:3.3.1 + sidekiq (the bug).
|
|
- Note vs plan §3 prose: main is `bitnamilegacy:3.5.0`, not `3.3.1` (main advanced); thesis unchanged —
|
|
the base (last-green/main, bitnamilegacy 3.5.0) deploys clean, NO `previous/` needed for discourse.
|
|
|
|
## Blocked
|
|
(none)
|