Files
cc-ci/machine-docs/REVIEW-prevb.md

125 lines
10 KiB
Markdown

# REVIEW — phase `prevb` (Adversary verdicts)
Append-only. Gates this phase: **M1** (implemented + green locally), **M2** (proven in real CI + spot-check).
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md`.
## Status
- 2026-06-16T23:57Z — Adversary live for `prevb`. No Builder claim yet (no STATUS-prevb.md, no `claim(`).
Cold-start recon done: baseline mechanism understood —
- base resolution: `run_recipe_ci.upgrade_base``meta.UPGRADE_BASE_VERSION or lifecycle.previous_version` (`vers[-2]`); discourse pins `0.7.0+3.3.1`.
- overlay `tests/discourse/compose.ccci.yml` applied to ALL deploys via `EXTRA_ENV.COMPOSE_FILE`; fuses environmental (start_period 20m, order stop-first) + version-specific (bitnamilegacy image pin + sidekiq block) — the bug.
- existing unit tests to watch for weakening: `tests/unit/test_upgrade_base.py`, `tests/unit/test_meta.py`.
Idle until a gate is CLAIMED.
- 2026-06-17T00:12Z — Independently cold-verified the Builder's STATUS ground-truth facts via gitea API
(NOT trusting STATUS): PR #4 head `ae5a81802b4d1d6cd1b449ac46cfa16d80730aaa` `compose.yml`
`app.image = discourse/discourse:3.5.3`, **no `sidekiq` service**; `.diff` shows
`-bitnamilegacy/discourse:3.5.0``+discourse/discourse:3.5.3` + full `sidekiq:` block removed.
main → `app`+`sidekiq` = `bitnamilegacy/discourse:3.5.0`, sidekiq present, base `f87c612d`.
Facts CONFIRMED. (Caution noted: gitea `raw?ref=<shortsha>` silently falls back to default branch —
must use the FULL sha when cold-verifying head content.) Foundation for "discourse needs no previous/" holds.
## Pre-review (M1 code, gate NOT yet CLAIMED — preliminary recon, not a verdict)
2026-06-17T00:30Z — studied the M1 `feat` commit bb2e3c6 (code/diff only, NOT JOURNAL). Design looks sound:
- `resolve_upgrade_base` → BasePlan(kind, version, ref, reason): override → last-green (`canonical.read_registry`)
→ main-tip (`recipe_branch_commit`) → skip. `.runs` gates the upgrade tier. head_ref = `recipe_head_commit`.
- `previous/` surface (lifecycle): `has_previous`, `previous_target_version` (VERSION marker), `previous_status`
(version-guarded apply/stale), provide/remove overlay, compose_file add/remove. Base-only; **stripped before
head redeploy** (`generic.perform_upgrade``remove_previous_overlay` + COMPOSE_FILE strip). Good teeth.
- discourse migrated: `compose.ccci.yml` now ENVIRONMENTAL-ONLY (`order: stop-first`); bitnamilegacy pins +
sidekiq + UPGRADE_BASE_VERSION **removed**. `test_upgrade.py` asserts running `app` image == official
`discourse/discourse:3.5.3` (not bitnamilegacy) + sidekiq gone; resolves as the upgrade-tier overlay
(`resolve_overlay_op``test_{op}.py`), run as its own pytest → rc!=0 fails the tier. Real teeth confirmed.
- **Unit tests run cold (nix pytest): 63 passed** (test_upgrade_base + test_previous + test_meta). Matrix
EXPANDED, not weakened (override-wins / last-green-primary / main-tip-fallback / head==main-tip skip / no-pred skip).
STILL REQUIRED for the formal M1 PASS (needs the Builder's e2e claim + my cold acceptance run):
(a) discourse upgrade tier GREEN locally with proof the head ran real 3.5.3 (not bitnamilegacy) + no sidekiq;
(b) BREAK-IT: a deliberately-broken head still fails the upgrade tier (base resolution didn't paper over it);
(c) base falls back to main when last-green absent (unit-covered; e2e desirable);
(d) `previous/` ignored for the head (code-confirmed; e2e desirable).
## Adversary findings (pre-review notes)
- [F-prevb-A] (PRE-EXISTING, NOT a prevb regression; INFO) `tests/unit/test_warm_reconcile.py::
test_traefik_spec_is_stateless_with_setup` is RED on main — `KeyError: 'health_domain'`. Fails identically at
the gtea-DONE commit 778720c (verified by checkout), and the prevb feat never touched warm_reconcile — the
`pxgate-M1` traefik-probe change (0e9fd38) refactored the spec without updating this test. Out of prevb scope,
but it means the FULL `tests/unit/` suite is NOT all-green (283 pass / 1 fail). Flagging so "unit green" claims
are scoped honestly. Not an M1 blocker.
- [F-prevb-B] (NIT) old `test_expected_na_other_rung_does_not_suppress` was dropped in the rewrite; the behavior
(an EXPECTED_NA for a non-upgrade rung must not suppress the base) is preserved via `.get("upgrade")` but no
longer has a dedicated test. Low risk; consider re-adding one line of coverage.
## M1 cold acceptance — IN FLIGHT (2026-06-17T00:42Z)
Gate M1 CLAIMED @00:40Z (code commit e1b32ea; claim commit bb79e91 = machine-docs only). Cold-verifying from a
FRESH clone on cc-ci (`/root/cc-ci-adv-prevb` @ bb79e91), not the Builder's tree.
Done so far (cold):
- prevb unit surface: **64 passed** (`test_upgrade_base`+`test_previous`+`test_meta`) via nix pytest.
- statics: `compose.ccci.yml` env-only (`order: stop-first`); discourse `recipe_meta.py` has NO `UPGRADE_BASE_VERSION` assignment.
- `prune_orphan_services` reviewed: removes only services NOT in the head compose → cannot mask the prevb bug
(if overlay leaked sidekiq into the head compose it'd be in `defined` → not pruned → test RED). Teeth preserved.
- e2e launched (`RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a8180… PR=4 STAGES=install,upgrade`),
run `manual-1344943`. Early log CONFIRMS `upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)`
→ base = main-tip chaos deploy (matches claim). Base deploy (main-tip, has the known sidekiq depends_on bug)
in progress; observed a non-fatal `lint rung: fail R011` on the base — watching whether it blocks.
- CONCURRENCY observed: a Builder keycloak spot-check (PR#3) runs simultaneously in `/root/prevb-deploy`. My
discourse run's janitor saw the keycloak lock and LEFT IT (`live concurrent run, leaving it`) — per-run
ABRA_DIR isolation holding. Watching for memory-pressure false-failures on the shared 7GB node.
UPDATE 2026-06-17T01:00Z (post-reboot, cold re-check of completed run):
- e2e `manual-1344943` COMPLETED **GREEN** (read full log /root/cc-ci-adv-prevb-e2e.log): `upgrade base:
kind=ref ref=f87c612d71b4 (target-branch (main) tip)`; `upgrade→PR-head head_ref=ae5a8180`;
generic `test_upgrade_reconverges` PASSED; discourse `test_head_runs_official_image_not_bitnamilegacy`
PASSED + `test_sidekiq_service_dropped_by_head` PASSED; RUN SUMMARY deploy-count=1 (expect 1),
install:pass upgrade:pass, level=2/5. Matches STATUS EXPECTED exactly.
- TEARDOWN clean: `docker stack ls` shows NO discourse stack; no discourse secrets/volumes. (warm-keycloak
stack present = Builder's concurrent spot-check, not mine.)
- BREAK-IT: my first probe (`manual-1357729`, broken-head ref 94ebaaa = head image
`discourse/discourse:99.99.99-adversary-broken`) was SIGTERM-killed mid-base-deploy by MY reboot — INCOMPLETE.
RE-LAUNCHED as `manual-1360025` (same broken head, base resolving to main-tip f87c612d as expected). In flight.
STILL TO CONFIRM: break-it `manual-1360025` → upgrade tier RED (broken head not papered over).
## Verdicts
### M1: PASS @2026-06-17T01:03Z (code commit e1b32ea / claim bb79e91)
Cold-verified from a fresh clone on cc-ci (`/root/cc-ci-adv-prevb`), independent of the Builder's tree.
Every M1 DoD item (plan §4) re-executed and confirmed:
1. **Dynamic base resolution (last-green → main-tip → skip).** e2e `manual-1344943` log: `upgrade base:
kind=ref ref=f87c612d71b4 (target-branch (main) tip)` — correctly falls back to main-tip (discourse has
NO last-green warm canonical and its only published tag is 0.7.0, behind main). Unit matrix re-run cold
(nix pytest, **64 passed**): override-wins / last-green-primary / main-tip-fallback / head==main-tip skip /
no-predecessor skip. Matrix EXPANDED vs old `upgrade_base`, not weakened.
2. **`previous/` surface** (discovery + base-only application + version-guard/stale-flag): unit-covered
(`test_previous`), code-confirmed base-only (stripped before head redeploy via `perform_upgrade` →
`remove_previous_overlay` + COMPOSE_FILE strip). discourse ships NO `previous/` (base deploys clean) —
matches plan §3 thesis.
3. **Environmental vs version-specific separated.** `tests/discourse/compose.ccci.yml` is env-only
(`app.deploy.update_config.order: stop-first`); bitnamilegacy image pins + `sidekiq` block removed;
`UPGRADE_BASE_VERSION` removed from `recipe_meta.py` (grep: none). Verified statically in cold clone.
4. **discourse migrated** — confirmed via #3 + e2e behaviour.
5. **discourse upgrade tier GREEN locally w/ proof head ran the REAL official image.** e2e `manual-1344943`:
generic `test_upgrade_reconverges` PASSED; discourse `test_head_runs_official_image_not_bitnamilegacy`
PASSED + `test_sidekiq_service_dropped_by_head` PASSED; RUN SUMMARY deploy-count=1 (expect 1),
install:pass, upgrade:pass, level=2/5. `upgrade→PR-head head_ref=ae5a8180 version=0.8.1+3.5.0→1.0.0+3.5.3`.
6. **TEETH — deliberately-broken head still goes RED (base resolution did NOT paper it over).** Break-it
probe `manual-1360025`: broken-head commit `94ebaaa` sets head `app.image =
discourse/discourse:99.99.99-adversary-broken`. Base resolved to main-tip f87c612d (same as GREEN run),
**install:pass**, then the HEAD redeploy failed: `prepull: docker pull
discourse/discourse:99.99.99-adversary-broken failed — manifest unknown` → **upgrade:fail (level 1/5)**.
Proves the head's real (broken) image is what gets deployed; base/prune/previous machinery cannot mask a
broken head.
7. **Clean teardown** after BOTH the GREEN run and the broken/failed run: `docker stack ls` / `secret ls` /
`volume ls` show NO discourse stack, secrets, or volumes. (warm-keycloak stack present = Builder's
concurrent spot-check, not discourse.)
8. **No test weakened.** F-prevb-B addressed — `test_expected_na_other_rung_does_not_suppress_upgrade`
re-added (commit e1b32ea), present in cold clone. Net coverage up (+ resolver matrix + previous/ layering).
SCOPE CAVEAT (not an M1 blocker): the FULL `tests/unit/` suite has 1 PRE-EXISTING unrelated red —
`test_warm_reconcile.py::test_traefik_spec_is_stateless_with_setup` (KeyError 'health_domain'), failing
identically at gtea-DONE 778720c, untouched by prevb (see [F-prevb-A]). prevb's own surface is all-green.
(JOURNAL not consulted before this verdict, per anti-anchoring. M1 stands on the plan, the code/diff, the
STATUS verification info, and my own cold re-runs.)
## Open VETOes
(none)