Files
cc-ci/machine-docs/REVIEW-prevb.md
2026-06-17 01:00:44 +00:00

7.0 KiB

REVIEW — phase prevb (Adversary verdicts)

Append-only. Gates this phase: M1 (implemented + green locally), M2 (proven in real CI + spot-check). SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md.

Status

  • 2026-06-16T23:57Z — Adversary live for prevb. No Builder claim yet (no STATUS-prevb.md, no claim(). Cold-start recon done: baseline mechanism understood —
    • base resolution: run_recipe_ci.upgrade_basemeta.UPGRADE_BASE_VERSION or lifecycle.previous_version (vers[-2]); discourse pins 0.7.0+3.3.1.
    • overlay tests/discourse/compose.ccci.yml applied to ALL deploys via EXTRA_ENV.COMPOSE_FILE; fuses environmental (start_period 20m, order stop-first) + version-specific (bitnamilegacy image pin + sidekiq block) — the bug.
    • existing unit tests to watch for weakening: tests/unit/test_upgrade_base.py, tests/unit/test_meta.py. Idle until a gate is CLAIMED.
  • 2026-06-17T00:12Z — Independently cold-verified the Builder's STATUS ground-truth facts via gitea API (NOT trusting STATUS): PR #4 head ae5a81802b4d1d6cd1b449ac46cfa16d80730aaa compose.ymlapp.image = discourse/discourse:3.5.3, no sidekiq service; .diff shows -bitnamilegacy/discourse:3.5.0+discourse/discourse:3.5.3 + full sidekiq: block removed. main → app+sidekiq = bitnamilegacy/discourse:3.5.0, sidekiq present, base f87c612d. Facts CONFIRMED. (Caution noted: gitea raw?ref=<shortsha> silently falls back to default branch — must use the FULL sha when cold-verifying head content.) Foundation for "discourse needs no previous/" holds.

Pre-review (M1 code, gate NOT yet CLAIMED — preliminary recon, not a verdict)

2026-06-17T00:30Z — studied the M1 feat commit bb2e3c6 (code/diff only, NOT JOURNAL). Design looks sound:

  • resolve_upgrade_base → BasePlan(kind, version, ref, reason): override → last-green (canonical.read_registry) → main-tip (recipe_branch_commit) → skip. .runs gates the upgrade tier. head_ref = recipe_head_commit.
  • previous/ surface (lifecycle): has_previous, previous_target_version (VERSION marker), previous_status (version-guarded apply/stale), provide/remove overlay, compose_file add/remove. Base-only; stripped before head redeploy (generic.perform_upgraderemove_previous_overlay + COMPOSE_FILE strip). Good teeth.
  • discourse migrated: compose.ccci.yml now ENVIRONMENTAL-ONLY (order: stop-first); bitnamilegacy pins + sidekiq + UPGRADE_BASE_VERSION removed. test_upgrade.py asserts running app image == official discourse/discourse:3.5.3 (not bitnamilegacy) + sidekiq gone; resolves as the upgrade-tier overlay (resolve_overlay_optest_{op}.py), run as its own pytest → rc!=0 fails the tier. Real teeth confirmed.
  • Unit tests run cold (nix pytest): 63 passed (test_upgrade_base + test_previous + test_meta). Matrix EXPANDED, not weakened (override-wins / last-green-primary / main-tip-fallback / head==main-tip skip / no-pred skip).

STILL REQUIRED for the formal M1 PASS (needs the Builder's e2e claim + my cold acceptance run): (a) discourse upgrade tier GREEN locally with proof the head ran real 3.5.3 (not bitnamilegacy) + no sidekiq; (b) BREAK-IT: a deliberately-broken head still fails the upgrade tier (base resolution didn't paper over it); (c) base falls back to main when last-green absent (unit-covered; e2e desirable); (d) previous/ ignored for the head (code-confirmed; e2e desirable).

Adversary findings (pre-review notes)

  • [F-prevb-A] (PRE-EXISTING, NOT a prevb regression; INFO) tests/unit/test_warm_reconcile.py:: test_traefik_spec_is_stateless_with_setup is RED on main — KeyError: 'health_domain'. Fails identically at the gtea-DONE commit 778720c (verified by checkout), and the prevb feat never touched warm_reconcile — the pxgate-M1 traefik-probe change (0e9fd38) refactored the spec without updating this test. Out of prevb scope, but it means the FULL tests/unit/ suite is NOT all-green (283 pass / 1 fail). Flagging so "unit green" claims are scoped honestly. Not an M1 blocker.
  • [F-prevb-B] (NIT) old test_expected_na_other_rung_does_not_suppress was dropped in the rewrite; the behavior (an EXPECTED_NA for a non-upgrade rung must not suppress the base) is preserved via .get("upgrade") but no longer has a dedicated test. Low risk; consider re-adding one line of coverage.

M1 cold acceptance — IN FLIGHT (2026-06-17T00:42Z)

Gate M1 CLAIMED @00:40Z (code commit e1b32ea; claim commit bb79e91 = machine-docs only). Cold-verifying from a FRESH clone on cc-ci (/root/cc-ci-adv-prevb @ bb79e91), not the Builder's tree. Done so far (cold):

  • prevb unit surface: 64 passed (test_upgrade_base+test_previous+test_meta) via nix pytest.
  • statics: compose.ccci.yml env-only (order: stop-first); discourse recipe_meta.py has NO UPGRADE_BASE_VERSION assignment.
  • prune_orphan_services reviewed: removes only services NOT in the head compose → cannot mask the prevb bug (if overlay leaked sidekiq into the head compose it'd be in defined → not pruned → test RED). Teeth preserved.
  • e2e launched (RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a8180… PR=4 STAGES=install,upgrade), run manual-1344943. Early log CONFIRMS upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip) → base = main-tip chaos deploy (matches claim). Base deploy (main-tip, has the known sidekiq depends_on bug) in progress; observed a non-fatal lint rung: fail R011 on the base — watching whether it blocks.
  • CONCURRENCY observed: a Builder keycloak spot-check (PR#3) runs simultaneously in /root/prevb-deploy. My discourse run's janitor saw the keycloak lock and LEFT IT (live concurrent run, leaving it) — per-run ABRA_DIR isolation holding. Watching for memory-pressure false-failures on the shared 7GB node. UPDATE 2026-06-17T01:00Z (post-reboot, cold re-check of completed run):
  • e2e manual-1344943 COMPLETED GREEN (read full log /root/cc-ci-adv-prevb-e2e.log): upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip); upgrade→PR-head head_ref=ae5a8180; generic test_upgrade_reconverges PASSED; discourse test_head_runs_official_image_not_bitnamilegacy PASSED + test_sidekiq_service_dropped_by_head PASSED; RUN SUMMARY deploy-count=1 (expect 1), install:pass upgrade:pass, level=2/5. Matches STATUS EXPECTED exactly.
  • TEARDOWN clean: docker stack ls shows NO discourse stack; no discourse secrets/volumes. (warm-keycloak stack present = Builder's concurrent spot-check, not mine.)
  • BREAK-IT: my first probe (manual-1357729, broken-head ref 94ebaaa = head image discourse/discourse:99.99.99-adversary-broken) was SIGTERM-killed mid-base-deploy by MY reboot — INCOMPLETE. RE-LAUNCHED as manual-1360025 (same broken head, base resolving to main-tip f87c612d as expected). In flight. STILL TO CONFIRM: break-it manual-1360025 → upgrade tier RED (broken head not papered over).

Verdicts

(none yet — M1 cold acceptance in flight)

Open VETOes

(none)