14 KiB
REVIEW — phase prevb (Adversary verdicts)
Append-only. Gates this phase: M1 (implemented + green locally), M2 (proven in real CI + spot-check).
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md.
Status
- 2026-06-16T23:57Z — Adversary live for
prevb. No Builder claim yet (no STATUS-prevb.md, noclaim(). Cold-start recon done: baseline mechanism understood —- base resolution:
run_recipe_ci.upgrade_base→meta.UPGRADE_BASE_VERSION or lifecycle.previous_version(vers[-2]); discourse pins0.7.0+3.3.1. - overlay
tests/discourse/compose.ccci.ymlapplied to ALL deploys viaEXTRA_ENV.COMPOSE_FILE; fuses environmental (start_period 20m, order stop-first) + version-specific (bitnamilegacy image pin + sidekiq block) — the bug. - existing unit tests to watch for weakening:
tests/unit/test_upgrade_base.py,tests/unit/test_meta.py. Idle until a gate is CLAIMED.
- base resolution:
- 2026-06-17T00:12Z — Independently cold-verified the Builder's STATUS ground-truth facts via gitea API
(NOT trusting STATUS): PR #4 head
ae5a81802b4d1d6cd1b449ac46cfa16d80730aaacompose.yml→app.image = discourse/discourse:3.5.3, nosidekiqservice;.diffshows-bitnamilegacy/discourse:3.5.0→+discourse/discourse:3.5.3+ fullsidekiq:block removed. main →app+sidekiq=bitnamilegacy/discourse:3.5.0, sidekiq present, basef87c612d. Facts CONFIRMED. (Caution noted: gitearaw?ref=<shortsha>silently falls back to default branch — must use the FULL sha when cold-verifying head content.) Foundation for "discourse needs no previous/" holds.
Pre-review (M1 code, gate NOT yet CLAIMED — preliminary recon, not a verdict)
2026-06-17T00:30Z — studied the M1 feat commit bb2e3c6 (code/diff only, NOT JOURNAL). Design looks sound:
resolve_upgrade_base→ BasePlan(kind, version, ref, reason): override → last-green (canonical.read_registry) → main-tip (recipe_branch_commit) → skip..runsgates the upgrade tier. head_ref =recipe_head_commit.previous/surface (lifecycle):has_previous,previous_target_version(VERSION marker),previous_status(version-guarded apply/stale), provide/remove overlay, compose_file add/remove. Base-only; stripped before head redeploy (generic.perform_upgrade→remove_previous_overlay+ COMPOSE_FILE strip). Good teeth.- discourse migrated:
compose.ccci.ymlnow ENVIRONMENTAL-ONLY (order: stop-first); bitnamilegacy pins + sidekiq + UPGRADE_BASE_VERSION removed.test_upgrade.pyasserts runningappimage == officialdiscourse/discourse:3.5.3(not bitnamilegacy) + sidekiq gone; resolves as the upgrade-tier overlay (resolve_overlay_op→test_{op}.py), run as its own pytest → rc!=0 fails the tier. Real teeth confirmed. - Unit tests run cold (nix pytest): 63 passed (test_upgrade_base + test_previous + test_meta). Matrix EXPANDED, not weakened (override-wins / last-green-primary / main-tip-fallback / head==main-tip skip / no-pred skip).
STILL REQUIRED for the formal M1 PASS (needs the Builder's e2e claim + my cold acceptance run):
(a) discourse upgrade tier GREEN locally with proof the head ran real 3.5.3 (not bitnamilegacy) + no sidekiq;
(b) BREAK-IT: a deliberately-broken head still fails the upgrade tier (base resolution didn't paper over it);
(c) base falls back to main when last-green absent (unit-covered; e2e desirable);
(d) previous/ ignored for the head (code-confirmed; e2e desirable).
Adversary findings (pre-review notes)
- [F-prevb-A] (PRE-EXISTING, NOT a prevb regression; INFO)
tests/unit/test_warm_reconcile.py:: test_traefik_spec_is_stateless_with_setupis RED on main —KeyError: 'health_domain'. Fails identically at the gtea-DONE commit778720c(verified by checkout), and the prevb feat never touched warm_reconcile — thepxgate-M1traefik-probe change (0e9fd38) refactored the spec without updating this test. Out of prevb scope, but it means the FULLtests/unit/suite is NOT all-green (283 pass / 1 fail). Flagging so "unit green" claims are scoped honestly. Not an M1 blocker. - [F-prevb-B] (NIT) old
test_expected_na_other_rung_does_not_suppresswas dropped in the rewrite; the behavior (an EXPECTED_NA for a non-upgrade rung must not suppress the base) is preserved via.get("upgrade")but no longer has a dedicated test. Low risk; consider re-adding one line of coverage.
M1 cold acceptance — IN FLIGHT (2026-06-17T00:42Z)
Gate M1 CLAIMED @00:40Z (code commit e1b32ea; claim commit bb79e91 = machine-docs only). Cold-verifying from a
FRESH clone on cc-ci (/root/cc-ci-adv-prevb @ bb79e91), not the Builder's tree.
Done so far (cold):
- prevb unit surface: 64 passed (
test_upgrade_base+test_previous+test_meta) via nix pytest. - statics:
compose.ccci.ymlenv-only (order: stop-first); discourserecipe_meta.pyhas NOUPGRADE_BASE_VERSIONassignment. prune_orphan_servicesreviewed: removes only services NOT in the head compose → cannot mask the prevb bug (if overlay leaked sidekiq into the head compose it'd be indefined→ not pruned → test RED). Teeth preserved.- e2e launched (
RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a8180… PR=4 STAGES=install,upgrade), runmanual-1344943. Early log CONFIRMSupgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)→ base = main-tip chaos deploy (matches claim). Base deploy (main-tip, has the known sidekiq depends_on bug) in progress; observed a non-fatallint rung: fail R011on the base — watching whether it blocks. - CONCURRENCY observed: a Builder keycloak spot-check (PR#3) runs simultaneously in
/root/prevb-deploy. My discourse run's janitor saw the keycloak lock and LEFT IT (live concurrent run, leaving it) — per-run ABRA_DIR isolation holding. Watching for memory-pressure false-failures on the shared 7GB node. UPDATE 2026-06-17T01:00Z (post-reboot, cold re-check of completed run): - e2e
manual-1344943COMPLETED GREEN (read full log /root/cc-ci-adv-prevb-e2e.log):upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip);upgrade→PR-head head_ref=ae5a8180; generictest_upgrade_reconvergesPASSED; discoursetest_head_runs_official_image_not_bitnamilegacyPASSED +test_sidekiq_service_dropped_by_headPASSED; RUN SUMMARY deploy-count=1 (expect 1), install:pass upgrade:pass, level=2/5. Matches STATUS EXPECTED exactly. - TEARDOWN clean:
docker stack lsshows NO discourse stack; no discourse secrets/volumes. (warm-keycloak stack present = Builder's concurrent spot-check, not mine.) - BREAK-IT: my first probe (
manual-1357729, broken-head ref 94ebaaa = head imagediscourse/discourse:99.99.99-adversary-broken) was SIGTERM-killed mid-base-deploy by MY reboot — INCOMPLETE. RE-LAUNCHED asmanual-1360025(same broken head, base resolving to main-tip f87c612d as expected). In flight. STILL TO CONFIRM: break-itmanual-1360025→ upgrade tier RED (broken head not papered over).
Verdicts
M1: PASS @2026-06-17T01:03Z (code commit e1b32ea / claim bb79e91)
Cold-verified from a fresh clone on cc-ci (/root/cc-ci-adv-prevb), independent of the Builder's tree.
Every M1 DoD item (plan §4) re-executed and confirmed:
- Dynamic base resolution (last-green → main-tip → skip). e2e
manual-1344943log:upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)— correctly falls back to main-tip (discourse has NO last-green warm canonical and its only published tag is 0.7.0, behind main). Unit matrix re-run cold (nix pytest, 64 passed): override-wins / last-green-primary / main-tip-fallback / head==main-tip skip / no-predecessor skip. Matrix EXPANDED vs oldupgrade_base, not weakened. previous/surface (discovery + base-only application + version-guard/stale-flag): unit-covered (test_previous), code-confirmed base-only (stripped before head redeploy viaperform_upgrade→remove_previous_overlay+ COMPOSE_FILE strip). discourse ships NOprevious/(base deploys clean) — matches plan §3 thesis.- Environmental vs version-specific separated.
tests/discourse/compose.ccci.ymlis env-only (app.deploy.update_config.order: stop-first); bitnamilegacy image pins +sidekiqblock removed;UPGRADE_BASE_VERSIONremoved fromrecipe_meta.py(grep: none). Verified statically in cold clone. - discourse migrated — confirmed via #3 + e2e behaviour.
- discourse upgrade tier GREEN locally w/ proof head ran the REAL official image. e2e
manual-1344943: generictest_upgrade_reconvergesPASSED; discoursetest_head_runs_official_image_not_bitnamilegacyPASSED +test_sidekiq_service_dropped_by_headPASSED; RUN SUMMARY deploy-count=1 (expect 1), install:pass, upgrade:pass, level=2/5.upgrade→PR-head head_ref=ae5a8180 version=0.8.1+3.5.0→1.0.0+3.5.3. - TEETH — deliberately-broken head still goes RED (base resolution did NOT paper it over). Break-it
probe
manual-1360025: broken-head commit94ebaaasets headapp.image = discourse/discourse:99.99.99-adversary-broken. Base resolved to main-tip f87c612d (same as GREEN run), install:pass, then the HEAD redeploy failed:prepull: docker pull discourse/discourse:99.99.99-adversary-broken failed — manifest unknown→ upgrade:fail (level 1/5). Proves the head's real (broken) image is what gets deployed; base/prune/previous machinery cannot mask a broken head. - Clean teardown after BOTH the GREEN run and the broken/failed run:
docker stack ls/secret ls/volume lsshow NO discourse stack, secrets, or volumes. (warm-keycloak stack present = Builder's concurrent spot-check, not discourse.) - No test weakened. F-prevb-B addressed —
test_expected_na_other_rung_does_not_suppress_upgradere-added (commite1b32ea), present in cold clone. Net coverage up (+ resolver matrix + previous/ layering).
SCOPE CAVEAT (not an M1 blocker): the FULL tests/unit/ suite has 1 PRE-EXISTING unrelated red —
test_warm_reconcile.py::test_traefik_spec_is_stateless_with_setup (KeyError 'health_domain'), failing
identically at gtea-DONE 778720c, untouched by prevb (see [F-prevb-A]). prevb's own surface is all-green.
(JOURNAL not consulted before this verdict, per anti-anchoring. M1 stands on the plan, the code/diff, the STATUS verification info, and my own cold re-runs.)
M2 cold acceptance — IN FLIGHT (2026-06-17T01:45Z)
Gate M2 CLAIMED @01:40Z (HEAD 71399f6). Cold-verifying independently (gitea API + host artifacts + own re-run).
CONFIRMED so far:
- discourse PR#4 !testme GREEN in REAL CI — verified via gitea API (NOT trusting STATUS):
!testmecomment @01:27:09Z → bridge reply @01:27:25Z🌻 cc-ci — discourse @ ae5a8180 ✅ **passed**→ Drone 717. (Teeth of the signal: an EARLIER !testme @22:34 → run 700 →❌ failure— !testme genuinely CAN go RED; 717's pass is meaningful, not a rubber-stamp. 700 failed pre-mint_admin-fix.) - Drone 717 junit cold-read: all 10 suites errors=0 failures=0 (install/upgrade ×2/backup ×2/restore ×2/custom create_topic+health_check+site_basic). results.json: level=4, results{install,upgrade,backup, restore,custom}=all pass; clean_teardown=true; no_secret_leak=true; ref=ae5a8180 (real PR head).
- Head genuinely ran official 3.5.3 — REAL TEETH:
tests/discourse/test_upgrade.pyasserts vialifecycle.deployed_identity(=docker service inspect <stack>_app …ContainerSpec.Image— the LIVE running swarm image, not a compose grep) that image startswithdiscourse/discourse:3.5.3& no bitnamilegacy; +stack_service_names(=docker stack services) that sidekiq is gone. Both PASS in 717. - lint R011 is a level-cap RUNG, NOT a gate (verified in code):
run_recipe_ci.py:770passed = warm_ok and bool(results) and all(v!='fail' for v in results.values()) and not sso_unverified— covers only the 5 functional tiers, NOT lint. So R011 caps level at 4/5 but cannot turn !testme RED. (R011 = "all services have images" on the official-image head + "invalid reference format" warns — a RECIPE-head lint nit, not a prevb/cc-ci failure; candidate PR comment, not a blocker.) - Secret-leak (independent scan of the PUBLIC surface): dashboard index (lists 717), results.json (all
11 test
messagefields empty on PASS), summary.html, junit, lint.txt — NO secret/password/token values.no_secret_leakflag scans results.json vs/run/secrets/*(infra secrets). NOTE [F-prevb-C, INFO]:mint_adminprints the minted plaintext discourse ApiKey to stdout → it lands in the Drone RAW build log (access-controlled, 401 w/o token — NOT the public dashboard). Pre-existing behavior (prevb only made the path image-agnostic, b66abc4; the.keyprint predates prevb). Not a public-surface leak; low severity. - Spot-checks (cold-read Builder logs + dynamic-base confirmed): cryptpad#5 base=ref 36ee3451 (main tip;
=PR#5's real base sha, gitea-confirmed), keycloak#3 base=ref 12ac6db8 (main tip via master fallback),
hedgedoc#1 base=ref 09bf4d54 (main tip). All install:pass upgrade:pass deploy-count=1; cryptpad
test_upgrade_preserves_dataPASS, keycloaktest_upgrade_preserves_realmPASS. No leftover stacks (only infra + pre-existing warm-keycloak orphan). - INDEPENDENT re-run in flight: re-executing cryptpad#5 (REF=9c18c176) from MY cold clone @71399f6 (normal fetch, not the Builder's tree) to confirm dynamic-base generality isn't tree/env-specific. STILL TO CONFIRM: my cryptpad re-run resolves base=main-tip 36ee3451, install+upgrade pass, clean teardown.
Open VETOes
(none)