198 lines
17 KiB
Markdown
198 lines
17 KiB
Markdown
# REVIEW — phase `prevb` (Adversary verdicts)
|
||
|
||
Append-only. Gates this phase: **M1** (implemented + green locally), **M2** (proven in real CI + spot-check).
|
||
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-prevb-previous-dynamic-base.md`.
|
||
|
||
## Status
|
||
- 2026-06-16T23:57Z — Adversary live for `prevb`. No Builder claim yet (no STATUS-prevb.md, no `claim(`).
|
||
Cold-start recon done: baseline mechanism understood —
|
||
- base resolution: `run_recipe_ci.upgrade_base` → `meta.UPGRADE_BASE_VERSION or lifecycle.previous_version` (`vers[-2]`); discourse pins `0.7.0+3.3.1`.
|
||
- overlay `tests/discourse/compose.ccci.yml` applied to ALL deploys via `EXTRA_ENV.COMPOSE_FILE`; fuses environmental (start_period 20m, order stop-first) + version-specific (bitnamilegacy image pin + sidekiq block) — the bug.
|
||
- existing unit tests to watch for weakening: `tests/unit/test_upgrade_base.py`, `tests/unit/test_meta.py`.
|
||
Idle until a gate is CLAIMED.
|
||
- 2026-06-17T00:12Z — Independently cold-verified the Builder's STATUS ground-truth facts via gitea API
|
||
(NOT trusting STATUS): PR #4 head `ae5a81802b4d1d6cd1b449ac46cfa16d80730aaa` `compose.yml` →
|
||
`app.image = discourse/discourse:3.5.3`, **no `sidekiq` service**; `.diff` shows
|
||
`-bitnamilegacy/discourse:3.5.0`→`+discourse/discourse:3.5.3` + full `sidekiq:` block removed.
|
||
main → `app`+`sidekiq` = `bitnamilegacy/discourse:3.5.0`, sidekiq present, base `f87c612d`.
|
||
Facts CONFIRMED. (Caution noted: gitea `raw?ref=<shortsha>` silently falls back to default branch —
|
||
must use the FULL sha when cold-verifying head content.) Foundation for "discourse needs no previous/" holds.
|
||
|
||
## Pre-review (M1 code, gate NOT yet CLAIMED — preliminary recon, not a verdict)
|
||
2026-06-17T00:30Z — studied the M1 `feat` commit bb2e3c6 (code/diff only, NOT JOURNAL). Design looks sound:
|
||
- `resolve_upgrade_base` → BasePlan(kind, version, ref, reason): override → last-green (`canonical.read_registry`)
|
||
→ main-tip (`recipe_branch_commit`) → skip. `.runs` gates the upgrade tier. head_ref = `recipe_head_commit`.
|
||
- `previous/` surface (lifecycle): `has_previous`, `previous_target_version` (VERSION marker), `previous_status`
|
||
(version-guarded apply/stale), provide/remove overlay, compose_file add/remove. Base-only; **stripped before
|
||
head redeploy** (`generic.perform_upgrade` → `remove_previous_overlay` + COMPOSE_FILE strip). Good teeth.
|
||
- discourse migrated: `compose.ccci.yml` now ENVIRONMENTAL-ONLY (`order: stop-first`); bitnamilegacy pins +
|
||
sidekiq + UPGRADE_BASE_VERSION **removed**. `test_upgrade.py` asserts running `app` image == official
|
||
`discourse/discourse:3.5.3` (not bitnamilegacy) + sidekiq gone; resolves as the upgrade-tier overlay
|
||
(`resolve_overlay_op`→`test_{op}.py`), run as its own pytest → rc!=0 fails the tier. Real teeth confirmed.
|
||
- **Unit tests run cold (nix pytest): 63 passed** (test_upgrade_base + test_previous + test_meta). Matrix
|
||
EXPANDED, not weakened (override-wins / last-green-primary / main-tip-fallback / head==main-tip skip / no-pred skip).
|
||
|
||
STILL REQUIRED for the formal M1 PASS (needs the Builder's e2e claim + my cold acceptance run):
|
||
(a) discourse upgrade tier GREEN locally with proof the head ran real 3.5.3 (not bitnamilegacy) + no sidekiq;
|
||
(b) BREAK-IT: a deliberately-broken head still fails the upgrade tier (base resolution didn't paper over it);
|
||
(c) base falls back to main when last-green absent (unit-covered; e2e desirable);
|
||
(d) `previous/` ignored for the head (code-confirmed; e2e desirable).
|
||
|
||
## Adversary findings (pre-review notes)
|
||
- [F-prevb-A] (PRE-EXISTING, NOT a prevb regression; INFO) `tests/unit/test_warm_reconcile.py::
|
||
test_traefik_spec_is_stateless_with_setup` is RED on main — `KeyError: 'health_domain'`. Fails identically at
|
||
the gtea-DONE commit 778720c (verified by checkout), and the prevb feat never touched warm_reconcile — the
|
||
`pxgate-M1` traefik-probe change (0e9fd38) refactored the spec without updating this test. Out of prevb scope,
|
||
but it means the FULL `tests/unit/` suite is NOT all-green (283 pass / 1 fail). Flagging so "unit green" claims
|
||
are scoped honestly. Not an M1 blocker.
|
||
- [F-prevb-B] (NIT) old `test_expected_na_other_rung_does_not_suppress` was dropped in the rewrite; the behavior
|
||
(an EXPECTED_NA for a non-upgrade rung must not suppress the base) is preserved via `.get("upgrade")` but no
|
||
longer has a dedicated test. Low risk; consider re-adding one line of coverage.
|
||
|
||
## M1 cold acceptance — IN FLIGHT (2026-06-17T00:42Z)
|
||
Gate M1 CLAIMED @00:40Z (code commit e1b32ea; claim commit bb79e91 = machine-docs only). Cold-verifying from a
|
||
FRESH clone on cc-ci (`/root/cc-ci-adv-prevb` @ bb79e91), not the Builder's tree.
|
||
Done so far (cold):
|
||
- prevb unit surface: **64 passed** (`test_upgrade_base`+`test_previous`+`test_meta`) via nix pytest.
|
||
- statics: `compose.ccci.yml` env-only (`order: stop-first`); discourse `recipe_meta.py` has NO `UPGRADE_BASE_VERSION` assignment.
|
||
- `prune_orphan_services` reviewed: removes only services NOT in the head compose → cannot mask the prevb bug
|
||
(if overlay leaked sidekiq into the head compose it'd be in `defined` → not pruned → test RED). Teeth preserved.
|
||
- e2e launched (`RECIPE=discourse SRC=recipe-maintainers/discourse REF=ae5a8180… PR=4 STAGES=install,upgrade`),
|
||
run `manual-1344943`. Early log CONFIRMS `upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)`
|
||
→ base = main-tip chaos deploy (matches claim). Base deploy (main-tip, has the known sidekiq depends_on bug)
|
||
in progress; observed a non-fatal `lint rung: fail R011` on the base — watching whether it blocks.
|
||
- CONCURRENCY observed: a Builder keycloak spot-check (PR#3) runs simultaneously in `/root/prevb-deploy`. My
|
||
discourse run's janitor saw the keycloak lock and LEFT IT (`live concurrent run, leaving it`) — per-run
|
||
ABRA_DIR isolation holding. Watching for memory-pressure false-failures on the shared 7GB node.
|
||
UPDATE 2026-06-17T01:00Z (post-reboot, cold re-check of completed run):
|
||
- e2e `manual-1344943` COMPLETED **GREEN** (read full log /root/cc-ci-adv-prevb-e2e.log): `upgrade base:
|
||
kind=ref ref=f87c612d71b4 (target-branch (main) tip)`; `upgrade→PR-head head_ref=ae5a8180`;
|
||
generic `test_upgrade_reconverges` PASSED; discourse `test_head_runs_official_image_not_bitnamilegacy`
|
||
PASSED + `test_sidekiq_service_dropped_by_head` PASSED; RUN SUMMARY deploy-count=1 (expect 1),
|
||
install:pass upgrade:pass, level=2/5. Matches STATUS EXPECTED exactly.
|
||
- TEARDOWN clean: `docker stack ls` shows NO discourse stack; no discourse secrets/volumes. (warm-keycloak
|
||
stack present = Builder's concurrent spot-check, not mine.)
|
||
- BREAK-IT: my first probe (`manual-1357729`, broken-head ref 94ebaaa = head image
|
||
`discourse/discourse:99.99.99-adversary-broken`) was SIGTERM-killed mid-base-deploy by MY reboot — INCOMPLETE.
|
||
RE-LAUNCHED as `manual-1360025` (same broken head, base resolving to main-tip f87c612d as expected). In flight.
|
||
STILL TO CONFIRM: break-it `manual-1360025` → upgrade tier RED (broken head not papered over).
|
||
|
||
## Verdicts
|
||
|
||
### M1: PASS @2026-06-17T01:03Z (code commit e1b32ea / claim bb79e91)
|
||
Cold-verified from a fresh clone on cc-ci (`/root/cc-ci-adv-prevb`), independent of the Builder's tree.
|
||
Every M1 DoD item (plan §4) re-executed and confirmed:
|
||
|
||
1. **Dynamic base resolution (last-green → main-tip → skip).** e2e `manual-1344943` log: `upgrade base:
|
||
kind=ref ref=f87c612d71b4 (target-branch (main) tip)` — correctly falls back to main-tip (discourse has
|
||
NO last-green warm canonical and its only published tag is 0.7.0, behind main). Unit matrix re-run cold
|
||
(nix pytest, **64 passed**): override-wins / last-green-primary / main-tip-fallback / head==main-tip skip /
|
||
no-predecessor skip. Matrix EXPANDED vs old `upgrade_base`, not weakened.
|
||
2. **`previous/` surface** (discovery + base-only application + version-guard/stale-flag): unit-covered
|
||
(`test_previous`), code-confirmed base-only (stripped before head redeploy via `perform_upgrade` →
|
||
`remove_previous_overlay` + COMPOSE_FILE strip). discourse ships NO `previous/` (base deploys clean) —
|
||
matches plan §3 thesis.
|
||
3. **Environmental vs version-specific separated.** `tests/discourse/compose.ccci.yml` is env-only
|
||
(`app.deploy.update_config.order: stop-first`); bitnamilegacy image pins + `sidekiq` block removed;
|
||
`UPGRADE_BASE_VERSION` removed from `recipe_meta.py` (grep: none). Verified statically in cold clone.
|
||
4. **discourse migrated** — confirmed via #3 + e2e behaviour.
|
||
5. **discourse upgrade tier GREEN locally w/ proof head ran the REAL official image.** e2e `manual-1344943`:
|
||
generic `test_upgrade_reconverges` PASSED; discourse `test_head_runs_official_image_not_bitnamilegacy`
|
||
PASSED + `test_sidekiq_service_dropped_by_head` PASSED; RUN SUMMARY deploy-count=1 (expect 1),
|
||
install:pass, upgrade:pass, level=2/5. `upgrade→PR-head head_ref=ae5a8180 version=0.8.1+3.5.0→1.0.0+3.5.3`.
|
||
6. **TEETH — deliberately-broken head still goes RED (base resolution did NOT paper it over).** Break-it
|
||
probe `manual-1360025`: broken-head commit `94ebaaa` sets head `app.image =
|
||
discourse/discourse:99.99.99-adversary-broken`. Base resolved to main-tip f87c612d (same as GREEN run),
|
||
**install:pass**, then the HEAD redeploy failed: `prepull: docker pull
|
||
discourse/discourse:99.99.99-adversary-broken failed — manifest unknown` → **upgrade:fail (level 1/5)**.
|
||
Proves the head's real (broken) image is what gets deployed; base/prune/previous machinery cannot mask a
|
||
broken head.
|
||
7. **Clean teardown** after BOTH the GREEN run and the broken/failed run: `docker stack ls` / `secret ls` /
|
||
`volume ls` show NO discourse stack, secrets, or volumes. (warm-keycloak stack present = Builder's
|
||
concurrent spot-check, not discourse.)
|
||
8. **No test weakened.** F-prevb-B addressed — `test_expected_na_other_rung_does_not_suppress_upgrade`
|
||
re-added (commit e1b32ea), present in cold clone. Net coverage up (+ resolver matrix + previous/ layering).
|
||
|
||
SCOPE CAVEAT (not an M1 blocker): the FULL `tests/unit/` suite has 1 PRE-EXISTING unrelated red —
|
||
`test_warm_reconcile.py::test_traefik_spec_is_stateless_with_setup` (KeyError 'health_domain'), failing
|
||
identically at gtea-DONE 778720c, untouched by prevb (see [F-prevb-A]). prevb's own surface is all-green.
|
||
|
||
(JOURNAL not consulted before this verdict, per anti-anchoring. M1 stands on the plan, the code/diff, the
|
||
STATUS verification info, and my own cold re-runs.)
|
||
|
||
## M2 cold acceptance — IN FLIGHT (2026-06-17T01:45Z)
|
||
Gate M2 CLAIMED @01:40Z (HEAD 71399f6). Cold-verifying independently (gitea API + host artifacts + own re-run).
|
||
CONFIRMED so far:
|
||
- **discourse PR#4 !testme GREEN in REAL CI** — verified via gitea API (NOT trusting STATUS): `!testme`
|
||
comment @01:27:09Z → bridge reply @01:27:25Z `🌻 cc-ci — discourse @ ae5a8180 ✅ **passed**` → Drone 717.
|
||
(Teeth of the signal: an EARLIER !testme @22:34 → run 700 → `❌ failure` — !testme genuinely CAN go RED;
|
||
717's pass is meaningful, not a rubber-stamp. 700 failed pre-mint_admin-fix.)
|
||
- **Drone 717 junit cold-read**: all 10 suites errors=0 failures=0 (install/upgrade ×2/backup ×2/restore
|
||
×2/custom create_topic+health_check+site_basic). results.json: level=4, results{install,upgrade,backup,
|
||
restore,custom}=all pass; clean_teardown=true; no_secret_leak=true; ref=ae5a8180 (real PR head).
|
||
- **Head genuinely ran official 3.5.3 — REAL TEETH**: `tests/discourse/test_upgrade.py` asserts via
|
||
`lifecycle.deployed_identity` (= `docker service inspect <stack>_app …ContainerSpec.Image` — the LIVE
|
||
running swarm image, not a compose grep) that image startswith `discourse/discourse:3.5.3` & no
|
||
bitnamilegacy; + `stack_service_names` (= `docker stack services`) that sidekiq is gone. Both PASS in 717.
|
||
- **lint R011 is a level-cap RUNG, NOT a gate** (verified in code): `run_recipe_ci.py:770` `passed =
|
||
warm_ok and bool(results) and all(v!='fail' for v in results.values()) and not sso_unverified` — covers
|
||
only the 5 functional tiers, NOT lint. So R011 caps level at 4/5 but cannot turn !testme RED. (R011 =
|
||
"all services have images" on the official-image head + "invalid reference format" warns — a RECIPE-head
|
||
lint nit, not a prevb/cc-ci failure; candidate PR comment, not a blocker.)
|
||
- **Secret-leak (independent scan of the PUBLIC surface)**: dashboard index (lists 717), results.json (all
|
||
11 test `message` fields empty on PASS), summary.html, junit, lint.txt — NO secret/password/token values.
|
||
`no_secret_leak` flag scans results.json vs `/run/secrets/*` (infra secrets). NOTE [F-prevb-C, INFO]:
|
||
`mint_admin` prints the minted plaintext discourse ApiKey to stdout → it lands in the Drone RAW build log
|
||
(access-controlled, 401 w/o token — NOT the public dashboard). Pre-existing behavior (prevb only made the
|
||
path image-agnostic, b66abc4; the `.key` print predates prevb). Not a public-surface leak; low severity.
|
||
- **Spot-checks (cold-read Builder logs + dynamic-base confirmed)**: cryptpad#5 base=ref 36ee3451 (main tip;
|
||
=PR#5's real base sha, gitea-confirmed), keycloak#3 base=ref 12ac6db8 (main tip via master fallback),
|
||
hedgedoc#1 base=ref 09bf4d54 (main tip). All install:pass upgrade:pass deploy-count=1; cryptpad
|
||
`test_upgrade_preserves_data` PASS, keycloak `test_upgrade_preserves_realm` PASS. No leftover stacks
|
||
(only infra + pre-existing warm-keycloak orphan).
|
||
- **INDEPENDENT re-run in flight**: re-executing cryptpad#5 (REF=9c18c176) from MY cold clone @71399f6
|
||
(normal fetch, not the Builder's tree) to confirm dynamic-base generality isn't tree/env-specific.
|
||
STILL TO CONFIRM: my cryptpad re-run resolves base=main-tip 36ee3451, install+upgrade pass, clean teardown.
|
||
→ CONFIRMED @01:58Z: my cold-clone (@71399f6, normal fetch) cryptpad#5 re-run: `upgrade base: kind=ref
|
||
ref=36ee3451a354 (target-branch (main) tip)`; install:pass upgrade:pass deploy-count=1;
|
||
`tests/cryptpad/test_upgrade.py::test_upgrade_preserves_data` PASSED; NO leftover cryptpad stack
|
||
(clean teardown). Dynamic base generality is NOT tree/env-specific — reproduced from my own clone.
|
||
|
||
## Verdicts (cont.)
|
||
|
||
### M2: PASS @2026-06-17T01:58Z (code/claim commit 71399f6)
|
||
Cold-verified independently of the Builder's tree — gitea API for the real-CI verdict, host-shared Drone
|
||
artifacts read cold, code-read for the gating logic, + my OWN spot-check re-run. Every M2 DoD item (plan §4):
|
||
|
||
1. **discourse PR#4 `!testme` GREEN in real CI** — gitea API (not STATUS): `!testme` @01:27:09Z → bridge
|
||
`🌻 cc-ci — discourse @ ae5a8180 ✅ passed` @01:27:25Z → Drone 717. Meaningful (earlier !testme @22:34
|
||
→ run 700 → `❌ failure` pre-fix; !testme genuinely can go RED).
|
||
2. **Head genuinely ran official `discourse/discourse:3.5.3` (migration exercised) — REAL TEETH.** 717 junit
|
||
`upgrade__cc-ci__test_upgrade.xml`: `test_head_runs_official_image_not_bitnamilegacy` +
|
||
`test_sidekiq_service_dropped_by_head` both PASS, asserting against the LIVE swarm service
|
||
(`docker service inspect …ContainerSpec.Image` / `docker stack services`) — not a compose grep. Image is
|
||
official 3.5.3 (not bitnamilegacy), sidekiq gone → the official-image migration the PR claims was tested.
|
||
3. **All tiers GREEN.** 717: 10 junit suites errors=0 failures=0; results{install,upgrade,backup,restore,
|
||
custom}=pass; level 4/5. The only non-pass is the `lint` rung (R011) — code-verified NON-GATING
|
||
(`run_recipe_ci.py:770` `passed` covers only the 5 functional results, not lint) → caps level, can't turn
|
||
the verdict RED. R011 ("all services have images" + "invalid reference format") is a RECIPE-head lint nit
|
||
(candidate PR comment per guardrail), not a prevb/cc-ci defect.
|
||
4. **Spot-check ≥3 recipes green under dynamic base.** cryptpad#5 (base=main-tip 36ee3451), keycloak#3
|
||
(base=main-tip 12ac6db8 via master fallback; prune-orphans safe-skip), hedgedoc#1 (base=main-tip
|
||
09bf4d54) — all install:pass upgrade:pass deploy-count=1, data-preservation tests pass, no leftover
|
||
stacks. PLUS my OWN cold re-run of cryptpad#5 reproduced base=main-tip + green + clean teardown.
|
||
5. **Secrets — independent scan of the PUBLIC surface clean.** dashboard index, results.json (all test
|
||
`message` empty on PASS), summary.html, junit, lint.txt — no secret values; `clean_teardown=true`,
|
||
`no_secret_leak=true`. [F-prevb-C, INFO/pre-existing]: `mint_admin` prints the minted plaintext discourse
|
||
ApiKey → it reaches only the access-controlled Drone RAW log (401 w/o token), NOT the public dashboard;
|
||
prevb only made the path image-agnostic (the print predates prevb). Low severity, not a blocker.
|
||
6. **Levels/records reconciled** — results.json levels correctly derived (discourse 4/5 lint-capped,
|
||
cryptpad 2/5 install+upgrade-only); PR runs don't promote last-green (correct — nothing merged).
|
||
|
||
Nothing merged on any mirror (verified: PRs #4/#5 still open). No test weakened. M1 already PASS @01:03Z.
|
||
**Both milestones now have fresh Adversary PASSes → no VETO; the Builder may write `## DONE`.**
|
||
(JOURNAL not consulted before this verdict, per anti-anchoring.)
|
||
|
||
## Open VETOes
|
||
(none)
|