Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-samever-older-base-fallback.md
autonomic-bot 03292c6f57 plan(samever): frame the same-version gap as the nightly cold-sweep STEADY STATE (operator insight)
The nightly cold-on-latest run promotes canonical->LATEST, so every subsequent
nightly (until a new version ships) finds canonical==latest==version-under-test
-> base==head -> same-version no-op. This is the common path, not a rare edge.
M2 now proves the fix on that nightly scenario (canonical already==latest ->
step back to previous published).
2026-06-17 03:56:09 +00:00

7.0 KiB

Phase samever — step back to an older base when last-green == head (no same-version upgrade)

Mission (operator-specified 2026-06-17): close a gap in the prevb dynamic upgrade-base resolver. When the resolved last-green (warm-canonical) base version equals the PR head version, the upgrade tier currently deploys the same version as base and head — a vacuous, non-upgrade "upgrade." Instead of skipping the tier, step back to a genuinely older base: the newest published version strictly older than the head version. The upgrade must always cross a real version delta when an older version exists. (This is design A; design B "canonical history" is deferred — see cc-ci-plan/IDEAS.md.)

State files: STATUS-samever.md, BACKLOG-samever.md, REVIEW-samever.md, JOURNAL-samever.md. DECISIONS.md shared.

1. Background / root cause

resolve_upgrade_base (runner/run_recipe_ci.py:111) resolves the upgrade base. Its two paths are guarded unequally:

  • ref (main-tip) path — guarded: if main_tip == head_ref → skip "head == main tip (no predecessor delta)".
  • version (last-green canonical) path — NOT guarded: it returns BasePlan("version", rec["version"], …) without checking that the canonical version differs from the head. The resolver isn't even given the head's version (only head_ref, a commit), so it currently can't compare.

When does the canonical equal the head version? The canonical advances only on a GREEN + COLD + LATEST run of a WARM_CANONICAL-enrolled recipe (should_promote_canonical = is_enrolled and overall==0 and not quick and not ref; a PR !testme carries ref so it NEVER promotes). So the canonical is always the latest-published version that last passed a cold sweep.

This is the STEADY STATE of the nightly sweep, not a rare edge (operator insight 2026-06-17). The nightly cold-on-latest run is exactly what promotes the canonical to LATEST. So once one green nightly run promotes canonical → vX, every subsequent nightly run — until a new version ships — finds canonical == latest == the version under test, and its upgrade tier resolves base == head → a same-version no-op → effectively no upgrade test the second night. (The non-version-bump PR is the same collision, less common.) So the step-back is the common nightly path, not an oddball — the resolver must handle it as the norm. With the fix, the second nightly run tests vX-1 → vX (a real upgrade; a repeat of the first night's, but real, not vacuous).

2. Design (A)

Give the resolver the head version (read the coop-cloud.*.version label from the head's compose.yml — the head checkout already exists), and extend the chain:

  1. explicit UPGRADE_BASE_VERSION override → use it (unchanged).
  2. last-green canonical, IF its version ≠ head version → use it (kind="version"; the green-verified primary).
  3. last-green canonical version == head versiondo NOT skip. Step back: from the recipe's published version tags (warm_reconcile.recipe_tags + the existing version-ordering used by latest_version), pick the newest published version STRICTLY OLDER than the head version and use it (kind="version"). previous/ still applies version-guarded against whatever base version is chosen.
  4. no canonical at all → existing main-tip ref path (use if main_tip ≠ head_ref, else skip) — unchanged.
  5. only if no older published version exists (genuinely the first version / no predecessor) → skip with a declared reason ("base == head and no older published predecessor").

Constraints:

  • "Strictly older" — exclude any tag equal to the head version; reuse the existing coop-cloud version ordering, do not hand-roll semver. If the head version isn't in the published tag list (a brand-new version above all tags), the canonical-≠-head branch already handles it — the step-back only triggers when canonical == head.
  • Preserve the F1d-2 protections: the chosen older base must actually deploy that pinned version (checkout the tag so the on-disk tree matches), never LATEST.
  • Pure resolver change where possible; keep the ref and skip paths' behavior identical for all other cases (don't perturb discourse #4 or any version-bump PR).

3. Gates

M1 — implemented + unit-tested. Resolver reads the head version and implements the chain above. Unit tests (extend tests/unit/test_upgrade_base.py): canonical==head → resolves to the newest-older published version (assert it's strictly older); canonical≠head → uses canonical (unchanged); no older-published → declared skip with the new reason; head-version parsed from compose; version ordering picks the correct strictly-older tag; override + ref + existing-skip paths unchanged. Adversary cold-verifies from a clean checkout: a same-version PR now upgrades from a real older version (base version < head version, evidenced), not a no-op and not a skip; teeth (a broken head still RED); the version-bump path (canonical→head) is untouched.

M2 — proven in real CI. Demonstrate on the realistic trigger — the nightly steady state: a cold-on-latest run of an enrolled recipe whose canonical already == latest (i.e. simulate the second consecutive nightly with no new version — seed/point the canonical at LATEST, then run cold-on-latest). Show its upgrade tier steps back to the previous published version (evidence base_version < latest, a genuine delta — not a same-version no-op, not a skip). Also cover the PR form (a non-version-bump PR where head version == canonical) the same way. Confirm a normal version-bump PR — re-run discourse #4 or equivalent — is UNAFFECTED (canonical, which is older, → head). Spot-check ≥1 other enrolled recipe. Fresh Adversary PASS on both milestones → ## DONE.

4. Guardrails

  • Never a same-version no-op, and never a needless skip when an older base exists. Skip only when there is genuinely no older published predecessor.
  • The base must be strictly older than the head version.
  • Don't regress the version-bump path — the common upgrade-PR case (canonical → head) must behave exactly as before; discourse #4 must still test the official-image migration.
  • Never weaken a test; minimal, well-scoped resolver change; previous/ stays the last resort.
  • Commit author autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>; push every commit; abra over a pseudo-TTY. Recipe mirrors PR-only; never merge.

5. Definition of Done

The resolver steps back to the newest-published-version-older-than-head whenever the last-green canonical equals the head version — never a same-version no-op, never a needless skip when an older base exists; unit-tested; proven in real CI on a same-version scenario with evidence of a real base<head delta, and the version-bump path (discourse #4) confirmed unaffected; M1 + M2 fresh Adversary PASSes in REVIEW-samever.md. Design B (canonical history) recorded in cc-ci-plan/IDEAS.md, out of scope here.