93 lines
7.1 KiB
Markdown
93 lines
7.1 KiB
Markdown
# JOURNAL — phase `prevb` (Builder reasoning; append-only)
|
||
|
||
## 2026-06-17 — Bootstrap + recon
|
||
|
||
Read SSOT (plan-phase-prevb), plan.md §6.1/§7/§9, Adversary's REVIEW-prevb (live, idle awaiting M1 claim).
|
||
|
||
**Mapped the harness upgrade flow** (`runner/run_recipe_ci.py`, `harness/lifecycle.py`,
|
||
`harness/generic.py`, `harness/meta.py`, `harness/canonical.py`):
|
||
- Base decision: `upgrade_base(stages, meta, recipe)` → `None` if upgrade∉stages or EXPECTED_NA[upgrade],
|
||
else `meta.UPGRADE_BASE_VERSION or lifecycle.previous_version(recipe)` (= `recipe_versions[-2]`).
|
||
`base = prev or target`; `prev` also gates whether the upgrade tier runs.
|
||
- Deploy: `deploy_app(version=base)` → pinned `recipe_checkout(version)` + (auto-chaos if overlay/lightweight tag);
|
||
`version=None` → chaos deploy of the current (head) checkout.
|
||
- Overlay `compose.ccci.yml`: copied into the checkout (`provide_ccci_overlay`), referenced by
|
||
`EXTRA_ENV.COMPOSE_FILE`, persists untracked across the head re-checkout → applies to ALL deploys.
|
||
- Upgrade op (`generic.perform_upgrade`): `recipe_checkout_ref(head_ref)` then chaos redeploy; the
|
||
ccci overlay persists → leaks version-specific pins onto the head. **That is the bug.**
|
||
- Last-green source: `canonical.read_registry(recipe)` → `{version, commit, status}` (promoted only on
|
||
GREEN LATEST cold runs for `WARM_CANONICAL` recipes). No separate "last-green" file.
|
||
|
||
**Ground-truth discourse facts** (gitea API, verified — see STATUS for the table). Key correction vs
|
||
plan §3 prose: main is `bitnamilegacy/discourse:3.5.0` (not 3.3.1 — main advanced). Thesis holds: base
|
||
(last-green/main = bitnamilegacy 3.5.0, deployable) → head (PR #4 = official discourse/discourse:3.5.3,
|
||
sidekiq dropped). So discourse needs NO `previous/`; the env overlay shrinks to `order: stop-first`.
|
||
|
||
**Design decisions (WHY):**
|
||
- *Resolution order* last-green → main-tip → skip. main-tip = the recipe's `main` branch HEAD = the true
|
||
predecessor the PR merges onto (more faithful than the old `vers[-2]`, which could span 2 version jumps).
|
||
This intentionally changes EVERY recipe's default base from `vers[-2]` to main-tip — plan-mandated, not a
|
||
regression; M2 spot-check validates representative recipes still go green.
|
||
- *Keep `UPGRADE_BASE_VERSION` as an optional explicit override* (still wins when set), but remove it from
|
||
discourse and make the DEFAULT dynamic. Rationale: fully deleting the meta field would break `plausible`
|
||
(its meta sets it) and the documented "PR adds a version above newest tag" escape hatch, without a deploy
|
||
test — risk vs guardrail "don't regress other recipes". The plan's "UPGRADE_BASE_VERSION removed" is in the
|
||
discourse-migration context; the normal/discourse path is now hardcode-free. Recorded in DECISIONS.
|
||
- *`previous/` scoped to last-green (published-version) base only* — version-guarded by a declared target;
|
||
on a main-tip base or version mismatch it is skipped + flagged stale. Discourse ships none (base deploys clean).
|
||
|
||
## 2026-06-17T00:30Z — M1 code done (unit+lint green); discourse e2e launched
|
||
|
||
Implemented B1–B4 (commit bb2e3c6): resolve_upgrade_base/BasePlan, deploy_app base_ref+apply_previous,
|
||
previous/ surface in lifecycle, generic.perform_upgrade strip, discourse migration, unit tests.
|
||
Unit: 88 relevant pass (full suite 283 pass; 1 PRE-EXISTING unrelated fail
|
||
`test_warm_reconcile::test_traefik_spec_is_stateless_with_setup` KeyError 'health_domain' — fails on
|
||
clean HEAD, not mine; flagged for Adversary). Lint PASS.
|
||
|
||
B5 e2e launched on cc-ci (/root/prevb-deploy @ bb2e3c6), STAGES=install,upgrade, discourse PR#4
|
||
(REF=ae5a8180, SRC=recipe-maintainers/discourse). First log lines confirm the core mechanism:
|
||
`== upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)` → base = main-tip chaos deploy
|
||
(bitnamilegacy:3.5.0), env overlay provided. Base now in slow Rails cold boot (15-25min). Polling ~5min.
|
||
(lint rung fail R011 = recipe-level, a rung not a gate; prepull skipped on the known sidekiq-depends-on
|
||
config rc=15 — non-fatal.)
|
||
|
||
## 2026-06-17T00:40Z — M1 GREEN locally; claiming
|
||
|
||
discourse install,upgrade e2e GREEN (2nd run, after the prune fix). Evidence in run-prevb-disc2.log on
|
||
cc-ci /root/prevb-deploy. The dynamic main-tip base worked first try (kind=ref f87c612d) — crucial,
|
||
because main (0.8.1+3.5.0) is AHEAD of the newest published tag (0.7.0+3.3.1), so the OLD vers[-2]
|
||
default (=0.6.3) would have been the wrong predecessor entirely. The upgrade moved
|
||
0.8.1+3.5.0 (bitnamilegacy, main-tip) → 1.0.0+3.5.3 (official, PR head), chaos-version=ae5a8180+U.
|
||
|
||
**The one real bug found+fixed (WHY):** first run, `test_head_runs_official_image` PASSED (head app =
|
||
official 3.5.3 — the leak is gone) but `test_sidekiq_service_dropped` FAILED: `docker stack deploy`
|
||
(what `abra app deploy` runs) only adds/updates services, it does NOT prune ones the new compose dropped,
|
||
so the base's sidekiq orphaned on the old image. This is a swarm mechanic, not a head-deploy failure, but
|
||
it means the deployed stack didn't faithfully reflect the head. Fix = `prune_orphan_services` in
|
||
perform_upgrade: reconcile the live stack to the head compose's `config --services` set (remove orphans).
|
||
Faithful (deployed stack == head), no-op when service sets match / compose unresolvable, weakens nothing.
|
||
|
||
Decided to CLAIM with the e2e green + image/sidekiq proof and leave the deliberately-broken-head teeth
|
||
probe to the Adversary's cold acceptance (its explicit M1 check; I can't push a broken commit to the
|
||
recipe mirror per guardrails). STATUS spells out where the teeth hold so they know where to probe.
|
||
|
||
## 2026-06-17T00:45Z — M2-prep spot-checks (3 green) while M1 under Adversary review
|
||
|
||
Ran 2 more recipes through the new dynamic base (de-risks the global resolver change; toward B8):
|
||
- **cryptpad #5** (install,upgrade): kind=ref main-tip 36ee3451; install+upgrade PASS incl
|
||
`test_upgrade_preserves_data` (data survived); deploy-count=1; clean teardown.
|
||
- **keycloak #3** (install,upgrade): base branch is **master** → kind=ref main-tip 12ac6db8 via the
|
||
origin/main→origin/master fallback in `recipe_branch_commit` (VALIDATES that path); install+upgrade
|
||
PASS incl `test_upgrade_preserves_realm`; SSO/DEPS path exercised; deploy-count=1; clean teardown.
|
||
Note: `prune-orphans` SAFE-SKIPPED ("head compose services unresolved — removes nothing") — keycloak's
|
||
`config --services` returned non-zero in that context; the defensive guard correctly removed nothing
|
||
(service set unchanged base→head anyway). Confirms prune never false-fails when compose is unresolvable.
|
||
|
||
So 3/3 current recipes resolve to main-tip (kind=ref) and pass — no warm canonicals exist on the host
|
||
(`find /var/lib/ci-warm -name canonical.json` empty), so last-green (kind=version) isn't exercised in e2e
|
||
yet (it IS unit-tested). For M2 I may seed/use a warm canonical to e2e the last-green path. Pre-existing
|
||
orphan `warm-keycloak_...` stack on the host (no registry record) — NOT from prevb; left untouched.
|
||
|
||
Stopping new e2e launches now — the Adversary is running its own discourse cold-acceptance on the shared
|
||
7GB node; piling on risks a memory-pressure false-failure in its run. Parking at M1 gate.
|