# JOURNAL — phase `prevb` (Builder reasoning; append-only) ## 2026-06-17 — Bootstrap + recon Read SSOT (plan-phase-prevb), plan.md §6.1/§7/§9, Adversary's REVIEW-prevb (live, idle awaiting M1 claim). **Mapped the harness upgrade flow** (`runner/run_recipe_ci.py`, `harness/lifecycle.py`, `harness/generic.py`, `harness/meta.py`, `harness/canonical.py`): - Base decision: `upgrade_base(stages, meta, recipe)` → `None` if upgrade∉stages or EXPECTED_NA[upgrade], else `meta.UPGRADE_BASE_VERSION or lifecycle.previous_version(recipe)` (= `recipe_versions[-2]`). `base = prev or target`; `prev` also gates whether the upgrade tier runs. - Deploy: `deploy_app(version=base)` → pinned `recipe_checkout(version)` + (auto-chaos if overlay/lightweight tag); `version=None` → chaos deploy of the current (head) checkout. - Overlay `compose.ccci.yml`: copied into the checkout (`provide_ccci_overlay`), referenced by `EXTRA_ENV.COMPOSE_FILE`, persists untracked across the head re-checkout → applies to ALL deploys. - Upgrade op (`generic.perform_upgrade`): `recipe_checkout_ref(head_ref)` then chaos redeploy; the ccci overlay persists → leaks version-specific pins onto the head. **That is the bug.** - Last-green source: `canonical.read_registry(recipe)` → `{version, commit, status}` (promoted only on GREEN LATEST cold runs for `WARM_CANONICAL` recipes). No separate "last-green" file. **Ground-truth discourse facts** (gitea API, verified — see STATUS for the table). Key correction vs plan §3 prose: main is `bitnamilegacy/discourse:3.5.0` (not 3.3.1 — main advanced). Thesis holds: base (last-green/main = bitnamilegacy 3.5.0, deployable) → head (PR #4 = official discourse/discourse:3.5.3, sidekiq dropped). So discourse needs NO `previous/`; the env overlay shrinks to `order: stop-first`. **Design decisions (WHY):** - *Resolution order* last-green → main-tip → skip. main-tip = the recipe's `main` branch HEAD = the true predecessor the PR merges onto (more faithful than the old `vers[-2]`, which could span 2 version jumps). This intentionally changes EVERY recipe's default base from `vers[-2]` to main-tip — plan-mandated, not a regression; M2 spot-check validates representative recipes still go green. - *Keep `UPGRADE_BASE_VERSION` as an optional explicit override* (still wins when set), but remove it from discourse and make the DEFAULT dynamic. Rationale: fully deleting the meta field would break `plausible` (its meta sets it) and the documented "PR adds a version above newest tag" escape hatch, without a deploy test — risk vs guardrail "don't regress other recipes". The plan's "UPGRADE_BASE_VERSION removed" is in the discourse-migration context; the normal/discourse path is now hardcode-free. Recorded in DECISIONS. - *`previous/` scoped to last-green (published-version) base only* — version-guarded by a declared target; on a main-tip base or version mismatch it is skipped + flagged stale. Discourse ships none (base deploys clean). ## 2026-06-17T00:30Z — M1 code done (unit+lint green); discourse e2e launched Implemented B1–B4 (commit bb2e3c6): resolve_upgrade_base/BasePlan, deploy_app base_ref+apply_previous, previous/ surface in lifecycle, generic.perform_upgrade strip, discourse migration, unit tests. Unit: 88 relevant pass (full suite 283 pass; 1 PRE-EXISTING unrelated fail `test_warm_reconcile::test_traefik_spec_is_stateless_with_setup` KeyError 'health_domain' — fails on clean HEAD, not mine; flagged for Adversary). Lint PASS. B5 e2e launched on cc-ci (/root/prevb-deploy @ bb2e3c6), STAGES=install,upgrade, discourse PR#4 (REF=ae5a8180, SRC=recipe-maintainers/discourse). First log lines confirm the core mechanism: `== upgrade base: kind=ref ref=f87c612d71b4 (target-branch (main) tip)` → base = main-tip chaos deploy (bitnamilegacy:3.5.0), env overlay provided. Base now in slow Rails cold boot (15-25min). Polling ~5min. (lint rung fail R011 = recipe-level, a rung not a gate; prepull skipped on the known sidekiq-depends-on config rc=15 — non-fatal.) ## 2026-06-17T00:40Z — M1 GREEN locally; claiming discourse install,upgrade e2e GREEN (2nd run, after the prune fix). Evidence in run-prevb-disc2.log on cc-ci /root/prevb-deploy. The dynamic main-tip base worked first try (kind=ref f87c612d) — crucial, because main (0.8.1+3.5.0) is AHEAD of the newest published tag (0.7.0+3.3.1), so the OLD vers[-2] default (=0.6.3) would have been the wrong predecessor entirely. The upgrade moved 0.8.1+3.5.0 (bitnamilegacy, main-tip) → 1.0.0+3.5.3 (official, PR head), chaos-version=ae5a8180+U. **The one real bug found+fixed (WHY):** first run, `test_head_runs_official_image` PASSED (head app = official 3.5.3 — the leak is gone) but `test_sidekiq_service_dropped` FAILED: `docker stack deploy` (what `abra app deploy` runs) only adds/updates services, it does NOT prune ones the new compose dropped, so the base's sidekiq orphaned on the old image. This is a swarm mechanic, not a head-deploy failure, but it means the deployed stack didn't faithfully reflect the head. Fix = `prune_orphan_services` in perform_upgrade: reconcile the live stack to the head compose's `config --services` set (remove orphans). Faithful (deployed stack == head), no-op when service sets match / compose unresolvable, weakens nothing. Decided to CLAIM with the e2e green + image/sidekiq proof and leave the deliberately-broken-head teeth probe to the Adversary's cold acceptance (its explicit M1 check; I can't push a broken commit to the recipe mirror per guardrails). STATUS spells out where the teeth hold so they know where to probe. ## 2026-06-17T00:45Z — M2-prep spot-checks (3 green) while M1 under Adversary review Ran 2 more recipes through the new dynamic base (de-risks the global resolver change; toward B8): - **cryptpad #5** (install,upgrade): kind=ref main-tip 36ee3451; install+upgrade PASS incl `test_upgrade_preserves_data` (data survived); deploy-count=1; clean teardown. - **keycloak #3** (install,upgrade): base branch is **master** → kind=ref main-tip 12ac6db8 via the origin/main→origin/master fallback in `recipe_branch_commit` (VALIDATES that path); install+upgrade PASS incl `test_upgrade_preserves_realm`; SSO/DEPS path exercised; deploy-count=1; clean teardown. Note: `prune-orphans` SAFE-SKIPPED ("head compose services unresolved — removes nothing") — keycloak's `config --services` returned non-zero in that context; the defensive guard correctly removed nothing (service set unchanged base→head anyway). Confirms prune never false-fails when compose is unresolvable. So 3/3 current recipes resolve to main-tip (kind=ref) and pass — no warm canonicals exist on the host (`find /var/lib/ci-warm -name canonical.json` empty), so last-green (kind=version) isn't exercised in e2e yet (it IS unit-tested). For M2 I may seed/use a warm canonical to e2e the last-green path. Pre-existing orphan `warm-keycloak_...` stack on the host (no registry record) — NOT from prevb; left untouched. Stopping new e2e launches now — the Adversary is running its own discourse cold-acceptance on the shared 7GB node; piling on risks a memory-pressure false-failure in its run. Parking at M1 gate. ## 2026-06-17T01:05Z — M1 PASS; starting M2 Adversary M1 PASS (dbc7a3b), all 8 DoD cold-verified incl. teeth: break-it probe with head image `discourse/discourse:99.99.99-adversary-broken` → `manifest unknown` at prepull → upgrade:fail (level 1/5), base still resolved to main-tip — proves base/prune/previous can't paper over a broken head. No VETO. Note for record: the Adversary attributed the lingering `warm-keycloak_...` stack to "Builder's concurrent spot-check". It's actually a PRE-EXISTING orphan (a warm- domain, created only by the canonical/warm system, not by a normal cold PR run) — my keycloak spot-check used a per-run `keycloak-pr3-*` domain and tore down clean (verified "no leftover keycloak run-stacks"). Not a prevb leak; pre-existing cruft. M2 plan: B7 = discourse PR#4 !testme GREEN in real CI (Drone). Infra confirmed healthy: ccci-bridge_app 1/1 (polls POLL_REPOS incl. discourse every 30s), drone_...app 1/1, Drone healthz 200; Drone builds cc-ci@main (= my prevb code). Before posting !testme publicly on PR#4, running the FULL pipeline locally first (STAGES=install,upgrade,backup,restore,custom) to de-risk backup/restore/custom under the new model (my local runs so far were install,upgrade only). If a non-prevb tier fails I fix/triage first, then !testme. ## 2026-06-17T01:30Z — All 5 discourse tiers green locally; posting !testme (B7) Full local run (run-prevb-disc-full) found ONE failure: custom `test_create_topic_roundtrip` — `mint_admin` hardcoded the bitnamilegacy path `/opt/bitnami/discourse` (404 on the official head). This is a DIRECT consequence of prevb working (the head is now genuinely official, not overlay-reverted to bitnamilegacy). Fixed `_discourse.py::mint_admin` image-agnostic (b66abc4): detect /var/www/discourse (official) vs /opt/bitnami/discourse (legacy); on official re-export DISCOURSE_DB_PASSWORD from /run/secrets/db_password (entrypoint exports it only for boot) and run bin/rails as root (official image USER is empty → exec=root; verified it works). Re-run (install,upgrade,custom) → custom PASS (all 3 custom tests green). Tier status (across run-prevb-disc-full + run-prevb-disc-custom): install✓ upgrade✓ backup✓ restore✓ custom✓. So the real-CI !testme full pipeline should be green. Posting !testme on discourse PR#4 as autonomic-bot (authorized org member) → bridge (polls every 30s) triggers a Drone build of cc-ci@main (= prevb code). ## 2026-06-17T01:33Z — B7 DONE: discourse PR#4 !testme GREEN in real CI (Drone 717) Posted !testme as autonomic-bot (comment 14597); bridge replied in ~16s (build 717), bridge final comment "✅ passed" @01:32:55Z. Run 717 junit (cold-readable at /var/lib/cc-ci-runs/717/junit/): ALL 10 suites failures=0 errors=0 — install / upgrade(generic+cc-ci) / backup(generic+cc-ci) / restore(generic+cc-ci) / custom(create_topic+health_check+site_basic). upgrade__cc-ci proves test_head_runs_official_image_not_bitnamilegacy + test_sidekiq_service_dropped_by_head PASS. Clean teardown (no discourse stacks). This is the M2 headline: the migration is REALLY tested in real CI. Launching hedgedoc #1 as the 3rd spot-check (cryptpad #5 + keycloak #3 already green). Then reconcile + claim M2. ## 2026-06-17T01:40Z — hedgedoc spot-check green; CLAIMING M2 hedgedoc #1 (3rd spot-check): kind=ref main-tip 09bf4d54; install:pass upgrade:pass; clean teardown. 3 spot-checks now green under dynamic base (cryptpad/keycloak/hedgedoc), all main-tip — no regression. discourse 717 results.json level=4/5. The 5th level is capped by the discourse *recipe* lint rung (R011) — a rung not a gate, and a recipe-level nit on the PR head (not cc-ci/prevb). The run is GREEN (bridge "✅ passed", all 5 functional tiers junit 0-fail). Not fixing the recipe's R011 here (recipe defect → not our test to weaken; out of prevb scope). Records reconciliation: 717's artifacts (results.json/junit/badge/summary/screenshot) are durable at /var/lib/cc-ci-runs/717/ (host-shared, Adversary-readable); the bridge mirrored the outcome to PR#4. No warm canonicals to reconcile (none exist). Pre-existing warm-keycloak orphan left untouched (not prevb). Claiming M2. Adversary cold-verifies (re-read 717 junit / re-trigger !testme / re-run a spot-check); then I write ## DONE once REVIEW-prevb shows fresh M1+M2 PASS with no VETO. ## 2026-06-17T01:58Z — M2 PASS → ## DONE Adversary M2 PASS (1c3ba71): all 6 M2 DoD items cold-verified incl. its own independent cryptpad#5 re-run; discourse 717 real-CI GREEN with live-swarm-image teeth (official 3.5.3, sidekiq gone); lint R011 code-verified non-gating; public surface secret-clean; nothing merged. Both M1(01:03Z)+M2(01:58Z) fresh PASS, no VETO. DONE handshake satisfied → wrote ## DONE to STATUS-prevb. Phase prevb complete. Stopping loop.