Files
cc-ci/machine-docs/JOURNAL-settings.md

7.8 KiB

JOURNAL — phase settings (WHY / reasoning; Adversary does not read before verdict)

2026-06-17 — bootstrap + M1 design

Phase: server-level settings.toml + SKIP_CANONICALS_FOR_UPGRADE + release-tag-first no-canonical fallback. Plan: /srv/cc-ci/cc-ci-plan/plan-phase-settings-ci-server-config.md.

Why a new harness/settings.py (not extending an env-var module)

Checked for an existing cc-ci config mechanism first (plan §2.A "extend rather than spawn a parallel one"). The server config today is scattered ad-hoc env reads (os.environ.get for MAX_TESTS, CCCI_RUNS_DIR, CCCI_REPO, STAGES, CCCI_QUICK, …) — there is no central config module/class to extend (grep for tomllib|settings\.toml|class Settings → none). So a small dedicated loader IS the minimal, extensible home rather than threading another env var. Stdlib tomllib (py3.12 on the server, confirmed). One [upgrade] table, one key now; _SCHEMA is the single source of defaults+validation so adding a key/table later is a one-line change.

Settings file path: /etc/cc-ci/settings.toml (override $CCCI_SETTINGS)

The harness runs from /etc/cc-ci in BOTH execution contexts (nightly sweep sets CCCI_REPO=/etc/cc-ci and cds there; the Drone recipe-CI runner runs from its checkout but an absolute host path is read identically by both). /etc/cc-ci is a git checkout kept current by git pull + nixos-rebuild on deploy — an untracked settings.toml there survives pulls (git pull never deletes untracked files) and sits next to the tracked settings.toml.example. Chose this over /srv/cc-ci/settings.toml (the plan's suggestion) because /srv/cc-ci is the orchestrator path, ambiguous on the server; /etc/cc-ci is unambiguous and discoverable. The loader is graceful if the file/dir is absent → defaults.

Why the canonical-present path (incl. samever step-back) is byte-for-byte unchanged

Guardrail §4: default false must be a no-op for current behavior. Structure: if rec and rec.version and not flag: → the entire existing prevb/samever block runs verbatim (canonical ≠ head → canonical; canonical == head → step-back older tag, else skip). Only when there is no canonical in play (rec falsy, OR flag true) do we enter the new _no_canonical_base. So with flag false + a canonical, nothing changes; the step-back's "no older predecessor → skip" is preserved (NOT routed to main-tip), which is correct — routing it to main-tip could reintroduce the same-version no-op samever exists to prevent. The plan §2.C "unified chain ... (==head)" is satisfied by the step-back already taking the same release-tag helper as step 1; I deliberately did NOT add a main-tip tail to the step-back skip, to keep samever's guarantee intact. This is the one place where a literal reading of §2.C ("==head → ... → main-tip → skip") and the §4 no-op guardrail + samever's intent point slightly differently; I chose the conservative path that preserves both samever and the no-op guardrail. If the Adversary reads §2.C literally and wants the step-back-no-older case to fall to main-tip, that is a one-line change — but I believe it would be a regression (vacuous upgrade), so it's recorded here.

Why _no_canonical_base guards on head_version before calling recipe_tags

newest_older_version(tags, None) returns None, but evaluating recipe_tags(recipe) eagerly would shell out to git -C <per-run recipe dir> tag even when head_version is None (e.g. callers/tests that don't pass it). Guarding if head_version else None avoids a needless/erroring git call and preserves the prevb behavior for the no-head_version caller shape (→ main-tip).

Why wrong-type raises but malformed/absent doesn't

Plan M1: "malformed file handled" (graceful) AND "wrong type errors clearly". Reconciled: absent / unreadable / TOML-syntax-error → WARN + all-defaults (a red file degrades to today's behavior, can't crash CI). A syntactically-valid file with a known key of the wrong typeTypeError (a typo'd value should be loud, not silently mis-parsed). bool-is-int-subclass handled: 1/0 for a bool key is rejected, not coerced.

Pre-existing, OUT OF SCOPE: dashboard lint drift on main

scripts/lint.sh reports dashboard/dashboard.py + tests/unit/test_dashboard.py would be reformatted by the pinned ruff — confirmed present at HEAD f68f1c5 (git show HEAD:... through pinned ruff), NOT in my diff. Not touched by this phase (narrow scope). Recorded in DECISIONS as an observation. My 5 phase files are format-clean + ruff check clean.

Verification (commands + output)

  • nix shell nixpkgs#python311Packages.pytest -c pytest tests/unit/test_upgrade_base.py tests/unit/test_settings.py -q32 passed.
  • full unit suite pytest tests/unit/ -q315 passed.
  • ruff check runner/ tests/unit/ bridge/ dashboard/ → All checks passed.
  • ruff format --check (pinned) on my 5 files → all formatted.

2026-06-17 — M2 prep (read-only; not advancing past M1 gate)

Server canonical registry (/var/lib/ci-warm/<recipe>/canonical.json, status all idle):

  • WITH canonical (16): cryptpad, custom-html, custom-html-tiny, drone, ghost, gitea, hedgedoc, immich, lasuite-docs, lasuite-drive, lasuite-meet, mailu, matrix-synapse, n8n, plausible, uptime-kuma.
  • warm dir but NO canonical.json (candidates for M2 evidence (a) "recipe without a canonical → newest release tag < head"): keycloak, alerts, traefik.

M2 plan (after M1 PASS):

  • (a) pick a no-canonical recipe WITH published release tags (keycloak has many) → show resolve_upgrade_base returns a release-tag base, not raw main-tip. Likely via a harness dry-run / targeted invocation on the server reading the live settings (absent file → default false).
  • (b) drop a scratch /etc/cc-ci/settings.toml with skip_canonicals_for_upgrade = true, show a canonical-bearing recipe (e.g. gitea/ghost) now resolves to the release-tag base (canonical bypassed), then remove the scratch file → restore default false.
  • Deploy: ensure /etc/cc-ci is at the phase commit (git pull); settings.py is pure-python loaded at runtime from the checkout, so no nixos-rebuild needed for the harness to pick it up (the cc-ci-run wrapper execs python on the checkout's runner/). Confirm on server.

2026-06-17 — M1 PASS + M2 verified live, claimed

M1 Adversary cold-PASS (REVIEW-settings.md @17:00Z, no VETO). Advanced to M2. Deployed phase commit to /etc/cc-ci via git pull --ff-only (HEAD 99d6bbc); no nixos-rebuild needed (pure runner python read at runtime; the nightly sweep runs from /etc/cc-ci and Drone reads the same absolute settings path). Added scripts/show-upgrade-base.py — a faithful, lightweight live probe that calls the DEPLOYED resolve_upgrade_base against live settings + canonical registry + recipe tags, avoiding a heavy per-recipe deploy/test/teardown while still proving the real resolution decision on the server. Chose this over full cc-ci-run runner/run_recipe_ci.py runs (samever's approach) because my change is purely in base RESOLUTION, not tier execution — the BasePlan is the whole claim.

Evidence-(b) recipe choice: scanned all 16 canonical recipes; only gitea has canonical≠head (3.5.3 vs 3.6.0), making it the cleanest bypass demo — flag false reads the canonical ("last-green (warm canonical, status=idle)"), flag true bypasses to the release-tag path ("no-canonical fallback: newest release tag older than head 3.6.0..."). The resolved version is 3.5.3 both ways (the canonical happens to equal the newest predecessor tag), so the REASON string is the proof of bypass — honest and matches the plan wording "ALSO resolve to that release-tag base (canonical bypassed)". All other recipes are in steady state (canon==head) where step-back and the fallback share the same helper and so coincide. Server restored to steady state (settings.toml absent → false).