Commit Graph

11 Commits

Author SHA1 Message Date
32f00717ac fix(2w): W0.9 WC1.1 hardening (proven live: healthy upgrade + marquee rollback)
Bugs found by the live proof, fixed:
- warmsnap: snapshot now swaps a <recipe>/snapshot/ SUBDIR, not the whole
  <recipe>/ dir — so the reconciler's sibling last_good file survives a
  snapshot swap (was being clobbered).
- warm_reconcile: deploy_version captures abra's stdout (it writes FATA to
  stdout) in the error; add wait_undeployed() after every undeploy so
  snapshot/restore/redeploy don't race a half-removed swarm stack; the upgrade
  deploy is wrapped so a deploy FAILURE (not just unhealthy) also triggers
  rollback. (57 unit pass.)

LIVE PROOF on warm keycloak (annotated fake tags via CCCI_SKIP_FETCH):
(a) healthy upgrade 10.7.1->10.7.9: snapshot+deploy+health-pass, last_good
    committed=10.7.9, marker realm preserved.
(b) MARQUEE rollback: broken latest 10.7.10 (lint-fail) -> rollback to 10.7.9,
    HEALTHY, marker realm INTACT (data preserved through broken-upgrade+restore),
    last_good NOT advanced, rollback alert written (attempted=10.7.10,
    last_good=10.7.9, recovered=True). keycloak recovered to canonical
    10.7.1+26.6.2 healthy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 01:21:05 +01:00
a044abb298 feat(2w): W0.6 unpinned warm reconciler + WC1.2 safety gate + WC1.1 scaffold
runner/warm_reconcile.py (python, packaged into nix store, replaces bash
reconcile): UNPIN keycloak (deploy latest published version TAG; recipe fetched
at runtime -> D8 closure byte-identical). WC1.2 pre-deploy safety gate (runs
FIRST): major recipe/app-version bump OR releaseNotes manual-migration marker
-> hold-on-current + alert sentinel (no deploy churn). WC1.1 health-gated
upgrade-with-rollback: record last-good -> [keycloak: undeploy->warmsnap.snapshot
->deploy latest] -> health-gate -> commit-or-(restore+redeploy-prior+alert).
Alerts = /var/lib/ci-warm/alerts/*.json (Builder loop relays). current version
read from abra TYPE=<recipe>:<version>. CCCI_SKIP_FETCH test hook.
+8 unit tests for the version gate (56 unit pass).

Proven on cc-ci: nixos-rebuild switch -> warm-keycloak.service runs the python
reconciler -> noop-healthy (system 0-failed, /realms/master=200). WC1.2 holds
proven live: MAJOR bump -> held-major (keycloak untouched); minor+manual-
migration notes -> held-manual-migration (alert carries notes); no deploy churn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 00:42:02 +01:00
4cc1e15a53 feat(2w): W0.5 WC3 snapshot/restore helper (warmsnap.py)
runner/harness/warmsnap.py: raw per-volume tar of an app's stack volumes while
UNDEPLOYED, under /var/lib/ci-warm/<recipe>/ (meta.json + volumes/<vol>.tar);
one last-good, atomic dir swap; restore clears+untars each volume back. Asserts
undeployed (consistency). Reused by WC1.1 (pre-upgrade keycloak snapshot) + WC5.
+5 unit tests (48 unit pass).

LIVE round-trip PROVEN on warm keycloak: create marker realm -> undeploy ->
snapshot (mariadb+providers vols) -> deploy -> delete marker (mutate DB) ->
undeploy -> restore -> deploy -> marker realm BACK; keycloak healthy. WC3 core.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 00:12:46 +01:00
74bf8c1723 feat(2w): W0.1 keycloak realm lifecycle primitives (WC1)
sso.py: list_realms, delete_keycloak_realm (idempotent, refuses master),
realms_to_reap (pure, concurrency-safe predicate), reap_orphaned_realms.
The per-run realm is the isolation unit on a shared live-warm keycloak;
orphans (crashed runs) reaped by hex not mapping to a live app stack.
+8 unit tests (tests/unit/test_warm_realm.py); 43 unit pass on cc-ci.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 23:16:48 +01:00
5b34496557 fix(2): F2-11 — SSO-dep deps-not-ready SKIP no longer yields GREEN !testme
When a DEPS-declaring recipe's setup_custom_tests fails, its @requires_deps (SSO/OIDC)
tests skip; a skip-only pytest file exits 0 so the run previously reported overall=0
(GREEN) while the only SSO test never ran (violates P7). Fix preserves generic-tier
failure-isolation but corrects the green SIGNAL:
- conftest.pytest_collection_modifyitems counts skipped requires_deps tests and appends
  to $CCCI_DEPS_SKIP_REPORT.
- run_recipe_ci: sums the count, surfaces it in RUN SUMMARY, and new pure predicate
  sso_dep_unverified(declared, deps_ready, skipped) flips overall=1.
- 7 new unit tests (tests/unit/test_f211_sso_skip.py).

Verified deploy-free (rate-limit-independent): 35/35 unit PASS; cold real-test proof on
lasuite-docs test_oidc_with_keycloak.py -> 1 skipped + skip-report==1 -> orchestrator
would set overall=1. Full e2e deferred until Docker Hub rate limit lifts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 21:25:27 +01:00
4d6b040ba7 feat(2): Q2.3 — dep resolver + SSO-setup harness primitives
- runner/harness/deps.py: dep resolver primitive (Phase 2 §4.2 / Q2.3).
  - declared_deps(recipe) reads DEPS list from tests/<recipe>/recipe_meta.py
  - dep_domain(parent, pr, ref, dep) — per-run domain per (parent, dep) pair
    so two recipes' deps of the same kind don't collide on a host
  - deploy_deps / teardown_deps — sequential deploy + reverse-order teardown
  - read/write of run-scoped $CCCI_DEPS_FILE
- runner/harness/sso.py: SSO-setup / OIDC-flow primitive (Phase 2 §4.2 / Q2.3).
  - setup_keycloak_realm: idempotent realm + confidential OIDC client +
    test user with generated 25-char alphanumeric password (class-B per §4.4-B);
    returns SsoCreds dict with discovery_url, token_url, all identifiers.
  - oidc_password_grant: exercises the password-grant OIDC flow; returns
    access_token (a JWT) or raises.
  - assert_discovery_endpoint: GET /.well-known/openid-configuration; asserts
    issuer matches the per-run provider domain+realm.
- runner/run_recipe_ci.py: wired in dep deploy BEFORE recipe-under-test, dep
  teardown LAST in finally (reverse order). DG4.1 deploy-count guard now
  expects 1 + len(deps_state) — accommodates declared deps without breaking
  the no-extra-deploys invariant.
- tests/conftest.py: deps_apps fixture reads $CCCI_DEPS_FILE -> dict mapping
  dep_recipe -> dep_domain.
- tests/unit/test_deps.py: 7 unit tests covering declared_deps parsing,
  per-(parent,dep) domain distinctness, run-state JSON write/load, env-var
  no-op semantics. 28/28 unit tests PASS on cc-ci.

Smoke test confirmed deploy_count == expected (1) when no deps declared
(custom-html install run, log /root/ccci-q2-deps-smoke.log).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 07:41:56 +01:00
5741e8838f fix(2): F2-1 — test_custom_tests_repo_local_gated uses synthetic recipe (Adversary cold)
The Phase-1e HC2 test asserted custom_tests('custom-html', repo-local) == [] when only the
repo-local dir was set + custom-html had no cc-ci-side functional tests. Phase-2 commit bec9265
added 4 legitimate non-lifecycle test_*.py files under tests/custom-html/{functional,playwright}/
which custom_tests() now correctly returns — breaking the == [] assertion.

The custom_tests behavior is correct; the test fixture was using the real recipe name. Fix: switch
to a synthetic recipe + monkeypatch cc_ci_dir (same pattern already used in the Phase-2 sibling
test_discovery_phase2.py). 5-line change, no behavior change.

Cold-verifiable on cc-ci: cc-ci-run -m pytest tests/unit -v -> 21 passed in 5.38s
(Adversary's F2-1 repro now PASSes; no other regression).

Also: tests/n8n/PARITY.md drafted for the in-flight Q1.2 work (n8n parity port).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 06:32:47 +01:00
0d0fc6c4bc feat(2): Q0.1/Q0.2 — harness.http + discovery recurses functional/playwright (Phase 2)
- runner/harness/http.py: canonical Phase-2 recipe-test HTTP API (vendored from
  recipe-maintainer/utils/tests/helpers.py): http_get/http_post, retry variants,
  wait_for_http, assert_converges. JSON-parsing, header support, form/JSON POST
  bodies, transport-failure -> status=0. Self-contained (cc-ci does not import
  recipe-maintainer at runtime per DECISIONS Phase 2).
- harness.discovery.custom_tests now also recurses into
  tests/<recipe>/{functional,playwright}/test_*.py (Phase 2 §4.1 layout) while
  excluding lifecycle test_<op>.py names and honoring the HC2 repo-local gate.
- Unit tests:
    tests/unit/test_http.py — in-process http.server fixture; deterministic
    proofs of parsing/retry/convergence semantics, no network egress.
    tests/unit/test_discovery_phase2.py — functional/+playwright/ recursion
    + HC2 gate still applies to subdirs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:36:49 +01:00
b7e6cbd7be feat(1e): HC3 additive generic + op/assertion split (orchestrator owns the op)
- orchestrator: per mutating tier, run optional pre-op seed hook (ops.py pre_<op>) → perform the op
  ONCE (harness-owned) → run generic assertion (unless opted out) AND overlay assertion, both against
  the shared post-op deployment. Op results passed op→assertion via run-scoped CCCI_OP_STATE_FILE.
- opt-out: CCCI_SKIP_GENERIC / CCCI_SKIP_GENERIC_<OP> / recipe_meta.SKIP_GENERIC (declarative).
- generic.py: split do_* into op primitives (perform_upgrade/backup/restore) + assertions
  (assert_upgraded/backup_artifact/restore_healthy) reading op_state(); deployed_identity now returns
  {version,image,chaos} (chaos label ready for HC1).
- generic test_<op>.py + all 6 recipe overlays migrated to assertion-only; pre-op seeding moved to
  per-recipe ops.py (pre_upgrade/pre_backup/pre_restore). install overlays unchanged (no op).
- deploy-count stays 1 (op primitives never call deploy_app). lint PASS; 8 unit tests PASS on cc-ci.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 03:12:04 +01:00
d38a695fa3 feat(1e): HC2 repo-local approval allowlist (default-deny) + discovery gate
- tests/repo-local-approved.txt (empty ⇒ default-deny); CCCI_REPO_LOCAL_APPROVED_FILE override.
- discovery: repo_local_approved()/_gated() centralize the gate; resolve_overlay_op + generic_op
  (HC3 additive split); custom_tests/install_steps/pre_op_hook all honor the gate.
- unit tests rewritten for approved-vs-not + the generic floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 02:55:58 +01:00
81e26a1bdc fix(1d): F1d-2 — pinned base deploys the pinned version; upgrade is non-vacuous
- deploy_app: checkout the pinned tag + deploy NON-chaos when a version is pinned (chaos only for
  version=None / PR-head). Was always -C, which ignored the pin and deployed LATEST -> upgrade no-op.
- do_upgrade: assert the deployment actually MOVED (coop-cloud version label and/or image changed)
  via lifecycle.deployed_identity -> a vacuous no-op upgrade can no longer pass (DG2).
- G2: migrate custom-html overlays to the assertion-only contract (override + extend-by-composition
  + data-continuity; split backup/restore). tests/unit/test_discovery.py proves precedence (5/5).

Probe (Adversary's F1d-2 test): hedgedoc deploy-prev=1.10.7 -> upgrade=1.10.8, CHANGED=True.
hedgedoc full generic lifecycle green (install/upgrade/backup/restore, deploy-count=1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 00:02:59 +01:00