Commit Graph

21 Commits

Author SHA1 Message Date
4d6b040ba7 feat(2): Q2.3 — dep resolver + SSO-setup harness primitives
- runner/harness/deps.py: dep resolver primitive (Phase 2 §4.2 / Q2.3).
  - declared_deps(recipe) reads DEPS list from tests/<recipe>/recipe_meta.py
  - dep_domain(parent, pr, ref, dep) — per-run domain per (parent, dep) pair
    so two recipes' deps of the same kind don't collide on a host
  - deploy_deps / teardown_deps — sequential deploy + reverse-order teardown
  - read/write of run-scoped $CCCI_DEPS_FILE
- runner/harness/sso.py: SSO-setup / OIDC-flow primitive (Phase 2 §4.2 / Q2.3).
  - setup_keycloak_realm: idempotent realm + confidential OIDC client +
    test user with generated 25-char alphanumeric password (class-B per §4.4-B);
    returns SsoCreds dict with discovery_url, token_url, all identifiers.
  - oidc_password_grant: exercises the password-grant OIDC flow; returns
    access_token (a JWT) or raises.
  - assert_discovery_endpoint: GET /.well-known/openid-configuration; asserts
    issuer matches the per-run provider domain+realm.
- runner/run_recipe_ci.py: wired in dep deploy BEFORE recipe-under-test, dep
  teardown LAST in finally (reverse order). DG4.1 deploy-count guard now
  expects 1 + len(deps_state) — accommodates declared deps without breaking
  the no-extra-deploys invariant.
- tests/conftest.py: deps_apps fixture reads $CCCI_DEPS_FILE -> dict mapping
  dep_recipe -> dep_domain.
- tests/unit/test_deps.py: 7 unit tests covering declared_deps parsing,
  per-(parent,dep) domain distinctness, run-state JSON write/load, env-var
  no-op semantics. 28/28 unit tests PASS on cc-ci.

Smoke test confirmed deploy_count == expected (1) when no deps declared
(custom-html install run, log /root/ccci-q2-deps-smoke.log).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 07:41:56 +01:00
0d0fc6c4bc feat(2): Q0.1/Q0.2 — harness.http + discovery recurses functional/playwright (Phase 2)
- runner/harness/http.py: canonical Phase-2 recipe-test HTTP API (vendored from
  recipe-maintainer/utils/tests/helpers.py): http_get/http_post, retry variants,
  wait_for_http, assert_converges. JSON-parsing, header support, form/JSON POST
  bodies, transport-failure -> status=0. Self-contained (cc-ci does not import
  recipe-maintainer at runtime per DECISIONS Phase 2).
- harness.discovery.custom_tests now also recurses into
  tests/<recipe>/{functional,playwright}/test_*.py (Phase 2 §4.1 layout) while
  excluding lifecycle test_<op>.py names and honoring the HC2 repo-local gate.
- Unit tests:
    tests/unit/test_http.py — in-process http.server fixture; deterministic
    proofs of parsing/retry/convergence semantics, no network egress.
    tests/unit/test_discovery_phase2.py — functional/+playwright/ recursion
    + HC2 gate still applies to subdirs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:36:49 +01:00
6eabfdc0fb fix(1e): F1e-1 exec_in_app race + HC1 head_ref/move hardening
F1e-1 (Adversary): exec_in_app silently returned '' on a failed docker exec, flipping a healthy
recipe RED under opt-out (post-backup container cycle, no readiness buffer). Now polls (re-resolve
container + re-exec) until rc==0 or 90s, then RAISES — never masks an exec failure as empty data.
No assertion weakened. Verified: opt-out install,backup,restore on custom-html now PASS.

HC1: head_ref = ref or recipe_head_commit (prefer explicit PR head sha $REF — robust, no git race;
production !testme always sets REF). assert_upgraded, when head_ref known, REQUIRES the deployed
chaos-version commit to MATCH head_ref (direct + non-vacuous proof the PR-head code was deployed; a
stale prev-checkout chaos redeploy fails). Falls back to version/image/chaos move check otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 03:41:42 +01:00
b7e6cbd7be feat(1e): HC3 additive generic + op/assertion split (orchestrator owns the op)
- orchestrator: per mutating tier, run optional pre-op seed hook (ops.py pre_<op>) → perform the op
  ONCE (harness-owned) → run generic assertion (unless opted out) AND overlay assertion, both against
  the shared post-op deployment. Op results passed op→assertion via run-scoped CCCI_OP_STATE_FILE.
- opt-out: CCCI_SKIP_GENERIC / CCCI_SKIP_GENERIC_<OP> / recipe_meta.SKIP_GENERIC (declarative).
- generic.py: split do_* into op primitives (perform_upgrade/backup/restore) + assertions
  (assert_upgraded/backup_artifact/restore_healthy) reading op_state(); deployed_identity now returns
  {version,image,chaos} (chaos label ready for HC1).
- generic test_<op>.py + all 6 recipe overlays migrated to assertion-only; pre-op seeding moved to
  per-recipe ops.py (pre_upgrade/pre_backup/pre_restore). install overlays unchanged (no op).
- deploy-count stays 1 (op primitives never call deploy_app). lint PASS; 8 unit tests PASS on cc-ci.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 03:12:04 +01:00
d38a695fa3 feat(1e): HC2 repo-local approval allowlist (default-deny) + discovery gate
- tests/repo-local-approved.txt (empty ⇒ default-deny); CCCI_REPO_LOCAL_APPROVED_FILE override.
- discovery: repo_local_approved()/_gated() centralize the gate; resolve_overlay_op + generic_op
  (HC3 additive split); custom_tests/install_steps/pre_op_hook all honor the gate.
- unit tests rewritten for approved-vs-not + the generic floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 02:55:58 +01:00
feb6f80d50 fix(1d): bounded retry in _app_container (backup briefly cycles the app container)
abra app backup create (backup-bot-two) stops/cycles the app container, so a mutate exec_in_app
right after backup hit an empty docker ps and raised. _app_container now polls (no bare sleep) for
the container to reappear within a timeout. Recipe-agnostic harness robustness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 00:06:28 +01:00
81e26a1bdc fix(1d): F1d-2 — pinned base deploys the pinned version; upgrade is non-vacuous
- deploy_app: checkout the pinned tag + deploy NON-chaos when a version is pinned (chaos only for
  version=None / PR-head). Was always -C, which ignored the pin and deployed LATEST -> upgrade no-op.
- do_upgrade: assert the deployment actually MOVED (coop-cloud version label and/or image changed)
  via lifecycle.deployed_identity -> a vacuous no-op upgrade can no longer pass (DG2).
- G2: migrate custom-html overlays to the assertion-only contract (override + extend-by-composition
  + data-continuity; split backup/restore). tests/unit/test_discovery.py proves precedence (5/5).

Probe (Adversary's F1d-2 test): hedgedoc deploy-prev=1.10.7 -> upgrade=1.10.8, CHANGED=True.
hedgedoc full generic lifecycle green (install/upgrade/backup/restore, deploy-count=1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 00:02:59 +01:00
6c5d8f28ea fix(1d): G1 backup/restore + F1d-1 cert-check reframe
- backup artifact: read snapshot_id from 'abra app backup create' output (snapshots needs a TTY);
  generic.parse_snapshot_id + do_backup assert it
- restore serving race: lifecycle.http_fetch (one request -> status+body, never raises) +
  assert_serving is now a bounded poll (settles a post-op reconverge, no bare sleep); drop wait_serving
- F1d-1 (Adversary, low): reframe served_cert/assert_serving honestly as an INFRA TLS sanity check
  (catches a lapsed/mis-rotated wildcard cert), NOT app-vs-fallback (Traefik serves the wildcard
  zone-wide); the genuine serving proof is services_converged + non-404 status. Awaiting re-test.

DG1 Adversary PASS @ef44d46. G1 full-lifecycle re-verification in flight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:39:45 +01:00
ef44d4658b feat(1d): G0 — generic install + deploy-once orchestrator (DG1 green on hedgedoc)
- harness/generic.py: recipe-agnostic assert_serving (converged + real HTTP, 404-excluded +
  not Traefik 404 body + CA-verified trusted wildcard cert), op helpers, backup_capable detect
- harness/discovery.py: per-op overlay resolution (repo-local > cc-ci > generic), custom + hook
- tests/_generic/: assertion-only tiers (install/upgrade/backup/restore) on the shared deployment
- run_recipe_ci.py: deploy-ONCE orchestrator, per-op summary, deploy-count guard (DG4.1)
- conftest live_app fixture; lifecycle deploy-count + install-steps hook + pin DOMAIN to run domain

DG1 cold-verified green on hedgedoc (pure generic, deploy-count=1, clean teardown). G0 CLAIMED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:27:55 +01:00
2cede01ed7 style(1b): auto-format + lint-clean the whole codebase (RL1)
Mechanical, semantics-preserving cleanup so the codebase passes the new lint stage:
- ruff format: all 32 Python files (wraps long signatures, normalizes quotes/blank lines).
- nixpkgs-fmt: modules/drone-runner.nix.
- shfmt (-i 2 -ci): scripts/*.sh.

Lint fixes (reviewed, behavior-preserving — no test weakened):
- ruff SIM105: try/except-pass -> contextlib.suppress (abra.py app_config rm; lifecycle.py janitor).
- ruff SIM115: open().read() -> with open() (run_recipe_ci.py redaction-values + gitea-token).
- statix: merge repeated sops `secrets.*` keys into one `secrets = { ... }` (comments kept);
  empty fn pattern `{ ... }:` -> `_:` (packages.nix).
- deadnix: drop unused lambda args (flake `self`; configuration.nix `lib`; overlay `final` -> `_`).

Verified on cc-ci: `scripts/lint.sh` -> lint: PASS; nixosConfigurations.cc-ci evaluates;
all Python byte-compiles. The deployed bridge/dashboard/runner source changes hash (reformat),
so cc-ci will be rebuilt to the new closure in W2 before the cold D1-D10 re-verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 20:52:05 +01:00
575efb5054 fix: abra app upgrade -c (no-converge-checks) — abra false-fails slow heavy rolling upgrades
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
Diagnosed via instrumented diag: lasuite-docs upgrade reported 'FATA deploy failed' while all 9
services converged 1/1 — abra's convergence poll gives up too early on the slow stop-first roll
(pulling new images). Disable abra's check; the harness wait_healthy + data-survival assertion is
the real, more-patient gate (a genuine failure still fails the test: app never gets healthy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:34:59 +01:00
4d5f7e25c6 fix: abra app upgrade -o (offline) — was 401'ing fetching tags from the private mirror origin
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:31:40 +01:00
ebb4c0cbca M6.5: enroll cryptpad (recipe #3, stateful/no-DB) + generic per-recipe EXTRA_ENV
All checks were successful
continuous-integration/drone/push Build is passing
Adds a shared-harness EXTRA_ENV mechanism (recipe_meta.py dict or domain-callable),
applied in deploy_app at every deploy path — no per-recipe harness surgery (D5).
cryptpad uses it for its required distinct SANDBOX_DOMAIN. Tests assert data
survival via a marker file in the backed-up cryptpad_data volume (exec_in_app,
since cryptpad data isn't HTTP-served).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 04:41:44 +01:00
7aa0346902 harness: backup/restore pass -C -o; catalogue fetch re-clones clean
Some checks failed
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is failing
Two fixes surfaced by the first real recipe-ci run through Drone:
- abra app backup/restore now pass -C -o (current checkout, no remote fetch) like
  every other recipe-touching call — without -o they fetch recipe tags from the
  (private) remote and fail 'authentication required: Unauthorized'.
- fetch_recipe's catalogue path rm's the recipe dir first so a leftover private-mirror
  remote from a prior SRC+REF run can't poison version resolution / backup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 03:05:03 +01:00
25b628e959 harness: app_new uses chaos only when no version (version => clean tag checkout)
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 02:05:54 +01:00
9b33fdf6e6 M6: D4 recipe-local discovery + recipe #2 (keycloak, DB-backed) enrolled; M6 CLAIMED
All checks were successful
continuous-integration/drone/push Build is passing
D4 snapshots recipe-shipped tests/ and runs them against the live app. abra -C -o
everywhere + token clone for private mirror PRs. keycloak install green with no
harness surgery (D5). docs/enroll-recipe.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 01:48:06 +01:00
7fc26fae68 M6 (part 1): per-recipe meta + D4 recipe-local discovery + shared naming helper
All checks were successful
continuous-integration/drone/push Build is passing
Recipe-agnostic harness (no surgery to enroll a recipe): recipe_meta.py for
health path/codes/timeouts; run_recipe_local discovers + runs recipe-shipped
tests/ against the live app. install non-regressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 01:16:29 +01:00
b7a2d70380 harness: fix A2 (janitor real-name + docker reap + age gate) and A3 (verified teardown)
All checks were successful
continuous-integration/drone/push Build is passing
teardown_app now docker-stack-rm fallback, removes .env only after stack gone,
retries volume rm, and verifies no residual (raises TeardownError). janitor matches
the real <recipe[:4]>-<6hex> scheme + reaps env-less orphans via docker. Verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 01:05:18 +01:00
7eb0dd3c77 M5: upgrade + backup/restore stages green (custom-html); backup-bot-two oneshot
All checks were successful
continuous-integration/drone/push Build is passing
3-stage run green (install/upgrade/backup), clean teardown. backupbot deployed
via reconcile oneshot; PTY (script) for abra backup/restore; -m for secret generate
(no value leak). M5 CLAIMED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 00:53:16 +01:00
38a145fd9c M4: harness + green install stage (custom-html + Playwright); guaranteed teardown; M4 CLAIMED
All checks were successful
continuous-integration/drone/push Build is passing
run_recipe_ci.py + conftest + abra/lifecycle wrappers + Nix python/playwright env.
deploy_app forces LETS_ENCRYPT_ENV='' (addresses A1). Short per-run domain scheme
for the 64-char swarm name limit. 2 passed; teardown leaves zero orphans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 00:23:55 +01:00
c21cce51b9 chore: bootstrap cc-ci loop state
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:07:31 +01:00