Commit Graph

24 Commits

Author SHA1 Message Date
be4f451d3a fix(flake): make Hetzner the canonical cc-ci host target
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2026-06-01 13:57:45 +00:00
91a69b8971 feat(3 U5.1+U5.2): per-recipe latest-level badge endpoint /badge/<recipe>.svg (R6, level-coloured, status fallback) + complete docs/results-ux.md §3-5 (card/screenshot/PR-comment/badge-embedding, R8); +2 badge unit tests
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:04:14 +00:00
490813c3d1 docs(3 R8): results-ux.md — level ladder + rung-mapping reference (stable section)
R8 doc seeded with the SETTLED, Adversary-fuzzed level ladder + tier->rung translation + results.json
schema + invariant flags. Card/screenshot/PR-comment/badge sections stubbed (filled as U1-U5 wire +
serve their artifacts). Does not advance past the U0 gate; pure documentation of settled design.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:29:12 +00:00
edf34e3e53 claim(2b): deploy budget confirmed minimal+enforced (1+N_cold_deps); B1-B4 claimed
Phase 2b confirm-and-document outcome: per-recipe test-sequence deploy budget is
already minimal — `deploys == 1 (base, shared by all 5 tiers) + N_cold_deps` — and
tighter than plan B1's nominal `1+1(upgrade)+N` because the upgrade is an in-place
chaos redeploy of the prev-version base, not a separate deploy. Enforced as a hard
failure by DG4.1 (expected = 1 + deps_deployed_count, run_recipe_ci.py:1005-1010).
No redundant deploy found; none removed (none existed).

- docs/perf/deploys.md: the budget record (B4), names the out-of-budget WC5 reseed
- STATUS-2b.md: B1-B4 claim with WHAT/HOW/EXPECTED/WHERE for cold verify
- JOURNAL-2b.md / BACKLOG-2b.md / DECISIONS.md: reasoning + settled note
- consume machine-docs/BUILDER-INBOX.md (Adversary heads-up processed)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:35:46 +00:00
18577336f0 docs(2): Q5.1 — enroll-recipe.md §2.4 non-HTTP/multi-service/host-dependent recipes + mumble/mailu examples
Documents the Phase-2 Q4 patterns proven this session: EXTRA_ENV callable, READY_PROBE (HTTP+TCP),
CHAOS_BASE_DEPLOY, recipe_checkout -f, install_steps overlay-drop; non-HTTP protocol tests (mumble
host-ports + _mumble_proto), in-container functional tests (mailu flask/sendmail/doveadm under
TLS_FLAVOR=notls), and P4-N/A when a recipe ships no backupbot label. Worked-example pointers to
tests/mumble + tests/mailu.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 22:33:43 +01:00
de6103d41d claim(2pc): PC1 conservative prune deployed+verified; PC2/PC3 local-store cache confirmed
ci-docker-prune (gated surgical prune) live on cc-ci: old autoPrune --all gone, new timer
enabled (daily), no-ops below 80% disk keeping the local image cache, never --all/--volumes.
Daemon stays PAT-authenticated (nptest2); /var/lib/docker retained across rebuild. PC3 proof:
redis:7-alpine deploy->teardown(service rm, image retained)->redeploy = "Image is up to date",
no layer re-download (cold 5303ms -> warm 674ms). Docs: runbook "Image cache & prune policy",
warm.md, DECISIONS Phase-2pc, IDEAS (registry pull-through cache deferred + revisit trigger).
Gate 2pc CLAIMED, awaiting Adversary cold-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:42:36 +01:00
40b03a9bf1 claim(2w): WC8 + WC9 (FINAL gates) — resource-safety consolidation + stale-warm prune + docs/warm.md + --quick rollback proof
WC8: canonical.prune_stale (drop de-enrolled warm data + volumes) wired into the
nightly sweep + df log; consolidated evidence (DRONE_RUNNER_CAPACITY=MAX_TESTS
serialize; autoPrune drops --volumes so warm vols survive; cold teardown sacred;
warm excluded from D8 — no nix source ref). +1 unit (72 pass). WC9: docs/warm.md
documents the full warm/quick model; --quick rollback proof already proven live
(W2 FAIL restores exact known-good; WC4 PASS byte-identical snapshot). On PASS,
all WC1-WC9 (incl WC1.1/WC1.2) verified → DONE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 04:43:34 +01:00
41ede13042 feat(2): refactor — SSO-dep plan refinement (deps AFTER generic + setup_custom_tests + failure isolation)
Per operator-2026-05-28 SSO-dep plan (plan-sso-dep-testing.md). Substantial orchestrator
restructuring:

NEW LIFECYCLE ORDER:
  1. Recipe deploy ALONE (no deps).
  2. install / upgrade / backup / restore — recipe-only generic tiers.
  3. setup_custom_tests step (NEW):
     a. Deploy each declared dep + provision realm/client/test-user via harness.sso.
     b. Write $CCCI_DEPS_FILE in dict shape {dep_recipe: {domain, realm, client_id, client_secret,
        admin_user, admin_password, discovery_url, token_url, ...}}.
     c. Run tests/<recipe>/setup_custom_tests.sh hook (jq-readable; wires OIDC env via abra
        secret insert + .env edits + in-place 'abra app deploy --force --chaos').
  4. CUSTOM tier with deps-ready flag; @pytest.mark.requires_deps tests skip with
     'deps-not-ready: <reason>' when setup_custom_tests fails. NON-deps custom tests still run
     normally — FAILURE ISOLATION (a DoD item per plan).
  5. Teardown: recipe first, deps in reverse declaration order.

Harness changes:
- runner/run_recipe_ci.py: deps deploy moves from BEFORE recipe deploy to AFTER restore tier.
  Adds _enrich_deps_with_sso() + _run_setup_custom_tests_hook(). DG4.1 generalised to
  'one abra app new per app' (recipe + each dep); in-place redeploys (\--force) don't count.
- runner/harness/deps.py: write_run_state + load_run_state accept dict OR list shape;
  deps_as_dict() coerces either to a recipe→entry map.
- runner/harness/sso.py: admin_password_inside() public re-export.
- tests/conftest.py: deps_creds fixture (full creds dict); deps_apps fixture flattens to
  recipe→domain string. pytest_collection_modifyitems hook skips
  \@pytest.mark.requires_deps tests when CCCI_DEPS_READY=0.
  pytest_configure registers the marker.

Recipe content:
- tests/lasuite-docs/setup_custom_tests.sh: NEW hook reads $CCCI_DEPS_FILE via jq;
  inserts oidc_rpcs secret at BUMPED version (v1→v2) since abra app new -S generates v1 first
  and Swarm forbids overwriting; updates SECRET_OIDC_RPCS_VERSION in .env; writes 9 OIDC env
  vars (REALM/DISCOVERY/AUTH/TOKEN/USERINFO/LOGOUT/JWKS/CLIENT_ID/SCOPES); ensures trailing
  newline on .env so writes don't concatenate (caught a 'TIMEOUT=900OIDC_REALM=...' bug);
  triggers in-place 'abra app deploy --force --chaos --no-input'.
- tests/lasuite-docs/functional/test_oidc_with_keycloak.py: refactored to consume deps_creds
  fixture (no longer calls setup_keycloak_realm itself — the orchestrator does it in
  setup_custom_tests). Marked \@pytest.mark.requires_deps.

Cold-verifiable on cc-ci (log /root/ccci-refactor-lasuite-r5.log):
  RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
  install: PASS, custom: 3 PASS incl. test_oidc_password_grant_against_dep_keycloak.
  deploy-count = 2 (expect 2) — DG4.1 generalised holds.
  Smoke regression: RECIPE=custom-html STAGES=install,custom → 5 PASS, deploy-count=1.

Closes DEFERRED.md #5 (lasuite-docs OIDC parity ports via this plan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 19:11:42 +01:00
b2151af532 docs(2): Q5.1 partial — enroll-recipe.md Phase-2 contract
Adds:
- §2 layout: PARITY.md / functional/ / playwright/ subdirs (Phase 2 §4.1)
- §2.1 Phase-2 contract: parity port + ≥2 specific functional tests + Playwright;
  custom-tier discovery from functional/ + playwright/; SOURCE comment audit
- §2.2 DEPS = [...] declaration; orchestrator dep deploy order; deps_apps fixture;
  expected deploy-count = 1 + len(DEPS); F2-5 verify=True teardown
- §2.3 harness.sso primitives (setup_keycloak_realm, oidc_password_grant,
  assert_discovery_endpoint); F2-7 note that setup is keycloak-specific
- Worked example: lasuite-docs full Phase-2 layout (DEPS + functional/ + lifecycle overlays)
  and the !testme flow walked through end-to-end
- Updated 'Run locally' to include restore + custom stages

A new engineer can add a recipe's full Phase-2 suite from the docs alone (P8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:50:13 +01:00
74725610ab fix(1e): HC1 upgrade/restore tier calls now pass head_ref (multi-line edit miss)
Earlier perl substitution missed the multi-line upgrade and restore run_lifecycle_tier calls (still
passed `target` = VERSION env, None for !testme runs), so perform_upgrade got head_ref=None for
upgrade tier → re-checkout skipped → chaos redeploy of leftover prev checkout (vacuous prev→prev that
'passed' via the chaos-label move fallback).

Verified e2e on hedgedoc (install,upgrade; commit pending push):
  upgrade→PR-head: head_ref=09bf4d54 chaos-version=09bf4d54 version=3.0.9+1.10.7→3.0.10+1.10.8
deploy-count=1, install/upgrade=pass, clean teardown. The chaos-version label deterministically
matches head_ref — direct proof PR-head code was deployed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:04:13 +01:00
b756e72cc2 docs(1d): DG8 — docs/testing.md (generic suite + overlay convention + install-steps hook); update enroll-recipe.md to the deploy-once contract; README pointer
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 01:35:16 +01:00
992d87cfcd refactor(1b): RL6 — move Builder protocol files into machine-docs/ (README stays root)
git mv STATUS*/BACKLOG*/JOURNAL*/DECISIONS.md -> machine-docs/. README.md kept at root (operator
decision). Updated in-repo refs: README (status line + lint section + Loop-state section) and
docs/install.md -> machine-docs/...

Safe to move now: launch.sh already has resolve_state() (prefers machine-docs/ else root) used by
every STATUS/REVIEW read, and the running watchdog (pid 133191) was restarted AFTER that update, so
it is location-agnostic. scripts/lint.sh -> lint: PASS post-move. Adversary moves its own REVIEW*.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:35:30 +01:00
433ec9de30 refactor(1b): RL5 — consolidate Nix code under nix/ (modules->nix/modules, hosts->nix/hosts)
flake.nix/flake.lock STAY at root so the build ref #cc-ci is unchanged; only flake's internal
configuration.nix path updated. Root-relative refs inside moved modules re-based ../X -> ../../X
(secrets/bridge/dashboard); configuration.nix's ../../modules imports unchanged (both dirs under nix/).
Living docs (README, architecture/install/secrets/enroll) + .drone.yml comment updated to nix/...;
append-only history logs left as-is. DECISIONS.md records RL5 + the deferred-coordinated RL6.

Verified on cc-ci: nixos-rebuild build 'path:#cc-ci' -> toplevel 8i3jcad9 (BYTE-IDENTICAL to the
pre-move build — store derivations are content-addressed on file contents, module .nix not in the
runtime closure); scripts/lint.sh -> lint: PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 21:19:09 +01:00
6276bfd3a8 1c/ADV-1c-1: architecture.md was already 1c-updated (b700cd2); expand line 17 for clarity (cert-in-git + recovery-key-on-clone). Pls re-verify HEAD
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 20:24:07 +01:00
b700cd2fda 1c/C7: docs — secrets.md + architecture.md updated to the 1c model (cc-ci-secrets submodule, cert-in-git, bootstrap age key, Drone-token injection, verified D8)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 19:52:03 +01:00
70f108d2fa 1c/W4 DONE: genuine throwaway-VM live rebuild (single switch, 0 failed, byte-identical, TLS leaf==git cert); Gate W4 CLAIMED + install.md updated
All checks were successful
continuous-integration/drone/push Build is passing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:37:02 +01:00
9b58fd0dfb M9/D9: add architecture.md + runbook.md — docs set complete
All checks were successful
continuous-integration/drone/push Build is passing
architecture.md: components, the !testme flow, network/TLS, resource safety, enrollment.
runbook.md: where to look, common failure modes (timeout/rate-limit/auth/skip/health/data), orphan
cleanup, re-trigger, cancel. Completes the D9 doc set (README+install+enroll+secrets+arch+runbook).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 10:34:37 +01:00
fc07d15800 M7/D6: secrets rotation doc + log redaction filter
All checks were successful
continuous-integration/drone/push Build is passing
docs/secrets.md documents the 3 secret classes (A1 external, A2 internal-generated, B recipe-app),
the sops-nix decryption chain, and rotation procedures for each (cert version bump, sops re-encrypt +
swarm-secret version bump, recipe-app ephemeral). run_recipe_ci streams each stage's output through a
redaction filter that masks any /run/secrets/* value (>=8 chars) before it reaches Drone logs —
belt-and-suspenders over 'harness never prints secrets + abra doesn't echo'. Live streaming + exit
code preserved (locally tested). Recipe-ci clones cc-ci fresh per build, so this applies next run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 07:44:53 +01:00
7addb9686c bridge: polling primary + org-membership auth (orchestrator design change)
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
Polling is now the primary, read-only trigger (always-on thread); the /hook
webhook is an optional admin-registered push optimization deduped by comment id.
Authorize commenters via GET /orgs/{owner}/members/{user} (204, read-level) +
optional allowlist, replacing the admin-requiring /collaborators permission
endpoint. Bot never self-registers webhooks. Enroll = POLL_REPOS + tests/<recipe>/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 02:41:25 +01:00
9b33fdf6e6 M6: D4 recipe-local discovery + recipe #2 (keycloak, DB-backed) enrolled; M6 CLAIMED
All checks were successful
continuous-integration/drone/push Build is passing
D4 snapshots recipe-shipped tests/ and runs them against the live app. abra -C -o
everywhere + token clone for private mirror PRs. keycloak install green with no
harness surgery (D5). docs/enroll-recipe.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 01:48:06 +01:00
e251a1177c M2 GATE: green build via push (Drone + exec runner); OAuth bootstrap script + docs
All checks were successful
continuous-integration/drone/push Build is passing
Build #1 success (clone+hello on exec runner). Drone<->Gitea OAuth scripted as
one-time bootstrap-drone-oauth.sh. M2 claimed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 23:08:38 +01:00
a385148af9 M2: Drone server + exec runner up; infra as idempotent-reconcile oneshots
Convert proxy+drone bring-up to writeShellApplication systemd oneshots that
reconcile every activation (orchestrator steer). pkgs.abra overlay. Runner
connected via RPC (polling, capacity=2). install.md = clone + nixos-rebuild switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 22:59:59 +01:00
12f86fd3fb M1: proxy via real coop-cloud/traefik (abra, wildcard/no-ACME); recipe deploy+teardown; M1 CLAIMED
Orchestrator decision: deploy canonical coop-cloud traefik via abra instead of a
hand-rolled module. abra packaged in Nix (pinned). custom-html deployed over HTTPS
(200) via the gateway and torn down clean. docs/install.md seeded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 22:21:12 +01:00
c21cce51b9 chore: bootstrap cc-ci loop state
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:07:31 +01:00