cc-ci-orchestrator

Author	SHA1	Message	Date
autonomic-bot	330378d30d	ideas: fail-fast on crash-looping deploy + don't let one wedged run starve the CI queue After a live incident: plausible build 220 (ClickHouse exit-1 crash-loop) held the single serial runner for its full 1200s DEPLOY_TIMEOUT, starving immich PR-2's queued builds for ~12min until manually torn down. Logs the two fixes (fail-fast on crash-loop; head-of-line blocking on the serial runner) + the interim mitigations (step-2b dev loop for debugging; SIGINT to free a wedged run). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 16:29:30 +00:00
autonomic-bot	77ba7ee075	guardrail: upgrader never modifies cc-ci tests/harness unless --with-tests Absolute, mode-gated rule reinforced in /recipe-upgrade (Guardrails + the new step-2b direct-deploy loop where the upgrader has cc-ci host access) and noted as the interim safeguard in IDEAS.md until the deploy loop moves to isolated infra. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 15:32:50 +00:00
autonomic-bot	98276124e5	ideas: isolate the upgrader's direct deploy onto separate infra (can't tamper with tests) The step-2b direct deploy-and-inspect runs on the cc-ci server's own swarm today, so the upgrader holds write access to the host that owns the tests + CI verdict — a trust hole (could hack the tests). Parked idea: a dedicated throwaway test server with scoped creds, so the upgrader can deploy+inspect but not modify the gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 15:31:20 +00:00
autonomic-bot	2f9d7df78f	ideas: package cc-ci itself as a Co-op Cloud recipe (parked, not implementing) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 00:43:44 +00:00
autonomic-bot	e85e16318c	Phase 2b narrowed to "confirm minimal deploys"; perf ideas moved to IDEAS Operator (2026-05-30): the real deploy-speed bottleneck was hardware (cc-ci VM was 2 vCPU on a 4-core host + disk-I/O-bound; RAM fine), now fixed directly (bumped to 4 vCPU, made cc-nix-test the only running VM on b1). The 2b software micro-optimizations are judged unlikely to help, so: - IDEAS.md: parked the whole empirical-perf program (instrumentation, baseline, attribution) + the optimization menu (image cache/prepull, readiness tuning, warm-SSO start/stop, runner caching, concurrency sizing, resources, secret overhead) under "Phase-2b empirical performance work", revisit only if measurement later proves a specific software bottleneck. - plan-phase2b: reduced to ONE goal — confirm (and fix if needed) that the per-recipe test sequence already uses the minimum deploys (1 base shared by install+functional+backup/restore, +1 for the upgrade tier, +1 per dep), enforced by the existing DG4.1 deploy-count check, WITHOUT weakening any test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 05:07:49 +01:00
autonomic-bot	294a8a1a9e	rename the opt-in heavy-tests flag: --extra-tests -> --extra (operator 2026-05-29) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 10:36:04 +01:00
autonomic-bot	f7971d949d	2pc: drop the pull-through registry cache — single host makes it marginal; keep PC1 prune-policy only Operator (2026-05-29): on one host Docker's local image store already IS the cache; the churn was over-pruning, not a missing cache. So 2pc = conservative prune policy + confirm local-store retention + daemon auth (PC1-3). Registry pull-through cache deferred to IDEAS with a concrete revisit condition (multi-node, or measured cold-deploy bottleneck on recreate-surviving storage). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 09:24:56 +01:00
autonomic-bot	9f99b134cd	ideas: ALT infra-app model — traefik/keycloak/drone as normal coop-cloud abra deployments, maintainer-updated outside Nix (parked, operator-flagged) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 00:12:15 +01:00
autonomic-bot	36a6c9872a	orchestrator: reboot-resilience + session auto-resume + full session plan/tooling Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, .tmp.). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-28 20:28:10 +01:00
autonomic-bot	8c4efe3c88	Add cc-ci-plan/IDEAS.md: deferred-ideas backlog; park optional webhook self-registration First item: later, for environments where the CI server has repo-admin, consider an opt-in (off-by-default) feature to auto-register + idempotently reconcile the issue_comment webhook — preserving the read-only/polling default. Parked, out of current scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 02:42:34 +01:00

10 Commits