Files
cc-ci-orchestrator/cc-ci-plan/README.md
autonomic-bot 36a6c9872a orchestrator: reboot-resilience + session auto-resume + full session plan/tooling
Reboot survival for the Pi orchestrator host:
- systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot
  records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the
  orchestrator session.
- reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count).
- launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed
  orchestrator announces itself (PushNotification) + reports reboots.
- AGENTS.md: on-startup notify routine documented.

Plans/tooling accumulated this session:
- plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review),
  sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance.
- launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution,
  limit-stall re-nudge, INBOX side-channel detection.
- plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED.
- prompts: isolation discipline + INBOX + pacing.
- .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 20:28:10 +01:00

43 lines
4.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci-plan
Self-contained handoff package for building the **cc-ci** Co-op Cloud recipe CI server with two
autonomous Claude loops (a Builder and an adversarial Reviewer) running over days.
## Start here
1. Read **`plan.md`** — the full plan and single source of truth (mission, Definition of Done,
architecture, milestones, the two-agent coordination protocol, loop discipline).
2. Read **`kickoff.md`** — how to launch and supervise the loops.
3. Run **`./launch.sh start`** to bring up both loops + the watchdog.
## Files
| File | Purpose |
|---|---|
| `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. |
| `plan-phase1c-full-reproducibility.md` | **Phase 1c** (runs first): make the VM fully reproducible from git (all secrets incl. the wildcard cert in sops, in a separate private `cc-ci-secrets` repo as a flake input; base stays well-parameterized) and do the **genuine throwaway-VM live rebuild** to close D8 honestly (the "infeasible by design" was overstated). |
| `plan-phase1b-review-lint.md` | **Phase 1b** (after 1c): deterministic linting/formatting in CI + a white-box review checklist (real tests, DRY harness, idempotent Nix, no footguns/secrets), ending in a full cold re-verification of all D1D10 — now covering 1c's refactor. |
| `plan-phase1d-generic-test-suite.md` | **Phase 1d** (after 1b, before 2): a **generic install/upgrade/backup/restore** suite that runs on *any* recipe with zero config, with a recipe's own `test_<op>.py` **overriding or extending** the generic (Builder's call) and **reusing the generic's deployment — no redeploy**, plus optional custom install-steps; recipes needing special setup fail the generic form gracefully. The test-architecture foundation Phase 2 builds on. |
| `plan-phase1e-harness-corrections.md` | **Phase 1e** (after 1d, before 2): three operator-review corrections to the shared generic harness — (HC1) upgrade goes previous-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved recipes** (default = cc-ci overlays + generic only); (HC3) the **generic runs by default** alongside an overlay, skipped only via explicit opt-out. |
| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1e): build on the corrected generic suite — author the recipe overlays (port recipe-maintainer tests as `test_*.py`) + define custom install steps where a recipe fails generically. |
| `plan-phase2b-test-performance.md` | **Phase 2b** (after Phase 2, before Phase 3): empirically measure where test time goes and reduce it (image cache, readiness tuning, dedup deploys, warm infra, concurrency) — no weakened tests. |
| `plan-phase3-results-ux.md` | **Phase 3** (after Phase 2b): beautiful YunoHost-style results — per-run **level**, image-forward PR comment (badge + summary card + app screenshot), polished dashboard. |
| `IDEAS.md` | Deferred/future ideas, parked out of current scope. |
| `brief.md` | The original one-page brief (context only; `plan.md` supersedes it). |
| `kickoff.md` | Launch & supervision guide. |
| `launch.sh` | Starts both loops + a watchdog; restarts dead loops; stops on `## DONE`. |
| `prompts/builder.md` | Builder loop prompt (fed to `claude` by the script). |
| `prompts/adversary.md` | Adversary loop prompt. |
## Before launching
- Set the org in `plan.md` (`git.autonomic.zone/recipe-maintainers/cc-ci`) and lock the six proof recipes (§8).
- Ensure the launching shell has: SSH+sudo to `cc-ci`, the Gitea token, `git.autonomic.zone` access.
- Preconfigure test-app DNS + TLS (plan §4.0): point a wildcard `*.ci.commoninternet.net` record at a gateway that TLS-passthroughs to cc-ci, and **pre-issue the wildcard cert** (`*.ci.commoninternet.net` + `ci.commoninternet.net`, via Gandi DNS-01) into `/var/lib/ci-certs/live/` on cc-ci. The agent handles everything else on cc-ci (Traefik file provider → that cert, swarm, routing) and does **no ACME**; renewal (~90 days) is an out-of-band operator task, so the DNS token never goes to the agent.
- `export CC_CI_REPO=https://git.autonomic.zone/recipe-maintainers/cc-ci.git` so the watchdog can detect `## DONE`.
## What "done" means
The loops stop only when all of `plan.md` §2 (D1D10) hold **and** the Adversary has independently
re-verified each within 24h. The watchdog then tears the loops down automatically.