Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.8 KiB
STATUS — cc-ci Builder
Phase: M0/M1/M2/M4/M5 PASS; M3 PASS (Adversary-verified); M6 CLAIMED (awaiting Adversary).
Bridge→Drone→harness integration DONE (recipe-ci pipeline). M6.5 underway: keycloak full 3-stage
GREEN through Drone (build #39). Next: enroll recipes 3–6 (remaining D10 categories), M7, M8.
In-flight: M6.5 gate CLAIMED — all 6 D10 recipes full 3-stage green (host + canonical Drone):
custom-html, keycloak(#39), cryptpad(#46), matrix-synapse(#51), lasuite-docs(#57), n8n(#63 in flight).
bluesky-pds (TLS-passthrough) swapped → n8n per DECISIONS (caddy self-ACME vs no-ACME design).
M6.5 PASS + M7/D6 PASS (Adversary). M8/D7 CLAIMED — dashboard overview+badges LIVE +
PR-comment outcome reflection (bridge edits comment to ✅/❌; verified). Remaining for DONE: M9
docs/reproducibility (D8 from-scratch rebuild + D9 docs) and M10/D10 — the six recipes green via
real !testme PRs (currently proven via API-trigger; the Adversary-flagged gap). M10 = enroll
recipes in the bridge POLL_REPOS + open recipe-mirror PRs + !testme each.
Last updated: 2026-05-27 (M6.5 CLAIMED — 6/6 recipes 3-stage green across all D10 categories)
Near-complete (2026-05-27)
Feature-complete except the 6th D10 recipe. Verified/claimed: M0–M6 PASS, M6.5 PASS, M7/D6 PASS,
M8/D7 CLAIMED. M9/D9 docs complete (architecture+runbook added). M10: 5/6 recipes green via real
!testme (custom-html/keycloak/matrix-synapse/n8n/cryptpad). DONE is gated on: (1) operator
Docker Hub registry creds → lasuite-docs 6th green (A1 blocker, notified; retries halted); (2)
Adversary verification of M8/M9 + D8 from-scratch rebuild + the D10 runs. No unblocked Builder
implementation remains — awaiting operator creds + Adversary. On each wake: check .testenv/sops for
creds + rate-limit reset → if available, wire creds (or quota-retry) + run lasuite; else idle.
Gate: M6.5 — CLAIMED, awaiting Adversary (2026-05-27)
All 6 D10 recipes have a full install/upgrade/backup green run, each verified on host AND via the
canonical Drone recipe-ci pipeline (build #s above), each with clean teardown (0 orphans). Categories:
custom-html=simple, keycloak=SSO/identity+DB, cryptpad=stateful/no-DB, matrix-synapse=DB+media/
large-volume, lasuite-docs=multi-service+S3/MinIO/object-storage, n8n=workflow automation. D5 held:
each recipe enrolled via tests/<recipe>/ + recipe_meta.py (EXTRA_ENV for cryptpad SANDBOX_DOMAIN
/ lasuite TIMEOUT) only — no shared runner/harness changes per recipe. Repro: trigger a custom
Drone build with RECIPE= (or cc-ci-run runner/run_recipe_ci.py with RECIPE/STAGES on host).
Gates
- Gate: M0 — CLAIMED, awaiting Adversary (2026-05-26). Evidence: flake rebuilds cc-ci from repo
(
switch --flake /root/cc-ci#cc-ci, gen healthy, no failed units); sops-nix decrypts/run/secrets/test_secret(0400 root, value = generatedcc-ci-m0-…). Repro: clone repo, sync to host,nixos-rebuild switch --flake .#cc-ci, thensystemctl is-system-running+ check the secret. Per §6.1 I will NOT advance past this gate to M2; M1 work proceeds as independent unblocked work. → M0 PASS logged by Adversary in REVIEW.md @2026-05-26T21:35Z (cold verify, leak probe clean). - Gate: M1 — CLAIMED, awaiting Adversary (2026-05-26). Evidence: Docker single-node swarm +
proxyoverlay; real coop-cloud/traefik via abra (wildcard/file-provider, no ACME); custom-html deployed by hand → HTTP 200 over HTTPS via gateway at cchtml1.ci.commoninternet.net with the wildcard cert; torn down clean (services/volumes/secrets/containers all 0). Repro:scripts/deploy-proxy.sh+abra app new/deploy/undeploy. Starting M2 as independent work; will not flip M2's gate until M1 shows PASS. → M1 PASS @2026-05-26T22:20Z. - Gate: M2 — CLAIMED, awaiting Adversary (2026-05-26). Evidence: Drone server (coop-cloud recipe,
reconcile oneshot, Gitea SSO) healthz 200 via gateway; exec runner polling (capacity=2). cc-ci repo
activated (push webhook). Pushing
.drone.ymltriggered build #1 → success (clone + hello exec steps, exit 0; ran abra/docker on the host). Repro:nixos-rebuild switch+ one-timescripts/bootstrap-drone-oauth.sh. Starting M3 as independent work; won't flip M3 gate until M2 PASS. - Gate: M3 — CLAIMED, awaiting Adversary (2026-05-27). Trigger redesigned per orchestrator
(plan §4.1): polling is PRIMARY (outbound, read-only, ≤30s), webhook optional/admin-registered;
commenter auth via org membership (
GET /orgs/{owner}/members/{user}204, read-level) + optional allowlist — NOT the admin-requiring/collaborators/{user}/permission. Evidence: posted!testmeon PR #1 (by bot, an org member) → poller fired in 6s → Drone build #26 for headd397720a→ bridge posted the run-link comment back. Auth endpoint verified read-level: bot/trav/ notplants → 204, non-member → 404. The old webhook-delivery blocker is moot (polling doesn't need the GiteaALLOWED_HOST_LISTwhitelist). Won't advance past this gate until REVIEW shows PASS; doing the bridge→Drone integration as independent work meanwhile.
Resource safety (plan §4.2/§4.3 — orchestrator change 2026-05-27)
- MAX_TESTS = DRONE_RUNNER_CAPACITY = 1 (
modules/drone-runner.nix): ≤1 build at once, Drone auto-queues the rest natively. VerifiedDRONE_RUNNER_CAPACITY=1on the runner. - Per-build timeout = 60m (
modules/drone.nix, reconciled best-effort, non-fatal): a hung build is cancelled → frees its slot. Verified Drone repotimeout: 60. - Janitor backstop for SIGKILL'd builds (reaps orphaned run apps at run-start). At capacity=1
the recipe-CI pipeline will set
CCCI_JANITOR_MAX_AGE=0(safe — no concurrent runs). See DECISIONS.
Blocked
- Docker Hub anonymous pull rate limit — registry pull creds needed (A1, operator). During the
D10 real-
!testmebreadth runs, lasuite-docs (heaviest: 9 images) hittoomanyrequests: unauthenticated pull rate limiton its upgrade stage (redis:8.2.6 task Rejected "No such image" → couldn't pull). Confirmed:docker pull redis:8.2.6on the node → rate-limited. This is the plan's flagged A1 input (§1.5/§4.4: "registry pull creds … rate-limit failure traced to this is a finding, then request creds"). Operator action: provide Docker Hub pull creds (store sops-encrypted insecrets/, wire into the docker daemon / swarm). NOT globally blocking: 5/6 recipes already green via real!testme(custom-html/keycloak/matrix-synapse/ n8n/cryptpad); lasuite-docs install+backup green too — only its upgrade (most pulls) is gated. Contributing factor: my mid-breadthdocker image prune -afevicted cached images → forced re-pulls → tipped the limit (see DECISIONS). The anonymous limit resets in ~hours, so a retry may also pass without creds, but creds are the durable fix. Working M9 (docs) meanwhile. - (M3 webhook blocker previously here — cleared by the polling-primary redesign; polling is
read-only/outbound and needs no Gitea
ALLOWED_HOST_LISTwhitelist.)
Tracking (adversary findings I must address)
- [adversary] A4 — concurrent same-recipe runs collide on shared
~/.abra/recipes/<recipe>. Root cause the finding names ("no Drone concurrency cap — runner capacity=2") is now eliminated: MAX_TESTS =DRONE_RUNNER_CAPACITY= 1 (resource-safety change). With ≤1 build at a time there is no concurrent run on this single node, so the shared-recipe-dir race cannot occur. Builder side addressed via the concurrency cap (per plan §4.2 "concurrency cap 1–2"); Adversary to re-test/close. (Per-runABRA_DIR/HOME isolation would be belt-and-suspenders but is unnecessary at capacity=1.) - [adversary] A2 — janitor
-prfilter dead. Already fixed in code:lifecycle.RUN_APP_RE=^[a-z0-9]{1,4}-[0-9a-f]{6}\.ci\.commoninternet\.net$(the hashed scheme), plus a stack-name regex for.env-less orphans, gated on age. Awaiting Adversary kill-probe re-test. - [adversary] A3 — teardown unverified;
.envremoved before confirmed undeploy. Already fixed:lifecycle.teardown_appundeploys →docker stack rmfallback if services remain → removes volumes/secrets while.envexists → drops.envLAST → then_residual()check raisesTeardownErrorif anything is left. Awaiting Adversary kill-mid-run re-test. - [adversary] A1 — no-ACME hazard for test apps. Acknowledged (valid). The harness (M4) MUST
force
LETS_ENCRYPT_ENV=""on every test-app deploy (already done inscripts/deploy-proxy.shand the M1 manual custom-html deploy;scripts/deploy-drone.shwill too). Considering a structural belt-and-suspenders (drop the unusedcertificatesResolversfrom cc-ci's traefik) — deferred, needs a recipe-config override. Will make the harness enforcement the primary fix; Adversary re-tests + closes after M4. → Now enforced:harness.lifecycle.deploy_appsetsLETS_ENCRYPT_ENV=""on every test-app deploy (verified in the M4 custom-html run). Adversary can re-test + close A1.
Notes
- Disk RESOLVED: operator grew the VM 8.9→28 GiB (22 GiB free) on 2026-05-26. Inodes 1.78M total / 1.21M free (was ~6k free — old 8.9 GiB fs had only 586k inodes, which the flake's nixpkgs fetch exhausted). Both byte + inode pressure gone.
- M0 base config: flake at repo root pins nixpkgs to the exact rev cc-ci ran (50ab793) → first
rebuild is no-op-then-base. Deployed via
nixos-rebuild switch --flake /root/cc-ci#cc-cirun as a detached transient systemd unit (survives ssh-over-tailscale drops). Gen 3 current, healthy. - Open warning: incus module enables
systemd.networkwhile we setnetworking.useDHCP=true(scripted dhcpcd) — Nix warns both may manage interfaces. Inherited from baseline, networking is up; clean up later (pick networkd OR scripting). Tracked, non-blocking.