Files
cc-ci/STATUS.md
autonomic-bot f16708155c
All checks were successful
continuous-integration/drone/push Build is passing
STATUS: M3 webhook being whitelisted operator-side; keep webhook, polling reverted
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 02:02:57 +01:00

5.4 KiB
Raw Blame History

STATUS — cc-ci Builder

Phase: M6 complete & CLAIMED. M0/M1/M2/M4/M5 PASS. M3 gate BLOCKED (Gitea webhook; operator). Next: M6.5 (breadth ramp — recipes 36 + keycloak full 3-stage), M7, M8. Resolve M3 trigger before M10. In-flight: M6.5 — keycloak full 3-stage (DB survival), then enroll recipes covering remaining categories. Last updated: 2026-05-27 (M6 claimed; D4 + recipe #2)

Gates

  • Gate: M0 — CLAIMED, awaiting Adversary (2026-05-26). Evidence: flake rebuilds cc-ci from repo (switch --flake /root/cc-ci#cc-ci, gen healthy, no failed units); sops-nix decrypts /run/secrets/test_secret (0400 root, value = generated cc-ci-m0-…). Repro: clone repo, sync to host, nixos-rebuild switch --flake .#cc-ci, then systemctl is-system-running + check the secret. Per §6.1 I will NOT advance past this gate to M2; M1 work proceeds as independent unblocked work. → M0 PASS logged by Adversary in REVIEW.md @2026-05-26T21:35Z (cold verify, leak probe clean).
  • Gate: M1 — CLAIMED, awaiting Adversary (2026-05-26). Evidence: Docker single-node swarm + proxy overlay; real coop-cloud/traefik via abra (wildcard/file-provider, no ACME); custom-html deployed by hand → HTTP 200 over HTTPS via gateway at cchtml1.ci.commoninternet.net with the wildcard cert; torn down clean (services/volumes/secrets/containers all 0). Repro: scripts/deploy-proxy.sh + abra app new/deploy/undeploy. Starting M2 as independent work; will not flip M2's gate until M1 shows PASS. → M1 PASS @2026-05-26T22:20Z.
  • Gate: M2 — CLAIMED, awaiting Adversary (2026-05-26). Evidence: Drone server (coop-cloud recipe, reconcile oneshot, Gitea SSO) healthz 200 via gateway; exec runner polling (capacity=2). cc-ci repo activated (push webhook). Pushing .drone.yml triggered build #1 → success (clone + hello exec steps, exit 0; ran abra/docker on the host). Repro: nixos-rebuild switch + one-time scripts/bootstrap-drone-oauth.sh. Starting M3 as independent work; won't flip M3 gate until M2 PASS.

Blocked

  • M3 gate — Gitea→bridge webhook delivery (operator FIXING: whitelisting ci.commoninternet.net in git.autonomic.zone ALLOWED_HOST_LIST). Orchestrator update 2026-05-27: keep the webhook design, do NOT pivot to polling. Bridge + webhook (id 210) left in place as-is (webhook-only; the brief polling experiment was reverted). When the operator pings that the whitelist is applied: re-test delivery (Gitea Test Delivery or re-comment !testme on PR #1), confirm the bridge gets the POST + triggers a Drone build, then claim the M3 gate. Working other milestones meanwhile. Original diagnosis below for reference. The comment-bridge is built, deployed (swarm service behind traefik), and publicly reachable: https://ci.commoninternet.net/hook/healthz → 200 from the sandbox over real public DNS (ci.commoninternet.net → gateway 143.244.213.108). HMAC logic verified (a manually openssl-signed POST is accepted; bad sig → 401). BUT Gitea never delivers: commenting !testme on PR #1 and even Gitea's "Test Delivery" (UI returns 200/queued) produce zero requests at the bridge container (and traefik accessLog is off, so unobservable there). Bridge is reachable from a 3rd network, gateway accepts public sources, public DNS is correct → Gitea is not sending the HTTP request. Most likely git.autonomic.zone's [webhook] ALLOWED_HOST_LIST excludes ci.commoninternet.net (bot is not Gitea admin, can't inspect/change). Operator options: (a) add ci.commoninternet.net to Gitea's webhook allowed-host list; or (b) tell me to pivot the bridge to poll the Gitea API for !testme comments (self-service, satisfies D1's 60s; recorded as the fallback). Not globally blocking — M4 (harness + install stage) is independent of the trigger path (dev builds triggerable via the Drone API), so I proceed there meanwhile.

Tracking (adversary findings I must address)

  • [adversary] A1 — no-ACME hazard for test apps. Acknowledged (valid). The harness (M4) MUST force LETS_ENCRYPT_ENV="" on every test-app deploy (already done in scripts/deploy-proxy.sh and the M1 manual custom-html deploy; scripts/deploy-drone.sh will too). Considering a structural belt-and-suspenders (drop the unused certificatesResolvers from cc-ci's traefik) — deferred, needs a recipe-config override. Will make the harness enforcement the primary fix; Adversary re-tests + closes after M4. → Now enforced: harness.lifecycle.deploy_app sets LETS_ENCRYPT_ENV="" on every test-app deploy (verified in the M4 custom-html run). Adversary can re-test + close A1.

Notes

  • Disk RESOLVED: operator grew the VM 8.9→28 GiB (22 GiB free) on 2026-05-26. Inodes 1.78M total / 1.21M free (was ~6k free — old 8.9 GiB fs had only 586k inodes, which the flake's nixpkgs fetch exhausted). Both byte + inode pressure gone.
  • M0 base config: flake at repo root pins nixpkgs to the exact rev cc-ci ran (50ab793) → first rebuild is no-op-then-base. Deployed via nixos-rebuild switch --flake /root/cc-ci#cc-ci run as a detached transient systemd unit (survives ssh-over-tailscale drops). Gen 3 current, healthy.
  • Open warning: incus module enables systemd.network while we set networking.useDHCP=true (scripted dhcpcd) — Nix warns both may manage interfaces. Inherited from baseline, networking is up; clean up later (pick networkd OR scripting). Tracked, non-blocking.