Files
cc-ci/STATUS.md
autonomic-bot f16708155c
All checks were successful
continuous-integration/drone/push Build is passing
STATUS: M3 webhook being whitelisted operator-side; keep webhook, polling reverted
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 02:02:57 +01:00

69 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# STATUS — cc-ci Builder
**Phase:** M6 complete & CLAIMED. M0/M1/M2/M4/M5 PASS. M3 gate BLOCKED (Gitea webhook; operator).
Next: M6.5 (breadth ramp — recipes 36 + keycloak full 3-stage), M7, M8. Resolve M3 trigger before M10.
**In-flight:** M6.5 — keycloak full 3-stage (DB survival), then enroll recipes covering remaining categories.
**Last updated:** 2026-05-27 (M6 claimed; D4 + recipe #2)
## Gates
- **Gate: M0 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: flake rebuilds cc-ci from repo
(`switch --flake /root/cc-ci#cc-ci`, gen healthy, no failed units); sops-nix decrypts
`/run/secrets/test_secret` (0400 root, value = generated `cc-ci-m0-…`). Repro: clone repo, sync to
host, `nixos-rebuild switch --flake .#cc-ci`, then `systemctl is-system-running` + check the secret.
Per §6.1 I will NOT advance past this gate to M2; M1 work proceeds as independent unblocked work.
**M0 PASS** logged by Adversary in REVIEW.md @2026-05-26T21:35Z (cold verify, leak probe clean).
- **Gate: M1 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: Docker single-node swarm +
`proxy` overlay; real coop-cloud/traefik via abra (wildcard/file-provider, no ACME); custom-html
deployed by hand → HTTP 200 over HTTPS via gateway at cchtml1.ci.commoninternet.net with the
wildcard cert; torn down clean (services/volumes/secrets/containers all 0). Repro:
`scripts/deploy-proxy.sh` + `abra app new/deploy/undeploy`. Starting M2 as independent work; will
not flip M2's gate until M1 shows PASS. → **M1 PASS** @2026-05-26T22:20Z.
- **Gate: M2 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: Drone server (coop-cloud recipe,
reconcile oneshot, Gitea SSO) healthz 200 via gateway; exec runner polling (capacity=2). cc-ci repo
activated (push webhook). Pushing `.drone.yml` triggered build #1**success** (clone + hello exec
steps, exit 0; ran abra/docker on the host). Repro: `nixos-rebuild switch` + one-time
`scripts/bootstrap-drone-oauth.sh`. Starting M3 as independent work; won't flip M3 gate until M2 PASS.
## Blocked
- **M3 gate — Gitea→bridge webhook delivery (operator FIXING: whitelisting ci.commoninternet.net in
git.autonomic.zone `ALLOWED_HOST_LIST`).** Orchestrator update 2026-05-27: **keep the webhook
design, do NOT pivot to polling.** Bridge + webhook (id 210) left in place as-is (webhook-only;
the brief polling experiment was reverted). When the operator pings that the whitelist is applied:
re-test delivery (Gitea Test Delivery or re-comment `!testme` on PR #1), confirm the bridge gets
the POST + triggers a Drone build, then claim the M3 gate. Working other milestones meanwhile.
Original diagnosis below for reference.
The comment-bridge is built, deployed (swarm service behind traefik), and **publicly reachable**:
`https://ci.commoninternet.net/hook/healthz` → 200 from the sandbox over *real public DNS*
(ci.commoninternet.net → gateway 143.244.213.108). HMAC logic verified (a manually openssl-signed
POST is accepted; bad sig → 401). BUT Gitea never delivers: commenting `!testme` on PR #1 and even
Gitea's "Test Delivery" (UI returns 200/queued) produce **zero** requests at the bridge container
(and traefik accessLog is off, so unobservable there). Bridge is reachable from a 3rd network, gateway
accepts public sources, public DNS is correct → Gitea is not *sending* the HTTP request. Most likely
git.autonomic.zone's `[webhook] ALLOWED_HOST_LIST` excludes `ci.commoninternet.net` (bot is not Gitea
admin, can't inspect/change). **Operator options:** (a) add `ci.commoninternet.net` to Gitea's webhook
allowed-host list; or (b) tell me to pivot the bridge to **poll** the Gitea API for `!testme` comments
(self-service, satisfies D1's 60s; recorded as the fallback). **Not globally blocking** — M4 (harness +
install stage) is independent of the trigger path (dev builds triggerable via the Drone API), so I
proceed there meanwhile.
## Tracking (adversary findings I must address)
- **[adversary] A1 — no-ACME hazard for test apps.** Acknowledged (valid). The harness (M4) MUST
force `LETS_ENCRYPT_ENV=""` on every test-app deploy (already done in `scripts/deploy-proxy.sh` and
the M1 manual custom-html deploy; `scripts/deploy-drone.sh` will too). Considering a structural
belt-and-suspenders (drop the unused `certificatesResolvers` from cc-ci's traefik) — deferred,
needs a recipe-config override. Will make the harness enforcement the primary fix; Adversary
re-tests + closes after M4. → **Now enforced**: `harness.lifecycle.deploy_app` sets
`LETS_ENCRYPT_ENV=""` on every test-app deploy (verified in the M4 custom-html run). Adversary can
re-test + close A1.
## Notes
- **Disk RESOLVED:** operator grew the VM 8.9→**28 GiB** (22 GiB free) on 2026-05-26. Inodes
1.78M total / 1.21M free (was ~6k free — old 8.9 GiB fs had only 586k inodes, which the flake's
nixpkgs fetch exhausted). Both byte + inode pressure gone.
- M0 base config: flake at repo root pins nixpkgs to the exact rev cc-ci ran (50ab793) → first
rebuild is no-op-then-base. Deployed via `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run as
a detached transient systemd unit (survives ssh-over-tailscale drops). Gen 3 current, healthy.
- Open warning: incus module enables `systemd.network` while we set `networking.useDHCP=true`
(scripted dhcpcd) — Nix warns both may manage interfaces. Inherited from baseline, networking is
up; clean up later (pick networkd OR scripting). Tracked, non-blocking.