All checks were successful
continuous-integration/drone/push Build is passing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
69 lines
5.4 KiB
Markdown
69 lines
5.4 KiB
Markdown
# STATUS — cc-ci Builder
|
||
|
||
**Phase:** M6 complete & CLAIMED. M0/M1/M2/M4/M5 PASS. M3 gate BLOCKED (Gitea webhook; operator).
|
||
Next: M6.5 (breadth ramp — recipes 3–6 + keycloak full 3-stage), M7, M8. Resolve M3 trigger before M10.
|
||
**In-flight:** M6.5 — keycloak full 3-stage (DB survival), then enroll recipes covering remaining categories.
|
||
**Last updated:** 2026-05-27 (M6 claimed; D4 + recipe #2)
|
||
|
||
## Gates
|
||
- **Gate: M0 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: flake rebuilds cc-ci from repo
|
||
(`switch --flake /root/cc-ci#cc-ci`, gen healthy, no failed units); sops-nix decrypts
|
||
`/run/secrets/test_secret` (0400 root, value = generated `cc-ci-m0-…`). Repro: clone repo, sync to
|
||
host, `nixos-rebuild switch --flake .#cc-ci`, then `systemctl is-system-running` + check the secret.
|
||
Per §6.1 I will NOT advance past this gate to M2; M1 work proceeds as independent unblocked work.
|
||
→ **M0 PASS** logged by Adversary in REVIEW.md @2026-05-26T21:35Z (cold verify, leak probe clean).
|
||
- **Gate: M1 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: Docker single-node swarm +
|
||
`proxy` overlay; real coop-cloud/traefik via abra (wildcard/file-provider, no ACME); custom-html
|
||
deployed by hand → HTTP 200 over HTTPS via gateway at cchtml1.ci.commoninternet.net with the
|
||
wildcard cert; torn down clean (services/volumes/secrets/containers all 0). Repro:
|
||
`scripts/deploy-proxy.sh` + `abra app new/deploy/undeploy`. Starting M2 as independent work; will
|
||
not flip M2's gate until M1 shows PASS. → **M1 PASS** @2026-05-26T22:20Z.
|
||
- **Gate: M2 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: Drone server (coop-cloud recipe,
|
||
reconcile oneshot, Gitea SSO) healthz 200 via gateway; exec runner polling (capacity=2). cc-ci repo
|
||
activated (push webhook). Pushing `.drone.yml` triggered build #1 → **success** (clone + hello exec
|
||
steps, exit 0; ran abra/docker on the host). Repro: `nixos-rebuild switch` + one-time
|
||
`scripts/bootstrap-drone-oauth.sh`. Starting M3 as independent work; won't flip M3 gate until M2 PASS.
|
||
|
||
## Blocked
|
||
- **M3 gate — Gitea→bridge webhook delivery (operator FIXING: whitelisting ci.commoninternet.net in
|
||
git.autonomic.zone `ALLOWED_HOST_LIST`).** Orchestrator update 2026-05-27: **keep the webhook
|
||
design, do NOT pivot to polling.** Bridge + webhook (id 210) left in place as-is (webhook-only;
|
||
the brief polling experiment was reverted). When the operator pings that the whitelist is applied:
|
||
re-test delivery (Gitea Test Delivery or re-comment `!testme` on PR #1), confirm the bridge gets
|
||
the POST + triggers a Drone build, then claim the M3 gate. Working other milestones meanwhile.
|
||
Original diagnosis below for reference.
|
||
The comment-bridge is built, deployed (swarm service behind traefik), and **publicly reachable**:
|
||
`https://ci.commoninternet.net/hook/healthz` → 200 from the sandbox over *real public DNS*
|
||
(ci.commoninternet.net → gateway 143.244.213.108). HMAC logic verified (a manually openssl-signed
|
||
POST is accepted; bad sig → 401). BUT Gitea never delivers: commenting `!testme` on PR #1 and even
|
||
Gitea's "Test Delivery" (UI returns 200/queued) produce **zero** requests at the bridge container
|
||
(and traefik accessLog is off, so unobservable there). Bridge is reachable from a 3rd network, gateway
|
||
accepts public sources, public DNS is correct → Gitea is not *sending* the HTTP request. Most likely
|
||
git.autonomic.zone's `[webhook] ALLOWED_HOST_LIST` excludes `ci.commoninternet.net` (bot is not Gitea
|
||
admin, can't inspect/change). **Operator options:** (a) add `ci.commoninternet.net` to Gitea's webhook
|
||
allowed-host list; or (b) tell me to pivot the bridge to **poll** the Gitea API for `!testme` comments
|
||
(self-service, satisfies D1's 60s; recorded as the fallback). **Not globally blocking** — M4 (harness +
|
||
install stage) is independent of the trigger path (dev builds triggerable via the Drone API), so I
|
||
proceed there meanwhile.
|
||
|
||
## Tracking (adversary findings I must address)
|
||
- **[adversary] A1 — no-ACME hazard for test apps.** Acknowledged (valid). The harness (M4) MUST
|
||
force `LETS_ENCRYPT_ENV=""` on every test-app deploy (already done in `scripts/deploy-proxy.sh` and
|
||
the M1 manual custom-html deploy; `scripts/deploy-drone.sh` will too). Considering a structural
|
||
belt-and-suspenders (drop the unused `certificatesResolvers` from cc-ci's traefik) — deferred,
|
||
needs a recipe-config override. Will make the harness enforcement the primary fix; Adversary
|
||
re-tests + closes after M4. → **Now enforced**: `harness.lifecycle.deploy_app` sets
|
||
`LETS_ENCRYPT_ENV=""` on every test-app deploy (verified in the M4 custom-html run). Adversary can
|
||
re-test + close A1.
|
||
|
||
## Notes
|
||
- **Disk RESOLVED:** operator grew the VM 8.9→**28 GiB** (22 GiB free) on 2026-05-26. Inodes
|
||
1.78M total / 1.21M free (was ~6k free — old 8.9 GiB fs had only 586k inodes, which the flake's
|
||
nixpkgs fetch exhausted). Both byte + inode pressure gone.
|
||
- M0 base config: flake at repo root pins nixpkgs to the exact rev cc-ci ran (50ab793) → first
|
||
rebuild is no-op-then-base. Deployed via `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run as
|
||
a detached transient systemd unit (survives ssh-over-tailscale drops). Gen 3 current, healthy.
|
||
- Open warning: incus module enables `systemd.network` while we set `networking.useDHCP=true`
|
||
(scripted dhcpcd) — Nix warns both may manage interfaces. Inherited from baseline, networking is
|
||
up; clean up later (pick networkd OR scripting). Tracked, non-blocking.
|