18 KiB
REVIEW — cc-ci Adversary (append-only)
This file is owned by the Adversary loop (§6.1). The Builder seeds this stub at bootstrap and
does not edit it afterward. Adversary appends milestone/D-item verdicts (<id>: PASS @<ts> +
evidence, or FAIL + a finding in BACKLOG.md ## Adversary findings), and may write ## VETO.
M0 — Foundations: PASS @2026-05-26T21:35Z
Verified cold (fresh shell, own clone /srv/cc-ci/cc-ci-adv, isolated host build dir
/root/cc-ci-advverify, no reuse of Builder's /root/cc-ci).
Acceptance — "systemctl is-system-running healthy after a rebuild from the repo" + Builder's
sops claim:
- Repo rebuilds cc-ci: synced M0 commit
deb4a0f(git-archive, no .git) to host, rannixos-rebuild build --flake .#cc-ci→BUILD EXIT 0, produced…-nixos-system-nixos-24.11.20250630.50ab793. Current HEAD also builds clean. - System health:
systemctl is-system-running→running;systemctl --failed→ 0 units. - sops decrypt:
/run/secrets/test_secretpresent, mode400 root:root, 41 bytes, value beginscc-c…(matches claimed generatedcc-ci-m0-…).secrets/secrets.yamlis genuinely encrypted (2×ENC[…]+ sops metadata block). - D6 leak probe (early): the decrypted plaintext value appears 0 times across all git
history (
git grep -F over git rev-list --all) and 0× in plaintext insecrets.yaml. No leak.
Note (not a finding; context for the M1 gate): the running system is already ahead of M0 — its
closure includes docker, unit-swarm-init, and traefik units (traefik.yml,
traefik-stack.yml, unit-traefik-deploy) that are not yet committed (HEAD ab839ae is
swarm-only, no traefik). Expected mid-M1 churn, but the Traefik config must be committed to the
repo before M1 is claimed or it fails D8 reproducibility — will check at the M1 gate.
M1 — Swarm + abra target: PASS @2026-05-26T22:20Z
Verified cold from own clone; deployed my own probe recipe via abra (not trusting the Builder's
hand-test). Acceptance "a recipe deployed via abra is reachable over HTTPS at
*.ci.commoninternet.net, then fully torn down leaving no volumes" + orchestrator's M1 checklist
(a–d).
- (a) Real coop-cloud/traefik recipe (not hand-rolled):
docker service ls→traefik_…_app(traefik:v3.6.15) +…_socket-proxy(lscr.io socket-proxy) — the canonical recipe layout, deployed via abra (scripts/deploy-proxy.sh).modules/traefik.nixis deleted. - (b) Wildcard on web-secure + proxy overlay: static
traefik.ymlhasweb-secure: :443(web→web-secure 301 redirect, verified live). File provider/etc/traefik/file-provider.yml:tls.certificates: [{certFile:/run/secrets/ssl_cert, keyFile:/run/secrets/ssl_key}]; swarm secrets…_ssl_cert_v1/…_ssl_key_v1mounted (2909 B / 227 B = the pre-issued cert). My probe appadvm1probe_…_appwas attached to theproxyoverlay. - E2E (cold deploy):
abra app new custom-html -D advm1probe.ci.commoninternet.net(forcedLETS_ENCRYPT_ENV="") →deploy succeeded 🟢. Via SOCKS proxy: HTTP 200; served certsubject: CN=*.ci.commoninternet.net, SAN-matched,SSL certificate verify ok, issuer LE E8 — i.e. the pre-issued wildcard, NOT a per-host ACME cert. - (c) No Gandi/DNS token, no ACME credential: repo (all history) clean; on host the only
gandi/dns-challenge strings are commented-out recipe-template options (
#GANDI_…,#SECRET_GANDIV5_…) holding no value. Active traefik env =LETS_ENCRYPT_ENV=(empty),WILDCARDS_ENABLED=1,compose.wildcard.yml.staging/productioncertResolvers are defined in traefik.yml (stock template) but referenced by no router; both acme.json are 0 bytes; 0 ACME lines in traefik logs. No ACME ever fires. (Hardening risk filed — see findings.) - (d) Manual renewal documented: DECISIONS.md — operator re-issues at same paths, then
abra app secret rm … ssl_cert+ re-insert at bumped version; install.md "Renewed out-of-band; never ACME here." - Teardown:
abra app undeploy+volume remove→ post-teardown services/containers/volumes/ secrets for the probe all 0. Also independently confirmed the Builder'scchtml1test left 0 runtime resources (only its inert.envconfig file remains, harmless).
Verdict: M1 PASS. Not a hard fail on (c) — no token/credential exists and no ACME fires — but
the inert ACME resolvers + test-app default LETS_ENCRYPT_ENV=production are a latent hazard that
goes live when the harness deploys apps; filed as [adversary] for M4.
M2 — Drone online: PASS @2026-05-26T23:32Z
Verified cold from own clone. Acceptance: "push to cc-ci triggers a visible green Drone build."
- Drone server healthy:
https://drone.ci.commoninternet.net/healthz→ HTTP 200 via gateway. Exec runner (drone-runner-exec.service) active,polling the remote server capacity=2 type=exec. - Repo wired: in Drone's DB the
recipe-maintainers/cc-cirepo isrepo_active=1,repo_config=.drone.yml. Gitea↔Drone OAuth proven by the in-pipelineclonestep succeeding against the private repo (build can't clone without working OAuth/repo token). - Push→green, independently triggered: I pushed my own commit
91a8e8d(a REVIEW.md change) → Drone created build #4,build_event=push,build_trigger=@hook(Gitea webhook), and it ransuccess: stageself-testexit 0, stepsclone+helloboth exit 0. Builds #1–#3 (Builder commits) likewise allsuccessvia@hook. (My earlier M0/M1 review pushes predate the.drone.yml, so correctly produced no builds.) - Visible logs (D7 precondition):
logstable holds per-step log blobs for every build; Drone UI/API serve them. Full D7 UX is M8.
Verdict: M2 PASS. No new findings.
M3 — Comment bridge: PRE-CLAIM PROGRESS (not yet PASS) @2026-05-26T23:48Z
M3 is Blocked in STATUS (Gitea not delivering webhooks), so not a gate verdict yet. But the
bridge is deployed and I independently hammered its auth/filter logic — the part I can verify
regardless of the delivery leg (and which survives a pivot to API polling). Probes were live POSTs
to https://ci.commoninternet.net/hook via the SOCKS proxy, with HMAC signatures I computed from
the on-host secret (read with root; value never printed/committed):
| probe | expect | got |
|---|---|---|
no X-Gitea-Signature |
401 | 401 |
| bad signature | 401 | 401 |
valid sig, event=ping (not issue_comment) |
204 | 204 |
valid sig, !testmexyz on a real PR |
204 (no trigger) | 204 |
valid sig, !testme but issue is not a PR |
204 | 204 |
valid sig, !testme on PR, action=edited |
204 | 204 |
valid sig, !testme on real PR, non-collaborator |
403 | 403 |
So: HMAC fail-closed + timing-safe (compare_digest, verified before body parse), !testmexyz
correctly ignored (exact trimmed match), non-PR ignored, and a non-collaborator is rejected (403;
collaborator status re-checked via Gitea API, not trusted from the signed payload). Source review
of bridge/bridge.py found no auth bypass.
Blocker independently corroborated (operator-side): the bridge hook is registered + active on
recipe-maintainers/cc-ci (id 210, events [issue_comment] → ci.commoninternet.net/hook), and
the bot is not a Gitea site-admin (GET /admin/hooks → 403) nor org owner, so it genuinely cannot
inspect/change Gitea's [webhook] ALLOWED_HOST_LIST. Endorse STATUS ## Blocked: needs operator
allowlisting or the documented poll-the-API fallback.
Still UNVERIFIED for an M3 PASS: (1) the positive path — a valid collaborator !testme actually
starts a build + posts the PR comment end-to-end; (2) real Gitea→bridge delivery (or the polling
pivot). Will complete both when M3 is claimed.
Noted for M7 (not a finding yet): the Drone-managed Gitea webhook (id 209) carries its webhook
secret as a ?secret= query param in the hook URL (Drone default; admin-only in Gitea, not in cc-ci
git / CI logs / dashboard). Will adjudicate against D6 at M7.
M4 — Harness + install stage: VERIFICATION IN PROGRESS (no verdict yet) @2026-05-27T00:35Z
M4 is CLAIMED. Code review done; runtime checks so far:
- A1 CLOSED (see BACKLOG): harness forces
LETS_ENCRYPT_ENV=""every deploy; live appcust-c95a69served the wildcard cert, 0 ACME lines, no certresolver. - Happy-path teardown works: a prior run's app
cust-e084bdwas fully torn down (gone) — not an orphan; earlier ambiguity was a run cycling apps. - Two teardown-robustness defects filed (A2, A3): janitor's
-prfilter is dead code under thecust-<hex>naming (no crash-orphan reaping); teardown is best-effort/unverified and deletes the.enveven on failed undeploy (silent orphan, run still green). - Deferred to next idle tick (a Builder harness run is active now; sequential-only): my own cold install run (green install + Playwright + clean teardown verification) and the §6 kill-mid-run probe to test A3 empirically. Verdict (PASS/FAIL) follows that.
M4 — Harness + install stage: PASS @2026-05-27T01:05Z
Verified by my own cold harness run (RECIPE=custom-html REF=advcold… cc-ci-run runner/run_recipe_ci.py, app cust-cfeb6a, isolated from a Builder run that happened to run
concurrently as cust-3c1970 — no collision, distinct domains/volumes/secrets):
- Install stage green:
test_install.py→ 2 passed (27s):test_http_reachable(HTTPS 200 via gateway) +test_playwright_page(real Chromium loads the live app, status 200, served HTML). - Guaranteed teardown: after the run,
cust-cfeb6aleft 0 services / volumes / secrets / containers /.env— fully clean. Infra (traefik/drone/bridge/backups) untouched. - A1 closed (no-ACME enforced). Open robustness findings A2 (dead
-prjanitor) + A3 (unverified best-effort teardown) concern the crash path (finalizer-skipped), not this happy-path run; they don't block M4's literal acceptance but must be resolved before DONE (D2 teardown guarantee). Kill-mid-run probe to substantiate A2/A3 deferred until the host is idle.
Verdict: M4 PASS.
M5 — Upgrade + backup/restore stages: PASS @2026-05-27T01:05Z
Same cold run, stages 2 and 3 — both genuine end-to-end (no mocks; assertions reviewed in source and not softened):
- Upgrade green:
test_upgrade.py→ 1 passed (41s). Deploys the previous published version (previous_version=recipe_versions[-2]), writes a marker into the volume-backed html dir, upgrades to latest (abra upgrade), then asserts HTTP 200 and the marker survives — a real version change with data persistence across the volume (cust-…_content), not a no-op. - Backup/restore green:
test_backup.py→ 1 passed (37s). Writesoriginal,abra backup, mutates tomutated(asserted),abra restore, then asserts the served content is back tooriginal("restore did not return the pre-mutation state"). Real backup→mutate→restore cycle via backup-bot-two. - Teardown clean (same
cust-cfeb6a0-remnant check above covers all three stages — same domain reused per stage).
Verdict: M5 PASS.
M6 — Recipe-local tests + second recipe: VERIFICATION IN PROGRESS (no verdict yet) @2026-05-27T01:48Z
M6 CLAIMED. Host has been continuously busy (Builder M6.5 ramp), so deploy-based checks are deferred to an idle window; static + evidence review so far:
- custom-html 3-stage: already verified cold by me (see M5 PASS) — green + clean teardown.
- D4 recipe-local discovery — code genuine:
run_recipe_ci.snapshot_recipe_testscopies the recipe-shippedtests/before abra re-checkouts to a version tag, thenrun_recipe_localdeploys the app and runs those tests against the LIVE app viaCCCI_BASE_URL/CCCI_APP_DOMAIN, merged as a separate stage with guaranteed teardown. Demo branchrecipe-maintainers/custom-html@ ci/d4-recipe-localconfirmed to shiptests/test_recipe_local.py(Gitea API). Will run it cold to confirm the stage executes+passes. - keycloak (#2) install — test genuine:
/realms/master200 health + real Playwright admin console login (waits for the username field).recipe_meta.py(HEALTH_PATH/timeouts) confirms D5 "no harness surgery". Empirical keycloak reproduction deferred (heavy deploy; idle window). - Filed [adversary] A4 (concurrency): same-recipe concurrent runs share
~/.abra/recipes/<recipe>with no isolation/lock/concurrency-cap — a collision vector for the §6 concurrency check; to confirm empirically.
Pending for idle host: cold D4 run, keycloak reproduce, A2/A3 kill-probe re-test, A4 concurrency test.
D6/M7 — preliminary leak scan of published Drone logs (PASS so far; M7 not yet claimed) @2026-05-27T02:05Z
Host-safe probe while the host was busy. Pulled Drone's database.sqlite, dumped all 42 logs
rows (~25.5k chars of published per-step build output), scanned:
- Known infra secrets — 0 leaks: webhook HMAC (64), drone token (32), gitea token (40) each
appear 0× in the logs (exact
grep -F). - No value patterns: 0 matches for
password|secret|token = <value>. - The only long hex/base64 hits are git commit SHAs in
git clone/mergeoutput — benign. Caveat: current Drone logs are hello-world + self-test; the full M7/D6 test must also cover app-generated secrets (e.g. keycloak DB passwords) in recipe-run logs AND the dashboard (M8). This is a clean baseline, not the final D6 verdict. (DB copy was scanned off-box and deleted; no secret value printed or committed.)
M3 — Comment bridge: PASS @2026-05-27T03:13Z
Verified cold against the NEW design (orchestrator change: polling-PRIMARY + org-membership auth;
webhook now optional). Re-reviewed bridge/bridge.py (256 lines) — sound — then live-probed the
running bridge + Drone:
!testmetriggers a run ≤60s: I posted!testme(comment 13708) on PR #1 at epoch 1779847690 → bridge[poll] triggered build 35→ Drone build 35 created at 1779847702 = 12s latency. (Build isfailureonly becauseRECIPE=cc-cihas notests/cc-ci/; the trigger + event=custom recipe-CI pipeline fired correctly — integration is live.)- Re-commenting re-runs: my new comment 13708 → build 35, distinct from the earlier
comment 13705 → build 26. Distinct comment ids each fire once (dedup via
_claim). - Other comments do NOT trigger: I posted
!testmexyz→ no build created, no bridge trigger log. Exact trimmed match enforced. - Auth enforced (org-membership, fail-closed):
GET /orgs/recipe-maintainers/members/<u>— autonomic-bot & notplants → 204 (allowed),definitely-not-a-member-zzz9→ 404 (rejected).is_authorizedreturns True only on 204/allowlist; anything else (incl. errors) → False. - Link back: bridge posted run-link comment 13706 ("cc-ci: started CI run … → drone…/recip…").
- Concurrency cap live: runner
capacity=1(DRONE_RUNNER_CAPACITY=1) + pipelineconcurrency:limit:1— recipe-CI builds serialize.
Verdict: M3 PASS. (Polling is outbound read+comment only — no repo-admin; webhook optional.) Note: full bridge→3-stage-recipe-CI E2E on a real recipe PR is the Builder's in-flight integration item / D10 — build 35 shows the pipeline wiring works; green-on-a-real-recipe is M10.
D6 — leak scan extended to recipe-CI build logs (still clean) @2026-05-27T04:05Z
Followup to the earlier hello-world scan: scanned the logs of all 7 event=custom recipe-CI builds
(~26.7k chars — these ran real abra app deploy + abra app secret generate, so generated app
secrets could surface here). Result: 0 password|secret = <value> patterns, 0 "secret
generated/inserted" value lines (abra doesn't echo secret values), and every long hex/base64 hit is
benign — Nix store paths, git SHAs, Drone workspace dir names (<rand16>/drone/src), pytest
tracebacks. No app-secret leak in published recipe-run logs. (Full M7/D6 verdict still pending the
dashboard (M8) leak check + final M7 claim.)
M6 — Recipe-local tests + second recipe: PASS @2026-05-27T04:43Z
Acceptance: "both recipes green (custom-html 3-stage; keycloak install) + recipe-local merged", plus D4/D5. Verified by a mix of my own cold runs + deep Drone-log corroboration (keycloak's 31-min deploy made a self-rerun impractical on the contended host, so I read the actual build #39 logs, not a Builder summary):
- custom-html 3-stage: my own cold run (see M5 PASS) — install/upgrade/backup green, 0 orphans.
- keycloak (#2) full 3-stage — build #39 (event=custom, RECIPE=keycloak, success): actual log
lines show
PASSED test_realm_endpoint_healthy,PASSED test_playwright_admin_login(install, 510s),PASSED test_upgrade_preserves_realm(upgrade, 610s — DB realm survived),PASSED test_backup_mutate_restore(backup, 495s — realm restored). Three separate reported stages (D2). Tests are genuine (admin REST + real Playwright admin-console login; reviewed source — not mocked). Post-run: 0 keycloak services/volumes (clean teardown). - D4 recipe-local — verified by my OWN run:
RECIPE=custom-html SRC=…/custom-html REF=ci/d4-recipe-local→ recipe-shippedtests/test_recipe_local.pysnapshotted to a temp dir (immune to abra's version re-checkout), deployed the app, rantest_recipe_local_serves_content PASSEDagainst the LIVE app viaCCCI_BASE_URL, merged as arecipe-localstage; clean teardown (0cust-leftovers). - D5 (no harness surgery): keycloak enrolled via
tests/keycloak/+recipe_meta.pyonly; no changes to sharedrunner/harnesscode. enroll-recipe.md documents the flow.
Verdict: M6 PASS. (keycloak full 3-stage also satisfies the first M6.5 breadth slot.)