status(2): Docker Hub rate-limit RESOLVED — declarative sops auth + swarm pulls authenticate (3 conditions); DECISIONS recorded
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -521,3 +521,39 @@ readiness wait still gates real liveness. Safe for all currently-green recipes (
|
||||
all N/N with N>0; the `0/0` case did not previously occur). Buckets/migrations that the one-shot
|
||||
performs are run on-demand in the recipe's `setup_custom_tests.sh` (post-deploy), not relied upon for
|
||||
generic-install convergence (the SPA at `/` serves 200 without them).
|
||||
|
||||
## 2026-05-28 — Docker Hub auth: declarative config.json via sops (rate-limit fix) — SETTLED
|
||||
|
||||
**Context.** Heavy Phase-2 recipe deploys exhausted Docker Hub's anonymous pull rate limit
|
||||
(100/6h per shared IP `68.14.43.142`) → `toomanyrequests` blocked all new deploys. Operator
|
||||
provided a read-only Docker Hub PAT (Class A1 registry creds, plan §1.5): `DOCKERHUB_USERNAME=nptest2`
|
||||
+ `DOCKERHUB_TOKEN` in `/srv/cc-ci/.testenv`. Authenticated pulls = 200/6h **per-account**.
|
||||
|
||||
**Decision.** Wire it declaratively (survives a 1c NixOS rebuild), not just an imperative login:
|
||||
- **Secret:** `secrets/secrets.yaml` (cc-ci-secrets submodule, commit `cdd5e0a`) gains key
|
||||
`dockerhub_auth` = `base64("nptest2:<PAT>")` — i.e. the exact `auth` field docker config.json
|
||||
wants, so the nix template is a pure render (no runtime base64). sops-encrypted to host+master age
|
||||
recipients (edited on cc-ci using its ssh-host-key→age identity via `nix shell nixpkgs#sops`;
|
||||
plaintext shredded; PAT never committed plaintext nor exposed in process args/logs).
|
||||
- **Render:** `nix/modules/secrets.nix` adds `sops.secrets.dockerhub_auth` + a
|
||||
`sops.templates."docker-config.json"` that renders `/root/.docker/config.json` (0600, root) at
|
||||
activation. It becomes a symlink to `/run/secrets/rendered/docker-config.json`.
|
||||
- **Why /root:** the drone exec runner runs pipelines as `User=root` (drone-runner.nix), and manual
|
||||
deploys ssh in as root — so `/root/.docker/config.json` covers both the `!testme` CI path and
|
||||
manual ops. Single config, single user.
|
||||
|
||||
**Swarm-propagation question — RESOLVED empirically (no `--with-registry-auth` / pre-pull needed).**
|
||||
The operator/Adversary flagged that a node `docker login` may NOT propagate to swarm SERVICE-task
|
||||
pulls. Tested on cc-ci with the authenticated config.json in place:
|
||||
- Account ratelimit baseline 197/200 (source = account hash `b662dd8b-…`, not the IP).
|
||||
- Deployed **uncached** `n8nio/n8n:2.20.6` via abra (`RECIPE=n8n STAGES=install`). The swarm service
|
||||
task pulled it to `1/1 Running` with **no `toomanyrequests`**.
|
||||
- Account counter dropped 197 → 196 (manager manifest resolution) → **195** (agent layer-manifest
|
||||
pull), source still the account hash. So abra's `docker stack deploy` propagates the cred to the
|
||||
swarm task pull on this single-node swarm — billed to the account, not the anon IP.
|
||||
- Corroborating: the earlier lasuite-drive deploy resolved **12** images with no `toomanyrequests`
|
||||
while anon budget was ≤4 — impossible anonymously → manager resolution is authenticated too.
|
||||
|
||||
So: declarative root `config.json` is sufficient end-to-end here; `--with-registry-auth` is not
|
||||
required (abra/SDK attaches it). **Caveat (Phase 2b):** 200/6h may still be tight for a full ~18-recipe
|
||||
sweep; the permanent structural fix is a registry pull-through cache authenticated with this same PAT.
|
||||
|
||||
Reference in New Issue
Block a user