status(2): Docker Hub rate-limit RESOLVED — declarative sops auth + swarm pulls authenticate (3 conditions); DECISIONS recorded

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-28 22:13:25 +01:00
parent 5e14963d51
commit 7a337f5d69
2 changed files with 66 additions and 20 deletions

View File

@ -521,3 +521,39 @@ readiness wait still gates real liveness. Safe for all currently-green recipes (
all N/N with N>0; the `0/0` case did not previously occur). Buckets/migrations that the one-shot
performs are run on-demand in the recipe's `setup_custom_tests.sh` (post-deploy), not relied upon for
generic-install convergence (the SPA at `/` serves 200 without them).
## 2026-05-28 — Docker Hub auth: declarative config.json via sops (rate-limit fix) — SETTLED
**Context.** Heavy Phase-2 recipe deploys exhausted Docker Hub's anonymous pull rate limit
(100/6h per shared IP `68.14.43.142`) → `toomanyrequests` blocked all new deploys. Operator
provided a read-only Docker Hub PAT (Class A1 registry creds, plan §1.5): `DOCKERHUB_USERNAME=nptest2`
+ `DOCKERHUB_TOKEN` in `/srv/cc-ci/.testenv`. Authenticated pulls = 200/6h **per-account**.
**Decision.** Wire it declaratively (survives a 1c NixOS rebuild), not just an imperative login:
- **Secret:** `secrets/secrets.yaml` (cc-ci-secrets submodule, commit `cdd5e0a`) gains key
`dockerhub_auth` = `base64("nptest2:<PAT>")` — i.e. the exact `auth` field docker config.json
wants, so the nix template is a pure render (no runtime base64). sops-encrypted to host+master age
recipients (edited on cc-ci using its ssh-host-key→age identity via `nix shell nixpkgs#sops`;
plaintext shredded; PAT never committed plaintext nor exposed in process args/logs).
- **Render:** `nix/modules/secrets.nix` adds `sops.secrets.dockerhub_auth` + a
`sops.templates."docker-config.json"` that renders `/root/.docker/config.json` (0600, root) at
activation. It becomes a symlink to `/run/secrets/rendered/docker-config.json`.
- **Why /root:** the drone exec runner runs pipelines as `User=root` (drone-runner.nix), and manual
deploys ssh in as root — so `/root/.docker/config.json` covers both the `!testme` CI path and
manual ops. Single config, single user.
**Swarm-propagation question — RESOLVED empirically (no `--with-registry-auth` / pre-pull needed).**
The operator/Adversary flagged that a node `docker login` may NOT propagate to swarm SERVICE-task
pulls. Tested on cc-ci with the authenticated config.json in place:
- Account ratelimit baseline 197/200 (source = account hash `b662dd8b-…`, not the IP).
- Deployed **uncached** `n8nio/n8n:2.20.6` via abra (`RECIPE=n8n STAGES=install`). The swarm service
task pulled it to `1/1 Running` with **no `toomanyrequests`**.
- Account counter dropped 197 → 196 (manager manifest resolution) → **195** (agent layer-manifest
pull), source still the account hash. So abra's `docker stack deploy` propagates the cred to the
swarm task pull on this single-node swarm — billed to the account, not the anon IP.
- Corroborating: the earlier lasuite-drive deploy resolved **12** images with no `toomanyrequests`
while anon budget was ≤4 — impossible anonymously → manager resolution is authenticated too.
So: declarative root `config.json` is sufficient end-to-end here; `--with-registry-auth` is not
required (abra/SDK attaches it). **Caveat (Phase 2b):** 200/6h may still be tight for a full ~18-recipe
sweep; the permanent structural fix is a registry pull-through cache authenticated with this same PAT.