The TCP READY_PROBE proves 64738 is listening, but the murmur control channel needs more warmup to
complete a full TLS+ServerSync handshake; under concurrent sweep load that exceeded the 60s budget
(green in isolation, red under load). Longer budget absorbs the delay; assertions unchanged (a dead
server still fails after all retries).
canonical_domain() routes any recipe in warm.WARM_DOMAINS (keycloak) to a distinct warm-canon-<recipe>
domain so the data-warm canonical promote can never collide with the live-warm OIDC provider at
warm-keycloak. keycloak WARM_CANONICAL=True (full canonical coverage without risking live SSO).
history_for() now enumerates run dirs' results.json, groups by recipe, sorts
newest-first by finished timestamp (mixed numeric+named ids — timestamp is the
only correct key), caps at HISTORY_CAP=30, skips malformed/empty/no-recipe dirs.
Overview + badges + /runs + security guards + stdlib-only unchanged.
Local verify: 13/13 unit tests; full-fixture vs 308 real results.json →
bluesky-pds=8 in exact ts order, plausible capped 30 newest, edge dirs skipped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
plausible's canonical is established at 3.1.0+v2.0.0 (latest), so the dynamic resolver no longer
needs the explicit pin: a same-version head steps back to newest-older = 3.0.1+v2.0.0 (NOT the
broken 3.0.0). Verifying live before stripping the key globally (§2.G gate).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
keycloak is the always-on shared OIDC dep provider at warm-keycloak.ci..., the SAME stable domain a
data-warm canonical would use → the sweep's promote would collide with the live provider that
lasuite-*/drone depend on. keycloak is kept current by roll_warm_infra (WC1.1) instead.
WARM_CANONICAL=False; exception recorded in DECISIONS. Enrolled set now 20.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
WARM_CANONICAL=True added to every recipe in cc-ci-plan/used-recipes.md (20 weekly +
uptime-kuma external). enrolled_recipes() now returns all 21. Test fixtures
(custom-html-*-bad, concurrency, regression) intentionally left unenrolled.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- should_promote_canonical gains a `tagged` requirement (canon §2.A): a green cold
latest run promotes only when the tested head version is a published release tag;
an untagged main commit never becomes a canonical.
- warm_reconcile.is_released_version(recipe, version): release-tag membership (exact or
by version_key). Caller computes `tagged` so the gate stays pure.
- unit tests: untagged -> no promote; is_released_version cases.
- drive-by (pre-existing reds, unrelated to canon, now green): test_warm_reconcile
traefik assertion was stale vs the phase-pxgate spec (probes /api/version, no
health_domain); meta.py UPGRADE_BASE_VERSION KEYS help synced to the prevb doc text.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
resolve_upgrade_base now reads the head's published version (abra.head_compose_version,
the coop-cloud.<stack>.version label) and, when the last-green warm-canonical version
equals it, steps back to the newest published version strictly older than head instead
of deploying a same-version no-op. warm_reconcile gains version_key + newest_older_version
(single coop-cloud ordering source; sort_versions refactored onto version_key, no behavior
change). Skip only when no older published predecessor exists. Step-back returns kind=version
so it inherits F1d-2 pinned-tag checkout. Extends tests/unit/test_upgrade_base.py (13 pass).
The custom tier runs on the PR head — now genuinely the official discourse/discourse image (prevb
stopped the overlay reverting it to bitnamilegacy). mint_admin hardcoded /opt/bitnami/discourse (404 on
official) → create-topic roundtrip failed. Detect /var/www/discourse, re-export DISCOURSE_DB_PASSWORD
from /run/secrets (entrypoint exports it only for boot), run bin/rails; keep bitnami fallback.
docker stack deploy doesn't prune services the head compose dropped (discourse PR#4 drops sidekiq),
leaving them orphaned on the base image. perform_upgrade now reconciles the live stack to the head
compose service set (lifecycle.prune_orphan_services). Makes the deployed stack faithfully reflect
the head — no test weakened. No-op when service sets match / compose unresolvable.
abra does NOT write STACK_NAME to the app's .env file — it derives it at runtime
by replacing dots with underscores (e.g. gite-e1cb78.ci.commoninternet.net →
gite-e1cb78_ci_commoninternet_net). Build #691 failed with 'STACK_NAME not found'
because the env file read was looking for a key that doesn't exist.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blocker 4 fix: abra `secret generate --all` uses .env.sample for length hints; the
lfs-plain-gitea PR has SECRET_LFS_JWT_SECRET_VERSION=v1 COMMENTED OUT, so abra produces
a wrong-length secret. gitea requires exactly 43 chars (32 bytes base64 URL-safe); wrong
length → gitea fatals trying to save the JWT secret to the read-only Docker Config
app.ini → health check fails → swarm rolls back.
Fix: new UPGRADE_SECRET_PREP hook (meta.py) called before `abra secret generate --all`
in the upgrade path. abra's `--all` is idempotent (skips existing secrets), so the
correctly pre-inserted secret survives. gitea's recipe_meta.py implements the hook using
`docker secret create` directly to guarantee correct format regardless of .env.sample.
Also consumes machine-docs/BUILDER-INBOX.md (Adversary Blocker 4 digest).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Blocker 1 (LFS roundtrip fails on PR #1):
- Add UPGRADE_EXTRA_ENV to gitea recipe_meta.py — after PR-head checkout
(compose.lfs.yml now in ABRA_DIR), add compose.lfs.yml to COMPOSE_FILE
and set SECRET_LFS_JWT_SECRET_VERSION=v1 so the upgrade chaos redeploy
actually runs with LFS enabled. Without this, the base install checks out
the 3.5.x tag (compose.lfs.yml removed), EXTRA_ENV sees no LFS, and the
upgrade chaos redeploy inherits the no-LFS .env — so the LFS test runs
(compose.lfs.yml is restored by recipe_checkout_ref) but LFS is off.
- Add abra.secret_generate(domain) in generic.perform_upgrade when
upgrade_env is non-empty — generates lfs_jwt_secret before chaos redeploy.
Blocker 2 (REF=main upgrade fails HC1):
- Always use recipe_head_commit (git rev-parse HEAD) for head_ref instead
of using ref directly. When ref="main" (a branch name), the HC1 commit
check "head_ref.startswith(chaos_commit)" always fails since "main" ≠ SHA.
recipe_head_commit returns the actual SHA after the fetch/checkout.
Side-fix (stale creds — build #675):
- ops.py pre_install: delete the per-domain creds file before calling
_ensure_admin. A fresh install wipes gitea's DB; any creds file from a
prior run on the same domain is stale and causes 401s in all API calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gitea 1.22+ (including 1.24.2 on cc-ci) requires explicit scopes
when creating API tokens. Add read:user + read:organization to satisfy
the token creation endpoint and the read-back assertions that follow.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty-repo HTTPS push with git clone exits 0 but silently fails (remote
branch creation on an empty clone is unreliable). Fix:
- Create repo with auto_init=True + default_branch=main (initial commit present)
- Clone into a non-existing subdir (git clone must target non-existing path)
- Push via explicit cred_url (bypasses remote config; no tracking needed)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_git_push.py + test_lfs_roundtrip.py: use cred_url (https://user:pass@host/...)
instead of GIT_CONFIG_COUNT insteadOf rewriting, which silently failed to
propagate credentials to the push step (repo remained empty after push exit 0).
Also add GIT_SSL_NO_VERIFY=true and GIT_TERMINAL_PROMPT=0.
- test_lfs_roundtrip.py: fix restart health-poll path /api/v1/version → /version
(_api() already prepends /api/v1; double prefix produced 404 and a 120s timeout).
- nix/hosts/cc-ci/configuration.nix: add git-lfs to systemPackages (required for
the LFS capstone test on the lfs-plain-gitea PR branch).
Adversary pre-M1 findings: Issue 1 (git-lfs absent) + Issue 2 (double path) both fixed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_csrf is a hidden field; wait_for_selector defaults to state=visible
and times out. Switch to the visible username input which proves the
login form rendered.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When _enrich_deps_with_sso raises after deploy_deps succeeds (e.g., gitea API
call fails), deps_state stays {} and the finally block's `if deps_state:` guard
skips teardown, orphaning the dep at its deterministic domain.
Fix: add an `else` branch after the `if deps_state:` block that reads
$CCCI_DEPS_FILE (the legacy-list written by deploy_deps) and calls
teardown_deps on the cold entries so no dep is left running.
Unit tests: test_load_run_state_provides_fallback_for_enrichment_failure and
test_fallback_skips_warm_entries verify the data-flow that the fallback relies on.
19/19 unit tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_scm_configured.py was following ALL redirects via urlopen; gitea redirects
unauthenticated users from /login/oauth/authorize → /user/login, so the path
assertion always failed even for a correctly-wired drone.
Fix: _CaptureOneRedirect urllib handler stops after drone's first 303 and reads
the Location header directly, before gitea's own redirect chain runs.
- Consume BUILDER-INBOX.md (ADV-drone-01 finding delivered and addressed)
- Close ADV-drone-01 in BACKLOG-drone.md
- Update test_gitea_dep.py terminology: "location_url" not "final_url"
- All 10 unit tests pass
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
level.py: RUNGS += lint; statuses {pass,fail,skip,unver}; compute_level = max passed
rung with all below pass-or-skip (fail/unver block); cap_reason/capped DELETED.
harness/lint.py: lint executor — pristine scratch clone of the per-run tree at the
exact tested ref (mirror-origin + untracked-overlay pollution solved by context, no
rule filtered), PTY via script -qec, 60s hard budget, lint.txt artifact, table-parse
classifier (rc only signals FATA), unver on any non-run (never silent pass).
results.py: derive_rungs classifies every N/A source (structural/declared → skip,
else unver), lint rung + synthetic lint stage + lint block in results.json, schema 2,
cap fields removed. run_recipe_ci.py: lint call before tiers (double-wrapped,
verdict-neutral), badge = level only. card/dashboard: 0-5 ramp, cap line → 'level N
of {4|5}', unverified rows, badge number+colour only, lint.txt servable, old schema-1
artifacts render untouched. Unit suite rewritten: 245 passed on cc-ci venv.
The P2b port of setup_custom_tests.sh -> ops.py::pre_install made the 90s bucket-poll timeout a
fatal AssertionError; the original shell hook fell through on timeout BY DESIGN (best-effort) and
the custom-tier MinIO storage test is the real gate for a genuinely missing bucket. Live evidence:
in both M2 sweep failures the bucket landed just after the window and every later tier including
the custom MinIO test passed. Warn loudly + continue, exactly the old semantics.
Adversary-approved fix-forward (REVIEW-rcust 57c66ad, scoped to this raise).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adversary heads-up (inbox 2026-06-10T19:06Z): meta values are repo-public by construction, but
the manifest lands on the dashboard — a field literally named SECRET_KEY_BASE showing a value
(plausible's committed CI dummy) is needless secret-scan noise. Mask values whose key NAME is
secret-shaped (SECRET|PASSWORD|TOKEN|CREDENTIAL|word-segment KEY), top-level and nested dict
keys; the key name stays visible. Unit test pins redacted vs passthrough (KEYCLOAK_URL).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>