- set_env: ensure trailing newline before append (keycloak .env.sample ends
with a newline-less #COMPOSE_FILE comment, so a bare append glued DOMAIN onto
it -> DOMAIN unset -> KC_HOSTNAME=https:// -> crash-loop). Same bite fixed in
backupbot.nix.
- converge skips the (forced) redeploy when keycloak already serves 200, so an
activation/boot is a true no-op (no JVM-restart blip) and only redeploys when
down/crash-looping. Health-wait extended to 15min.
Verified on cc-ci: nixos-rebuild switch -> warm-keycloak.service active,
'no-op converge', system running (0 failed), /realms/master=200.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nix/modules/warm-keycloak.nix: idempotent systemd oneshot (like deploy-proxy)
that converges a live-warm shared keycloak at warm-keycloak.ci.commoninternet.net
pinned to 10.7.1+26.6.2, secrets generated only-if-missing (never
rotate a live provider), waits /realms/master=200. Re-warmable from scratch
(D8/WC8). Wired into hosts/cc-ci/configuration.nix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sso.py: list_realms, delete_keycloak_realm (idempotent, refuses master),
realms_to_reap (pure, concurrency-safe predicate), reap_orphaned_realms.
The per-run realm is the isolation unit on a shared live-warm keycloak;
orphans (crashed runs) reaped by hex not mapping to a live app stack.
+8 unit tests (tests/unit/test_warm_realm.py); 43 unit pass on cc-ci.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In-flight Q3.2 iteration (NOT yet live-verified — needs a lasuite-drive deploy
once the warm keycloak from Phase 2w is available). Phase 2 paused here per
operator interjection of Phase 2w; state preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a DEPS-declaring recipe's setup_custom_tests fails, its @requires_deps (SSO/OIDC)
tests skip; a skip-only pytest file exits 0 so the run previously reported overall=0
(GREEN) while the only SSO test never ran (violates P7). Fix preserves generic-tier
failure-isolation but corrects the green SIGNAL:
- conftest.pytest_collection_modifyitems counts skipped requires_deps tests and appends
to $CCCI_DEPS_SKIP_REPORT.
- run_recipe_ci: sums the count, surfaces it in RUN SUMMARY, and new pure predicate
sso_dep_unverified(declared, deps_ready, skipped) flips overall=1.
- 7 new unit tests (tests/unit/test_f211_sso_skip.py).
Verified deploy-free (rate-limit-independent): 35/35 unit PASS; cold real-test proof on
lasuite-docs test_oidc_with_keycloak.py -> 1 skipped + skip-report==1 -> orchestrator
would set overall=1. Full e2e deferred until Docker Hub rate limit lifts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per orchestrator's SSO-dep plan + the refactor in 41ede13, DEFERRED.md entry #5 (lasuite-docs
OIDC parity ports + create-a-doc) closes by execution.
- tests/lasuite-docs/functional/test_oidc_login.py: parity port of recipe-maintainer
oidc_login.py. Anonymous GET /api/v1.0/users/me/ → 302 to keycloak realm OR 401/403;
password-grant token → 200 with user.email matching the provisioned test user.
- tests/lasuite-docs/functional/test_create_doc.py: plan §4.3 prescribed create-an-object +
read-it-back. POST /api/v1.0/documents/ with OIDC Bearer → captured id; GET
/api/v1.0/documents/<id>/ → asserts id+title round-trip.
Both marked \@pytest.mark.requires_deps; skipped with 'deps-not-ready' if setup_custom_tests
fails (failure isolation per plan-sso-dep-testing.md §4).
Cold-verifiable: ssh cc-ci 'RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py'
install: 2 PASS; custom: 5 PASS incl. test_oidc_login_via_keycloak +
test_create_doc_and_read_back; deploy-count=2 (recipe + keycloak dep).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tests/plausible/recipe_meta.py + tests/plausible/functional/test_health_check.py drafted with
EXTRA_ENV setting required Phoenix vars (DISABLE_AUTH, DISABLE_REGISTRATION, SECRET_KEY_BASE).
Stack converges 1/1 but the served app returns HTTP 500 from / for the full 600s HTTP_TIMEOUT
window — config-class failure, not a deploy-timing issue. Diagnosing needs live container-log
inspection + iterative env tuning, more debug cycles than fit autonomous mode. Committing the
draft + a DEFERRED.md entry; operator can iterate when they want.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harness change (small, surgical):
- runner/harness/lifecycle.deploy_app gains a deploy_timeout param (default 900s); passes
through to abra.deploy(timeout=...). For heavy recipes (ghost, matrix-synapse, lasuite-meet),
the orchestrator + dep resolver now read recipe_meta.DEPLOY_TIMEOUT and pass it so the Python
subprocess wrapping abra deploy doesn't SIGKILL it before the recipe's INTERNAL TIMEOUT
(via EXTRA_ENV) finishes swarm convergence.
- runner/run_recipe_ci.py + runner/harness/deps.py: thread recipe_meta.DEPLOY_TIMEOUT into
the per-recipe deploy_app call.
Q4.4 ghost enrollment:
- recipe_meta.py: HEALTH_PATH=/, DEPLOY_TIMEOUT=1200 (subprocess), EXTRA_ENV={TIMEOUT: 1200}
(recipe internal). Ghost cold-start with theme + DB migration runs ~12-15min on cc-ci.
- functional/test_health_check.py: GET / returns 200 (themed site).
- functional/test_content_api.py: GET /ghost/api/content/settings/ returns 200 (settings JSON)
or 401/403 (Ghost error envelope) — distinguishes ghost-server up + JSON API working from
static fallback.
- functional/test_admin_redirect.py: GET /ghost/ returns 200 or 302 + Ghost branding;
proves admin route is wired through nginx proxy.
- PARITY.md: recipe-maintainer corpus has no ghost tests/, Phase-2 health_check is the
parity baseline; create-a-post deeper test deferred (DEFERRED.md, --extra-tests linked).
Cold-verifiable (log /root/ccci-q44-ghost-r3.log):
RECIPE=ghost STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
install + 3 functional tests PASS, deploy-count=1. 28/28 unit tests still PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per orchestrator note: my prior append (commit 650ab47) accidentally landed under the
'## Closed deferrals' header instead of '## Open deferrals'. All 5 entries (lasuite-docs OIDC
parity, cryptpad create-a-pad, uptime-kuma create-a-monitor, ghost create-a-post, authentik
enrollment) are still OPEN (unchecked boxes) — section relocation only, no content change.
'## Closed deferrals' restored to its (none yet) placeholder.
Per orchestrator note: machine-docs/DEFERRED.md is now the single canonical registry for any
deliberately-deferred work. Every entry MUST carry a specific RE-ENTRY TRIGGER. The orchestrator
seeded 4 matrix-synapse entries; this commit migrates the other Phase-2 deferrals I'd buried
in JOURNAL/PARITY/DECISIONS:
- lasuite-docs OIDC parity ports + create-a-doc (re-entry: before any Q3 gate claim — Adversary
already flagged this in Q3/Q4 checkpoint).
- cryptpad create-a-pad + content round-trip Playwright (re-entry: Adversary F2-9 conditional —
MUST lift before Phase-2 DONE; Q5.2 cold-sample must include).
- uptime-kuma create-a-monitor via Socket.IO (re-entry: --extra-tests flag OR another recipe
needing Socket.IO).
- ghost create-a-post round-trip (re-entry: --extra-tests flag OR Q4 deeper-test pass before
Phase-2 DONE).
- Q2.2 authentik enrollment + setup_authentik_realm backend (re-entry: when cryptpad oidc_login
parity lifts — uses authentik — OR Phase-2 DONE review).
All linked to IDEAS.md --extra-tests flag where relevant. Phase-4 cleanup pass MUST review this
file per plan.md §6.1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per REVIEW-2 ## Q3/Q4 partial checkpoint, F2-8: 'goat CLI in container / account state cleanup'
was the §7.1-prohibited 'needs X' excuse class (same shape as F2-4). The recipe-maintainer
corpus literally calls the goat CLI via abra app run — it works fine.
Added tests/bluesky-pds/functional/test_account_and_post.py:
- goat pds describe → assert did:web:<live_app> in output (PDS self-identifies correctly).
- goat pds admin account create with UUID-suffixed handle + email + per-run password (class-B);
parse new account's did:plc:<id>.
- POST /xrpc/com.atproto.server.createSession with the new handle+password → accessJwt.
- POST /xrpc/com.atproto.repo.createRecord (collection=app.bsky.feed.post) with a UUID-marker
text → returns at://<did>/app.bsky.feed.post/<rkey>.
- GET /xrpc/com.atproto.repo.getRecord with that rkey → assert value.text == marker (round-trip).
- Best-effort goat account delete cleanup in finally.
This is the §4.3 prescribed test in full (create account + create post + fetch back + delete).
Cold-verifiable: ssh cc-ci 'RECIPE=bluesky-pds STAGES=install,custom cc-ci-run runner/run_recipe_ci.py'
install + 4 functional tests (health_check + describe_server + session_auth + account_and_post)
all PASS, deploy-count=1.
PARITY.md updated to show goat_account.py as ported.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-only commit. The Phase-2 functional tests + PARITY.md are written and locally consistent,
but the e2e cold-verify on cc-ci is BLOCKED by abra deploy timing out (900s) on the
matrix-synapse stack. The deploy hits the orchestrator's wait_healthy timeout — synapse +
postgres-autoupgrade are too slow on this host (28GB disk, 3.5GB RAM, single node).
Even after pruning Docker images (freed disk from 90% → 55% used), the deploy still times out.
Root cause appears to be CPU/IO-bound startup on this host rather than disk space.
What's landed (code-only):
- tests/matrix-synapse/PARITY.md: parity table; the 3 recipe-maintainer shell-script tests
(compress_state / test_complexity_limit / test_purge) deferred with technical rationale
(operational regressions against persistent state — incompatible with the ephemeral per-run
model). Phase-2 health_check added (the corpus has no health_check.py).
- tests/matrix-synapse/functional/test_health_check.py: GET /_matrix/client/versions → 200 + JSON.
- tests/matrix-synapse/functional/test_federation_version.py: GET /_matrix/federation/v1/version
→ 200, asserts server.name='Synapse' + non-empty server.version (plan §4.3 prescribed).
- tests/matrix-synapse/functional/test_register_and_message.py: plan §4.3 prescribed test —
registers two users via the public client API (m.login.dummy UIAA flow), logs in, creates a
private_chat room, invites + joins user_b, sends an m.room.message with a uuid marker, reads
the room's messages, asserts the marker appears in user_b's view. Non-vacuous full client-API
roundtrip.
- tests/matrix-synapse/recipe_meta.py: EXTRA_ENV adds ENABLE_REGISTRATION=true (lets the test
use public client registration; admin endpoints aren't routed publicly by this recipe) and
TIMEOUT=900 (overrides the recipe's default 300s abra-deploy convergence timeout).
**Cold-verify status: BLOCKED on cc-ci host capacity for matrix-synapse deploys** — needs
operator review (more disk / RAM / a heavier-recipe sequencing strategy). Filed in JOURNAL-2 +
PushNotification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Q3.1 lasuite-docs: parity + 2 specific (oidc_with_keycloak + auth_required); deeper oidc_login
+ upload_conversion + create-a-doc need lasuite-docs OIDC env wiring (install_steps.sh). Tracked.
Q3.4 cryptpad: parity + 2 specific (spa_assets + Playwright render); §4.3-prescribed create-pad
deeper test deferred with technical rationale (version-specific UI selectors). DECISIONS.md
Phase-2 Q3.4 section logs the deferral for Adversary sign-off per §7.1.
Both meet the ≥2 specific floor; both have open follow-ups documented for the Q3 gate (and/or
Q5 catch-up).
Initial Q3.4 (commit 0fb1458) shipped two tests that failed cold:
- test_api_config.py — /api/config endpoint doesn't exist in this cryptpad version
(only / and /cryptpad_websocket per the recipe's nginx.conf.tmpl). REMOVED.
- test_pad_create.py — attempted to detect client-side-encryption key fragment after
navigating to /pad/. CryptPad's pad-creation flow is version-specific; this release
(10.6.0+5.7.0) does NOT auto-inject a fragment on /pad/ visit, and the UI selector for
the 'new pad' launcher varies across versions. Deeper test deferred.
Revised:
- tests/cryptpad/functional/test_spa_assets.py: GETs /, asserts CryptPad branding in HTML
AND at least one of CryptPad's canonical asset paths (/customize/, /components/, main.js,
/api/broadcast). Non-vacuous: catches the wedged-cryptpad-server-fallback-page case.
- tests/cryptpad/playwright/test_pad_create.py: NOW asserts SPA renders + JS bundle loads
+ no console errors (filtered for 401/403/favicon). Documents the create-pad deeper test
as deferred in-file. The maximal testable subset per §7.1 is what's shipped here.
- PARITY.md updated: deeper create-pad test in 'Deferred' with technical rationale (CryptPad
version-specific pad-init flow) for Adversary sign-off per §7.1.
Cold-verifiable on cc-ci (log /root/ccci-q34-cryptpad-r4.log):
RECIPE=cryptpad STAGES=install,custom cc-ci-run runner/run_recipe_ci.py
install + custom both PASS; deploy-count=1; 5 assertions all PASS (2 lifecycle install
+ 3 custom-tier: parity health_check, recipe-specific spa_assets, Playwright SPA render).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>