Commit Graph

190 Commits

Author SHA1 Message Date
b29bb3f804 feat(samever): step back to older base when last-green canonical == head version
resolve_upgrade_base now reads the head's published version (abra.head_compose_version,
the coop-cloud.<stack>.version label) and, when the last-green warm-canonical version
equals it, steps back to the newest published version strictly older than head instead
of deploying a same-version no-op. warm_reconcile gains version_key + newest_older_version
(single coop-cloud ordering source; sort_versions refactored onto version_key, no behavior
change). Skip only when no older published predecessor exists. Step-back returns kind=version
so it inherits F1d-2 pinned-tag checkout. Extends tests/unit/test_upgrade_base.py (13 pass).
2026-06-17 04:24:14 +00:00
b66abc4978 fix(prevb): discourse custom mint_admin image-agnostic (official /var/www/discourse + DB-password re-export; bitnami fallback)
All checks were successful
continuous-integration/drone/push Build is passing
The custom tier runs on the PR head — now genuinely the official discourse/discourse image (prevb
stopped the overlay reverting it to bitnamilegacy). mint_admin hardcoded /opt/bitnami/discourse (404 on
official) → create-topic roundtrip failed. Detect /var/www/discourse, re-export DISCOURSE_DB_PASSWORD
from /run/secrets (entrypoint exports it only for boot), run bin/rails; keep bitnami fallback.
2026-06-17 01:20:41 +00:00
e1b32ea650 fix(prevb): prune orphan services on upgrade redeploy (head's dropped services); re-add EXPECTED_NA-other-rung test; consume Adversary inbox
All checks were successful
continuous-integration/drone/push Build is passing
docker stack deploy doesn't prune services the head compose dropped (discourse PR#4 drops sidekiq),
leaving them orphaned on the base image. perform_upgrade now reconciles the live stack to the head
compose service set (lifecycle.prune_orphan_services). Makes the deployed stack faithfully reflect
the head — no test weakened. No-op when service sets match / compose unresolvable.
2026-06-17 00:29:00 +00:00
bb2e3c6b2c feat(prevb): dynamic upgrade base (last-green→main→skip) + per-recipe previous/ overlay; migrate discourse off static base + leaky overlay
All checks were successful
continuous-integration/drone/push Build is passing
- resolve_upgrade_base: BasePlan(kind=version|ref|skip); last-green (warm canonical) primary,
  main-tip fallback, declared skip else. UPGRADE_BASE_VERSION retained as optional override.
- deploy_app: base_ref path (chaos-deploy a main-tip/last-green commit) + apply_previous wiring.
- lifecycle: previous/ surface (has_previous, previous_target_version, previous_status decision,
  provide/remove overlay, compose_file add/remove, recipe_branch_commit, stack_service_names).
- generic.perform_upgrade: strip previous/ overlay + COMPOSE_FILE entry before head redeploy.
- discourse: compose.ccci.yml now environmental-only (order: stop-first); removed bitnamilegacy
  pins + sidekiq + UPGRADE_BASE_VERSION; test_upgrade.py asserts head image == official 3.5.3 + no sidekiq.
- unit tests: resolve_upgrade_base matrix + previous/ apply/skip/stale + COMPOSE_FILE layering.
2026-06-17 00:15:06 +00:00
ad53b5a620 fix(gtea): derive STACK_NAME from domain (dots→underscores) in UPGRADE_SECRET_PREP
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
abra does NOT write STACK_NAME to the app's .env file — it derives it at runtime
by replacing dots with underscores (e.g. gite-e1cb78.ci.commoninternet.net →
gite-e1cb78_ci_commoninternet_net). Build #691 failed with 'STACK_NAME not found'
because the env file read was looking for a key that doesn't exist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 21:56:44 +00:00
2d865f06cb fix(gtea): ruff format + check all gtea files and bridge.py
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
Clears cc-ci self-test lint failures:
- ruff format: 9 files reformatted (all gtea test files + test_discovery.py)
- ruff check --fix: bridge.py UP017 (datetime.UTC alias) + 6 gtea check errors
- manifest.py B007: rename unused loop variable path → _path (no auto-fix available)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 21:52:01 +00:00
d832b353e4 fix(gtea): UPGRADE_SECRET_PREP hook — pre-insert lfs_jwt_secret with correct 43-char format
Some checks failed
continuous-integration/drone/push Build is failing
Blocker 4 fix: abra `secret generate --all` uses .env.sample for length hints; the
lfs-plain-gitea PR has SECRET_LFS_JWT_SECRET_VERSION=v1 COMMENTED OUT, so abra produces
a wrong-length secret. gitea requires exactly 43 chars (32 bytes base64 URL-safe); wrong
length → gitea fatals trying to save the JWT secret to the read-only Docker Config
app.ini → health check fails → swarm rolls back.

Fix: new UPGRADE_SECRET_PREP hook (meta.py) called before `abra secret generate --all`
in the upgrade path. abra's `--all` is idempotent (skips existing secrets), so the
correctly pre-inserted secret survives. gitea's recipe_meta.py implements the hook using
`docker secret create` directly to guarantee correct format regardless of .env.sample.

Also consumes machine-docs/BUILDER-INBOX.md (Adversary Blocker 4 digest).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 21:46:28 +00:00
a121d2c069 fix(gtea): fix M2 blockers — LFS upgrade and REF=main HC1
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is failing
Blocker 1 (LFS roundtrip fails on PR #1):
- Add UPGRADE_EXTRA_ENV to gitea recipe_meta.py — after PR-head checkout
  (compose.lfs.yml now in ABRA_DIR), add compose.lfs.yml to COMPOSE_FILE
  and set SECRET_LFS_JWT_SECRET_VERSION=v1 so the upgrade chaos redeploy
  actually runs with LFS enabled. Without this, the base install checks out
  the 3.5.x tag (compose.lfs.yml removed), EXTRA_ENV sees no LFS, and the
  upgrade chaos redeploy inherits the no-LFS .env — so the LFS test runs
  (compose.lfs.yml is restored by recipe_checkout_ref) but LFS is off.
- Add abra.secret_generate(domain) in generic.perform_upgrade when
  upgrade_env is non-empty — generates lfs_jwt_secret before chaos redeploy.

Blocker 2 (REF=main upgrade fails HC1):
- Always use recipe_head_commit (git rev-parse HEAD) for head_ref instead
  of using ref directly. When ref="main" (a branch name), the HC1 commit
  check "head_ref.startswith(chaos_commit)" always fails since "main" ≠ SHA.
  recipe_head_commit returns the actual SHA after the fetch/checkout.

Side-fix (stale creds — build #675):
- ops.py pre_install: delete the per-domain creds file before calling
  _ensure_admin. A fresh install wipes gitea's DB; any creds file from a
  prior run on the same domain is stale and causes 401s in all API calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 21:01:21 +00:00
74bc5f0106 fix(gtea): test_admin_api: add token scopes for gitea 1.22+
Some checks failed
continuous-integration/drone/push Build is failing
Gitea 1.22+ (including 1.24.2 on cc-ci) requires explicit scopes
when creating API tokens. Add read:user + read:organization to satisfy
the token creation endpoint and the read-back assertions that follow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 20:06:42 +00:00
3cc8338a78 fix(gtea): test_git_push: auto_init repo + direct URL push
Some checks failed
continuous-integration/drone/push Build is failing
Empty-repo HTTPS push with git clone exits 0 but silently fails (remote
branch creation on an empty clone is unreliable). Fix:
- Create repo with auto_init=True + default_branch=main (initial commit present)
- Clone into a non-existing subdir (git clone must target non-existing path)
- Push via explicit cred_url (bypasses remote config; no tracking needed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 20:04:48 +00:00
893a7b0eb4 fix(gtea): embed git credentials in URL; fix double /api/v1 path; add git-lfs
Some checks failed
continuous-integration/drone/push Build is failing
- test_git_push.py + test_lfs_roundtrip.py: use cred_url (https://user:pass@host/...)
  instead of GIT_CONFIG_COUNT insteadOf rewriting, which silently failed to
  propagate credentials to the push step (repo remained empty after push exit 0).
  Also add GIT_SSL_NO_VERIFY=true and GIT_TERMINAL_PROMPT=0.
- test_lfs_roundtrip.py: fix restart health-poll path /api/v1/version → /version
  (_api() already prepends /api/v1; double prefix produced 404 and a 120s timeout).
- nix/hosts/cc-ci/configuration.nix: add git-lfs to systemPackages (required for
  the LFS capstone test on the lfs-plain-gitea PR branch).

Adversary pre-M1 findings: Issue 1 (git-lfs absent) + Issue 2 (double path) both fixed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 20:01:31 +00:00
6ac9989140 fix(gtea): wait for visible input#user_name on gitea login page
Some checks failed
continuous-integration/drone/push Build is failing
_csrf is a hidden field; wait_for_selector defaults to state=visible
and times out. Switch to the visible username input which proves the
login form rendered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 19:56:25 +00:00
33561c8609 feat(gtea): build full gitea test suite (M1 build — all files)
Some checks failed
continuous-integration/drone/push Build is failing
- tests/gitea/recipe_meta.py: updated from dep-provider stub to dual-role (dep + recipe-under-test).
  Adds BACKUP_CAPABLE=True, READY_PROBE (/api/v1/version), SCREENSHOT (sign-in page), LFS-
  conditional EXTRA_ENV (compose.lfs.yml + GITEA_LFS_START_SERVER only when RECIPE=gitea AND
  overlay present — dep path unchanged). All existing dep keys preserved; 10/10 dep unit tests pass.

- tests/gitea/ops.py: NEW — admin user creation via gitea CLI (ci_admin, creds in /tmp per-domain
  file), marker repo lifecycle (pre_install/pre_upgrade/pre_backup create; pre_restore deletes to
  diverge from backup state).

- tests/gitea/test_{install,upgrade,backup,restore}.py: NEW — lifecycle overlays. Install checks
  API + admin auth + Playwright sign-in. Upgrade/backup/restore assert marker repo continuity.

- tests/gitea/custom/: NEW — test_health.py (parity: HTTP 200 root), test_git_push.py (parity:
  create→clone→push→verify→delete), test_admin_api.py (beyond-parity: user+org+token CRUD),
  test_lfs_roundtrip.py (LFS OID round-trip + JWT stability; skips on main, runs on PR #1 head).

- tests/gitea/PARITY.md: NEW — mapping table, source note (recipe-info corpus not upstream repo),
  beyond-parity rationale, backup/restore real-tier note, DB choice, dep-split mechanism, LFS skip.

- machine-docs/STATUS-gtea.md: NEW — phase status (building M1).
- machine-docs/BACKLOG-gtea.md: merged with Adversary init.
- machine-docs/JOURNAL-gtea.md: Builder log with design decisions + unit test results.
- machine-docs/REVIEW-gtea.md: kept Adversary init content.
- machine-docs/DECISIONS.md: appended gtea section (LFS split, admin mgmt, marker design).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 19:50:08 +00:00
d44f799de9 fix(cfold): wait for ghost db in entrypoint
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2026-06-13 03:58:59 +00:00
ee6b613ff3 fix(cfold): delay ghost app retry during db crossover
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is failing
2026-06-13 02:18:17 +00:00
23f1861b7a fix(bridge): ignore pre-start trigger comments
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-13 00:27:22 +00:00
44e02425ab feat(cfold): canonicalize custom test layout
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-12 16:08:18 +00:00
1be74fb9e1 fix(lint): F821 undefined 'e' in test_scm_configured; shfmt/ruff auto-fixes
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
- test_scm_configured.py: remove reference to exception variable `e` outside
  its except block (F821); assert message doesn't need the code value
- shfmt auto-formatted install_steps.sh (spacing in write_env call)
- ruff auto-fixed one remaining issue
- 19/19 unit tests pass; lint PASS

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 22:17:19 +00:00
0aa46dbe72 fix(drone-dep): ADV-drone-02 — teardown fallback when SSO enrichment fails after deploy
Some checks failed
continuous-integration/drone/push Build is failing
When _enrich_deps_with_sso raises after deploy_deps succeeds (e.g., gitea API
call fails), deps_state stays {} and the finally block's `if deps_state:` guard
skips teardown, orphaning the dep at its deterministic domain.

Fix: add an `else` branch after the `if deps_state:` block that reads
$CCCI_DEPS_FILE (the legacy-list written by deploy_deps) and calls
teardown_deps on the cold entries so no dep is left running.

Unit tests: test_load_run_state_provides_fallback_for_enrichment_failure and
test_fallback_skips_warm_entries verify the data-flow that the fallback relies on.
19/19 unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 22:03:29 +00:00
7e7e84df34 fix(drone): ADV-drone-01 — no-follow redirect pattern in SCM test
Some checks failed
continuous-integration/drone/push Build is failing
test_scm_configured.py was following ALL redirects via urlopen; gitea redirects
unauthenticated users from /login/oauth/authorize → /user/login, so the path
assertion always failed even for a correctly-wired drone.

Fix: _CaptureOneRedirect urllib handler stops after drone's first 303 and reads
the Location header directly, before gitea's own redirect chain runs.

- Consume BUILDER-INBOX.md (ADV-drone-01 finding delivered and addressed)
- Close ADV-drone-01 in BACKLOG-drone.md
- Update test_gitea_dep.py terminology: "location_url" not "final_url"
- All 10 unit tests pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 21:48:36 +00:00
51c3280163 feat(drone): enroll drone + gitea SCM dep (M1 implementation)
Some checks failed
continuous-integration/drone/push Build is failing
- tests/gitea/recipe_meta.py: gitea as install-time dep provider; sqlite3
  overlay EXTRA_ENV, health path /api/healthz, relaxed access for CI use
- tests/drone/recipe_meta.py: DEPS=["gitea"]; health /healthz; 600s timeout
- tests/drone/install_steps.sh: wires GITEA_CLIENT_ID + GITEA_DOMAIN +
  client_secret Docker secret + DRONE_USER_CREATE before single drone deploy
- tests/drone/functional/test_scm_configured.py: Playwright-free SCM test —
  follows /login redirect, asserts final URL is gitea dep's OAuth2 authorize
  endpoint with matching client_id (per Adversary pre-probe REVIEW-drone.md)
- tests/drone/PARITY.md: backup structural-skip justified (no backupbot labels)
- runner/harness/sso.py: setup_gitea_oauth() — creates gitea admin user via
  CLI + OAuth2 app via API, returns {admin_user, admin_password, client_id,
  client_secret} for install_steps.sh consumption
- runner/run_recipe_ci.py: _enrich_deps_with_sso now handles gitea dep (calls
  setup_gitea_oauth; keycloak path unchanged)
- tests/unit/test_gitea_dep.py: unit tests for gitea dep path — meta loading,
  SSO routing, SCM redirect assertion logic (parametrized)
- machine-docs: STATUS/JOURNAL/BACKLOG-drone.md phase state files initialized

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 21:31:43 +00:00
b17b6f1232 claim(mailu): M2 — DEFERRED closed; PARITY.md updated with dual-volume evidence; operator summary written; PR#3 open for merge; awaiting Adversary fresh re-trigger
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2026-06-11 21:03:51 +00:00
b9352e8313 fix(mailu): extend backup/restore seed to cover /mail Maildir volume (ADV-mailu-01)
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2026-06-11 20:56:00 +00:00
1fbc4e0b15 fix(mailu): fix _mailu import path in ops.py+overlays (functional/ subdir)
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2026-06-11 20:44:40 +00:00
4b5051f003 feat(mailu): add ops.py + backup/restore tests + update PARITY.md (P4 now covered via PR#3)
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is failing
2026-06-11 20:41:33 +00:00
9afdf3de5a claim(kuma): M2 — build #462 LEVEL 5 PASS (flake #2); DEFERRED closed; PARITY updated
Some checks failed
continuous-integration/drone/push Build is failing
Second drone run #462: uptime-kuma@eb4521cc (PR #3) = LEVEL 5.
test_monitor_wizard [pass] in both #460 + #462 — flake check complete.
DEFERRED.md "uptime-kuma create-a-monitor" closed with build+commit pointers.
PARITY.md: new row for tests/uptime-kuma/playwright/test_monitor_wizard.py.
M1 Adversary PASS @2026-06-11T18:26Z (REVIEW-kuma.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 18:32:16 +00:00
8da59cff22 feat(kuma): implement wizard+monitor Playwright test (tests/uptime-kuma/playwright/)
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
Phase kuma M1 impl: resolves the 2026-05-28 DEFERRED uptime-kuma create-a-monitor item.

Approach: Playwright (option b) — python-socketio not in cc-ci Nix env; Playwright
handles Socket.IO transparently via the real browser. Selectors confirmed in 2.2.1
compiled bundle (data-cy setup wizard + data-testid monitor form/status badge).

Test flow (test_monitor_wizard_and_probe):
1. Setup wizard: admin create via data-cy form → auto-login → /dashboard
2. Create self-probe monitor (https://{live_app}/) → wait ≤90s for "Up" badge
3. Heartbeat table row check: isFirstBeat=important, row has real datetime stamp
4. Negative: dead-port monitor (http://127.0.0.1:19999/dead) → wait ≤60s for "Down"

All waits are bounded poll with page.wait_for_function/wait_for_url/wait_for_selector.
Admin password: 64-char UUID hex, never printed/logged.

Also: DECISIONS.md records Playwright choice; phase state files bootstrapped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 18:15:13 +00:00
0cc31a507e fix(dstamp): discourse upgrade stop-first overlay (stop 2x-memory start-first OOM→spurious swarm rollback) + harness assert_upgrade_converged (detect rollback/pause → honest upgrade failure, HC1 unweakened). Root cause: failure_action:rollback reverted chaos-version label, masked by start-first+wait_healthy
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 17:07:38 +00:00
e9745c8c74 feat(bsky): EXPECTED_NA['upgrade'] suppresses the upgrade-tier base deploy — single deploy = PR head; bluesky-pds declares it (no deployable base: every published tag pins the republished moving :0.4). upgrade_base() extracted pure + 6 unit tests; meta-key doc regenerated. 253 unit tests + repo lint PASS
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
2026-06-11 11:51:12 +00:00
68c3486216 fix(lvl5): lint executor PR-path — abra lint selects+checks out the repo DEFAULT BRANCH; scratch clone of a detached per-run tree has none (FATA, live 400-402), and a stale default would be silently linted instead of the PR head. Force local main AT the tested ref + repoint origin to the scratch itself (offline tag fetch, no drift). Regression test with detached two-commit source proves exact-ref content is linted. 247 unit tests green; real-abra detached-source smoke pass.
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
2026-06-11 10:56:56 +00:00
1d3b61c6c2 fix(lvl5): lint table parser — abra renders HEAVY box verticals (┃ U+2503); accept both; meta registry EXPECTED_NA/BACKUP_CAPABLE wording → regenerated doc table
Some checks failed
continuous-integration/drone/push Build is failing
Found by real-abra smoke on cc-ci: hedgedoc clean → pass; +lightweight tag →
fail R014. Full suite 246 passed on cc-ci venv.
2026-06-11 07:49:29 +00:00
e219a7891d feat(lvl5): P1 — 5-rung ladder (L5=abra recipe lint) + de-capped level semantics
All checks were successful
continuous-integration/drone/push Build is passing
level.py: RUNGS += lint; statuses {pass,fail,skip,unver}; compute_level = max passed
rung with all below pass-or-skip (fail/unver block); cap_reason/capped DELETED.
harness/lint.py: lint executor — pristine scratch clone of the per-run tree at the
exact tested ref (mirror-origin + untracked-overlay pollution solved by context, no
rule filtered), PTY via script -qec, 60s hard budget, lint.txt artifact, table-parse
classifier (rc only signals FATA), unver on any non-run (never silent pass).
results.py: derive_rungs classifies every N/A source (structural/declared → skip,
else unver), lint rung + synthetic lint stage + lint block in results.json, schema 2,
cap fields removed. run_recipe_ci.py: lint call before tiers (double-wrapped,
verdict-neutral), badge = level only. card/dashboard: 0-5 ramp, cap line → 'level N
of {4|5}', unverified rows, badge number+colour only, lint.txt servable, old schema-1
artifacts render untouched. Unit suite rewritten: 245 passed on cc-ci venv.
2026-06-11 07:42:30 +00:00
3c33129ebd fix(shot): mattermost hook v2 — interstitial appears on ANY first-visit route incl /login (proven byte-identical PNG); click 'View in Browser' best-effort then settle; unit test covers click + no-interstitial fallback; 207 pass, lint PASS
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:45:43 +00:00
7ad7d1f20d fix(shot): A1 — blank-retry keeps the LARGER frame (retry snapped to temp path, os.replace only if >= first; worse late frame discarded + temp cleaned); regression test [9999,4801]->9999; 207 unit tests pass, lint PASS
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:24:01 +00:00
80e5713c5c feat(shot): mattermost-lts SCREENSHOT hook → /login (default lands the desktop-or-browser interstitial; watch-list wants the real sign-in form) + public screenshot.settle() for hooks; unit test via real loader; 206 unit tests pass, lint PASS
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 06:19:39 +00:00
b98a471dac fix(shot): plausible SECRET_KEY_BASE 62→68 chars — Phoenix cookie store requires >=64 bytes, so EVERY HTML render 500'd (the real cause of screenshot:null on all runs; /api/* unaffected which is why tiers passed). Default capture now lands the real registration page; verified: shot-fix-plausible run install=pass, screenshot.png 64132B real form, no hook needed
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
2026-06-11 05:55:43 +00:00
ce50f641cc feat(shot): harness default capture fix — bounded networkidle settle after domcontentloaded + blank-frame retry (≤60s wait budget, R7 best-effort preserved); 6 unit tests; lint PASS, 205 unit tests pass via cc-ci-run
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:31:03 +00:00
be2026aafb fix(harness): services_converged — a replica deficit explained entirely by Complete tasks is converged (triggered one-shot, rcust M2 lasuite-drive root cause)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:26:53 +00:00
1357544301 fix(tests): restore best-effort semantics of lasuite-drive pre_install bucket trigger (rcust M2 regression)
All checks were successful
continuous-integration/drone/push Build is passing
The P2b port of setup_custom_tests.sh -> ops.py::pre_install made the 90s bucket-poll timeout a
fatal AssertionError; the original shell hook fell through on timeout BY DESIGN (best-effort) and
the custom-tier MinIO storage test is the real gate for a genuinely missing bucket. Live evidence:
in both M2 sweep failures the bucket landed just after the window and every later tier including
the custom MinIO test passed. Warn loudly + continue, exactly the old semantics.

Adversary-approved fix-forward (REVIEW-rcust 57c66ad, scoped to this raise).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 20:53:31 +00:00
858e0f582f fix(harness): redact secret-named meta values in the customization manifest (rcust)
All checks were successful
continuous-integration/drone/push Build is passing
Adversary heads-up (inbox 2026-06-10T19:06Z): meta values are repo-public by construction, but
the manifest lands on the dashboard — a field literally named SECRET_KEY_BASE showing a value
(plausible's committed CI dummy) is needless secret-scan noise. Mask values whose key NAME is
secret-shaped (SECRET|PASSWORD|TOKEN|CREDENTIAL|word-segment KEY), top-level and nested dict
keys; the key name stays visible. Unit test pins redacted vs passthrough (KEYCLOAK_URL).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 19:09:09 +00:00
68954be53e feat(harness): P5 — customization manifest (rcust)
All checks were successful
continuous-integration/drone/push Build is passing
One block at run start answering "what does this recipe customize?" across every surface
(non-default recipe_meta keys, ops.py pre-ops, install_steps.sh, compose.ccci.yml, lifecycle
overlays by source, custom-test counts, active CCCI_SKIP_GENERIC* env overrides — !!-flagged when
riding a CI run, P2c), printed to the run log and embedded verbatim in results.json under
"customization". Pure presentation — building/printing it never influences a verdict; the
manifest honors the HC2 repo-local gate so it never advertises code the run will not execute.

Unit tests: synthetic recipe exercising every surface -> complete + deterministic + JSON-clean;
HC2 invisibility; env-override flagging; render golden lines; build_results threads the dict
verbatim (key always present, None when absent).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:57:26 +00:00
29a28e2028 feat(harness): P4 — custom-test ergonomics (rcust)
All checks were successful
continuous-integration/drone/push Build is passing
Placement RULE: discovery.custom_tests covers ONLY functional/ + playwright/ —
the top-level test_*.py glob for recipe dirs is removed (top level is reserved
for lifecycle overlays; zero in-repo users of top-level custom tests, verified
by sweep). Lifecycle-name exclusion inside the subdirs stays as the double-run
safety net. HC2 default-deny unchanged (repo-local custom now pinned via
functional/ in the gate test).

New conftest fixture op_state: parses $CCCI_OP_STATE_FILE (op context: versions,
artifact paths), skipping with a clear reason when unset/absent/unparseable —
overlay tests read op facts from the fixture instead of hand-parsing env (zero
existing hand-parsers found; the fixture is the documented path forward). deps
fixture landed in P2d.

Unit tests: placement-rule discovery tests (top-level custom NOT discovered;
functional/playwright are; misfiled lifecycle names excluded), op_state fixture
contract (reads file / skips without env / skips on missing file), deps fixture
attribute sugar.

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 184 passed; scripts/lint.sh -> PASS.
2026-06-10 17:14:21 +00:00
fd02d9f4b8 feat(harness): P3 — uniform ctx hook convention (rcust)
All checks were successful
continuous-integration/drone/push Build is passing
harness.meta.HookCtx (frozen): .domain, .base_url, .meta (RecipeMeta), .deps
(provisioned dep creds from $CCCI_DEPS_FILE or None), .op (current lifecycle op
or None); built via meta.hook_ctx() at each hook call site.

All recipe callables now take ctx: EXTRA_ENV(ctx), UPGRADE_EXTRA_ENV(ctx),
READY_PROBE(ctx), BACKUP_VERIFY(ctx), SCREENSHOT(page, ctx), ops.py pre_<op>(ctx).
Dict-valued EXTRA_ENV/UPGRADE_EXTRA_ENV unchanged (only the callable signature
moved). Call sites converted: deploy_app env shaping, perform_upgrade,
wait_ready_probes (gains op=), _perform_op BACKUP_VERIFY, screenshot.capture,
_run_pre_hook.

Legacy signatures fail FAST with a clear migration message: the registry carries
hook_params per hook key, enforced at meta.load() (MetaError names the old vs new
signature); ops.py pre-op hooks get the same check at the orchestrator call site
(meta.check_hook_signature) — no silent TypeError mid-run.

Migrated every in-repo user mechanically (17 ops.py files; cryptpad/lasuite-*/
mailu EXTRA_ENV; mumble+lasuite-drive READY_PROBE; ghost/discourse BACKUP_VERIFY)
— seeded values, probes and assertions byte-identical (domain -> ctx.domain;
keycloak pre_restore's meta arg -> ctx.meta).

Unit tests: hook_ctx field contract, ctx.deps from the run deps file, legacy-
signature MetaError (READY_PROBE/EXTRA_ENV/SCREENSHOT + pre-op checker), ctx
signatures accepted. Docs table regenerated (signature docs in key docs).

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 180 passed; scripts/lint.sh -> PASS.
2026-06-10 17:10:26 +00:00
8cd72fd78d feat(harness): P2 — delete legacy customization keys & paths (rcust)
All checks were successful
continuous-integration/drone/push Build is passing
a) compose.ccci.yml is FIRST-CLASS: the harness auto-copies tests/<recipe>/
   compose.ccci.yml into the run's recipe checkout (ABRA_DIR-aware, lifecycle.
   provide_ccci_overlay) and auto-chaoses the pinned base deploy on its presence
   (kills the R7 implicit coupling). ghost/discourse install_steps.sh (copy-only
   boilerplate) deleted; CHAOS_BASE_DEPLOY removed from both metas + the registry.

b) install-time deps wiring is the ONLY mode: deps with DEPS provision BEFORE the
   single deploy; legacy post-deploy provisioning + the setup_custom_tests.sh
   invocation machinery deleted. lasuite-docs migrated to install_steps.sh OIDC
   wiring (same env names/values as the old hook — only the timing moved);
   lasuite-drive's remaining post-deploy MinIO bucket one-shot moved to ops.py
   pre_install; both setup_custom_tests.sh files deleted; OIDC_AT_INSTALL removed
   from drive/meet metas + the registry.

c) SKIP_GENERIC meta key deleted (zero users). Env form CCCI_SKIP_GENERIC* stays
   as the documented dev-only escape hatch; when active in a drone CI run the
   orchestrator prints a loud !! warning (manifest embedding lands in P5).

d) conftest cleanup: dead pre-deploy-once fixtures deployed/deployed_app deleted
   (zero users), app_domain + _short + _wait_healthy dropped (only users were the
   deleted fixtures); deps_apps+deps_creds consolidated into ONE deps fixture
   (entries expose .domain etc. as attributes; dict access intact); the 6 lasuite
   test files renamed deps_creds->deps (fixture name only — assertions and flows
   byte-identical). requires_deps marker + F2-11 skip-report plumbing unchanged.

Registry is now exactly the 14 final keys; docs §4 table regenerated. Stale
setup_custom_tests/OIDC_AT_INSTALL prose in docstrings/comments/assert MESSAGES
updated (no assert logic or expected value touched).

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 175 passed; scripts/lint.sh -> PASS.
2026-06-10 17:01:33 +00:00
472a68b32c feat(harness): P1 — single registry-backed meta loader (rcust)
All checks were successful
continuous-integration/drone/push Build is passing
One loader: runner/harness/meta.py::load(recipe) -> RecipeMeta (frozen dataclass,
attribute access), backed by the declarative KEYS registry (14 final keys + 3
P2-deprecated). The ONLY exec() of tests/<recipe>/recipe_meta.py. Validation per
the locked decision: unknown ALL-CAPS top-level name or type mismatch = MetaError
(hard error at load); underscore-prefixed names recipe-private; callables only on
hook-typed keys.

Migrated all six legacy loaders (spec §4 L1–L6):
- run_recipe_ci.py::_load_meta deleted; orchestrator loads once, passes meta down
- tests/conftest.py::_recipe_meta deleted; meta fixture returns full RecipeMeta (R3)
- lifecycle.py::_recipe_extra_env/_recipe_meta_flag deleted; deploy_app takes meta
- deps.py::declared_deps deleted; callers read meta.DEPS
- canonical.py::is_enrolled reads through meta.load()
- screenshot.py now actually receives SCREENSHOT through the orchestrator path (R2
  fix; proven by unit test through the real load path)

Mumble private constants underscore-prefixed (_WELCOME_TEXT_MARKER/_MAX_USERS) +
importers fixed. New tests/unit/test_meta.py (all-recipes-load-clean typo gate,
MetaError cases, spec §2 baseline defaults, underscore exemption, doc sync). Docs
§4 key table now GENERATED from the registry (scripts/gen-meta-docs.py); drift
fails CI.

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 175 passed; scripts/lint.sh -> PASS.
2026-06-10 16:46:58 +00:00
b6e12ef428 fix(harness): run-keyed run-scoped state files — CONC-A1 (same-domain runs corrupted shared deploy-count)
All checks were successful
continuous-integration/drone/push Build is passing
The four CCCI state files (deploys countfile, opstate, deps, depskip) were keyed
by app domain in shared /tmp. A second run of the same domain executes its main()
preamble + deploy_app's pre-lock _record_deploy BEFORE blocking at the app lock,
so it reset/polluted the live first run's counter (false DG4.1 deploy-count=2,
build 279) and the first run's end-of-run os.remove crashed the second
(FileNotFoundError, build 281). Masked pre-restructure by the end-to-end recipe
flock. Now keyed by run id + harness pid via _run_state_path(); children receive
exact paths via the CCCI_*_FILE env vars, so domain keying was never load-bearing.

tests/concurrency/test_run_state.py: path-invariant cases + a real-process
regression (helpers.py deploy-count-run) reproducing the live interleaving —
verified to FAIL under simulated shared keying. docs/concurrency.md §3 updated.
2026-06-10 08:16:09 +00:00
84d90fb655 test(concurrency): real-kernel suite for the restructured model — 20 tests, 19 plan cases
All checks were successful
continuous-integration/drone/push Build is passing
tests/concurrency/ — NOT in the default `pytest tests/unit` gate; run explicitly with
`pytest tests/concurrency -q`. flock/prctl/alarm are never mocked: helper subprocesses
(helpers.py) hold real locks and install the real lifetime guards; locks live in a per-test
tmp dir via CCCI_APP_LOCK_DIR; every helper (and recorded grandchild) is reaped by fixture
cleanup.

- test_locks.py (cases 1-4): SIGKILL auto-release; LOCK_NB held/unheld semantics; PEP 446
  fd-not-inherited (holder's child survives, lock still releases); same-domain second acquire
  blocks until first holder exits.
- test_janitor.py (cases 5-12): orphan reaped once + lockfile unlinked; live holder never
  reaped + logged; new-run acquire blocks until a slow reap completes (reap-under-probe-lock);
  two overlapping janitors -> exactly one reaps (flock arbitration); reboot sim (no lockfile)
  reaps immediately with no age wait; >120min-held lock flagged 'possible leaked run' and NOT
  stolen; warm/canonical names never probed (no lockfile even created); directory-as-lockfile
  and missing lock dir degrade to skip+log, never crash.
- test_lifetime.py (cases 13-16): PDEATHSIG (wrapper parent SIGKILL'd -> guarded child TERM'd,
  teardown marker, lock released); already-orphaned helper REFUSES to run (ppid race); 2s
  deadline alarm -> teardown + exit 142 + lock released; SIGTERM -> teardown + exit 143 +
  lock released.
- test_abra_dir.py (cases 17-19 + 18b): per-run dir built + $ABRA_DIR exported before the
  first abra call (recording stub abra on PATH); two CONCURRENT same-recipe fetch+checkout
  flows into different ABRA_DIRs -> divergent correct trees, canonical staged clone untouched;
  .env written through the servers/ symlink lands in the canonical path (env_get/env_set
  agree); manual runs get pid-suffixed dirs.

On cc-ci: pytest tests/concurrency -q -> 20 passed; tests/unit -> 138 passed; lint PASS.
2026-06-10 04:29:36 +00:00
17ebdf39ac feat(harness): P3 per-run ABRA_DIR — structural recipe-tree isolation, recipe flock deleted
All checks were successful
continuous-integration/drone/push Build is passing
- run_recipe_ci.setup_run_abra_dir(): builds <runs_dir>/<run-id>/abra with servers/ and
  catalogue/ symlinked to the canonical ~/.abra (app .env files keep landing in the shared
  canonical path, so janitor discovery and env-based teardown are unchanged; per-domain
  filenames + the P2 app-domain lock prevent write conflicts) and a FRESH empty recipes/ —
  each run clones + checkouts its own recipe trees. Exported as $ABRA_DIR (honored by the
  abra CLI, verified on-host) before ANY abra call. Manual runs get manual-<pid> isolation.
- fetch_recipe(): plain clone into $ABRA_DIR/recipes/<recipe> — no shared-tree rm-rf, no lock.
  CCCI_SKIP_FETCH=1 now copies the canonically-staged clone into the per-run tree (same staging
  workflow, run reads staged state).
- abra.abra_dir()/recipe_dir(): single resolution rule ($ABRA_DIR else ~/.abra), used by
  recipe_checkout, has_lightweight_version_tags, recipe_head_commit, recipe_versions,
  generic._recipe_dir, lifecycle.prepull_images, snapshot_recipe_tests, and
  warm_reconcile._recipe_dir (which keeps the canonical default for its own systemd runs but
  follows the per-run tree when imported by promote_canonical inside a run).
- deleted: lifecycle.acquire_recipe_lock, RECIPE_LOCK_DIR, the main() call site and the
  must-lock-before-fetch ordering rule.
- tests/{ghost,discourse}/install_steps.sh: RECIPE_DIR resolves ${ABRA_DIR:-$HOME/.abra} so the
  compose.ccci.yml overlay lands in the tree the run actually deploys from (mechanical path fix
  required by per-run trees; no assertion/gate touched — see DECISIONS.md).
- .drone.yml comments updated (HOME=/root rationale now via the servers symlink).
2026-06-10 04:18:33 +00:00
79c652ddd3 test(plausible): psql -q in _register_site — -t does not suppress command tags
All checks were successful
continuous-integration/drone/push Build is passing
psql -tAc still prints INSERT/CREATE command tags (e.g. "INSERT 0 1"), so
_register_site asserted out == site against "INSERT 0 1\nsite" and both
event-tracking roundtrip tests failed on their very first run (build 237 —
the custom tier had never executed before; install always failed earlier).
-q suppresses the tags; verified against the recipe db container.
2026-06-09 22:50:55 +00:00
c828f6cdd0 Merge remote-tracking branch 'origin/test/plausible-upgrade-base-3.0.1'
Some checks failed
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is failing
2026-06-09 21:57:39 +00:00