cc-ci

Author	SHA1	Message	Date
autonomic-bot	8cd72fd78d	feat(harness): P2 — delete legacy customization keys & paths (rcust) All checks were successful continuous-integration/drone/push Build is passing Details a) compose.ccci.yml is FIRST-CLASS: the harness auto-copies tests/<recipe>/ compose.ccci.yml into the run's recipe checkout (ABRA_DIR-aware, lifecycle. provide_ccci_overlay) and auto-chaoses the pinned base deploy on its presence (kills the R7 implicit coupling). ghost/discourse install_steps.sh (copy-only boilerplate) deleted; CHAOS_BASE_DEPLOY removed from both metas + the registry. b) install-time deps wiring is the ONLY mode: deps with DEPS provision BEFORE the single deploy; legacy post-deploy provisioning + the setup_custom_tests.sh invocation machinery deleted. lasuite-docs migrated to install_steps.sh OIDC wiring (same env names/values as the old hook — only the timing moved); lasuite-drive's remaining post-deploy MinIO bucket one-shot moved to ops.py pre_install; both setup_custom_tests.sh files deleted; OIDC_AT_INSTALL removed from drive/meet metas + the registry. c) SKIP_GENERIC meta key deleted (zero users). Env form CCCI_SKIP_GENERIC* stays as the documented dev-only escape hatch; when active in a drone CI run the orchestrator prints a loud !! warning (manifest embedding lands in P5). d) conftest cleanup: dead pre-deploy-once fixtures deployed/deployed_app deleted (zero users), app_domain + _short + _wait_healthy dropped (only users were the deleted fixtures); deps_apps+deps_creds consolidated into ONE deps fixture (entries expose .domain etc. as attributes; dict access intact); the 6 lasuite test files renamed deps_creds->deps (fixture name only — assertions and flows byte-identical). requires_deps marker + F2-11 skip-report plumbing unchanged. Registry is now exactly the 14 final keys; docs §4 table regenerated. Stale setup_custom_tests/OIDC_AT_INSTALL prose in docstrings/comments/assert MESSAGES updated (no assert logic or expected value touched). Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 175 passed; scripts/lint.sh -> PASS.	2026-06-10 17:01:33 +00:00
autonomic-bot	472a68b32c	feat(harness): P1 — single registry-backed meta loader (rcust) All checks were successful continuous-integration/drone/push Build is passing Details One loader: runner/harness/meta.py::load(recipe) -> RecipeMeta (frozen dataclass, attribute access), backed by the declarative KEYS registry (14 final keys + 3 P2-deprecated). The ONLY exec() of tests/<recipe>/recipe_meta.py. Validation per the locked decision: unknown ALL-CAPS top-level name or type mismatch = MetaError (hard error at load); underscore-prefixed names recipe-private; callables only on hook-typed keys. Migrated all six legacy loaders (spec §4 L1–L6): - run_recipe_ci.py::_load_meta deleted; orchestrator loads once, passes meta down - tests/conftest.py::_recipe_meta deleted; meta fixture returns full RecipeMeta (R3) - lifecycle.py::_recipe_extra_env/_recipe_meta_flag deleted; deploy_app takes meta - deps.py::declared_deps deleted; callers read meta.DEPS - canonical.py::is_enrolled reads through meta.load() - screenshot.py now actually receives SCREENSHOT through the orchestrator path (R2 fix; proven by unit test through the real load path) Mumble private constants underscore-prefixed (_WELCOME_TEXT_MARKER/_MAX_USERS) + importers fixed. New tests/unit/test_meta.py (all-recipes-load-clean typo gate, MetaError cases, spec §2 baseline defaults, underscore exemption, doc sync). Docs §4 key table now GENERATED from the registry (scripts/gen-meta-docs.py); drift fails CI. Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 175 passed; scripts/lint.sh -> PASS.	2026-06-10 16:46:58 +00:00
autonomic-bot	b6e12ef428	fix(harness): run-keyed run-scoped state files — CONC-A1 (same-domain runs corrupted shared deploy-count) All checks were successful continuous-integration/drone/push Build is passing Details The four CCCI state files (deploys countfile, opstate, deps, depskip) were keyed by app domain in shared /tmp. A second run of the same domain executes its main() preamble + deploy_app's pre-lock _record_deploy BEFORE blocking at the app lock, so it reset/polluted the live first run's counter (false DG4.1 deploy-count=2, build 279) and the first run's end-of-run os.remove crashed the second (FileNotFoundError, build 281). Masked pre-restructure by the end-to-end recipe flock. Now keyed by run id + harness pid via _run_state_path(); children receive exact paths via the CCCI_*_FILE env vars, so domain keying was never load-bearing. tests/concurrency/test_run_state.py: path-invariant cases + a real-process regression (helpers.py deploy-count-run) reproducing the live interleaving — verified to FAIL under simulated shared keying. docs/concurrency.md §3 updated.	2026-06-10 08:16:09 +00:00
autonomic-bot	84d90fb655	test(concurrency): real-kernel suite for the restructured model — 20 tests, 19 plan cases All checks were successful continuous-integration/drone/push Build is passing Details tests/concurrency/ — NOT in the default `pytest tests/unit` gate; run explicitly with `pytest tests/concurrency -q`. flock/prctl/alarm are never mocked: helper subprocesses (helpers.py) hold real locks and install the real lifetime guards; locks live in a per-test tmp dir via CCCI_APP_LOCK_DIR; every helper (and recorded grandchild) is reaped by fixture cleanup. - test_locks.py (cases 1-4): SIGKILL auto-release; LOCK_NB held/unheld semantics; PEP 446 fd-not-inherited (holder's child survives, lock still releases); same-domain second acquire blocks until first holder exits. - test_janitor.py (cases 5-12): orphan reaped once + lockfile unlinked; live holder never reaped + logged; new-run acquire blocks until a slow reap completes (reap-under-probe-lock); two overlapping janitors -> exactly one reaps (flock arbitration); reboot sim (no lockfile) reaps immediately with no age wait; >120min-held lock flagged 'possible leaked run' and NOT stolen; warm/canonical names never probed (no lockfile even created); directory-as-lockfile and missing lock dir degrade to skip+log, never crash. - test_lifetime.py (cases 13-16): PDEATHSIG (wrapper parent SIGKILL'd -> guarded child TERM'd, teardown marker, lock released); already-orphaned helper REFUSES to run (ppid race); 2s deadline alarm -> teardown + exit 142 + lock released; SIGTERM -> teardown + exit 143 + lock released. - test_abra_dir.py (cases 17-19 + 18b): per-run dir built + $ABRA_DIR exported before the first abra call (recording stub abra on PATH); two CONCURRENT same-recipe fetch+checkout flows into different ABRA_DIRs -> divergent correct trees, canonical staged clone untouched; .env written through the servers/ symlink lands in the canonical path (env_get/env_set agree); manual runs get pid-suffixed dirs. On cc-ci: pytest tests/concurrency -q -> 20 passed; tests/unit -> 138 passed; lint PASS.	2026-06-10 04:29:36 +00:00
autonomic-bot	17ebdf39ac	feat(harness): P3 per-run ABRA_DIR — structural recipe-tree isolation, recipe flock deleted All checks were successful continuous-integration/drone/push Build is passing Details - run_recipe_ci.setup_run_abra_dir(): builds <runs_dir>/<run-id>/abra with servers/ and catalogue/ symlinked to the canonical ~/.abra (app .env files keep landing in the shared canonical path, so janitor discovery and env-based teardown are unchanged; per-domain filenames + the P2 app-domain lock prevent write conflicts) and a FRESH empty recipes/ — each run clones + checkouts its own recipe trees. Exported as $ABRA_DIR (honored by the abra CLI, verified on-host) before ANY abra call. Manual runs get manual-<pid> isolation. - fetch_recipe(): plain clone into $ABRA_DIR/recipes/<recipe> — no shared-tree rm-rf, no lock. CCCI_SKIP_FETCH=1 now copies the canonically-staged clone into the per-run tree (same staging workflow, run reads staged state). - abra.abra_dir()/recipe_dir(): single resolution rule ($ABRA_DIR else ~/.abra), used by recipe_checkout, has_lightweight_version_tags, recipe_head_commit, recipe_versions, generic._recipe_dir, lifecycle.prepull_images, snapshot_recipe_tests, and warm_reconcile._recipe_dir (which keeps the canonical default for its own systemd runs but follows the per-run tree when imported by promote_canonical inside a run). - deleted: lifecycle.acquire_recipe_lock, RECIPE_LOCK_DIR, the main() call site and the must-lock-before-fetch ordering rule. - tests/{ghost,discourse}/install_steps.sh: RECIPE_DIR resolves ${ABRA_DIR:-$HOME/.abra} so the compose.ccci.yml overlay lands in the tree the run actually deploys from (mechanical path fix required by per-run trees; no assertion/gate touched — see DECISIONS.md). - .drone.yml comments updated (HOME=/root rationale now via the servers symlink).	2026-06-10 04:18:33 +00:00
autonomic-bot	79c652ddd3	test(plausible): psql -q in _register_site — -t does not suppress command tags All checks were successful continuous-integration/drone/push Build is passing Details psql -tAc still prints INSERT/CREATE command tags (e.g. "INSERT 0 1"), so _register_site asserted out == site against "INSERT 0 1\nsite" and both event-tracking roundtrip tests failed on their very first run (build 237 — the custom tier had never executed before; install always failed earlier). -q suppresses the tags; verified against the recipe db container.	2026-06-09 22:50:55 +00:00
autonomic-bot	c828f6cdd0	Merge remote-tracking branch 'origin/test/plausible-upgrade-base-3.0.1' Some checks failed continuous-integration/drone/push Build is passing Details continuous-integration/drone Build is failing Details	2026-06-09 21:57:39 +00:00
autonomic-bot	9a7772563a	style: repo-wide lint pass — make the lint gate green again Push builds have been RED on the lint step since ~build 209 from accumulated formatting drift. This is the mechanical cleanup: ruff format + ruff --fix (UP038 isinstance unions, SIM105 contextlib.suppress, UP031 f-strings, SIM115 tempfile context manager), shfmt -i 2 -ci, nixpkgs-fmt/statix/deadnix (merged attrsets, dropped unused lib args), yamllint, and shell quoting fixes in tests/lasuite-docs/setup_custom_tests.sh. No behaviour changes intended; lint: PASS, unit tests: 138 passed.	2026-06-09 21:56:15 +00:00
autonomic-bot	1ba0d961a3	test(plausible): pin UPGRADE_BASE_VERSION to 3.0.1+v2.0.0 (newest published) Some checks failed continuous-integration/drone/push Build is failing Details The harness default base (recipe_versions[-2]) resolves to 3.0.0+v2.0.0 for the open 3.1.0 upgrade PR. That release predates x86_64 support in the clickhouse entrypoint (added 3.0.1): on this amd64 host it downloads clickhouse-backup-linux-x86_64.tar.gz — a deterministic HTTP 404 — and with set -e + a silenced wget the container exits 1 before logging anything, crash-looping until the deploy times out. The base therefore can never converge, regardless of the PR content (the published tag is immutable). This is exactly the case the harness documents for UPGRADE_BASE_VERSION: a PR adding its version ABOVE the newest published tag, where the true predecessor is [-1] (3.0.1+v2.0.0), not [-2]. The upgrade tier then tests the real operator path 3.0.1 -> 3.1.0. Pairs with recipe-maintainers/plausible#3 (its !testme can only go green once this lands).	2026-06-09 19:24:21 +00:00
autonomic-bot	c51cd84159	feat(harness): intentional skips + custom-html-tiny functional test; 4-rung ladder (#6 ) Some checks failed continuous-integration/drone/push Build is failing Details Declare intentional skips + custom-html-tiny functional test; 4-rung level ladder - recipe_meta.EXPECTED_NA = {rung: reason} lists intentionally-skipped rungs; any essential rung skipped and not listed is unintentional. Skips still cap the level (never inflate). results.json: skips:{intentional,unintentional} + level_cap_rung. - Level ladder = the four essential rungs (install, upgrade, backup/restore, functional; top = L4). integration & recipe-local are optional, not leveled (SSO still enforced for the run verdict, unchanged). - Card shows skipped rungs as INTENTIONAL SKIP (green, reason below) / UNINTENTIONAL SKIP (amber); level badge gains an expected/gap? third segment. - custom-html-tiny: functional serve test (exact-byte round-trip + 404); declares backup_restore intentionally skipped (stateless static server). Independently verified by the adversary: 138 unit tests pass cold; live full-stage run on custom-html-tiny green (upgrade tier ran; level 2; correct skips/badge); clean teardown.	2026-06-09 03:12:11 +00:00
autonomic-bot	31b71f9949	fix(regression): correct bad-backup SHA to b6fe99de (has .env.sample) Some checks failed continuous-integration/drone/push Build is failing Details	2026-06-02 02:15:58 +00:00
autonomic-bot	9449b22f24	fix(regression): separate recipe for bad-restore (custom-html-rst-bad) Some checks failed continuous-integration/drone/push Build is failing Details Having test_backup.py in custom-html-bkp-bad caused both bad-backup and bad-restore to fail at the backup tier. Create custom-html-rst-bad with its own cc-ci test dir that has ops.py+test_restore.py but NO test_backup.py, so: - backup: only generic test_backup_artifact → PASS (snapshot exists) - restore: pre_restore writes 'mutated', marker stays 'mutated' after restore → FAIL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 02:15:03 +00:00
autonomic-bot	74364d0a46	fix(regression): bad-restore uses custom-html-bkp-bad + ops.py+test_restore.py Some checks failed continuous-integration/drone/push Build is failing Details backup-bot-two ignores backupbot.backup.path labels and always backs up the full volume, making path-based restore-RED infeasible. New approach: custom-html-bkp-bad has no pre_backup → marker never seeded → backup snapshot has no ci-marker.txt. pre_restore writes 'mutated'. After restore: marker is MISSING or 'mutated' → test_restore_returns_state FAILS. upgrade=skip (no version tags) is acceptable since passing_tiers_before=[install,backup]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 02:12:28 +00:00
autonomic-bot	c7ede9cfbb	fix(regression): add test_backup.py for bad-backup canary — assertion-level failure Some checks failed continuous-integration/drone/push Build is failing Details No ops.py::pre_backup for custom-html-bkp-bad → ci-marker.txt never seeded. test_backup_captures_state asserts marker=='original' → MISSING → FAIL → backup=RED. This works regardless of backupbot label behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 02:09:29 +00:00
autonomic-bot	3b7267cbee	fix(regression): use custom-html-bkp-bad recipe for bad-backup canary Some checks failed continuous-integration/drone/push Build is failing Details backupbot-two ignores nonexistent backup paths and backs up the whole volume, making the bad-path approach unreliable. New approach: - Create recipe-maintainers/custom-html-bkp-bad on Gitea (custom-html without backupbot.backup=true label) — SHA 4e584063a99a - Add tests/custom-html-bkp-bad/recipe_meta.py with BACKUP_CAPABLE=True so the harness runs the backup tier despite auto-detect returning False - Without a labeled container, backup-bot-two produces no snapshot → parse_snapshot_id=None → test_backup_artifact fails → backup=RED ✓ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 02:07:06 +00:00
autonomic-bot	090724ec80	fix(regression): correct SHAs for bad-backup/bad-restore (A-reg-3) + consume inbox Some checks failed continuous-integration/drone/push Build is failing Details continuous-integration/drone Build is passing Details Both compose.yml uploads had empty files due to a bash encoding bug. Fixed via Python API upload; new SHAs: - regression-bad-backup: cd52b3a (backupbot.backup.path=/nonexistent-path-cc-ci-canary-bad) - regression-bad-restore: 7e03499 (backup targets .backup-data subdir + command creates it) Adversary confirmed bad-install ✓ and bad-upgrade ✓ from run artifacts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 02:00:51 +00:00
autonomic-bot	cf405b4195	feat(regression): add 4 per-tier RED canaries (DoD#4) + canary_fast marker Some checks failed continuous-integration/drone/push Build is failing Details Four new per-tier RED canaries prove the server catches failure at every lifecycle tier: - bad-install: custom-html-tiny @ regression-bad-image (4ae88661) nonexistent image → prepull fails → install=fail STAGES=install → no prev-version lookup → chaos deploy of HEAD - bad-upgrade: same branch + SHA, STAGES=install,upgrade install uses prev-version (good image) → PASS upgrade chaos checks out HEAD (bad image) → prepull fails → FAIL - bad-backup: custom-html @ regression-bad-backup (e1e3c5fc) backupbot.backup.path=/nonexistent-path-cc-ci-canary-bad abra app backup create fails → backup=fail - bad-restore: custom-html @ regression-bad-restore (5a481cc1) backup targets .backup-data/ subdir (not where ci-marker.txt lives) backup succeeds; restore puts .backup-data back but NOT the marker marker stays "mutated" → test_restore_returns_state FAILS → restore=fail Each test asserts: rc!=0, failing_tier="fail", prior tiers="pass". Adds @pytest.mark.canary_fast for the fast subset. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 01:49:28 +00:00
autonomic-bot	a2a6eea757	fix(regression): fix relative import (A-reg-1) + consume inbox Some checks failed continuous-integration/drone/push Build is failing Details - tests/regression/test_canaries.py: replace `from .conftest import ...` (relative import fails when not a package) with sys.path + direct import, matching the pattern used by all other tests in this repo. - Delete machine-docs/BUILDER-INBOX.md (Adversary inbox consumed). - Update STATUS-regression.md + JOURNAL-regression.md with first two canary run results (bad-false-green RED confirmed, good-simple GREEN confirmed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 01:37:31 +00:00
autonomic-bot	fd3db37c49	feat(regression): add tests/regression/ E2E canary suite Some checks failed continuous-integration/drone/push Build is failing Details Three canaries (@pytest.mark.canary) drive the real cold CI lifecycle: - good-simple: custom-html-tiny @ main (435df8fc) — fast signal, expects GREEN - good-significant: lasuite-docs @ main (290a8ad7) — multi-service, expects GREEN - bad-false-green: custom-html @ v5-stale-docroot (71e7326a) — expects RED Semantic teeth: beyond exit-code, each test asserts that specific named tests ran in results.json stages (test_serving, test_serving_and_frontend, test_content_type). If an assertion is removed, the named test disappears → regression test fails. Includes conftest (run_recipe_ci helper + stage_has_{passing,failing}_test), README (cadence policy, how to run, how to add), and phase state files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 01:25:55 +00:00
autonomic-bot	242d56b56e	claim(mirror): Ph1+Ph2+Ph3 complete — mirrors created, hedgedoc tests, 9 recipes enrolled Some checks failed continuous-integration/drone/push Build is failing Details Phase 1: Create 3 missing Gitea mirrors (lasuite-drive, mailu, mumble) via API + force-sync upstream main (f4135d78, 23309a1a, 9fa5e949). All 3 return 200/empty=false from Gitea API. Phase 2: Author tests/hedgedoc/ (uptime-kuma template) — recipe_meta.py, functional/ test_health_check.py (GET / → 200/302), functional/test_branding.py (brand markers), PARITY.md. Generic tiers cover install/upgrade/backup baseline. Phase 3: Enroll 9 unenrolled recipes in nix/modules/bridge.nix POLL_REPOS: bluesky-pds, discourse, ghost, immich, lasuite-drive, mailu, mattermost-lts, mumble, plausible. Final POLL_REPOS: 20 entries (cc-ci + 19 recipes). Gate Ph4 CLAIMED: operator must run `nixos-rebuild switch --flake .#cc-ci` on cc-ci after Adversary-verifies Ph1+Ph2+Ph3. See STATUS-mirror.md for exact repro.	2026-06-02 00:25:12 +00:00
autonomic-bot	7225138f30	fix(tests): keep La Suite OIDC secret inserts offline Some checks failed continuous-integration/drone/push Build is failing Details	2026-06-01 13:57:15 +00:00
autonomic-bot	91a69b8971	feat(3 U5.1+U5.2): per-recipe latest-level badge endpoint /badge/<recipe>.svg (R6, level-coloured, status fallback) + complete docs/results-ux.md §3-5 (card/screenshot/PR-comment/badge-embedding, R8); +2 badge unit tests Some checks failed continuous-integration/drone/push Build is failing Details	2026-05-31 10:04:14 +00:00
autonomic-bot	e1d837ee97	feat(3 U4): YunoHost-style dashboard grid — per-recipe level badge + status + version + app screenshot thumbnail + per-recipe /recipe/<name> history; reads results.json artifacts (R5); 9 dashboard unit tests Some checks failed continuous-integration/drone/push Build is failing Details	2026-05-31 09:52:06 +00:00
autonomic-bot	9a47aa28e3	feat(3 U3): YunoHost-style PR comment (🌻 + level badge + summary card images, linked) updated in place per PR; text fallback; bridge tests + dashboard do_HEAD	2026-05-31 07:46:00 +00:00
autonomic-bot	7217e0c98c	feat(3 U2-scaffold): summary card + level/status SVG badge renderers (offline; pure) harness/card.py: render_badge_svg/level_badge_svg (shields-style SVG, colour-by-level, R6) + render_card_html (recipe+version, level badge, per-stage/per-test ✔/✘ table, embedded screenshot, invariant flags — REPORTS results.json verbatim, never recomputes; cardinal no-inflation guardrail) + render_card_png (best-effort Playwright HTML->PNG, R7). 8 pure unit tests. Orchestrator wiring + stable-URL serving + live PNG demo come after U0 PASSes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 06:11:47 +00:00
autonomic-bot	daa7edd3a7	feat(3 U1-scaffold): app screenshot capture module (offline; not yet wired) harness/screenshot.py: best-effort Playwright capture of the live app (reuses harness browser). Default = landing page (credential-free, secret-safe R7); recipes needing post-login opt into a recipe-meta SCREENSHOT hook responsible for avoiding secret pages. Every failure swallowed -> None (cosmetics never block, R7). Pure helpers unit-tested. Orchestrator wiring + live demo come after U0 PASSes (avoid deploy contention with the Adversary's cold U0 re-runs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 06:05:39 +00:00
autonomic-bot	52e5d210d8	feat(3 U0.2+U0.3): per-test results + results.json with computed level harness/results.py: JUnit-XML parsing (stdlib) → per-stage/per-test rows; derive_rungs (documented tier+deps/SSO → rung mapping); build_results assembles results.json {recipe,version,pr,ref,run_id, stages[],level,level_cap_reason,rungs,flags{clean_teardown,no_secret_leak},screenshot,summary_card}; write_results (atomic). run_recipe_ci.py: tiers emit --junitxml + append {tier,source,file,rc,junit} records; main() assembles+writes results.json wrapped so a failure NEVER changes the verdict (R7), incl. a narrow leak-scan of the serialised artifact. 17 new unit tests (test_results.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 05:55:58 +00:00
autonomic-bot	9773e3ff63	feat(3 U0.1): pure level() ladder mapper (L0-L6, gap-caps) + unit tests Phase-3 R1 foundation. harness.level.compute_level(rungs)->(level,cap_reason) with YunoHost gap-caps semantics: level = highest rung 1..L all clean PASS; first non-PASS (FAIL or N/A) caps, recorded in cap_reason. N/A caps like fail but distinctly (L5 'no integration surface' example). Helpers backup_restore_status + tier_to_rung. 16 unit tests incl U0 gate cases (L4-pass, L2-cap). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 05:46:23 +00:00
autonomic-bot	470afbff98	fix(discourse F2-15): add N/A PARITY.md (P2 §4.1) — parity genuinely N/A (no upstream corpus); documents functional tests + P4 integrity	2026-05-31 05:24:19 +00:00
autonomic-bot	4bf9e1d43d	feat(mumble F2-14c): drop cc-ci compose.host-ports.yml fork; deploy 0.2.0 base minimally, add native host-ports on upgrade-to-latest via new UPGRADE_EXTRA_ENV harness hook + COMPOSE_FILE-aware READY_PROBE/install skip	2026-05-31 05:07:55 +00:00
autonomic-bot	588a08773b	fix(discourse): send capitalised topic title so Discourse title_prettify is a no-op (was 'ccci'->'Ccci' mismatch)	2026-05-31 04:46:48 +00:00
autonomic-bot	1f92776052	fix(discourse): enable allow_uncategorized_topics in admin bootstrap so create-topic POST succeeds (Discourse 3.x 422 'Category cant be blank')	2026-05-31 04:41:03 +00:00
autonomic-bot	8dfd8ed3b3	fix(2): discourse — revert non-working depends_on override (additive map-merge can't remove bad key); keep image warm-cache + 3600s timeout The depends_on:[app] override in `04cc44c` does NOT make compose valid: docker normalizes short-form depends_on to a map and merges additively, so {discourse}+{app}={discourse,app} keeps the invalid 'discourse' key (config --images still rc=15). Reverted to keep the overlay minimal (re-pin + grace only). Prepull-skip is harmless because bitnamilegacy/discourse:3.3.1 is warm in the node image cache → inline pull is a no-op. Timeout headroom (3600s) retained in recipe_meta. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 01:25:47 +00:00
autonomic-bot	04cc44c15e	fix(2): discourse base-deploy timeout — prepull-enable (sidekiq depends_on app, valid compose) + 3600s timeout full4 base deploy timed out at 2400s on the 7-GiB single node. Root causes: (1) sidekiq.depends_on referenced undefined service 'discourse' (main svc is 'app') → abra config --images rc=15 → prepull SKIPPED → 2.4GB image pulled inline during deploy, eating convergence budget. Overlay now overrides sidekiq.depends_on:[app] (swarm ignores depends_on → no-op at runtime, masks nothing) so prepull resolves+pre-pulls images on both base+head deploys. (2) bumped DEPLOY_TIMEOUT/TIMEOUT 2400→3600 for headroom on the RAM/CPU-constrained Rails cold boot. Also pre-cached bitnamilegacy/discourse:3.3.1 by tag on cc-ci (was dangling <none>). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 01:23:38 +00:00
autonomic-bot	8d689d6c32	fix(2): discourse — mint_admin ruby PATH (bash -c + discover) + BACKUP_VERIFY for post-upgrade backup race	2026-05-31 00:28:21 +00:00
autonomic-bot	3a612fc733	fix(2): ghost BACKUP_VERIFY — drop __file__ (recipe_meta is exec'd, no __file__); import harness directly full9: backup tier FAILed with NameError('__file__' not defined) — recipe_meta.py is exec()'d into a bare namespace so __file__ is undefined. The harness already has runner/ on sys.path + harness imported, so import lifecycle directly. (restore PASSED on full9 — the data-integrity fix works; this just fixes the verify probe crashing the backup tier.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 21:49:08 +00:00
autonomic-bot	68a7c79668	fix(2): ghost F2-14b — harness BACKUP_VERIFY hook + retry; close the backup-capture race Root cause (instrumented, DECISIONS 2026-05-30): a DB recipe dumps its data in a backupbot pre-hook, but if the DB container cycles mid-dump (intermittent on the loaded CI node — full5/6/7 RED, full8 green; NOT OOM/NOT healthcheck) the dump is truncated/absent and restic snapshots an empty path — abra app backup 'succeeds' yet a later restore silently loses the data (ghost ci_marker). Fix (additive, recipe-scoped via meta like READY_PROBE): recipe_meta may define BACKUP_VERIFY(domain) -> bool, a READ-ONLY post-backup integrity probe. When it returns False the harness re-runs the whole backup (fresh snapshot, re-stabilised db) up to 3x. Recipes without the hook are unaffected. ghost's BACKUP_VERIFY confirms /var/lib/mysql/backup.sql.gz is a valid non-empty gzip. Weakens no assertion — it only retries a flaky CAPTURE so P4 restore is RELIABLY exercised, not luck-dependent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 21:30:25 +00:00
autonomic-bot	4a160f6121	fix(2): ghost F2-14b — bump DEPLOY_TIMEOUT/TIMEOUT 1200→2400s for slow mysql cold-init + migration full4 timed out: abra deploy killed at 1200s while the app was at the near-final email_recipients migration tables (still 0/1). Wall-time = mysql fresh-dir init (~6min, app crash-loops on ECONNREFUSED until DB ready — no migration progress lost) + ~9-15min schema migration (round-trip-bound, slower under host load). Not a test weakening — bounded wait (matches discourse), a genuine hang still fails. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:54:20 +00:00
autonomic-bot	845b86c868	feat(2): discourse Q4.6 — upgrade-to-latest 0.7.0 base-repin+grace overlay (compose.ccci.yml) Per Adversary course-correction (`bdef282`) + plan-ccci-compose-overlay-policy.md §1: upgrade-to-latest is MANDATORY. The 0.7.0+3.3.1 from-version pins the Docker-Hub-removed bitnami/discourse:3.3.1 (404) and ships a too-tight 5m start_period for the 15-25min Rails cold boot. Minimal base overlay compose.ccci.yml re-pins app+sidekiq to bitnamilegacy/discourse:3.3.1 (namespace-only, identical image — same re-pin the PR head makes) + widens start_period to 20m (grace-only). install_steps.sh provides it; CHAOS_BASE_DEPLOY skips the clean-tree gate; UPGRADE_BASE_VERSION=0.7.0+3.3.1 sets the true predecessor. Neither change weakens a test. Run shape returns to STAGES=install,upgrade,backup, restore,custom. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:29:41 +00:00
autonomic-bot	3ca45c7308	fix(2): ghost F2-14b — add db start_period grace to base overlay Run #2 base deploy: fresh mysql:8.0 init on the loaded cc-ci host (load ~8) took >6min (InnoDB ~90s + system-tables + root-pw apply, starved by the app crash-loop churn), exceeding the recipe's 1m db start_period (+6min retry grace) → swarm killed mysql mid-init (exit 137 unhealthy) → corrupt InnoDB redo logs → permanent deadlock (same signature as run #1's stale vol). Widen db healthcheck start_period to 15m (matches app) so the slow first-boot finishes before the healthcheck can fail it. Grace-only, masks no defect; bites base+head (published recipe ships db start_period 1m everywhere) so overlay covers both. Torn down corrupt vol. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 17:58:30 +01:00
autonomic-bot	7feeadd0ec	feat(2): ghost F2-14b — upgrade-to-latest base-grace overlay (compose.ccci.yml) Course correction (REVIEW-2 `bdef282`) mandates upgrade-to-latest; harness base-deploys prev published version 1.1.1+6-alpine which predates the recipe-PR 15m start_period bump (ships 1m) → would deadlock on the ~6-9min fresh-DB migration (swarm kill mid-migration → held migrations_lock). Policy-blessed minimal base overlay: compose.ccci.yml re-applies the 15m app-healthcheck start_period grace to the BASE so the from-version is deployable; install_steps.sh provides it; CHAOS_BASE_DEPLOY skips clean-tree on the untracked overlay; persists across head checkout (idempotent — PR head ships 15m). Grace-only, no test weakened. Prior corrupt mysql vol (stale, interrupted init) torn down. Next: full run incl upgrade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 17:49:05 +01:00
autonomic-bot	0f2cc2d704	feat(2): ghost F2-14b overlay migration — start_period bump moved to recipe-PR (ghost#1 head ae43ffe, literal 15m on app healthcheck); DELETE cc-ci compose.ccci-health.yml + install_steps.sh + COMPOSE_FILE/CHAOS_BASE_DEPLOY. Anti-drift (plan §9): recipe-as-tested == recipe-as-published. env-var start_period impossible (abra pre-subst duration validation, Adversary-reproduced `4b862f6`). Next: run ghost on ae43ffe head.	2026-05-30 17:20:20 +01:00
autonomic-bot	fb20321bd9	feat(2): discourse start_period via literal recipe-PR bump (abra can't env-interpolate start_period) abra rejects env-interpolation in healthcheck start_period (FATA 'Does not match format duration' for both ${VAR} and quoted forms — validates the literal compose duration before .env substitution). So §9 pt1's env-var route is impossible for this field; the §9-compliant fix is a LITERAL start_period:20m bump in the recipe-PR (recipe everyone runs, not a cc-ci overlay; strictly safer). Remove APP_START_PERIOD from recipe_meta EXTRA_ENV; record the finding in DECISIONS (ghost E1 must use the same approach); STATUS-2 → new PR head 7a2e0e0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:24:45 +01:00
autonomic-bot	c346b9763b	feat(2): discourse Q4.6 policy-compliant shape (plan §9) — env-var start_period, delete cc-ci overlay, upgrade N/A Migrate discourse off the cc-ci compose overlay per plan §9 / plan-prefer-env-over-compose-overlay.md: - recipe_meta: drop UPGRADE_BASE_VERSION + COMPOSE_FILE + CHAOS_BASE_DEPLOY; set APP_START_PERIOD=1200s via EXTRA_ENV (the recipe-PR exposes start_period: ${APP_START_PERIOD:-5m}); declare upgrade tier N/A (both published prev bases pin removed bitnami images; Adversary §7.1 granted, REVIEW-2 `efe3790`). - delete tests/discourse/compose.ccci-health.yml + install_steps.sh (existed only to copy the overlay). - DECISIONS.md + STATUS-2 record the §9 guardrail + discourse shape (upgrade N/A, env start_period, pg_backup restore-hook recipe-PR = 5th data-loss recipe cc-ci caught). recipe-PR head now 8b8df17 (start_period env var added). Not a claim — run STAGES=install,backup,restore,custom next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 15:47:28 +01:00
autonomic-bot	a750937fb0	feat(2): discourse Q4.6 honest upgrade crossover — UPGRADE_BASE_VERSION override (base-on-[-1]) + uniform bitnamilegacy image overlay Implements the real 0.7.0+3.3.1 -> 0.8.0+3.3.1 upgrade crossover instead of a §7.1 skip-with-sign-off (Adversary leans DENY on the deferral; agreed): - recipe_meta UPGRADE_BASE_VERSION=0.7.0+3.3.1 + generic support in run_recipe_ci (prev = meta override or previous_version). Harness default [-2]=0.6.3+3.1.2 is a hollow base (img 3.1.2 != head 3.3.1); [-1]=0.7.0+3.3.1 is the PR's true predecessor and shares head's servable 3.3.1 image. - compose.ccci-health.yml re-pins services.{app,sidekiq}.image to bitnamilegacy/discourse:3.3.1 so the 0.7.0 base (compose pins 404 bitnami:3.3.1) is servable; idempotent on the head (PR already bitnamilegacy). Consumes Adversary BUILDER-INBOX (deleted), leaves ADVERSARY-INBOX ack; STATUS-2 discourse section updated. Full lifecycle run launching next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 14:20:06 +01:00
autonomic-bot	d822550c7d	feat(2): discourse P3 functional tests — §4.3 create-topic round-trip + site.json config + admin-bootstrap helper _discourse.py: bootstrap an admin (recipe seeds none) + mint an ApiKey via rails runner in the app container (class-B run-scoped). test_create_topic.py: POST /posts.json (unique marker) -> GET /t/<id>.json title+cooked round-trip. test_site_basic.py: GET /site.json asserts discourse categories config. Meets P3 (>=2 functional beyond health).	2026-05-30 12:52:30 +01:00
autonomic-bot	0e3049b677	fix(2): discourse health overlay add version 3.8 (lint R011/R012 version-mismatch FATA vs compose.yml 3.8)	2026-05-30 12:09:51 +01:00
autonomic-bot	b2ed6cf989	fix(2): discourse recipe_meta — wire COMPOSE_FILE+CHAOS_BASE_DEPLOY+TIMEOUT 2400 (the overlay's missing half; prior commit `a432058` only added the files)	2026-05-30 11:49:51 +01:00
autonomic-bot	a432058aca	fix(2): discourse healthcheck start_period overlay (slow Rails boot) + CHAOS_BASE_DEPLOY + TIMEOUT 2400 Install timed out at 1800s: discourse's 15-25min Rails cold boot overran both the deploy timeout and the recipe healthcheck start_period:5m (swarm killed the booting app). Add compose.ccci-health.yml (app healthcheck start_period 1200s) via install_steps.sh + recipe_meta COMPOSE_FILE + CHAOS_BASE_DEPLOY, bump DEPLOY_TIMEOUT/TIMEOUT to 2400. Image re-pin (bitnamilegacy) already proven working. NO test weakened.	2026-05-30 11:48:18 +01:00
autonomic-bot	13da216f8d	fix(2): ghost healthcheck start_period overlay — fixes fresh-migration lock deadlock Root cause: Ghost's fresh-DB first boot runs a ~6-9min schema migration (round-trip-bound, not CPU); the recipe healthcheck start_period:1m (~6min grace) kills the still-migrating task, leaving a stale migrations_lock → every later task deadlocks (MigrationsAreLockedError). Hit on both 2- and 4-vCPU. Fix (cc-ci deploy overlay, NOT a recipe/test change): compose.ccci-health.yml raises app healthcheck start_period to 900s, wired via recipe_meta COMPOSE_FILE + install_steps.sh (+ CHAOS_BASE_DEPLOY for the untracked overlay). No assertion weakened. Budget 1200s = migration + convergence. Only the install tier needs it (upgrade redeploys on the populated DB → fast boot).	2026-05-30 05:23:47 +01:00

1 2 3

147 Commits