Commit Graph

147 Commits

Author SHA1 Message Date
8cd72fd78d feat(harness): P2 — delete legacy customization keys & paths (rcust)
All checks were successful
continuous-integration/drone/push Build is passing
a) compose.ccci.yml is FIRST-CLASS: the harness auto-copies tests/<recipe>/
   compose.ccci.yml into the run's recipe checkout (ABRA_DIR-aware, lifecycle.
   provide_ccci_overlay) and auto-chaoses the pinned base deploy on its presence
   (kills the R7 implicit coupling). ghost/discourse install_steps.sh (copy-only
   boilerplate) deleted; CHAOS_BASE_DEPLOY removed from both metas + the registry.

b) install-time deps wiring is the ONLY mode: deps with DEPS provision BEFORE the
   single deploy; legacy post-deploy provisioning + the setup_custom_tests.sh
   invocation machinery deleted. lasuite-docs migrated to install_steps.sh OIDC
   wiring (same env names/values as the old hook — only the timing moved);
   lasuite-drive's remaining post-deploy MinIO bucket one-shot moved to ops.py
   pre_install; both setup_custom_tests.sh files deleted; OIDC_AT_INSTALL removed
   from drive/meet metas + the registry.

c) SKIP_GENERIC meta key deleted (zero users). Env form CCCI_SKIP_GENERIC* stays
   as the documented dev-only escape hatch; when active in a drone CI run the
   orchestrator prints a loud !! warning (manifest embedding lands in P5).

d) conftest cleanup: dead pre-deploy-once fixtures deployed/deployed_app deleted
   (zero users), app_domain + _short + _wait_healthy dropped (only users were the
   deleted fixtures); deps_apps+deps_creds consolidated into ONE deps fixture
   (entries expose .domain etc. as attributes; dict access intact); the 6 lasuite
   test files renamed deps_creds->deps (fixture name only — assertions and flows
   byte-identical). requires_deps marker + F2-11 skip-report plumbing unchanged.

Registry is now exactly the 14 final keys; docs §4 table regenerated. Stale
setup_custom_tests/OIDC_AT_INSTALL prose in docstrings/comments/assert MESSAGES
updated (no assert logic or expected value touched).

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 175 passed; scripts/lint.sh -> PASS.
2026-06-10 17:01:33 +00:00
472a68b32c feat(harness): P1 — single registry-backed meta loader (rcust)
All checks were successful
continuous-integration/drone/push Build is passing
One loader: runner/harness/meta.py::load(recipe) -> RecipeMeta (frozen dataclass,
attribute access), backed by the declarative KEYS registry (14 final keys + 3
P2-deprecated). The ONLY exec() of tests/<recipe>/recipe_meta.py. Validation per
the locked decision: unknown ALL-CAPS top-level name or type mismatch = MetaError
(hard error at load); underscore-prefixed names recipe-private; callables only on
hook-typed keys.

Migrated all six legacy loaders (spec §4 L1–L6):
- run_recipe_ci.py::_load_meta deleted; orchestrator loads once, passes meta down
- tests/conftest.py::_recipe_meta deleted; meta fixture returns full RecipeMeta (R3)
- lifecycle.py::_recipe_extra_env/_recipe_meta_flag deleted; deploy_app takes meta
- deps.py::declared_deps deleted; callers read meta.DEPS
- canonical.py::is_enrolled reads through meta.load()
- screenshot.py now actually receives SCREENSHOT through the orchestrator path (R2
  fix; proven by unit test through the real load path)

Mumble private constants underscore-prefixed (_WELCOME_TEXT_MARKER/_MAX_USERS) +
importers fixed. New tests/unit/test_meta.py (all-recipes-load-clean typo gate,
MetaError cases, spec §2 baseline defaults, underscore exemption, doc sync). Docs
§4 key table now GENERATED from the registry (scripts/gen-meta-docs.py); drift
fails CI.

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 175 passed; scripts/lint.sh -> PASS.
2026-06-10 16:46:58 +00:00
b6e12ef428 fix(harness): run-keyed run-scoped state files — CONC-A1 (same-domain runs corrupted shared deploy-count)
All checks were successful
continuous-integration/drone/push Build is passing
The four CCCI state files (deploys countfile, opstate, deps, depskip) were keyed
by app domain in shared /tmp. A second run of the same domain executes its main()
preamble + deploy_app's pre-lock _record_deploy BEFORE blocking at the app lock,
so it reset/polluted the live first run's counter (false DG4.1 deploy-count=2,
build 279) and the first run's end-of-run os.remove crashed the second
(FileNotFoundError, build 281). Masked pre-restructure by the end-to-end recipe
flock. Now keyed by run id + harness pid via _run_state_path(); children receive
exact paths via the CCCI_*_FILE env vars, so domain keying was never load-bearing.

tests/concurrency/test_run_state.py: path-invariant cases + a real-process
regression (helpers.py deploy-count-run) reproducing the live interleaving —
verified to FAIL under simulated shared keying. docs/concurrency.md §3 updated.
2026-06-10 08:16:09 +00:00
84d90fb655 test(concurrency): real-kernel suite for the restructured model — 20 tests, 19 plan cases
All checks were successful
continuous-integration/drone/push Build is passing
tests/concurrency/ — NOT in the default `pytest tests/unit` gate; run explicitly with
`pytest tests/concurrency -q`. flock/prctl/alarm are never mocked: helper subprocesses
(helpers.py) hold real locks and install the real lifetime guards; locks live in a per-test
tmp dir via CCCI_APP_LOCK_DIR; every helper (and recorded grandchild) is reaped by fixture
cleanup.

- test_locks.py (cases 1-4): SIGKILL auto-release; LOCK_NB held/unheld semantics; PEP 446
  fd-not-inherited (holder's child survives, lock still releases); same-domain second acquire
  blocks until first holder exits.
- test_janitor.py (cases 5-12): orphan reaped once + lockfile unlinked; live holder never
  reaped + logged; new-run acquire blocks until a slow reap completes (reap-under-probe-lock);
  two overlapping janitors -> exactly one reaps (flock arbitration); reboot sim (no lockfile)
  reaps immediately with no age wait; >120min-held lock flagged 'possible leaked run' and NOT
  stolen; warm/canonical names never probed (no lockfile even created); directory-as-lockfile
  and missing lock dir degrade to skip+log, never crash.
- test_lifetime.py (cases 13-16): PDEATHSIG (wrapper parent SIGKILL'd -> guarded child TERM'd,
  teardown marker, lock released); already-orphaned helper REFUSES to run (ppid race); 2s
  deadline alarm -> teardown + exit 142 + lock released; SIGTERM -> teardown + exit 143 +
  lock released.
- test_abra_dir.py (cases 17-19 + 18b): per-run dir built + $ABRA_DIR exported before the
  first abra call (recording stub abra on PATH); two CONCURRENT same-recipe fetch+checkout
  flows into different ABRA_DIRs -> divergent correct trees, canonical staged clone untouched;
  .env written through the servers/ symlink lands in the canonical path (env_get/env_set
  agree); manual runs get pid-suffixed dirs.

On cc-ci: pytest tests/concurrency -q -> 20 passed; tests/unit -> 138 passed; lint PASS.
2026-06-10 04:29:36 +00:00
17ebdf39ac feat(harness): P3 per-run ABRA_DIR — structural recipe-tree isolation, recipe flock deleted
All checks were successful
continuous-integration/drone/push Build is passing
- run_recipe_ci.setup_run_abra_dir(): builds <runs_dir>/<run-id>/abra with servers/ and
  catalogue/ symlinked to the canonical ~/.abra (app .env files keep landing in the shared
  canonical path, so janitor discovery and env-based teardown are unchanged; per-domain
  filenames + the P2 app-domain lock prevent write conflicts) and a FRESH empty recipes/ —
  each run clones + checkouts its own recipe trees. Exported as $ABRA_DIR (honored by the
  abra CLI, verified on-host) before ANY abra call. Manual runs get manual-<pid> isolation.
- fetch_recipe(): plain clone into $ABRA_DIR/recipes/<recipe> — no shared-tree rm-rf, no lock.
  CCCI_SKIP_FETCH=1 now copies the canonically-staged clone into the per-run tree (same staging
  workflow, run reads staged state).
- abra.abra_dir()/recipe_dir(): single resolution rule ($ABRA_DIR else ~/.abra), used by
  recipe_checkout, has_lightweight_version_tags, recipe_head_commit, recipe_versions,
  generic._recipe_dir, lifecycle.prepull_images, snapshot_recipe_tests, and
  warm_reconcile._recipe_dir (which keeps the canonical default for its own systemd runs but
  follows the per-run tree when imported by promote_canonical inside a run).
- deleted: lifecycle.acquire_recipe_lock, RECIPE_LOCK_DIR, the main() call site and the
  must-lock-before-fetch ordering rule.
- tests/{ghost,discourse}/install_steps.sh: RECIPE_DIR resolves ${ABRA_DIR:-$HOME/.abra} so the
  compose.ccci.yml overlay lands in the tree the run actually deploys from (mechanical path fix
  required by per-run trees; no assertion/gate touched — see DECISIONS.md).
- .drone.yml comments updated (HOME=/root rationale now via the servers symlink).
2026-06-10 04:18:33 +00:00
79c652ddd3 test(plausible): psql -q in _register_site — -t does not suppress command tags
All checks were successful
continuous-integration/drone/push Build is passing
psql -tAc still prints INSERT/CREATE command tags (e.g. "INSERT 0 1"), so
_register_site asserted out == site against "INSERT 0 1\nsite" and both
event-tracking roundtrip tests failed on their very first run (build 237 —
the custom tier had never executed before; install always failed earlier).
-q suppresses the tags; verified against the recipe db container.
2026-06-09 22:50:55 +00:00
c828f6cdd0 Merge remote-tracking branch 'origin/test/plausible-upgrade-base-3.0.1'
Some checks failed
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is failing
2026-06-09 21:57:39 +00:00
9a7772563a style: repo-wide lint pass — make the lint gate green again
Push builds have been RED on the lint step since ~build 209 from accumulated
formatting drift. This is the mechanical cleanup: ruff format + ruff --fix
(UP038 isinstance unions, SIM105 contextlib.suppress, UP031 f-strings, SIM115
tempfile context manager), shfmt -i 2 -ci, nixpkgs-fmt/statix/deadnix (merged
attrsets, dropped unused lib args), yamllint, and shell quoting fixes in
tests/lasuite-docs/setup_custom_tests.sh. No behaviour changes intended;
lint: PASS, unit tests: 138 passed.
2026-06-09 21:56:15 +00:00
1ba0d961a3 test(plausible): pin UPGRADE_BASE_VERSION to 3.0.1+v2.0.0 (newest published)
Some checks failed
continuous-integration/drone/push Build is failing
The harness default base (recipe_versions[-2]) resolves to 3.0.0+v2.0.0 for
the open 3.1.0 upgrade PR. That release predates x86_64 support in the
clickhouse entrypoint (added 3.0.1): on this amd64 host it downloads
clickhouse-backup-linux-x86_64.tar.gz — a deterministic HTTP 404 — and with
set -e + a silenced wget the container exits 1 before logging anything,
crash-looping until the deploy times out. The base therefore can never
converge, regardless of the PR content (the published tag is immutable).

This is exactly the case the harness documents for UPGRADE_BASE_VERSION:
a PR adding its version ABOVE the newest published tag, where the true
predecessor is [-1] (3.0.1+v2.0.0), not [-2]. The upgrade tier then tests
the real operator path 3.0.1 -> 3.1.0.

Pairs with recipe-maintainers/plausible#3 (its !testme can only go green
once this lands).
2026-06-09 19:24:21 +00:00
c51cd84159 feat(harness): intentional skips + custom-html-tiny functional test; 4-rung ladder (#6)
Some checks failed
continuous-integration/drone/push Build is failing
Declare intentional skips + custom-html-tiny functional test; 4-rung level ladder

- recipe_meta.EXPECTED_NA = {rung: reason} lists intentionally-skipped rungs; any
  essential rung skipped and not listed is unintentional. Skips still cap the level
  (never inflate). results.json: skips:{intentional,unintentional} + level_cap_rung.
- Level ladder = the four essential rungs (install, upgrade, backup/restore,
  functional; top = L4). integration & recipe-local are optional, not leveled
  (SSO still enforced for the run verdict, unchanged).
- Card shows skipped rungs as INTENTIONAL SKIP (green, reason below) / UNINTENTIONAL
  SKIP (amber); level badge gains an expected/gap? third segment.
- custom-html-tiny: functional serve test (exact-byte round-trip + 404); declares
  backup_restore intentionally skipped (stateless static server).

Independently verified by the adversary: 138 unit tests pass cold; live full-stage
run on custom-html-tiny green (upgrade tier ran; level 2; correct skips/badge);
clean teardown.
2026-06-09 03:12:11 +00:00
31b71f9949 fix(regression): correct bad-backup SHA to b6fe99de (has .env.sample)
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-02 02:15:58 +00:00
9449b22f24 fix(regression): separate recipe for bad-restore (custom-html-rst-bad)
Some checks failed
continuous-integration/drone/push Build is failing
Having test_backup.py in custom-html-bkp-bad caused both bad-backup and bad-restore
to fail at the backup tier. Create custom-html-rst-bad with its own cc-ci test dir
that has ops.py+test_restore.py but NO test_backup.py, so:
- backup: only generic test_backup_artifact → PASS (snapshot exists)
- restore: pre_restore writes 'mutated', marker stays 'mutated' after restore → FAIL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 02:15:03 +00:00
74364d0a46 fix(regression): bad-restore uses custom-html-bkp-bad + ops.py+test_restore.py
Some checks failed
continuous-integration/drone/push Build is failing
backup-bot-two ignores backupbot.backup.path labels and always backs up
the full volume, making path-based restore-RED infeasible.

New approach: custom-html-bkp-bad has no pre_backup → marker never seeded
→ backup snapshot has no ci-marker.txt. pre_restore writes 'mutated'.
After restore: marker is MISSING or 'mutated' → test_restore_returns_state FAILS.
upgrade=skip (no version tags) is acceptable since passing_tiers_before=[install,backup].

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 02:12:28 +00:00
c7ede9cfbb fix(regression): add test_backup.py for bad-backup canary — assertion-level failure
Some checks failed
continuous-integration/drone/push Build is failing
No ops.py::pre_backup for custom-html-bkp-bad → ci-marker.txt never seeded.
test_backup_captures_state asserts marker=='original' → MISSING → FAIL → backup=RED.
This works regardless of backupbot label behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 02:09:29 +00:00
3b7267cbee fix(regression): use custom-html-bkp-bad recipe for bad-backup canary
Some checks failed
continuous-integration/drone/push Build is failing
backupbot-two ignores nonexistent backup paths and backs up the whole
volume, making the bad-path approach unreliable. New approach:
- Create recipe-maintainers/custom-html-bkp-bad on Gitea (custom-html
  without backupbot.backup=true label) — SHA 4e584063a99a
- Add tests/custom-html-bkp-bad/recipe_meta.py with BACKUP_CAPABLE=True
  so the harness runs the backup tier despite auto-detect returning False
- Without a labeled container, backup-bot-two produces no snapshot →
  parse_snapshot_id=None → test_backup_artifact fails → backup=RED ✓

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 02:07:06 +00:00
090724ec80 fix(regression): correct SHAs for bad-backup/bad-restore (A-reg-3) + consume inbox
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
Both compose.yml uploads had empty files due to a bash encoding bug.
Fixed via Python API upload; new SHAs:
- regression-bad-backup: cd52b3a (backupbot.backup.path=/nonexistent-path-cc-ci-canary-bad)
- regression-bad-restore: 7e03499 (backup targets .backup-data subdir + command creates it)

Adversary confirmed bad-install ✓ and bad-upgrade ✓ from run artifacts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 02:00:51 +00:00
cf405b4195 feat(regression): add 4 per-tier RED canaries (DoD#4) + canary_fast marker
Some checks failed
continuous-integration/drone/push Build is failing
Four new per-tier RED canaries prove the server catches failure at every
lifecycle tier:

- bad-install: custom-html-tiny @ regression-bad-image (4ae88661)
  nonexistent image → prepull fails → install=fail
  STAGES=install → no prev-version lookup → chaos deploy of HEAD

- bad-upgrade: same branch + SHA, STAGES=install,upgrade
  install uses prev-version (good image) → PASS
  upgrade chaos checks out HEAD (bad image) → prepull fails → FAIL

- bad-backup: custom-html @ regression-bad-backup (e1e3c5fc)
  backupbot.backup.path=/nonexistent-path-cc-ci-canary-bad
  abra app backup create fails → backup=fail

- bad-restore: custom-html @ regression-bad-restore (5a481cc1)
  backup targets .backup-data/ subdir (not where ci-marker.txt lives)
  backup succeeds; restore puts .backup-data back but NOT the marker
  marker stays "mutated" → test_restore_returns_state FAILS → restore=fail

Each test asserts: rc!=0, failing_tier="fail", prior tiers="pass".
Adds @pytest.mark.canary_fast for the fast subset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 01:49:28 +00:00
a2a6eea757 fix(regression): fix relative import (A-reg-1) + consume inbox
Some checks failed
continuous-integration/drone/push Build is failing
- tests/regression/test_canaries.py: replace `from .conftest import ...`
  (relative import fails when not a package) with sys.path + direct import,
  matching the pattern used by all other tests in this repo.
- Delete machine-docs/BUILDER-INBOX.md (Adversary inbox consumed).
- Update STATUS-regression.md + JOURNAL-regression.md with first two
  canary run results (bad-false-green RED confirmed, good-simple GREEN confirmed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 01:37:31 +00:00
fd3db37c49 feat(regression): add tests/regression/ E2E canary suite
Some checks failed
continuous-integration/drone/push Build is failing
Three canaries (@pytest.mark.canary) drive the real cold CI lifecycle:
- good-simple: custom-html-tiny @ main (435df8fc) — fast signal, expects GREEN
- good-significant: lasuite-docs @ main (290a8ad7) — multi-service, expects GREEN
- bad-false-green: custom-html @ v5-stale-docroot (71e7326a) — expects RED

Semantic teeth: beyond exit-code, each test asserts that specific named tests
ran in results.json stages (test_serving, test_serving_and_frontend, test_content_type).
If an assertion is removed, the named test disappears → regression test fails.

Includes conftest (run_recipe_ci helper + stage_has_{passing,failing}_test),
README (cadence policy, how to run, how to add), and phase state files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 01:25:55 +00:00
242d56b56e claim(mirror): Ph1+Ph2+Ph3 complete — mirrors created, hedgedoc tests, 9 recipes enrolled
Some checks failed
continuous-integration/drone/push Build is failing
Phase 1: Create 3 missing Gitea mirrors (lasuite-drive, mailu, mumble) via API + force-sync
  upstream main (f4135d78, 23309a1a, 9fa5e949). All 3 return 200/empty=false from Gitea API.

Phase 2: Author tests/hedgedoc/ (uptime-kuma template) — recipe_meta.py, functional/
  test_health_check.py (GET / → 200/302), functional/test_branding.py (brand markers),
  PARITY.md. Generic tiers cover install/upgrade/backup baseline.

Phase 3: Enroll 9 unenrolled recipes in nix/modules/bridge.nix POLL_REPOS:
  bluesky-pds, discourse, ghost, immich, lasuite-drive, mailu, mattermost-lts, mumble, plausible.
  Final POLL_REPOS: 20 entries (cc-ci + 19 recipes).

Gate Ph4 CLAIMED: operator must run `nixos-rebuild switch --flake .#cc-ci` on cc-ci after
Adversary-verifies Ph1+Ph2+Ph3. See STATUS-mirror.md for exact repro.
2026-06-02 00:25:12 +00:00
7225138f30 fix(tests): keep La Suite OIDC secret inserts offline
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-01 13:57:15 +00:00
91a69b8971 feat(3 U5.1+U5.2): per-recipe latest-level badge endpoint /badge/<recipe>.svg (R6, level-coloured, status fallback) + complete docs/results-ux.md §3-5 (card/screenshot/PR-comment/badge-embedding, R8); +2 badge unit tests
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:04:14 +00:00
e1d837ee97 feat(3 U4): YunoHost-style dashboard grid — per-recipe level badge + status + version + app screenshot thumbnail + per-recipe /recipe/<name> history; reads results.json artifacts (R5); 9 dashboard unit tests
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 09:52:06 +00:00
9a47aa28e3 feat(3 U3): YunoHost-style PR comment (🌻 + level badge + summary card images, linked) updated in place per PR; text fallback; bridge tests + dashboard do_HEAD 2026-05-31 07:46:00 +00:00
7217e0c98c feat(3 U2-scaffold): summary card + level/status SVG badge renderers (offline; pure)
harness/card.py: render_badge_svg/level_badge_svg (shields-style SVG, colour-by-level, R6) +
render_card_html (recipe+version, level badge, per-stage/per-test ✔/✘ table, embedded screenshot,
invariant flags — REPORTS results.json verbatim, never recomputes; cardinal no-inflation guardrail)
+ render_card_png (best-effort Playwright HTML->PNG, R7). 8 pure unit tests. Orchestrator wiring +
stable-URL serving + live PNG demo come after U0 PASSes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:11:47 +00:00
daa7edd3a7 feat(3 U1-scaffold): app screenshot capture module (offline; not yet wired)
harness/screenshot.py: best-effort Playwright capture of the live app (reuses harness browser).
Default = landing page (credential-free, secret-safe R7); recipes needing post-login opt into a
recipe-meta SCREENSHOT hook responsible for avoiding secret pages. Every failure swallowed -> None
(cosmetics never block, R7). Pure helpers unit-tested. Orchestrator wiring + live demo come after U0
PASSes (avoid deploy contention with the Adversary's cold U0 re-runs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:05:39 +00:00
52e5d210d8 feat(3 U0.2+U0.3): per-test results + results.json with computed level
harness/results.py: JUnit-XML parsing (stdlib) → per-stage/per-test rows; derive_rungs (documented
tier+deps/SSO → rung mapping); build_results assembles results.json {recipe,version,pr,ref,run_id,
stages[],level,level_cap_reason,rungs,flags{clean_teardown,no_secret_leak},screenshot,summary_card};
write_results (atomic). run_recipe_ci.py: tiers emit --junitxml + append {tier,source,file,rc,junit}
records; main() assembles+writes results.json wrapped so a failure NEVER changes the verdict (R7),
incl. a narrow leak-scan of the serialised artifact. 17 new unit tests (test_results.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:55:58 +00:00
9773e3ff63 feat(3 U0.1): pure level() ladder mapper (L0-L6, gap-caps) + unit tests
Phase-3 R1 foundation. harness.level.compute_level(rungs)->(level,cap_reason) with YunoHost
gap-caps semantics: level = highest rung 1..L all clean PASS; first non-PASS (FAIL or N/A) caps,
recorded in cap_reason. N/A caps like fail but distinctly (L5 'no integration surface' example).
Helpers backup_restore_status + tier_to_rung. 16 unit tests incl U0 gate cases (L4-pass, L2-cap).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:46:23 +00:00
470afbff98 fix(discourse F2-15): add N/A PARITY.md (P2 §4.1) — parity genuinely N/A (no upstream corpus); documents functional tests + P4 integrity 2026-05-31 05:24:19 +00:00
4bf9e1d43d feat(mumble F2-14c): drop cc-ci compose.host-ports.yml fork; deploy 0.2.0 base minimally, add native host-ports on upgrade-to-latest via new UPGRADE_EXTRA_ENV harness hook + COMPOSE_FILE-aware READY_PROBE/install skip 2026-05-31 05:07:55 +00:00
588a08773b fix(discourse): send capitalised topic title so Discourse title_prettify is a no-op (was 'ccci'->'Ccci' mismatch) 2026-05-31 04:46:48 +00:00
1f92776052 fix(discourse): enable allow_uncategorized_topics in admin bootstrap so create-topic POST succeeds (Discourse 3.x 422 'Category cant be blank') 2026-05-31 04:41:03 +00:00
8dfd8ed3b3 fix(2): discourse — revert non-working depends_on override (additive map-merge can't remove bad key); keep image warm-cache + 3600s timeout
The depends_on:[app] override in 04cc44c does NOT make compose valid: docker normalizes short-form
depends_on to a map and merges additively, so {discourse}+{app}={discourse,app} keeps the invalid
'discourse' key (config --images still rc=15). Reverted to keep the overlay minimal (re-pin + grace
only). Prepull-skip is harmless because bitnamilegacy/discourse:3.3.1 is warm in the node image cache
→ inline pull is a no-op. Timeout headroom (3600s) retained in recipe_meta.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 01:25:47 +00:00
04cc44c15e fix(2): discourse base-deploy timeout — prepull-enable (sidekiq depends_on app, valid compose) + 3600s timeout
full4 base deploy timed out at 2400s on the 7-GiB single node. Root causes:
(1) sidekiq.depends_on referenced undefined service 'discourse' (main svc is 'app') → abra config
    --images rc=15 → prepull SKIPPED → 2.4GB image pulled inline during deploy, eating convergence
    budget. Overlay now overrides sidekiq.depends_on:[app] (swarm ignores depends_on → no-op at
    runtime, masks nothing) so prepull resolves+pre-pulls images on both base+head deploys.
(2) bumped DEPLOY_TIMEOUT/TIMEOUT 2400→3600 for headroom on the RAM/CPU-constrained Rails cold boot.
Also pre-cached bitnamilegacy/discourse:3.3.1 by tag on cc-ci (was dangling <none>).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 01:23:38 +00:00
8d689d6c32 fix(2): discourse — mint_admin ruby PATH (bash -c + discover) + BACKUP_VERIFY for post-upgrade backup race 2026-05-31 00:28:21 +00:00
3a612fc733 fix(2): ghost BACKUP_VERIFY — drop __file__ (recipe_meta is exec'd, no __file__); import harness directly
full9: backup tier FAILed with NameError('__file__' not defined) — recipe_meta.py is exec()'d into a
bare namespace so __file__ is undefined. The harness already has runner/ on sys.path + harness imported,
so import lifecycle directly. (restore PASSED on full9 — the data-integrity fix works; this just fixes
the verify probe crashing the backup tier.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:49:08 +00:00
68a7c79668 fix(2): ghost F2-14b — harness BACKUP_VERIFY hook + retry; close the backup-capture race
Root cause (instrumented, DECISIONS 2026-05-30): a DB recipe dumps its data in a backupbot pre-hook,
but if the DB container cycles mid-dump (intermittent on the loaded CI node — full5/6/7 RED, full8
green; NOT OOM/NOT healthcheck) the dump is truncated/absent and restic snapshots an empty path —
abra app backup 'succeeds' yet a later restore silently loses the data (ghost ci_marker).

Fix (additive, recipe-scoped via meta like READY_PROBE): recipe_meta may define BACKUP_VERIFY(domain)
-> bool, a READ-ONLY post-backup integrity probe. When it returns False the harness re-runs the whole
backup (fresh snapshot, re-stabilised db) up to 3x. Recipes without the hook are unaffected. ghost's
BACKUP_VERIFY confirms /var/lib/mysql/backup.sql.gz is a valid non-empty gzip. Weakens no assertion —
it only retries a flaky CAPTURE so P4 restore is RELIABLY exercised, not luck-dependent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:30:25 +00:00
4a160f6121 fix(2): ghost F2-14b — bump DEPLOY_TIMEOUT/TIMEOUT 1200→2400s for slow mysql cold-init + migration
full4 timed out: abra deploy killed at 1200s while the app was at the near-final email_recipients
migration tables (still 0/1). Wall-time = mysql fresh-dir init (~6min, app crash-loops on ECONNREFUSED
until DB ready — no migration progress lost) + ~9-15min schema migration (round-trip-bound, slower
under host load). Not a test weakening — bounded wait (matches discourse), a genuine hang still fails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:54:20 +00:00
845b86c868 feat(2): discourse Q4.6 — upgrade-to-latest 0.7.0 base-repin+grace overlay (compose.ccci.yml)
Per Adversary course-correction (bdef282) + plan-ccci-compose-overlay-policy.md §1: upgrade-to-latest
is MANDATORY. The 0.7.0+3.3.1 from-version pins the Docker-Hub-removed bitnami/discourse:3.3.1 (404)
and ships a too-tight 5m start_period for the 15-25min Rails cold boot. Minimal base overlay
compose.ccci.yml re-pins app+sidekiq to bitnamilegacy/discourse:3.3.1 (namespace-only, identical
image — same re-pin the PR head makes) + widens start_period to 20m (grace-only). install_steps.sh
provides it; CHAOS_BASE_DEPLOY skips the clean-tree gate; UPGRADE_BASE_VERSION=0.7.0+3.3.1 sets the
true predecessor. Neither change weakens a test. Run shape returns to STAGES=install,upgrade,backup,
restore,custom.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:29:41 +00:00
3ca45c7308 fix(2): ghost F2-14b — add db start_period grace to base overlay
Run #2 base deploy: fresh mysql:8.0 init on the loaded cc-ci host (load ~8) took >6min
(InnoDB ~90s + system-tables + root-pw apply, starved by the app crash-loop churn), exceeding
the recipe's 1m db start_period (+6min retry grace) → swarm killed mysql mid-init (exit 137
unhealthy) → corrupt InnoDB redo logs → permanent deadlock (same signature as run #1's stale
vol). Widen db healthcheck start_period to 15m (matches app) so the slow first-boot finishes
before the healthcheck can fail it. Grace-only, masks no defect; bites base+head (published
recipe ships db start_period 1m everywhere) so overlay covers both. Torn down corrupt vol.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 17:58:30 +01:00
7feeadd0ec feat(2): ghost F2-14b — upgrade-to-latest base-grace overlay (compose.ccci.yml)
Course correction (REVIEW-2 bdef282) mandates upgrade-to-latest; harness base-deploys
prev published version 1.1.1+6-alpine which predates the recipe-PR 15m start_period bump
(ships 1m) → would deadlock on the ~6-9min fresh-DB migration (swarm kill mid-migration →
held migrations_lock). Policy-blessed minimal base overlay: compose.ccci.yml re-applies the
15m app-healthcheck start_period grace to the BASE so the from-version is deployable;
install_steps.sh provides it; CHAOS_BASE_DEPLOY skips clean-tree on the untracked overlay;
persists across head checkout (idempotent — PR head ships 15m). Grace-only, no test weakened.
Prior corrupt mysql vol (stale, interrupted init) torn down. Next: full run incl upgrade.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 17:49:05 +01:00
0f2cc2d704 feat(2): ghost F2-14b overlay migration — start_period bump moved to recipe-PR (ghost#1 head ae43ffe, literal 15m on app healthcheck); DELETE cc-ci compose.ccci-health.yml + install_steps.sh + COMPOSE_FILE/CHAOS_BASE_DEPLOY. Anti-drift (plan §9): recipe-as-tested == recipe-as-published. env-var start_period impossible (abra pre-subst duration validation, Adversary-reproduced 4b862f6). Next: run ghost on ae43ffe head. 2026-05-30 17:20:20 +01:00
fb20321bd9 feat(2): discourse start_period via literal recipe-PR bump (abra can't env-interpolate start_period)
abra rejects env-interpolation in healthcheck start_period (FATA 'Does not match
format duration' for both ${VAR} and quoted forms — validates the literal compose
duration before .env substitution). So §9 pt1's env-var route is impossible for
this field; the §9-compliant fix is a LITERAL start_period:20m bump in the
recipe-PR (recipe everyone runs, not a cc-ci overlay; strictly safer). Remove
APP_START_PERIOD from recipe_meta EXTRA_ENV; record the finding in DECISIONS
(ghost E1 must use the same approach); STATUS-2 → new PR head 7a2e0e0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:24:45 +01:00
c346b9763b feat(2): discourse Q4.6 policy-compliant shape (plan §9) — env-var start_period, delete cc-ci overlay, upgrade N/A
Migrate discourse off the cc-ci compose overlay per plan §9 / plan-prefer-env-over-compose-overlay.md:
- recipe_meta: drop UPGRADE_BASE_VERSION + COMPOSE_FILE + CHAOS_BASE_DEPLOY; set APP_START_PERIOD=1200s
  via EXTRA_ENV (the recipe-PR exposes start_period: ${APP_START_PERIOD:-5m}); declare upgrade tier N/A
  (both published prev bases pin removed bitnami images; Adversary §7.1 granted, REVIEW-2 efe3790).
- delete tests/discourse/compose.ccci-health.yml + install_steps.sh (existed only to copy the overlay).
- DECISIONS.md + STATUS-2 record the §9 guardrail + discourse shape (upgrade N/A, env start_period,
  pg_backup restore-hook recipe-PR = 5th data-loss recipe cc-ci caught).
recipe-PR head now 8b8df17 (start_period env var added). Not a claim — run STAGES=install,backup,restore,custom next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 15:47:28 +01:00
a750937fb0 feat(2): discourse Q4.6 honest upgrade crossover — UPGRADE_BASE_VERSION override (base-on-[-1]) + uniform bitnamilegacy image overlay
Implements the real 0.7.0+3.3.1 -> 0.8.0+3.3.1 upgrade crossover instead of a
§7.1 skip-with-sign-off (Adversary leans DENY on the deferral; agreed):
- recipe_meta UPGRADE_BASE_VERSION=0.7.0+3.3.1 + generic support in
  run_recipe_ci (prev = meta override or previous_version). Harness default
  [-2]=0.6.3+3.1.2 is a hollow base (img 3.1.2 != head 3.3.1); [-1]=0.7.0+3.3.1
  is the PR's true predecessor and shares head's servable 3.3.1 image.
- compose.ccci-health.yml re-pins services.{app,sidekiq}.image to
  bitnamilegacy/discourse:3.3.1 so the 0.7.0 base (compose pins 404 bitnami:3.3.1)
  is servable; idempotent on the head (PR already bitnamilegacy).
Consumes Adversary BUILDER-INBOX (deleted), leaves ADVERSARY-INBOX ack; STATUS-2
discourse section updated. Full lifecycle run launching next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:20:06 +01:00
d822550c7d feat(2): discourse P3 functional tests — §4.3 create-topic round-trip + site.json config + admin-bootstrap helper
_discourse.py: bootstrap an admin (recipe seeds none) + mint an ApiKey via rails runner in the app
container (class-B run-scoped). test_create_topic.py: POST /posts.json (unique marker) -> GET
/t/<id>.json title+cooked round-trip. test_site_basic.py: GET /site.json asserts discourse categories
config. Meets P3 (>=2 functional beyond health).
2026-05-30 12:52:30 +01:00
0e3049b677 fix(2): discourse health overlay add version 3.8 (lint R011/R012 version-mismatch FATA vs compose.yml 3.8) 2026-05-30 12:09:51 +01:00
b2ed6cf989 fix(2): discourse recipe_meta — wire COMPOSE_FILE+CHAOS_BASE_DEPLOY+TIMEOUT 2400 (the overlay's missing half; prior commit a432058 only added the files) 2026-05-30 11:49:51 +01:00
a432058aca fix(2): discourse healthcheck start_period overlay (slow Rails boot) + CHAOS_BASE_DEPLOY + TIMEOUT 2400
Install timed out at 1800s: discourse's 15-25min Rails cold boot overran both the deploy timeout and
the recipe healthcheck start_period:5m (swarm killed the booting app). Add compose.ccci-health.yml
(app healthcheck start_period 1200s) via install_steps.sh + recipe_meta COMPOSE_FILE + CHAOS_BASE_DEPLOY,
bump DEPLOY_TIMEOUT/TIMEOUT to 2400. Image re-pin (bitnamilegacy) already proven working. NO test weakened.
2026-05-30 11:48:18 +01:00
13da216f8d fix(2): ghost healthcheck start_period overlay — fixes fresh-migration lock deadlock
Root cause: Ghost's fresh-DB first boot runs a ~6-9min schema migration (round-trip-bound, not CPU);
the recipe healthcheck start_period:1m (~6min grace) kills the still-migrating task, leaving a stale
migrations_lock → every later task deadlocks (MigrationsAreLockedError). Hit on both 2- and 4-vCPU.
Fix (cc-ci deploy overlay, NOT a recipe/test change): compose.ccci-health.yml raises app healthcheck
start_period to 900s, wired via recipe_meta COMPOSE_FILE + install_steps.sh (+ CHAOS_BASE_DEPLOY for
the untracked overlay). No assertion weakened. Budget 1200s = migration + convergence. Only the
install tier needs it (upgrade redeploys on the populated DB → fast boot).
2026-05-30 05:23:47 +01:00