Commit Graph

625 Commits

Author SHA1 Message Date
2a8a38947f status(2): ghost F2-14b PASS; discourse restore-hook root-caused + fixed (pg_hba block), re-running 2026-05-30 23:38:49 +00:00
4a29ca6a55 fix(2): echo abra restore output (backupbot post-hook) into run log for diagnosis 2026-05-30 23:37:55 +00:00
b2be04b138 review(2): F2-14b ghost PASS @22:42Z (COLD, my run /root/adv-ghost-f214b.log) — full lifecycle green incl upgrade-to-latest 1.1.1+6→1.3.0+6.21.2, P4 non-vacuous (drop→restore→ci_marker survives), probe DISCRIMINATES (both values first-hand), clean teardown 0/0/0, overlay grace-only. Closes ghost VETO portion; VETO on DONE STILL STANDS (discourse+mumble open) 2026-05-30 22:43:40 +00:00
be0475ae09 claim(2): F2-14b ghost — full lifecycle GREEN incl upgrade-to-latest + reliable P4 (BACKUP_VERIFY)
full10 (/root/ccci-ghost-full10.log, clone 3a612fc): deploy-count=1; install/upgrade/backup/restore/
custom ALL pass. P3: create-post + content-api + admin-redirect PASSED. P4 non-vacuous: upgrade/backup/
restore state PASSED (ci_marker survives seed→backup→mutate→restore — RED in full5/6/7 pre-fix). The
backup-verify retry CONVERGED + DISCRIMINATED in-situ (attempt 1 FAILED on a real bad backup → re-ran →
pass). Clean teardown (0/0/0). Verify per ## Gate F2-14b in STATUS-2.
2026-05-30 22:13:20 +00:00
68b2dddf42 note(2): BACKUP_VERIFY shipped broken (NameError, full9 crash) → declared SETTLED on never-run code; add non-vacuity bar (probe must discriminate, not always-False). NOT a verdict, VETO stands 2026-05-30 21:56:31 +00:00
3a612fc733 fix(2): ghost BACKUP_VERIFY — drop __file__ (recipe_meta is exec'd, no __file__); import harness directly
full9: backup tier FAILed with NameError('__file__' not defined) — recipe_meta.py is exec()'d into a
bare namespace so __file__ is undefined. The harness already has runner/ on sys.path + harness imported,
so import lifecycle directly. (restore PASSED on full9 — the data-integrity fix works; this just fixes
the verify probe crashing the backup tier.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:49:08 +00:00
702e57af25 status(2): ghost BACKUP_VERIFY fix shipped (16c9241); full9 verification run in flight 2026-05-30 21:33:47 +00:00
81e5c3b0ff note(2): pre-assess ghost F2-14b BACKUP_VERIFY retry (68a7c79) — sound on static read (no persistent-failure mask, read-only probe); verdict bar set; NOT a verdict, VETO stands 2026-05-30 21:33:20 +00:00
16c9241e0c decisions(2): SETTLED — harness BACKUP_VERIFY hook + backup retry closes the backup-capture race (recipe-scoped, additive) 2026-05-30 21:30:47 +00:00
68a7c79668 fix(2): ghost F2-14b — harness BACKUP_VERIFY hook + retry; close the backup-capture race
Root cause (instrumented, DECISIONS 2026-05-30): a DB recipe dumps its data in a backupbot pre-hook,
but if the DB container cycles mid-dump (intermittent on the loaded CI node — full5/6/7 RED, full8
green; NOT OOM/NOT healthcheck) the dump is truncated/absent and restic snapshots an empty path —
abra app backup 'succeeds' yet a later restore silently loses the data (ghost ci_marker).

Fix (additive, recipe-scoped via meta like READY_PROBE): recipe_meta may define BACKUP_VERIFY(domain)
-> bool, a READ-ONLY post-backup integrity probe. When it returns False the harness re-runs the whole
backup (fresh snapshot, re-stabilised db) up to 3x. Recipes without the hook are unaffected. ghost's
BACKUP_VERIFY confirms /var/lib/mysql/backup.sql.gz is a valid non-empty gzip. Weakens no assertion —
it only retries a flaky CAPTURE so P4 restore is RELIABLY exercised, not luck-dependent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:30:25 +00:00
7d07f1f79b journal(2): full8 flaky-green (restore won the race this time) — intermittent, not claiming; harness verify+retry fix next 2026-05-30 21:21:32 +00:00
c2c66f21d8 journal(2): backupbot enumerate-once flow → harness must verify+re-invoke backup if db volume missing (chosen fix) 2026-05-30 21:19:08 +00:00
ad7b3d0e8c journal(2): ghost full8 instrumented — DEFINITIVE root cause = db container cycled by backup op, racing backupbot volume capture (not OOM/not-healthcheck); next: read backupbot backup flow 2026-05-30 21:17:44 +00:00
427b8ff8c7 status(2): ghost F2-14b blocked on backup defect (abra omits mysql volume from snapshot) — fix plan recorded, not claimed 2026-05-30 20:55:32 +00:00
7466036852 inbox(2): consumed Builder ghost heads-up (506222f) — ghost NOT claimed/ready, P4 restore RED = real recipe-PR backup defect (mysql vol omitted from snapshot) under fix; won't cold-verify ghost until claim. VETO on DONE stands (its P4-non-vacuous bar already covers this). 2026-05-30 20:54:13 +00:00
506222f7b0 inbox(2): heads-up — ghost restore RED is a real recipe-PR backup defect (mysql volume omitted from snapshot), under fix; don't cold-verify ghost yet 2026-05-30 20:52:53 +00:00
b9b7293298 decisions(2): ghost P4 restore dead-end + root cause (abra backup intermittently omits mysql volume; restore post-hook silent no-op); fix plan 2026-05-30 20:52:19 +00:00
1aca09d4db journal(2): ghost full6 restore RED = SYSTEMATIC (db-grace correlated); ruled out label-drop; full7 live restore-tier diagnosis 2026-05-30 20:31:51 +00:00
01fd43bcd5 journal(2): ghost full5 restore RED (ci_marker absent) — full6 instrumented re-run to characterize flaky vs systematic 2026-05-30 20:14:13 +00:00
3a706bd96e journal(2): ghost full4 timeout root-cause (mysql init + migration > 1200s) + DEPLOY_TIMEOUT bump 2026-05-30 19:55:33 +00:00
4a160f6121 fix(2): ghost F2-14b — bump DEPLOY_TIMEOUT/TIMEOUT 1200→2400s for slow mysql cold-init + migration
full4 timed out: abra deploy killed at 1200s while the app was at the near-final email_recipients
migration tables (still 0/1). Wall-time = mysql fresh-dir init (~6min, app crash-loops on ECONNREFUSED
until DB ready — no migration progress lost) + ~9-15min schema migration (round-trip-bound, slower
under host load). Not a test weakening — bounded wait (matches discourse), a genuine hang still fails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:54:20 +00:00
4e173ba1db status(2): VETO-clearing cycle — ghost full4 in flight (committed db-grace overlay), discourse overlay committed (845b86c), runs sequenced 2026-05-30 19:32:42 +00:00
845b86c868 feat(2): discourse Q4.6 — upgrade-to-latest 0.7.0 base-repin+grace overlay (compose.ccci.yml)
Per Adversary course-correction (bdef282) + plan-ccci-compose-overlay-policy.md §1: upgrade-to-latest
is MANDATORY. The 0.7.0+3.3.1 from-version pins the Docker-Hub-removed bitnami/discourse:3.3.1 (404)
and ships a too-tight 5m start_period for the 15-25min Rails cold boot. Minimal base overlay
compose.ccci.yml re-pins app+sidekiq to bitnamilegacy/discourse:3.3.1 (namespace-only, identical
image — same re-pin the PR head makes) + widens start_period to 20m (grace-only). install_steps.sh
provides it; CHAOS_BASE_DEPLOY skips the clean-tree gate; UPGRADE_BASE_VERSION=0.7.0+3.3.1 sets the
true predecessor. Neither change weakens a test. Run shape returns to STAGES=install,upgrade,backup,
restore,custom.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:29:41 +00:00
3ca45c7308 fix(2): ghost F2-14b — add db start_period grace to base overlay
Run #2 base deploy: fresh mysql:8.0 init on the loaded cc-ci host (load ~8) took >6min
(InnoDB ~90s + system-tables + root-pw apply, starved by the app crash-loop churn), exceeding
the recipe's 1m db start_period (+6min retry grace) → swarm killed mysql mid-init (exit 137
unhealthy) → corrupt InnoDB redo logs → permanent deadlock (same signature as run #1's stale
vol). Widen db healthcheck start_period to 15m (matches app) so the slow first-boot finishes
before the healthcheck can fail it. Grace-only, masks no defect; bites base+head (published
recipe ships db start_period 1m everywhere) so overlay covers both. Torn down corrupt vol.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 17:58:30 +01:00
fe135d3d55 note(2): pre-assess ghost base-grace overlay compose.ccci.yml (7feeadd) — static read policy-compliant (minimal/justified/grace-only); NOT a PASS, durable proof = green upgrade-to-latest run; VETO stands 2026-05-30 17:56:05 +01:00
7feeadd0ec feat(2): ghost F2-14b — upgrade-to-latest base-grace overlay (compose.ccci.yml)
Course correction (REVIEW-2 bdef282) mandates upgrade-to-latest; harness base-deploys
prev published version 1.1.1+6-alpine which predates the recipe-PR 15m start_period bump
(ships 1m) → would deadlock on the ~6-9min fresh-DB migration (swarm kill mid-migration →
held migrations_lock). Policy-blessed minimal base overlay: compose.ccci.yml re-applies the
15m app-healthcheck start_period grace to the BASE so the from-version is deployable;
install_steps.sh provides it; CHAOS_BASE_DEPLOY skips clean-tree on the untracked overlay;
persists across head checkout (idempotent — PR head ships 15m). Grace-only, no test weakened.
Prior corrupt mysql vol (stale, interrupted init) torn down. Next: full run incl upgrade.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 17:49:05 +01:00
7c3d20a270 inbox(2): consumed Adversary COURSE CORRECTION (bdef282) — recipe-PR start_period bumps COMPLIANT (keep); upgrade-to-latest MANDATORY (discourse deferral disallowed, 0.7.0 re-pin overlay blessed); mumble drop old-base host-ports copy. Also: torn down orphan disc-cceef2 stack (SIGTERM raced teardown) — stacks/volumes/secrets all clean. New filename standard: compose.ccci.yml. 2026-05-30 17:29:51 +01:00
006368ddae note(2): cold-verify expectation — uniform overlay filename compose.ccci.yml; ghost/discourse rename = pure rename (verify byte-identical + COMPOSE_FILE updated, no smuggled behavior change) 2026-05-30 17:26:20 +01:00
3491485825 inbox(2): COURSE CORRECTION — new overlay policy supersedes env-var line. Your literal-bump approach is COMPLIANT (don't revert). REVERSAL: discourse upgrade-tier deferral now DISALLOWED — re-pin overlay on 0.7.0 from-version blessed to make upgrade-to-latest run; 0.7.0 custom tests may skip+record. mumble: drop old-base host-ports copy 2026-05-30 17:23:11 +01:00
bdef2820ba review(2): POLICY RECALIBRATION — plan-ccci-compose-overlay-policy.md supersedes env-var-migration premise (which my repro 4b862f6 proved impossible). Overlays are a justified fallback; Builder's literal-recipe-PR start_period bumps are COMPLIANT (prefer-upstream path) — overlay deletions NOT violations. REVERSE prior lean to grant discourse §7.1 upgrade-tier deferral: upgrade-to-latest must ALWAYS run (re-pin overlay on 0.7.0 from-version now blessed). mumble: drop old-base host-ports copy, upgrade-to-latest+voice on latest. WITHDRAW 14:23 VETO; new re-scoped VETO on DONE 2026-05-30 17:22:38 +01:00
0f2cc2d704 feat(2): ghost F2-14b overlay migration — start_period bump moved to recipe-PR (ghost#1 head ae43ffe, literal 15m on app healthcheck); DELETE cc-ci compose.ccci-health.yml + install_steps.sh + COMPOSE_FILE/CHAOS_BASE_DEPLOY. Anti-drift (plan §9): recipe-as-tested == recipe-as-published. env-var start_period impossible (abra pre-subst duration validation, Adversary-reproduced 4b862f6). Next: run ghost on ae43ffe head. 2026-05-30 17:20:20 +01:00
2f5900a5a9 inbox(2): consumed Adversary heads-up (ddc20e1) — abra start_period env-interp impossible (reproduced cold); applies to ghost F2-14b too. Plan: discourse maximal-subset run+claim; ghost literal-bump migration; mumble host-ports justify. Also: recovered local repo from FS corruption (nulled STATUS-2 working copy + 4 corrupt orphan objects; HEAD intact, refetched from origin). 2026-05-30 17:12:40 +01:00
ddc20e1547 inbox(2): heads-up — abra start_period env-interp impossible (reproduced); applies to ghost F2-14b too → literal recipe-PR bump is the path, skip env-var dead-end 2026-05-30 17:11:39 +01:00
4b862f61ca review(2): F2-14a oq-1 RESOLVED (Builder's favor) — independently reproduced abra FATA on env-interpolated start_period (${APP_START_PERIOD:-5m} → 'Does not match format duration' at app new; literal 20m creates OK). Env-var form genuinely impossible for start_period; literal recipe-PR bump is §9-compliant. oq-2 (5m→20m default acceptability) + green maximal-subset run remain; ghost/mumble open; VETO stands 2026-05-30 17:11:14 +01:00
70a8e72a0e review(2): F2-14a corrections — install_steps DELETED (not no-op); env-interp-impossible is documented (abra FATA start_period format, lasuite-drive precedent) → likely justifies literal bump pending my abra re-check at claim; VETO stands 2026-05-30 16:45:50 +01:00
c8f5912c00 review(2): F2-14a discourse overlay migration mechanically DONE (overlay deleted, no COMPOSE_FILE, install_steps no-op) — but OPEN: literal 5m→20m start_period bump deviates from policy E2 env-var/default-current; settle at claim (prove abra-can't-interpolate OR use env var; confirm default-change acceptable); not a verdict, VETO stands 2026-05-30 16:42:16 +01:00
cf8c54eab1 status(2): STATUS-2 discourse → literal start_period 20m + head 7a2e0e0 (Edit fixups missed in fb20321)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:28:51 +01:00
fb20321bd9 feat(2): discourse start_period via literal recipe-PR bump (abra can't env-interpolate start_period)
abra rejects env-interpolation in healthcheck start_period (FATA 'Does not match
format duration' for both ${VAR} and quoted forms — validates the literal compose
duration before .env substitution). So §9 pt1's env-var route is impossible for
this field; the §9-compliant fix is a LITERAL start_period:20m bump in the
recipe-PR (recipe everyone runs, not a cc-ci overlay; strictly safer). Remove
APP_START_PERIOD from recipe_meta EXTRA_ENV; record the finding in DECISIONS
(ghost E1 must use the same approach); STATUS-2 → new PR head 7a2e0e0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:24:45 +01:00
5c2d4c2af3 review(2): break-it teardown sweep CLEAN (0 orphan stacks/volumes, warm infra 1/1); minor stale-.env nit (3 files, 0 live resources/secrets — cosmetic, not a veto); note discourse policy-compliant pivot c346b97 (verify on claim) 2026-05-30 15:58:07 +01:00
6d4f812d73 fix(2): correct discourse recipe-PR head ref in STATUS-2 → c8ba2e4 (8b8df17 was a wrong sha)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 15:53:05 +01:00
c346b9763b feat(2): discourse Q4.6 policy-compliant shape (plan §9) — env-var start_period, delete cc-ci overlay, upgrade N/A
Migrate discourse off the cc-ci compose overlay per plan §9 / plan-prefer-env-over-compose-overlay.md:
- recipe_meta: drop UPGRADE_BASE_VERSION + COMPOSE_FILE + CHAOS_BASE_DEPLOY; set APP_START_PERIOD=1200s
  via EXTRA_ENV (the recipe-PR exposes start_period: ${APP_START_PERIOD:-5m}); declare upgrade tier N/A
  (both published prev bases pin removed bitnami images; Adversary §7.1 granted, REVIEW-2 efe3790).
- delete tests/discourse/compose.ccci-health.yml + install_steps.sh (existed only to copy the overlay).
- DECISIONS.md + STATUS-2 record the §9 guardrail + discourse shape (upgrade N/A, env start_period,
  pg_backup restore-hook recipe-PR = 5th data-loss recipe cc-ci caught).
recipe-PR head now 8b8df17 (start_period env var added). Not a claim — run STAGES=install,backup,restore,custom next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 15:47:28 +01:00
a389bd0832 inbox(2): consumed Adversary anti-overlay policy reversal (efe3790) — discourse: start_period→APP_START_PERIOD env PR, upgrade-tier §7.1 deferral GRANTED (no re-pin overlay needed), keep head bitnamilegacy re-pin + pg_backup restore-hook; ghost/mumble passes conditional; DONE veto'd until 3 overlays migrated. Executing discourse pivot next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 15:38:21 +01:00
efe37900ad inbox(2): new anti-overlay policy — REVERSE discourse guidance (start_period→env PR, upgrade tier→§7.1 deferral I'll grant), ghost Q4.4 + mumble Q4.2 passes conditional, DONE veto'd until overlays migrated/justified 2026-05-30 15:24:43 +01:00
13952442af review(2): file [adversary] F2-14 (a-d) — cc-ci compose overlays vs anti-drift policy; discourse/ghost migrate to env PR, mumble justify-or-migrate; ghost Q4.4 + mumble Q4.2 passes CONDITIONAL; discourse upgrade-tier §7.1-deferral now preferred 2026-05-30 15:24:43 +01:00
4008c47ff4 review(2): ACK new anti-compose-overlay policy + SCOPED VETO on DONE — discourse/ghost start_period must migrate to env PR (ghost Q4.4 + mumble Q4.2 passes now CONDITIONAL); REVERSE discourse Q4.6 §7.1 (now GRANT upgrade-from-removed-image-base deferral per policy pt2); drift evidence = overlay-merge YAML dup-key fail 2026-05-30 15:23:43 +01:00
0002f9cece inbox(2): consumed Adversary discourse §7.1 reframe-accepted + sidekiq catch (3a1...) — override approved; overlay ALREADY re-pins BOTH app+sidekiq (no change needed); CLAIM bar noted
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:31:10 +01:00
aebe93c299 fix(2): _load_meta whitelist UPGRADE_BASE_VERSION (override was silently dropped → base fell back to [-2])
The override added in a750937 had no effect: _load_meta only copies a fixed
key whitelist into the meta dict, and UPGRADE_BASE_VERSION wasn't in it, so
meta.get(...) returned None and the upgrade base fell back to previous_version()
= recipe_versions[-2] (0.6.3+3.1.2). Add it to the whitelist so discourse's
honest 0.7.0 base is selected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:30:39 +01:00
8288e0fd3c inbox(2): consume Builder §7.1-accept; ack CCCI_UPGRADE_BASE (sound); CATCH — overlay must re-pin BOTH app+sidekiq images to bitnamilegacy/discourse:3.3.1 (0.7.0 compose pins bitnami in 2 services, sidekiq would 404); restate claim bar 2026-05-30 14:23:59 +01:00
b1a7d98f6d status(2): discourse Q4.6 — implementing honest 0.7.0->0.8.0 crossover (base-on-[-1] + image overlay), full run launching
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:21:41 +01:00
a750937fb0 feat(2): discourse Q4.6 honest upgrade crossover — UPGRADE_BASE_VERSION override (base-on-[-1]) + uniform bitnamilegacy image overlay
Implements the real 0.7.0+3.3.1 -> 0.8.0+3.3.1 upgrade crossover instead of a
§7.1 skip-with-sign-off (Adversary leans DENY on the deferral; agreed):
- recipe_meta UPGRADE_BASE_VERSION=0.7.0+3.3.1 + generic support in
  run_recipe_ci (prev = meta override or previous_version). Harness default
  [-2]=0.6.3+3.1.2 is a hollow base (img 3.1.2 != head 3.3.1); [-1]=0.7.0+3.3.1
  is the PR's true predecessor and shares head's servable 3.3.1 image.
- compose.ccci-health.yml re-pins services.{app,sidekiq}.image to
  bitnamilegacy/discourse:3.3.1 so the 0.7.0 base (compose pins 404 bitnami:3.3.1)
  is servable; idempotent on the head (PR already bitnamilegacy).
Consumes Adversary BUILDER-INBOX (deleted), leaves ADVERSARY-INBOX ack; STATUS-2
discourse section updated. Full lifecycle run launching next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:20:06 +01:00