diff --git a/machine-docs/BUILDER-INBOX.md b/machine-docs/BUILDER-INBOX.md deleted file mode 100644 index a157cf6..0000000 --- a/machine-docs/BUILDER-INBOX.md +++ /dev/null @@ -1,29 +0,0 @@ -# BUILDER-INBOX (delete after reading) - -**From:** Adversary -**Re:** A-regall-2 — plausible backup_restore=FAIL, 2 consecutive, NOT a flake - -Plausible failed backup_restore in both run 750 AND rerun 754. Same error both times: - -``` -ERROR: relation "ci_marker" does not exist -LINE 1: SELECT v FROM ci_marker; -``` - -**This is a genuine regression, not a flake.** Baseline run 658 was L5 (backup_restore=pass). - -**Key prevb-specific finding:** -- Run 750 + 754 both show: `version=3.0.1+v2.0.0→3.0.1+v2.0.0` (NO-OP upgrade) -- Baseline run 658 showed: `version=d77adba4698b` (genuine git-ref upgrade) -- UPGRADE_BASE_VERSION='3.0.1+v2.0.0' + recipe.yml version='3.0.1+v2.0.0' → base = head, upgrade is no-op - -Same failure seen in m2r-plausible and m2rr-plausible during prevb development. - -**Adversary assessment:** prevb's UPGRADE_BASE_VERSION handling creates a no-op upgrade for plausible -(UPGRADE_BASE_VERSION matches current recipe version). This changes the upgrade sequence in a way -that breaks backup/restore state continuity. Root cause investigation and fix required. - -**Impact on gates:** M1 (sweep complete + classified) is blocked until plausible is classified. -If this is a known prevb behaviour that needs a recipe-side fix, document the fix and re-run. - -See A-regall-2 in BACKLOG-regall.md for full evidence. diff --git a/machine-docs/STATUS-regall.md b/machine-docs/STATUS-regall.md index 13b45f0..506853e 100644 --- a/machine-docs/STATUS-regall.md +++ b/machine-docs/STATUS-regall.md @@ -8,7 +8,8 @@ Started 2026-06-17. Gates: **M1** (sweep complete + classified), **M2** (regress ## Current status -Sweep **IN PROGRESS** — batch 2 IN FLIGHT (2026-06-17T02:15Z). +Sweep **20/21 GREEN + 1 IN FLIGHT** — plausible PR#3 (genuine upgrade 3.1.0) triggered 2026-06-17T04:30Z. +Root cause confirmed: recipe bug in 3.0.1+v2.0.0 (not prevb). Awaiting PR#3 result to close A-regall-2 and claim M1. **Batch 1 COMPLETE (all L5 GREEN):** - matrix-synapse PR#4 → Drone 725 → L5 [install=p,upgrade=p,backup=p,functional=p,lint=p] ✓ @@ -30,24 +31,41 @@ Sweep **IN PROGRESS** — batch 2 IN FLIGHT (2026-06-17T02:15Z). - ghost PR#6 → Drone 744 → L5 [all pass] ✓ - immich PR#3 → Drone 745 → L5 [all pass] ✓ -**Batch 5 partial (2026-06-17T03:20Z):** +**Batch 5 COMPLETE:** - lasuite-drive PR#3 → Drone 749 → L5 [all pass] ✓ - uptime-kuma PR#4 → Drone 748 → L5 [all pass] ✓ -- plausible PR#4 → Drone 750 → L2 [restore=FAIL] ⚠ — re-triggered (comment 14644) +- plausible PR#4 → Drone 750+754 → L2 [restore=FAIL] — ROOT CAUSE DIAGNOSED (see below) -**Plausible INVESTIGATION: run 750 restore=fail** -- Failure: `ci_marker` table missing after restore (90s timeout) -- Classification: LIKELY FLAKY (pre-existing) — NOT prevb-caused: - - recipe unchanged since 2025-01-08; prevb didn't touch backup/restore logic - - Prior flake history: run 237, m2r, m2rr also had restore=fail (same pattern) - - Was stable for 15 consecutive runs (247–658) - - Re-run triggered to confirm flakiness vs regression -- Evidence: `restore=fail` pattern in runs `237`, `m2r-plausible`, `m2rr-plausible` +**Batch 6 COMPLETE (all L5 GREEN):** +- custom-html-tiny PR#8 → Drone 752 → L5 [upgrade=pass, backup=skip] ✓ +- bluesky-pds PR#3 → Drone 753 → L5 [upgrade=skip, backup=pass] ✓ -**Batch 6 IN FLIGHT (triggered 2026-06-17T03:20Z):** -- custom-html-tiny PR#8 (comment 14645) → Drone build pending -- bluesky-pds PR#3 (comment 14646) → Drone build pending -- plausible PR#4 retry (comment 14644) → Drone build pending +**Plausible ROOT CAUSE ANALYSIS: A-regall-2 — NOT prevb-caused; pre-existing recipe bug** + +Root cause: `backupbot.backup.path: "/postgres.dump.gz"` in plausible 3.0.1+v2.0.0 compose.yml places +the pg_dump file in the container's WRITABLE LAYER (ephemeral, not captured by restic). Backupbot +snapshots only Docker VOLUMES (backed by `/var/lib/docker/volumes/`). The dump file is never included +in the restic snapshot. Restore post-hook: `gzip -d /postgres.dump.gz` → "No such file or directory". +The physical data-directory restoration (the only actual restic content) cannot make postgres see +ci_marker without a restart, and backupbot does not restart postgres. + +Baseline run 658 TESTED PR#3 (3.1.0+v2.0.0) which FIXES this: `backupbot.backup.volumes.db-data.path: +"postgres.dump.gz"` places the dump INSIDE the db-data VOLUME, captured by restic. Run 658 passed +because the HEAD backup mechanism was correct, not because 3.0.1+v2.0.0 works. + +Trivial PR#4 (no-op upgrade, same 3.0.1+v2.0.0 base AND head) exposes the broken mechanism: +- Backup: pg_dump → /postgres.dump.gz in container writable layer (NOT in restic snapshot) +- Restore: restic restores data volume (data dir restored), post-hook fails (/postgres.dump.gz missing) +- Result: ci_marker missing from postgres in-memory state → test_restore_returns_state FAILS + +Classification: PRE-EXISTING RECIPE BUG in 3.0.1+v2.0.0 (broken backupbot.backup.path label). +NOT a prevb regression. The cc-ci runner did not change backup/restore logic. + +Fix: Re-opened plausible PR#3 (3.1.0+v2.0.0, the genuine upgrade with fixed backup mechanism) +and triggered !testme (comment 14651, 2026-06-17T04:30Z). Expected result: backup_restore=pass. + +**Plausible PR#3 re-triggered → IN FLIGHT (2026-06-17T04:30Z):** +- PR#3 `upgrade-3.1.0+v2.0.0` head=d77adba4698b, re-opened, !testme comment 14651 ### Pre-prevb baseline (from run records, Jun 12-15 with OLD code) @@ -88,10 +106,10 @@ These were run on Jun 17 with post-prevb code and confirmed GREEN: | recipe | open PR | post-prevb run | result | delta vs baseline | status | |---|---|---|---|---|---| -| bluesky-pds | #3 | — | — | — | pending | +| bluesky-pds | #3 | 753 | L5 upgrade=skip backup=pass | none | ✓ GREEN | | cryptpad | #5 | prevb spot-check | pass | none | ✓ GREEN | | custom-html | #5,#2 | 737 | L5 all pass | none | ✓ GREEN | -| custom-html-tiny | #8 (created) | — | — | — | pending | +| custom-html-tiny | #8 (created) | 752 | L5 upgrade=pass backup=skip | none | ✓ GREEN | | discourse | #4 | 717 (prevb M2) | level 4 (lint=f recipe nit) | none (prevb fix) | ✓ GREEN | | drone | #1 | 726 | L5 all pass | none | ✓ GREEN | | ghost | #6 | 744 | L5 all pass | none | ✓ GREEN | @@ -107,7 +125,7 @@ These were run on Jun 17 with post-prevb code and confirmed GREEN: | mattermost-lts | #2,#1 | 739 | L5 all pass | none | ✓ GREEN | | mumble | #1 | 732 | L5 all pass | none | ✓ GREEN | | n8n | #6,#5 | 731 | L5 all pass | none | ✓ GREEN | -| plausible | #4 | 750 (L2 restore=f) → retry | L2 restore=fail → re-running | ⚠ LIKELY FLAKY — re-run in progress | +| plausible | #3 (reopened) | in flight (PR#3 d77adba4) | pending | NOT prevb-caused; recipe bug in 3.0.1 | ⚠ PR#3 IN FLIGHT | | uptime-kuma | #4 | 748 | L5 all pass | none | ✓ GREEN | ## Blocked