Commit Graph

109 Commits

Author SHA1 Message Date
7ca77f95ca inbox(canon): heads-up — M2.2 sweep stuck on discourse ~51m (abra deploy hung, 0 containers, ~08:24Z timeout); canonical count 2
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-17 08:15:59 +00:00
a3d115d6e3 diagnose(regall): A-regall-2 root cause — recipe bug in 3.0.1+v2.0.0, NOT prevb
All checks were successful
continuous-integration/drone/push Build is passing
backupbot.backup.path: "/postgres.dump.gz" places dump in container writable
layer (not a volume), so restic never captures it. Restore post-hook fails
with "No such file or directory". PR#3 (3.1.0+v2.0.0) fixes this with
backupbot.backup.volumes.db-data.path. Baseline run 658 tested PR#3 (working
mechanism), not 3.0.1+v2.0.0 (broken). Re-opened PR#3 + !testme triggered
(comment 14651) to demonstrate backup_restore=pass. BUILDER-INBOX consumed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-17 02:58:06 +00:00
3edd0713d2 review(regall): A-regall-2 CONFIRMED — plausible backup_restore=fail 2/2 (genuine regression)
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
Runs 750 and 754 both fail: ci_marker absent after restore.
No-op upgrade (3.0.1+v2.0.0→3.0.1+v2.0.0) via UPGRADE_BASE_VERSION path is prevb-specific.
Baseline run 658 had genuine git-ref upgrade and passed L5.

Builder-INBOX written. M1 blocked pending plausible fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-17 02:34:04 +00:00
7c6134a773 fix(regall): correct mailu baseline upgrade=pass (A-regall-1); consume Adversary inbox; batch 2 in flight
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-17 02:05:42 +00:00
4ad3c9d907 review(regall): BP-1 baseline verified (A-regall-1: mailu upgrade=pass not skip); BP-2 upgrade-base=main-tip confirmed; batch-1 all L5
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-17 02:04:48 +00:00
e1b32ea650 fix(prevb): prune orphan services on upgrade redeploy (head's dropped services); re-add EXPECTED_NA-other-rung test; consume Adversary inbox
All checks were successful
continuous-integration/drone/push Build is passing
docker stack deploy doesn't prune services the head compose dropped (discourse PR#4 drops sidekiq),
leaving them orphaned on the base image. perform_upgrade now reconciles the live stack to the head
compose service set (lifecycle.prune_orphan_services). Makes the deployed stack faithfully reflect
the head — no test weakened. No-op when service sets match / compose unresolvable.
2026-06-17 00:29:00 +00:00
7f3e7c26f6 recon(prevb): M1 code pre-review (sound; 63 prevb unit tests pass cold) + builder heads-up (pre-existing red test)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-17 00:27:06 +00:00
d832b353e4 fix(gtea): UPGRADE_SECRET_PREP hook — pre-insert lfs_jwt_secret with correct 43-char format
Some checks failed
continuous-integration/drone/push Build is failing
Blocker 4 fix: abra `secret generate --all` uses .env.sample for length hints; the
lfs-plain-gitea PR has SECRET_LFS_JWT_SECRET_VERSION=v1 COMMENTED OUT, so abra produces
a wrong-length secret. gitea requires exactly 43 chars (32 bytes base64 URL-safe); wrong
length → gitea fatals trying to save the JWT secret to the read-only Docker Config
app.ini → health check fails → swarm rolls back.

Fix: new UPGRADE_SECRET_PREP hook (meta.py) called before `abra secret generate --all`
in the upgrade path. abra's `--all` is idempotent (skips existing secrets), so the
correctly pre-inserted secret survives. gitea's recipe_meta.py implements the hook using
`docker secret create` directly to guarantee correct format regardless of .env.sample.

Also consumes machine-docs/BUILDER-INBOX.md (Adversary Blocker 4 digest).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 21:46:28 +00:00
1efab2e1e6 review(gtea): M2 re-verify — #684 PASS, #685 FAIL (LFS upgrade rollback blocker)
Some checks failed
continuous-integration/drone/push Build is failing
Build #684 (RECIPE=gitea REF=main PR=0): PASS level=5 — all tiers pass, LFS correctly
SKIP on main, HC1 SHA match (e6a1cc79=e6a1cc79). M2 main-branch DoD MET.

Build #685 (RECIPE=gitea PR=1 REF=357926f26e69): FAIL level=1 — new critical blocker:
upgrade chaos redeploy to PR head with compose.lfs.yml fails with rollback_completed.
Root cause: lfs_jwt_secret generated by abra --all with wrong length/format because
.env.sample in PR #1 has `SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43` COMMENTED OUT.
Gitea starts but fails health check on bad JWT secret → Docker swarm rolls back.

Also filed: cc-ci self-test lint failures (9 ruff format violations in gtea files),
drone dep path not re-verified via live CI since a121d2c.

M2 still NOT claimable — Builder must fix lfs_jwt_secret generation and re-trigger #685.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 21:30:42 +00:00
a121d2c069 fix(gtea): fix M2 blockers — LFS upgrade and REF=main HC1
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is failing
Blocker 1 (LFS roundtrip fails on PR #1):
- Add UPGRADE_EXTRA_ENV to gitea recipe_meta.py — after PR-head checkout
  (compose.lfs.yml now in ABRA_DIR), add compose.lfs.yml to COMPOSE_FILE
  and set SECRET_LFS_JWT_SECRET_VERSION=v1 so the upgrade chaos redeploy
  actually runs with LFS enabled. Without this, the base install checks out
  the 3.5.x tag (compose.lfs.yml removed), EXTRA_ENV sees no LFS, and the
  upgrade chaos redeploy inherits the no-LFS .env — so the LFS test runs
  (compose.lfs.yml is restored by recipe_checkout_ref) but LFS is off.
- Add abra.secret_generate(domain) in generic.perform_upgrade when
  upgrade_env is non-empty — generates lfs_jwt_secret before chaos redeploy.

Blocker 2 (REF=main upgrade fails HC1):
- Always use recipe_head_commit (git rev-parse HEAD) for head_ref instead
  of using ref directly. When ref="main" (a branch name), the HC1 commit
  check "head_ref.startswith(chaos_commit)" always fails since "main" ≠ SHA.
  recipe_head_commit returns the actual SHA after the fetch/checkout.

Side-fix (stale creds — build #675):
- ops.py pre_install: delete the per-domain creds file before calling
  _ensure_admin. A fresh install wipes gitea's DB; any creds file from a
  prior run on the same domain is stale and causes 401s in all API calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 21:01:21 +00:00
05bf5d5264 review(gtea): file M2 blockers to Builder-INBOX — LFS deploy + upgrade-REF=main
Some checks failed
continuous-integration/drone/push Build is failing
Two critical issues prevent M2: (1) lfs_jwt_secret not generated via disk .env → LFS disabled in
container; (2) upgrade tier fails when REF=main. Details + fix hints in BUILDER-INBOX.md.
2026-06-15 20:53:34 +00:00
446bafe408 inbox(gtea): consume BUILDER-INBOX (Adversary pre-M1 findings addressed)
Some checks failed
continuous-integration/drone/push Build is failing
Both issues fixed in 893a7b0:
- Issue 1 (git-lfs missing): added to nix/hosts/cc-ci/configuration.nix systemPackages
- Issue 2 (double /api/v1): fixed path in test_lfs_roundtrip.py restart poll

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 20:01:50 +00:00
4a4b75661e inbox(gtea): heads-up to Builder — git-lfs absent on cc-ci (M2 blocker) + double /api/v1 bug in LFS test
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-15 19:58:17 +00:00
d12d8a12ca inbox(poe2e): consume BUILDER-INBOX; take JOURNAL ownership (baseline preserved); set up STATUS/BACKLOG; heads-up to Adversary
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-13 19:30:10 +00:00
62efd76bc1 chore(poe2e): init Adversary phase files — D5 baseline snapshot, awaiting Builder
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-13 19:27:09 +00:00
b97d1e5345 inbox: remove orphan pxgate cold-boot note (phase already DONE; loops stopped) — evidence in orchestrator JOURNAL
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-13 13:52:55 +00:00
f09b7bf21f inbox(pxgate): cold-boot proof PASSED — deploy-proxy active 11s before dashboard on real reboot
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-13 13:52:13 +00:00
162f731e91 status(pxgate): ## DONE — M1+M2 PASS, cycle broken, cold-boot sim confirms no deadlock
Some checks failed
continuous-integration/drone/push Build is failing
M2 verified: nixos-rebuild @13:43Z deployed /api/version probe; deploy-proxy
active(exited) in 279ms (nixos-rebuild) and 17ms (cold-boot sim) — no alert, no
deadlock. All 9 services 1/1. Running server unaffected. Adversary PASS @13:44Z.
BUILDER-INBOX consumed.
2026-06-13 13:47:42 +00:00
927cbfa747 inbox(pxgate): orchestrator completed M2 nixos-rebuild — deploy-proxy on /api/version, cycle broken
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-13 13:45:39 +00:00
0e9fd388d2 claim(pxgate-M1): change traefik health probe to /api/version (A1 cycle fix)
Some checks failed
continuous-integration/drone/push Build is failing
Break the deploy-proxy ↔ dashboard health-gate circular dependency (Adversary A1, pvfix):

- runner/warm_reconcile.py: remove health_domain override (was ci.commoninternet.net,
  the dashboard). Change health_path from / to /api/version. The probe now uses
  traefik.ci.commoninternet.net/api/version — traefik's own API, no backend/dashboard dep.
- nix/modules/proxy.nix: update comment to reflect new health probe.
- machine-docs/DECISIONS.md: pxgate fix logged (supersedes pvfix manual workaround).
- machine-docs/DEFERRED.md: 2026-06-13 circular-dependency entry closed.
- Consumed BUILDER-INBOX.md (Adversary orientation msg).

Controlled reproduction (dashboard swarm scaled to 0):
  OLD probe (ci.commoninternet.net): HTTP 404  ← gate would loop → timeout
  NEW probe (traefik.../api/version): HTTP 200  ← passes immediately
Stale false-alarm alert 20260613T054428Z-traefik-unhealthy-on-latest.json cleared on host.
No After=deploy-proxy consumers changed (ordering preserved).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 12:46:34 +00:00
c798292598 chore(pxgate): BUILDER-INBOX — orientation done, live bug proven
Some checks failed
continuous-integration/drone/push Build is failing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 12:43:32 +00:00
f73bcf225e inbox(cf55): consume adversary launcher mismatch note
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-13 04:13:36 +00:00
d1fc6b9747 review(cf55): record launcher mismatch blocker
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-13 04:12:38 +00:00
87928a9096 status(cfold): seed phase state and consume inbox
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-12 15:57:50 +00:00
8fba68e27c review(cfold): record cold pre-claim audit
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-12 15:57:02 +00:00
87566b1c95 review(cfold): note missing phase status file
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-12 15:55:55 +00:00
7e7e84df34 fix(drone): ADV-drone-01 — no-follow redirect pattern in SCM test
Some checks failed
continuous-integration/drone/push Build is failing
test_scm_configured.py was following ALL redirects via urlopen; gitea redirects
unauthenticated users from /login/oauth/authorize → /user/login, so the path
assertion always failed even for a correctly-wired drone.

Fix: _CaptureOneRedirect urllib handler stops after drone's first 303 and reads
the Location header directly, before gitea's own redirect chain runs.

- Consume BUILDER-INBOX.md (ADV-drone-01 finding delivered and addressed)
- Close ADV-drone-01 in BACKLOG-drone.md
- Update test_gitea_dep.py terminology: "location_url" not "final_url"
- All 10 unit tests pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 21:48:36 +00:00
d20bffd597 review(drone): BUILDER-INBOX — ADV-drone-01 critical, fix before M1 claim
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-11 21:43:40 +00:00
89dec5188f inbox(rcust): consumed 01:12Z be2026a-cleared note; bluesky-pds filed in DEFERRED.md as non-rcust upstream image breakage (justified M2 exclusion, A/B-proven harness-neutral)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 01:00:32 +00:00
24a203a098 review(rcust): be2026a fix-forward CLEARED (all 3 conditions met, independently verified) + ACCEPT L5≡L4+OIDC-pass equivalence — lasuite-* L5 baselines stale (c51cd84 4-rung predates rcust, git-proven), rcust innocent, OIDC coverage preserved. Consumed 01:10Z inbox. M2 still open: bluesky upstream-breakage note, drone-path runs, zero-leak, my sample re-check
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:59:29 +00:00
914c1663b5 inbox(rcust): consumed 00:31Z conditional APPROVE — merging be2026a, post-merge lasuite-drive re-run queued behind discourse A/B pair
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:33:07 +00:00
a531746e53 review(rcust): APPROVE fix-forward be2026a (services_converged completed-one-shot rule) — cold-verified diff+7 tests+199 unit+lint on fresh checkout, no false-green path (HTTP floor + minio custom test independent); conditional on post-merge lasuite-drive L5 + merged-diff==branch-diff + discourse PR=2 A/B cold re-check. Consumed 00:40Z inbox
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:31:54 +00:00
1ec0e772e8 inbox(rcust): consumed 23:53Z asks — lasuite-drive proof RUNNING, discourse same-ref 2x2 queued (new-main PR=2 + old-main PR=2 @7ae7b0f); m2b-discourse HC1 facts pinned (re-checkout persisted, eb96de94=base tag, sidekiq line benign); bluesky-pds = upstream image breakage (MODULE_NOT_FOUND x3, harness-neutral)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-11 00:06:13 +00:00
40b59b356b review(rcust): M2 proof-run cold analysis — 3/6 (immich/mattermost/plausible) reproduce baseline L4 at baseline ref on merged main (restructure innocent); discourse L4->L1 upgrade-HC1 at baseline ref UNexplained (A/B was at wrong ref) + lasuite-drive needs fresh L5 post-fix-forward; M2 OPEN
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 23:54:36 +00:00
efd7efc32b inbox(rcust): consumed 20:53Z approval — fix-forward pushed as 57c66ad; proof re-run at baseline REF queued behind tests 2+3
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:53:52 +00:00
57c66add51 review(rcust): APPROVE lasuite-drive pre_install fix-forward (scoped to line-54 bucket-poll raise→best-effort; verified old=best-effort, custom MinIO test is real gate, no coverage loss); conditioned on L5 re-run + my diff re-verify. Auditing other shell->python hook ports for same drift
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:52:53 +00:00
b9abf48116 inbox(rcust): consumed 20:33Z ACK — ref-mismatch independently confirmed; tests 2+3 concurred; proceeding
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:34:36 +00:00
4cb1f57e2c inbox(rcust): consumed Builder 20:35Z ref-mismatch heads-up + ACK — independently confirmed sweep ran default-branch heads (7d53d4ec/da159375) != baseline PR refs; concur tests 2+3 separate harness×content; will run own cold A/B at claim
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:33:56 +00:00
41033b4500 inbox(rcust): consumed 20:15Z follow-up — restore cluster confirmed pre-existing, VETO threat withdrawn; proceeding to satisfy the 4 M2 PASS conditions (re-runs at baseline, canary+zero-leak, log sample, !testme x2)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:19:12 +00:00
a7a558ada3 note(rcust): M2 follow-up — confirmed restore cluster is the PRE-EXISTING truncated-dump race (documented in discourse BACKUP_VERIFY docstring on pre-merge 49fb818); VETO-threat withdrawn; stated M2 PASS conditions (re-runs at baseline + spot-checks)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:18:26 +00:00
37dcfab07d inbox(rcust): consumed Adversary 20:13Z restore-cluster heads-up — ACK: serial re-runs of all 6 already in flight (/root/m2-rerun-logs/, results m2rr-*); will ALSO run immich on OLD main (pre-merge c2508c7) serially in the same env as the requested A/B regardless of re-run outcome; no M2 claim until both legs are documented in STATUS
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:18:13 +00:00
ffc88848f3 note(rcust): M2 heads-up — restore-failure cluster (discourse/immich/plausible/mattermost ci_marker-missing) blocks M2 PASS; evidence says infra/pre-existing not restructure (restore orchestration unchanged, no BACKUP_VERIFY correlation, peers pass); suggest A/B vs old main (NOT a verdict)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 20:17:14 +00:00
8984b57b35 status(rcust): P6 complete (da558ca) + Adversary inbox consumed — manifest redaction landed (858e0f5); M1 prep starting
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 19:10:00 +00:00
5ccc0d1c34 note(rcust): interim pre-review of frozen P5 (68954be) — cold unit 191 + lint PASS reproduced; manifest exposes NO generated/real secrets (HC2-honoring, pure presentation); one non-blocking heads-up re plausible SECRET_KEY_BASE public-dummy on dashboard (NOT an M1 verdict)
All checks were successful
continuous-integration/drone/push Build is passing
2026-06-10 19:07:24 +00:00
0684576d74 chore(conc): consume BUILDER-INBOX (ML-flake context on (c) round-2; concur — will re-trigger (c) clean after 290/291 terminal)
Some checks reported errors
continuous-integration/drone/push Build is passing
continuous-integration/drone Build was killed
2026-06-10 08:45:14 +00:00
fa9a89bcf8 review(conc): live (c) round-2 — serialization confirmed via lslocks; delay is immich-ML healthcheck flake, not the restructure; veto unchanged
All checks were successful
continuous-integration/drone/push Build is passing
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 08:44:30 +00:00
090724ec80 fix(regression): correct SHAs for bad-backup/bad-restore (A-reg-3) + consume inbox
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
Both compose.yml uploads had empty files due to a bash encoding bug.
Fixed via Python API upload; new SHAs:
- regression-bad-backup: cd52b3a (backupbot.backup.path=/nonexistent-path-cc-ci-canary-bad)
- regression-bad-restore: 7e03499 (backup targets .backup-data subdir + command creates it)

Adversary confirmed bad-install ✓ and bad-upgrade ✓ from run artifacts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 02:00:51 +00:00
3859cd7f40 review(regression): A-reg-3 — bad-backup/bad-restore compose.yml empty (wrong tier fails); bad-install/bad-upgrade PASS cold-verified
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-02 01:59:50 +00:00
a2a6eea757 fix(regression): fix relative import (A-reg-1) + consume inbox
Some checks failed
continuous-integration/drone/push Build is failing
- tests/regression/test_canaries.py: replace `from .conftest import ...`
  (relative import fails when not a package) with sys.path + direct import,
  matching the pattern used by all other tests in this repo.
- Delete machine-docs/BUILDER-INBOX.md (Adversary inbox consumed).
- Update STATUS-regression.md + JOURNAL-regression.md with first two
  canary run results (bad-false-green RED confirmed, good-simple GREEN confirmed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 01:37:31 +00:00
464760ebb7 review(regression): D-initial FAIL — A-reg-1 relative import (suite won't collect), A-reg-2 plan gap (4 per-tier RED canaries missing)
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-02 01:34:56 +00:00