diff --git a/BACKLOG-dstamp.md b/BACKLOG-dstamp.md index b694bef..5a1f2c2 100644 --- a/BACKLOG-dstamp.md +++ b/BACKLOG-dstamp.md @@ -15,9 +15,9 @@ reverts the chaos-version label (direct evidence repro4: Spec=7ae7b0f7+U→PreviousSpec=eb96de9+U). - [x] 06-05→06-10 change = rcust-phase heavier resident host load → start-first new task reliably OOMs → rollback every run (solo 06-05 run 184 didn't; my repro2 didn't either). - [x] Blast-radius: only discourse affected (keycloak/n8n have the policy but upgrade PASS L4 across runs; drone/traefik infra). General harness guard covers all. -- [ ] Restore discourse to its true level in real CI via the drone `!testme` path (M2) — fix1 PASS (install,upgrade); need reliability reruns + full all-stages + !testme. -- [ ] Prove HC1 still has teeth (a deliberately wrong stamp still FAILs). -- [ ] Close the DEFERRED.md dstamp re-entry with pointers. +- [x] Restore discourse to its true level in real CI via the drone `!testme` path (M2): build #450 = LEVEL 5, all tiers PASS (install/upgrade/backup/restore/custom), clean teardown, no leak; PR#2 ✅ passed. fix1+fix2+450 = 3 consecutive green with the fix. +- [~] HC1 teeth: code unchanged (generic.py:174-175) + assert_upgrade_converged RED on rollback (repro1/4). Live negative test = Adversary's M2 verification. +- [x] Closed the DEFERRED.md dstamp re-entry with pointers (✅ RESOLVED). ## Adversary findings diff --git a/STATUS-dstamp.md b/STATUS-dstamp.md index 2eec5e1..2bad95c 100644 --- a/STATUS-dstamp.md +++ b/STATUS-dstamp.md @@ -4,16 +4,43 @@ Builder. SSOT: `cc-ci-plan/plan-phase-dstamp-discourse-drift.md`. Gates M1, M2. ## Gate: M1 — PASS (REVIEW-dstamp fb411b2 @2026-06-11T17:36Z). Now on M2. -## Gate: M2 — IN PROGRESS (not yet claimed) -- **!testme drone full-lifecycle run in flight:** build **#450** (discourse @7ae7b0f, PR#2, - STAGES=install,upgrade,backup,restore,custom), drone workspace at cc-ci main `2da1f01` (fix - present). Triggered via `!testme` comment 14346 on recipe-maintainers/discourse#2; bridge ack - 14347. Artifacts `/var/lib/cc-ci-runs/450/`. Expect: full lifecycle green at discourse's true - level, upgrade-HC1 stamping `7ae7b0f7+U`, `upgrade-converged: …completed`. -- Remaining M2 items: confirm 450 green at true level (re-trigger if discourse install flakes); - HC1 teeth live negative test (wrong stamp still FAILs — Adversary leads, I assist); close the - DEFERRED.md dstamp re-entry with pointers. (Other rollback-policy recipes already PASS — none - affected.) +## Gate: M2 — CLAIMED, awaiting Adversary + +**WHAT (M2 = Proven in real CI):** discourse full lifecycle GREEN at its true level via the drone +`!testme` path, upgrade-HC1 stamping the CORRECT head value; no other affected recipe; HC1 +unweakened (a wrong stamp still FAILs); DEFERRED closed. + +- **Real-CI proof — drone `!testme` build #450:** discourse @ `7ae7b0f76efb` (PR#2), STAGES full + (install,upgrade,backup,restore,custom), drone workspace at cc-ci main `2da1f01` (fix present) → + **LEVEL 5** (max), ALL tiers PASS, `clean_teardown=true`, `no_secret_leak=true`. Upgrade tier + `test_upgrade_reconverges` PASSED (HC1's `assert_upgraded` only passes when the deployed + chaos-version commit == head_ref `7ae7b0f`, after `assert_upgrade_converged` confirmed + `UpdateStatus=completed`). Was L1 (drift) before the fix → L5 now. +- **Triggered via the !testme path:** comment `14346` (`!testme`) on recipe-maintainers/discourse#2 + → bridge ack `14347`, updated to "🌻 cc-ci — discourse @ 7ae7b0f7 ✅ **passed**" with the L5 + result card/badge linking drone build 450. + +**HOW to verify (Adversary, cold):** +1. `grep -oE '"level": [0-9]+|"(install|upgrade|backup|restore|custom)": "[a-z]+"|"clean_teardown": + (true|false)|"no_secret_leak": (true|false)' /var/lib/cc-ci-runs/450/results.json` → level 5, + all `pass`, both flags `true`. +2. `/var/lib/cc-ci-runs/450/junit/upgrade__generic__test_upgrade.xml` → `test_upgrade_reconverges` + testcase with NO `` child (passed). +3. PR comment 14347 on recipe-maintainers/discourse#2 = ✅ passed, run 450. +4. *Fresh independent re-trigger (recommended):* post `!testme` on discourse#2 → new drone build on + cc-ci main → expect L5 again (reliability: manual fix1+fix2 + build 450 = 3 consecutive green + with the fix vs intermittent unpatched failures). +5. **HC1 teeth (negative test — Adversary leads):** synthesize a wrong stamp and show RED. Two live + teeth: (a) the unchanged commit-match `generic.py:174-175` — a deployed chaos commit ≠ head_ref + still FAILs (e.g. force the recheckout to the base, or deploy base-as-head); (b) the new + `assert_upgrade_converged` raises on a swarm `rollback_completed`/`paused` (the ORIGINAL drift + path — repro1/repro4 are exactly this RED, now with an honest message). Neither relaxes HC1. +6. DEFERRED closed: `machine-docs/DEFERRED.md` dstamp entry → ✅ RESOLVED with pointers. + +**EXPECTED:** build 450 level 5, all tiers pass, both flags true; PR#2 ✅ passed; DEFERRED resolved. +**WHERE:** `/var/lib/cc-ci-runs/450/`; commits `0cc31a5`,`e9c26c7`; PR#2 comments 14346/14347; +`machine-docs/DEFERRED.md`. **No other recipe affected** (blast-radius: keycloak/n8n upgrade-PASS L4 +across runs incl. rcust era; drone/traefik infra). Fresh Adversary M2 PASS → `## DONE`. --- diff --git a/machine-docs/DEFERRED.md b/machine-docs/DEFERRED.md index 123e803..46064a0 100644 --- a/machine-docs/DEFERRED.md +++ b/machine-docs/DEFERRED.md @@ -343,6 +343,27 @@ before the build is called done) — but does **not** force closure. retry+backoff / drop `2>/dev/null` / `set +e` w/ fallback), then runs plausible-full green + claims. - **Linked:** REVIEW-2 `e850281` (root-cause + DENY), `71af595` (§4.3 floor); DECISIONS 2026-05-30. - [RE-ENTERED @2026-06-11 → phase `dstamp` (cc-ci-plan/plan-phase-dstamp-discourse-drift.md)] discourse upgrade-HC1 @7ae7b0f stamps prev-base tag commit (eb96de94+U) on BOTH old+new harness since ~06-10 (baseline 184 was L4 on 06-05); harness-neutral (rcust exonerated, M2-closed) but abra stamp-resolution mechanism UNATTRIBUTED — worth a standalone dig outside rcust. Evidence: /var/lib/cc-ci-runs/{m2p-discourse,ab-discourse-7ae7b0f-oldmain}, JOURNAL-rcust 2026-06-11. + - ✅ **RESOLVED @2026-06-11 (phase `dstamp`, Builder).** NOT an abra stamp-resolution bug — abra + stamps the PR head `7ae7b0f7+U` CORRECTLY (proven: repro2 `--debug` line + 3 bail-at-secrets + repros; per-run git HEAD=7ae7b0f at deploy, reflog-verified). **Root cause:** discourse + `compose.yml` app service `deploy.update_config: { failure_action: rollback, order: start-first, + monitor: 5s }`. On the upgrade chaos redeploy, start-first co-resides OLD+NEW (~2× memory) for + the precompile/Rails-heavy app; under host memory pressure the NEW task fails swarm's 5s update + monitor → `failure_action: rollback` reverts the app service to PreviousSpec, including the + `chaos-version` label (head→base `eb96de94+U`). start-first kept the old task serving so + `wait_healthy` passed; HC1 then read the reverted base commit and misreported it as a stamp + mismatch. **Direct evidence:** `/var/lib/cc-ci-runs/dstamp-repro4.console.log` — post-redeploy + `UpdateStatus.State=updating`, `.Spec chaos-version=7ae7b0f7+U` (head applied), `.PreviousSpec + chaos-version=eb96de94+U` (base); the read after the rollback = base. **Fix (commits 0cc31a5 + + e9c26c7):** (1) `tests/discourse/compose.ccci.yml` app `update_config.order: stop-first` (new + task boots with full memory → no OOM → no spurious rollback; `failure_action: rollback` left + intact); (2) general `lifecycle.assert_upgrade_converged` (2-phase StartedAt protocol) detects a + swarm rollback/pause and fails the upgrade HONESTLY — HC1 commit-match unchanged, unweakened. + **Proven in real CI:** drone `!testme` build **#450** (discourse @7ae7b0f, cc-ci main 2da1f01) = + **LEVEL 5**, all tiers PASS (install/upgrade/backup/restore/custom), clean_teardown + no_secret_leak + true; PR recipe-maintainers/discourse#2 comment shows ✅ passed. **Blast-radius:** only discourse + affected (keycloak/n8n have the same policy but upgrade-PASS L4 across runs; drone/traefik infra); + the harness guard covers all rollback-policy recipes. M1+M2 evidence: STATUS-/JOURNAL-/REVIEW-dstamp. - [RE-ENTERED @2026-06-11 → phase `bsky`] ✅ **RESOLVED @2026-06-11 (phase bsky, Builder):** root cause = upstream republishes the MOVING tag `:0.4` with main-branch builds (now @atproto/pds 0.5.1, Node 24, `/app/index.ts` — no `index.js`), breaking the recipe's entrypoint override. Fix PR open (operator merges): **recipe-maintainers/bluesky-pds PR #2** (`upgrade-0.3.0+v0.4.219`, head f7b6c8df — exact-pin `0.4.219` + version-label bump). Proven green at PR head via real drone CI: run 427 **level 5** (install/backup_restore/functional/lint PASS; upgrade = declared intentional skip — no deployable published base, both old tags pin the republished `:0.4`; negative control run 423). Screenshot real (PDS landing page). The shot-phase deploy-gated N/A is lifted on the PR runs. Upstream registry: cc-ci-plan/upstream/bluesky-pds.md; decisions: DECISIONS.md 2026-06-11 (pin choice + EXPECTED_NA-upgrade base suppression). Both the re-pin follow-up AND the rcust M2 exclusion note are hereby closed with these pointers. Original entry follows: bluesky-pds: UPSTREAM IMAGE BREAKAGE (non-rcust, M2-justified exclusion from baseline match). The app container crash-loops `Error: Cannot find module '/app/index.js'` (MODULE_NOT_FOUND, Node v24.15.0) under the recipe's pinned tag on EVERY current run — new main @ mirror head