claim(cfold): claim M2 full sweep green
Some checks failed
continuous-integration/drone/push Build is failing

This commit is contained in:
autonomic-bot
2026-06-13 04:04:14 +00:00
parent d44f799de9
commit abe5e33dde
2 changed files with 133 additions and 30 deletions

View File

@ -405,3 +405,83 @@ comment-bridge listening on 0.0.0.0:8080 (poll primary + optional webhook)
This fix addresses the replay hole exposed during cfold's Ghost retrigger. It does not change the cfold
bottom line: Ghost's upgrade tier remains the lone M2 blocker, while custom discovery continues to pass.
## 2026-06-13 — Ghost upgrade blocker fixed in cc-ci; same-ref real CI rerun now green
I stayed on the Ghost blocker until I had a same-ref real-`!testme` proof, since M2 could not be claimed
while Ghost remained the only non-green recipe in the sweep.
Focused investigation sequence:
- Preserved-current-code repros showed the old failure mode honestly: during the base->head crossover, the
new Ghost app task could start before the replacement mysql service was usable, exiting on
`ENOTFOUND` / `ECONNREFUSED` against `${STACK_NAME}_db`, which made swarm pause the update before the
head spec settled.
- My first attempt (`restart_policy.delay`) was insufficient because swarm paused the update on the first
failed new task before any retry delay could matter.
- My second attempt (wrapping Ghost in `command: sh -ec ...`) proved the DB wait idea but regressed the
base install: it bypassed Ghost's normal docker-entrypoint first-boot path, so the default `source`
theme was never seeded and `/` stayed 500 (`The currently active theme "source" is missing`).
- Final fix: move the DB wait into the app `entrypoint`, then exec the normal
`/abra-entrypoint.sh node current/index.js` path. That preserved both the first-boot seeding behavior
and the upgrade crossover guard.
The finished overlay in `tests/ghost/compose.ccci.yml` now does three things and nothing more:
1. keep the existing 15m app healthcheck grace,
2. keep the existing 15m db healthcheck grace,
3. wait for the DB TCP socket before entering the normal Ghost entrypoint on the base->head crossover.
Verification:
```bash
$ ssh cc-ci 'jq -r ".results, .stages" /var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json'
{
"install": "pass",
"upgrade": "pass"
}
[
{"name":"install","status":"pass",...},
{"name":"upgrade","status":"pass",...},
{"name":"lint","status":"pass",...}
]
$ ssh cc-ci 'tok=$(cat /run/secrets/bridge_drone_token); curl -fsS -H "Authorization: Bearer $tok" https://drone.ci.commoninternet.net/api/repos/recipe-maintainers/cc-ci/builds/585 | jq -r "[.number,.status,.after,.params.RECIPE,.params.PR,.params.REF] | @tsv"'
585 success d44f799de945d0775933aad58726d46509154a64 ghost 5 d42d0f7c7cf9946077a583ffa3f7c96abfe94a77
$ ssh cc-ci 'jq -r "{level,recipe,ref,results,stages:(.stages|map({name,status}))}" /var/lib/cc-ci-runs/585/results.json'
{
"level": 5,
"recipe": "ghost",
"ref": "d42d0f7c7cf9",
"results": {
"backup": "pass",
"custom": "pass",
"install": "pass",
"restore": "pass",
"upgrade": "pass"
},
"stages": [
{"name":"install","status":"pass"},
{"name":"upgrade","status":"pass"},
{"name":"backup","status":"pass"},
{"name":"restore","status":"pass"},
{"name":"custom","status":"pass"},
{"name":"lint","status":"pass"}
]
}
$ ssh cc-ci 'printf "ghost custom junit="; ls /var/lib/cc-ci-runs/585/junit/custom__cc-ci__*.xml | wc -l; printf " ghost upgrade junit="; ls /var/lib/cc-ci-runs/585/junit/upgrade*.xml | wc -l'
ghost custom junit=4
ghost upgrade junit=2
$ ssh cc-ci 'printf "live_pr_apps="; docker stack ls --format "{{.Name}}" | grep -c -- "-pr" || true'
live_pr_apps=0
```
Outcome:
- Ghost is no longer the M2 blocker.
- The real PR-triggered build (`585`) on the same Ghost ref that previously failed (`d42d0f7c`) is now L5.
- The custom tier remained intact throughout: still 4 canonical custom JUnit files on the green run.
- With Ghost green and teardown clean, the cfold phase is ready for a formal M2 claim.

View File

@ -53,16 +53,18 @@ Adversary verdict:
---
## M2 — IN PROGRESS
## M2 — CLAIMED, awaiting Adversary
Gate: M2 — CLAIMED, awaiting Adversary
Current work item:
- full real-CI `!testme` sweep evidence is mostly assembled; one recipe (`ghost`) remains non-green for
a cfold-neutral upgrade regression on the recipe/environment side
- fresh follow-up probes now show the Ghost upgrade failure is not confined to PR #4 / PR #5: a reopened
PR #3 at ref `720faa0b` also re-failed twice post-cfold (`568`, `569`) with the same shape
- the Ghost duplicate-trigger side issue is now root-caused in the bridge source: reopened PRs can replay
old pre-bridge-start `!testme` comments that were never seen during startup because the PR was closed
at that time; the bridge fix is now pushed and live on `cc-ci` (image tag `eb32876581d9`)
- full real-CI `!testme` sweep is now green across the enrolled recipe set, including the formerly-blocking
Ghost PR head
- Ghost's upgrade blocker was fixed in cc-ci via the `tests/ghost/compose.ccci.yml` overlay: the app now
waits in its entrypoint for the replacement DB socket before starting during the base->head crossover,
while preserving Ghost's normal `/abra-entrypoint.sh node current/index.js` boot path
- bridge replay-guard fix remains live on `cc-ci` (image tag `eb32876581d9`); the Ghost duplicate-trigger
side issue is separately closed and no longer affects the cfold sweep result
### M2 baseline matrix (built from live PR heads + fresh post-cfold evidence)
@ -74,7 +76,7 @@ Current work item:
| custom-html-tiny | PR #7 `526502ba` | 5 | 1 | build `510` -> L5 |
| discourse | PR #2 `b7d8a244` | 5 | 3 | build `521` -> L5 |
| drone | PR #1 `049438e1` | 5 | 1 | build `506` -> L5 |
| ghost | PR #3 `720faa0b` | 5 | 4 | build `568` -> L1 (upgrade fail) |
| ghost | PR #5 `d42d0f7c` | 5 | 4 | build `585` -> L5 |
| hedgedoc | PR #1 `441c411c` | 5 | 2 | build `555` -> L5 |
| immich | PR #2 `17f1649c` | 5 | 3 | build `522` -> L5 |
| keycloak | PR #3 `bfe0d16f` | 5 | 3 | build `553` -> L5 |
@ -89,29 +91,25 @@ Current work item:
| plausible | PR #3 `709a294d` | 5 | 2 | build `530` -> L5 |
| uptime-kuma | PR #3 `b0ce7942` | 5 | 4 | build `531` -> L5 |
### Ghost deviation (blocking a formal M2 claim)
### Ghost closure
`ghost` is the only recipe still preventing an M2 claim.
`ghost` was the final M2 blocker and is now green on the real `!testme` path.
- Current upgrade PR heads and fresh post-cfold outcomes are all red with the same stage shape:
- PR #3 `720faa0b`: builds `568` and `569` -> L1; install/backup/restore/custom/lint pass, upgrade fail
- PR #4 `d88f5801`: build `557` -> L1; install/backup/restore/custom pass, upgrade fail
- PR #5 `d42d0f7c`: build `559` -> L1; install/backup/restore/custom/lint pass, upgrade fail
- Focused artifact audit still confirms the strongest same-ref comparison explicitly:
historical build `185` (`d42d0f7c7cf9`) had `upgrade=pass`, while fresh build `559` on that same ref
has `upgrade=fail` with the canonical `custom` stage still green.
- The fresh PR #3 rerun adds a second previously-green Ghost upgrade head that now fails the same way,
so the blocker is broader than a single Ghost branch and still points away from cfold itself.
- Side observation from the PR #3 retrigger: a single `!testme` comment at `2026-06-13T00:07:50Z` spawned
three new Ghost runs (`568`, `569`, `570`). All three are now red with the same upgrade-only
failure.
- Root cause of the triple-trigger: bridge logs show those three runs were tied to three distinct comment
ids on the reopened PR (`14029`, `14032`, `14497`), not one comment processed three times. The poller
replayed two historical `!testme` comments that predated the current bridge process because PR #3 was
closed during bridge startup and only became visible to the poller after reopen.
- Conclusion so far: Ghost's current failure is not caused by the `custom/` folder migration; the custom
tier still discovers and passes all 4 canonical custom tests, and the regression reproduces across
multiple Ghost PR heads as an upgrade convergence failure.
- Historical failing same-ref comparison remains the strongest pre-fix proof:
- build `559` on `d42d0f7c7cf9` -> L1; install/backup/restore/custom/lint pass, upgrade fail
- build `585` on `d42d0f7c7cf9` -> L5; install/upgrade/backup/restore/custom/lint pass
- Root cause of the upgrade failure: during the base->head crossover, Ghost's app task started before the
replacement DB service was accepting connections, so the new task exited on `ENOTFOUND`/`ECONNREFUSED`
against `${STACK_NAME}_db` and swarm paused the update before the head spec could settle.
- Fix landed in `cc-ci` commit `d44f799` (`fix(cfold): wait for ghost db in entrypoint`):
`tests/ghost/compose.ccci.yml` now keeps the existing 15m app/db healthcheck grace and wraps the app
`entrypoint` with a tiny TCP wait that execs the normal `/abra-entrypoint.sh node current/index.js`
path only after the DB socket is reachable.
- Focused same-code-path repro after the fix:
- `/var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json` -> `install=pass`, `upgrade=pass`
- log `/root/ghost-repro-cfold-3.log` includes
`upgrade-converged: ghos-ce3c44_ci_commoninternet_net_app swarm UpdateStatus=completed`
and `upgrade->PR-head: head_ref=d42d0f7c chaos-version=d42d0f7c+U version=1.2.0+6.21.2-alpine->1.4.0+6.44.0-alpine`
### Fresh Adversary state
@ -119,6 +117,31 @@ Current work item:
- `REVIEW-cfold.md` 2026-06-13T00:23:55Z: cold M2 artifact/teardown audit only, no new finding, no M2
claim pending; zero leaked live `-pr` stacks confirmed.
WHAT:
- M2 is now met: the full real-CI `!testme` recipe sweep is green, the formerly-blocking Ghost recipe is
green again on the same PR head that previously failed, custom-tier coverage remains intact, and there
are zero leaked live `-pr` stacks.
HOW:
- `ssh cc-ci 'tok=$(cat /run/secrets/bridge_drone_token); curl -fsS -H "Authorization: Bearer $tok" https://drone.ci.commoninternet.net/api/repos/recipe-maintainers/cc-ci/builds/585 | jq -r "[.number,.status,.after,.params.RECIPE,.params.PR,.params.REF] | @tsv"'`
- `ssh cc-ci 'jq -r "{level,recipe,ref,results,stages:(.stages|map({name,status}))}" /var/lib/cc-ci-runs/585/results.json'`
- `ssh cc-ci 'printf "ghost custom junit="; ls /var/lib/cc-ci-runs/585/junit/custom__cc-ci__*.xml | wc -l; printf " ghost upgrade junit="; ls /var/lib/cc-ci-runs/585/junit/upgrade*.xml | wc -l'`
- `ssh cc-ci 'printf "live_pr_apps="; docker stack ls --format "{{.Name}}" | grep -c -- "-pr" || true'`
- `ssh cc-ci 'jq -r ".results, .stages" /var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json'`
EXPECTED:
- Drone build query returns build `585`, status `success`, `after=d44f799de945d0775933aad58726d46509154a64`, recipe `ghost`, PR `5`, ref `d42d0f7c7cf9946077a583ffa3f7c96abfe94a77`
- `results.json` for build `585` shows `level: 5` and `results.install=pass`, `results.upgrade=pass`, `results.backup=pass`, `results.restore=pass`, `results.custom=pass`; stages include `install`, `upgrade`, `backup`, `restore`, `custom`, `lint` all `pass`
- JUnit counts for build `585`: `ghost custom junit=4`, `ghost upgrade junit=2`
- Teardown check returns `live_pr_apps=0`
- Focused repro `ghost-repro-cfold-3` shows `install=pass`, `upgrade=pass`
WHERE:
- Fix commit: `d44f799` (`fix(cfold): wait for ghost db in entrypoint`)
- Ghost overlay: `tests/ghost/compose.ccci.yml`
- Real CI proof: `/var/lib/cc-ci-runs/585/results.json`, `/var/lib/cc-ci-runs/585/junit/`
- Focused repro proof: `/var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json`, `/root/ghost-repro-cfold-3.log`
---
## Baseline (pre-cfold) — custom test count per recipe