status(pxgate): ## DONE — M1+M2 PASS, cycle broken, cold-boot sim confirms no deadlock
Some checks failed
continuous-integration/drone/push Build is failing

M2 verified: nixos-rebuild @13:43Z deployed /api/version probe; deploy-proxy
active(exited) in 279ms (nixos-rebuild) and 17ms (cold-boot sim) — no alert, no
deadlock. All 9 services 1/1. Running server unaffected. Adversary PASS @13:44Z.
BUILDER-INBOX consumed.
This commit is contained in:
autonomic-bot
2026-06-13 13:47:42 +00:00
parent 927cbfa747
commit 162f731e91
3 changed files with 60 additions and 83 deletions

View File

@ -17,68 +17,32 @@ One non-blocking documentation note from Adversary: STATUS claim said "999 error
---
## Gate: M2 — AWAITING ORCHESTRATOR nixos-rebuild
## Gate: M2 — PASS @2026-06-13T13:44Z (Adversary cold-verified)
M2 requires the orchestrator to deploy the fix to the live cc-ci host and verify deploy-proxy completes without deadlock.
See REVIEW-pxgate.md for full evidence. Summary:
- nixos-rebuild at ~13:43 UTC; deploy-proxy re-ran with new nix store path `8qjh8apxcbs85asgizkymjskicf4zmsl`
- New runner `/nix/store/5hic3aba65i88m1ib67b7g6dwzrzd1z2-runner/warm_reconcile.py` confirmed: `health_path="/api/version"`, `health_domain` absent → probe is `traefik.ci.commoninternet.net/api/version`
- deploy-proxy `active (exited)` in 279ms (nixos-rebuild run) and 17ms (cold-boot sim) — no alert, no deadlock
- Cold-boot simulation: dashboard stopped → proxy started → `active (exited)` immediately ✓ → dashboard restored ✓
- All 9 services 1/1 after rebuild and cold-boot sim; `ci.commoninternet.net` → 200; `/api/version` → 200
- Rollback path unchanged: `health_code()` returns 0 on curl failure → 0 ∉ `health_ok=(200,)` → rollback ✓
- A1/DEFERRED entry closed (at M1); consumer ordering unchanged ✓
### WHAT is needed from the orchestrator
---
Run `nixos-rebuild switch` on cc-ci. The builder-clone **has been pre-staged** (checked out to `main` at `d23baf8` — 2026-06-13T13:35Z). The orchestrator only needs to run nixos-rebuild:
## DONE
```bash
ssh cc-ci 'cd /root/builder-clone && git checkout main && git pull && git log --oneline -1'
# EXPECTED: d23baf8 (or newer) review(pxgate): idle break-it probes PASS @13:31Z...
Phase pxgate complete. All Definition-of-Done items met and Adversary-verified:
nixos-rebuild switch --flake "git+file:///root/builder-clone?submodules=1#cc-ci"
```
| Item | Status | Evidence |
|---|---|---|
| Cycle broken (deploy-proxy↔dashboard) | ✅ | Cold-boot sim: proxy active (exited) without dashboard |
| Dashboard-independent health gate | ✅ | `traefik.ci.commoninternet.net/api/version` — traefik's own API |
| Rollback intact | ✅ | Gate returns 0 on failure → not in (200,) → rollback triggered |
| No consumer mis-ordered | ✅ | Adversary P3 probe: all After=deploy-proxy consumers unchanged |
| Running server unaffected | ✅ | All 9 services 1/1; ci.commoninternet.net → 200 |
| A1/DEFERRED closed | ✅ | DEFERRED.md entry closed at M1; DECISIONS.md updated |
| M1 Adversary PASS | ✅ | REVIEW-pxgate.md @2026-06-13T13:00Z |
| M2 Adversary PASS | ✅ | REVIEW-pxgate.md @2026-06-13T13:44Z |
Note: `git checkout main` is included as a safeguard — the builder-clone was previously on `restructure/concurrency`; it is now on `main` but the checkout ensures correctness if it drifts.
This rebuilds the nix store with the new `runner/warm_reconcile.py` and restarts `deploy-proxy.service` (unit script path changes → systemd restarts it on daemon-reload).
### HOW the Adversary verifies M2 (after nixos-rebuild)
```bash
# 1. deploy-proxy is active (not failed):
ssh cc-ci 'systemctl status deploy-proxy --no-pager | head -10'
# EXPECTED: Active: active (exited)
# 2. New nix store path is in use:
ssh cc-ci 'systemctl cat cc-ci-reconcile-proxy 2>/dev/null || cat $(systemctl cat deploy-proxy | grep ExecStart | awk "{print \$2}")'
# OR:
ssh cc-ci 'grep -r "api/version" /nix/store/*cc-ci-reconcile-proxy*/bin/ 2>/dev/null | head -3'
# EXPECTED: /api/version appears in the reconcile script (new nix store path)
# 3. All services still up (running server unaffected):
ssh cc-ci 'docker service ls --format "{{.Name}}\t{{.Replicas}}"'
# EXPECTED: all services 1/1 (or their normal replica count)
# 4. Rollback path — code-proof (no live rollback test needed; logic unchanged):
# health_code() line 276: returns int(r.stdout.strip() or "0")
# → on curl failure: stdout="000" → int("000")=0 → 0 ∉ health_ok=(200,) → wait_healthy returns False
# → upgrade path: unhealthy → write_alert + roll back to last_good
# → no-op path: unhealthy → try redeploy → if still bad → write_alert
# Unchanged from pre-fix; M1 confirms endpoint is dashboard-independent.
# 5. Cold-boot simulation (optional but durable — run if not doing a fresh VM):
ssh cc-ci 'systemctl stop deploy-dashboard'
ssh cc-ci 'systemctl stop deploy-proxy && systemctl reset-failed deploy-proxy'
ssh cc-ci 'systemctl start deploy-proxy'
ssh cc-ci 'systemctl status deploy-proxy --no-pager | head -5'
# EXPECTED: Active: active (exited) WITHOUT needing deploy-dashboard running
ssh cc-ci 'systemctl start deploy-dashboard'
```
### EXPECTED M2 outcomes
| Check | Expected |
|---|---|
| deploy-proxy after nixos-rebuild | `active (exited)` |
| `/api/version` in nix store reconcile script | present |
| All services 1/1 | yes |
| Cold-boot sim (proxy starts without dashboard) | `active (exited)` |
| Running server unaffected | all routes return expected codes |
### WHERE
Fix commit: `0e9fd38` (on origin/main). nixos-rebuild command: `nixos-rebuild switch --flake "git+file:///root/builder-clone?submodules=1#cc-ci"` (pull main first).
Fix commit: `0e9fd38` (`claim(pxgate-M1): change traefik health probe to /api/version`). Live since nixos-rebuild @2026-06-13T13:43Z.