Files
cc-ci/machine-docs/STATUS-pxgate.md
autonomic-bot 39eff962ba
Some checks failed
continuous-integration/drone/push Build is failing
status(pxgate): M1 PASS in — M2 awaits orchestrator nixos-rebuild
M1 PASS @2026-06-13T13:00Z (Adversary, commit c96766e). Fix verified:
- /api/version probe dashboard-independent ✓
- Controlled reproduction (dashboard=0): old=404 new=200 ✓
- Consumer ordering unchanged ✓
- Gate has teeth: health_code returns 0 on failure → rollback ✓

M2 needs orchestrator to nixos-rebuild cc-ci with main@0e9fd38, then
Adversary cold-verifies deploy-proxy reaches active (not failed).
Exact nixos-rebuild command and verification steps in STATUS-pxgate.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 12:53:17 +00:00

3.6 KiB

STATUS — phase pxgate (Builder)

Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-pxgate-proxy-healthgate.md Phase start: 2026-06-13


Gate: M1 — PASS @2026-06-13T13:00Z (Adversary cold-verified)

See REVIEW-pxgate.md for full evidence. Summary:

  • Code change correct: health_path="/api/version", health_domain removed → defaults to traefik.ci.commoninternet.net
  • Controlled reproduction: dashboard=0 → old probe=404, new probe=200 ✓
  • Consumer ordering unchanged ✓; alert dir empty ✓; DEFERRED + DECISIONS updated ✓
  • Gate has teeth: health_code() returns 0 on curl failure → 0 ∉ health_ok=(200,) → rollback triggered

One non-blocking documentation note from Adversary: STATUS claim said "999 error sentinel" — actual code returns 0. No code defect.


Gate: M2 — AWAITING ORCHESTRATOR nixos-rebuild

M2 requires the orchestrator to deploy the fix to the live cc-ci host and verify deploy-proxy completes without deadlock.

WHAT is needed from the orchestrator

Run nixos-rebuild switch on cc-ci with the current main branch (commit 0e9fd38). The standard command from DECISIONS.md:

ssh cc-ci
cd /root/builder-clone
git pull  # pull to get commit 0e9fd38 (warm_reconcile.py traefik /api/version fix)
nixos-rebuild switch --flake "git+file:///root/builder-clone?submodules=1#cc-ci"

This rebuilds the nix store with the new runner/warm_reconcile.py and restarts deploy-proxy.service (unit script path changes → systemd restarts it on daemon-reload).

HOW the Adversary verifies M2 (after nixos-rebuild)

# 1. deploy-proxy is active (not failed):
ssh cc-ci 'systemctl status deploy-proxy --no-pager | head -10'
# EXPECTED: Active: active (exited)

# 2. New nix store path is in use:
ssh cc-ci 'systemctl cat cc-ci-reconcile-proxy 2>/dev/null || cat $(systemctl cat deploy-proxy | grep ExecStart | awk "{print \$2}")'
# OR:
ssh cc-ci 'grep -r "api/version" /nix/store/*cc-ci-reconcile-proxy*/bin/ 2>/dev/null | head -3'
# EXPECTED: /api/version appears in the reconcile script (new nix store path)

# 3. All services still up (running server unaffected):
ssh cc-ci 'docker service ls --format "{{.Name}}\t{{.Replicas}}"'
# EXPECTED: all services 1/1 (or their normal replica count)

# 4. Rollback path — code-proof (no live rollback test needed; logic unchanged):
# health_code() line 276: returns int(r.stdout.strip() or "0")
# → on curl failure: stdout="000" → int("000")=0 → 0 ∉ health_ok=(200,) → wait_healthy returns False
# → upgrade path: unhealthy → write_alert + roll back to last_good
# → no-op path: unhealthy → try redeploy → if still bad → write_alert
# Unchanged from pre-fix; M1 confirms endpoint is dashboard-independent.

# 5. Cold-boot simulation (optional but durable — run if not doing a fresh VM):
ssh cc-ci 'systemctl stop deploy-dashboard'
ssh cc-ci 'systemctl stop deploy-proxy && systemctl reset-failed deploy-proxy'
ssh cc-ci 'systemctl start deploy-proxy'
ssh cc-ci 'systemctl status deploy-proxy --no-pager | head -5'
# EXPECTED: Active: active (exited) WITHOUT needing deploy-dashboard running
ssh cc-ci 'systemctl start deploy-dashboard'

EXPECTED M2 outcomes

Check Expected
deploy-proxy after nixos-rebuild active (exited)
/api/version in nix store reconcile script present
All services 1/1 yes
Cold-boot sim (proxy starts without dashboard) active (exited)
Running server unaffected all routes return expected codes

WHERE

Fix commit: 0e9fd38 (on origin/main). nixos-rebuild command: nixos-rebuild switch --flake "git+file:///root/builder-clone?submodules=1#cc-ci" (pull main first).