claim(2w): WC6 nightly full-cold sweep — timer+service roll warm/infra (health-gated) then serial cold sweep promoting canonicals (WC5); proven live
canonical.enrolled_recipes; runner/nightly_sweep.py (roll keycloak+traefik → serial full-cold over enrolled on latest → green promotes; skip if test active; operate against CCCI_REPO checkout for tests/); nix/modules/nightly-sweep.nix (timer 03:00 Persistent + oneshot service) wired in. 2 bugs fixed via live service run (repo-relative enrolled scan; util-linux for backup PTY). Live SERVICE sweep: enrolled=['custom-html'] → all tiers green → canonical advanced 1.10.0→1.11.0; red-run correctly does NOT promote. 71 unit pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -358,3 +358,22 @@ canonical at latest separately (one extra deploy) so the old known-good is never
|
||||
(DECISIONS Phase-2w WC5). Next: WC6 nightly sweep (systemd timer: nixos-rebuild switch FIRST then
|
||||
serial cold sweep over enrolled recipes; need canonical.enrolled_recipes() + a nightly-sweep nix
|
||||
module). Building WC6 code while the Adversary verifies WC5.
|
||||
|
||||
## 2026-05-29 — W3 WC6 nightly full-cold sweep built + proven (systemd service); claiming. WC5+WC6 close W3.
|
||||
|
||||
canonical.enrolled_recipes() (scan tests/*/recipe_meta.py for WARM_CANONICAL). runner/nightly_sweep.py
|
||||
(roll keycloak+traefik via warm_reconcile health-gated → serial full-cold over enrolled recipes on
|
||||
latest → each green promotes WC5; skip if a run is active; per-recipe red reported not fatal).
|
||||
nix/modules/nightly-sweep.nix = systemd timer (OnCalendar 03:00 Persistent +RandomizedDelay) + oneshot
|
||||
service; wired into configuration.nix. 71 unit pass.
|
||||
|
||||
Two bugs found via the live SERVICE run (not the direct run): (1) the store packages only runner/ (not
|
||||
tests/), so enrolled_recipes scanned a nonexistent store/tests → []; fixed nightly_sweep to operate
|
||||
against $CCCI_REPO=/root/cc-ci (the checkout with tests/) — same place run_recipe_ci runs from. (2) the
|
||||
sweep wrapper's runtimeInputs lacked util-linux → abra's backup/restore PTY (`script`) failed → backup
|
||||
red; added util-linux (matching cc-ci-run). After both fixes, the live SERVICE sweep: enrolled=
|
||||
['custom-html'] → all 5 tiers green → WC5 promote advanced canonical 1.10.0→1.11.0+1.29.0; timer active
|
||||
(next ~03:00). Also confirmed the red-run path (the util-linux flake) correctly did NOT promote
|
||||
(known-good stayed 1.10.0 — never lose known-good). W3 (WC5+WC6) essentially closed. Remaining:
|
||||
WC8 (resource/isolation hardening — mostly already in place) + WC9 (docs + --quick rollback proof,
|
||||
already shown) → then DONE.
|
||||
|
||||
@ -39,7 +39,12 @@ nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversa
|
||||
snapshot+registry; never lose known-good). Proven live: green cold custom-html run advanced the
|
||||
canonical 1.10.0+1.28.0 → 1.11.0+1.29.0 (snapshot refreshed, idle, per-run app torn down).
|
||||
`--quick` never promotes (W2). **Adversary PASS @2026-05-29** (REVIEW-2w 5bbc47c, gate 125453d).
|
||||
- [ ] **WC6** — Nightly full-cold sweep (scheduled, declarative, MAX_TESTS-bounded).
|
||||
- [x] **WC6** — Nightly full-cold sweep. `nix/modules/nightly-sweep.nix` (systemd TIMER OnCalendar
|
||||
03:00 Persistent + oneshot service) → `runner/nightly_sweep.py`: roll warm/infra (keycloak+traefik
|
||||
health-gated, WC1.1) → SERIAL full-cold run over enrolled (`canonical.enrolled_recipes`) recipes
|
||||
on latest → each green run promotes its canonical (WC5); skips if a test is in flight. Proven via
|
||||
the live service: enrolled=['custom-html'] → all tiers green → canonical advanced 1.10.0→1.11.0.
|
||||
**CLAIMED — see Gate.**
|
||||
- [x] **WC7** — Trigger/authority/labeling: default `!testme`=cold (unchanged); `--quick` opt-in via
|
||||
bridge `parse_trigger` (`!testme --quick` → CCCI_QUICK=1 Drone param, deployed+live-verified);
|
||||
never gates merge; runs carry mode=quick (lower-confidence label); clean no-canonical fallback
|
||||
@ -133,6 +138,30 @@ headline e2e is green (below). No recipe/harness change needed.
|
||||
|
||||
## Gate
|
||||
|
||||
### Gate: WC6 — CLAIMED, awaiting Adversary (@2026-05-29)
|
||||
|
||||
**WHAT.** Nightly full-cold sweep: a scheduled job rolls warm/infra to latest (health-gated, WC1.1)
|
||||
then runs the full COLD suite serially across enrolled canonical recipes on latest — refreshing each
|
||||
canonical's known-good (WC5) + a daily authoritative regression. Declarative, MAX_TESTS-bounded
|
||||
(serial), skips if a test is in flight. **WHERE:** `nix/modules/nightly-sweep.nix` (timer+service),
|
||||
`runner/nightly_sweep.py`, `runner/harness/canonical.py` (`enrolled_recipes`). Wired into
|
||||
`hosts/cc-ci/configuration.nix`.
|
||||
|
||||
**HOW + EXPECTED (cold):**
|
||||
1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **71 passed** (incl. test_canonical enrolled_recipes).
|
||||
2. **Timer present:** `systemctl is-active nightly-sweep.timer` → active; `systemctl list-timers
|
||||
nightly-sweep.timer` → next ~03:00 (Persistent).
|
||||
3. **Live sweep (via the systemd SERVICE, store copy):** set the custom-html canonical to an OLDER
|
||||
version, then `systemctl start nightly-sweep.service` → journal shows: roll keycloak rc=0 + traefik
|
||||
rc=0 (health-gated, noop at latest); `enrolled canonicals = ['custom-html']`; full-cold custom-html
|
||||
install/upgrade/backup/restore/custom **all pass**; `WC5 promote: canonical custom-html advanced to
|
||||
known-good 1.11.0+1.29.0`; `custom-html: PASS`; afterwards `canonical.json` version ADVANCED to
|
||||
1.11.0+1.29.0, canonical idle, traefik+keycloak 200, system running. Builder ran this live: **PASS**.
|
||||
(A red recipe in the sweep is reported FAIL + does NOT promote — known-good safe; verified when a
|
||||
missing-util-linux backup flake red'd a run and the canonical stayed put, then fixed.)
|
||||
|
||||
---
|
||||
|
||||
### Gate: WC5 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 5bbc47c, gate 125453d)
|
||||
Anti-poison gate predicate + live advancement 1.10.0→1.11.0 (cold-only) cold-verified. Builder may
|
||||
proceed to WC6. (claim detail retained below.)
|
||||
|
||||
@ -17,6 +17,7 @@
|
||||
../../modules/backupbot.nix
|
||||
../../modules/harness.nix
|
||||
../../modules/warm-keycloak.nix
|
||||
../../modules/nightly-sweep.nix
|
||||
];
|
||||
|
||||
# --- Tailscale (ACCESS-CRITICAL: do not break, this is the only route in) ---
|
||||
|
||||
46
nix/modules/nightly-sweep.nix
Normal file
46
nix/modules/nightly-sweep.nix
Normal file
@ -0,0 +1,46 @@
|
||||
# Phase 2w / WC6 — nightly full-cold sweep. A systemd TIMER fires nightly and runs
|
||||
# `runner/nightly_sweep.py`: roll warm/infra (keycloak+traefik) to latest health-gated (WC1.1) THEN
|
||||
# a SERIAL full-cold run across enrolled (WARM_CANONICAL) recipes on latest — each green run
|
||||
# promotes/refreshes that recipe's canonical (WC5), serving as the daily authoritative regression.
|
||||
# Serial = MAX_TESTS honored (one at a time); skips itself if a test is already in flight. Declarative
|
||||
# + reproducible (runner/ packaged in the nix store, D8-clean).
|
||||
{ pkgs, ... }:
|
||||
let
|
||||
runnerSrc = ../../runner;
|
||||
# The sweep drives run_recipe_ci.py (pytest/playwright) — needs the full harness env like cc-ci-run.
|
||||
pyEnv = pkgs.python3.withPackages (ps: with ps; [ pytest playwright ]);
|
||||
sweep = pkgs.writeShellApplication {
|
||||
name = "cc-ci-nightly-sweep";
|
||||
# util-linux provides `script` (abra's PTY wrapper for backup/restore TTY ops) — same as cc-ci-run.
|
||||
runtimeInputs = with pkgs; [ abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps ];
|
||||
text = ''
|
||||
export HOME=/root
|
||||
export PLAYWRIGHT_BROWSERS_PATH=${pkgs.playwright-driver.browsers}
|
||||
export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
|
||||
exec ${pyEnv}/bin/python3 ${runnerSrc}/nightly_sweep.py
|
||||
'';
|
||||
};
|
||||
in
|
||||
{
|
||||
systemd.services.nightly-sweep = {
|
||||
description = "Phase-2w nightly: roll warm/infra (health-gated) + full-cold sweep over canonicals";
|
||||
after = [ "deploy-proxy.service" "warm-keycloak.service" "docker.service" ];
|
||||
environment.HOME = "/root";
|
||||
serviceConfig = {
|
||||
Type = "oneshot";
|
||||
# A full sweep across several recipes (each a cold deploy/test/teardown) is long; bound it.
|
||||
TimeoutStartSec = "21600"; # 6h ceiling
|
||||
ExecStart = "${sweep}/bin/cc-ci-nightly-sweep";
|
||||
};
|
||||
};
|
||||
|
||||
systemd.timers.nightly-sweep = {
|
||||
description = "Nightly trigger for the Phase-2w full-cold canonical sweep (WC6)";
|
||||
wantedBy = [ "timers.target" ];
|
||||
timerConfig = {
|
||||
OnCalendar = "*-*-* 03:00:00";
|
||||
Persistent = true; # catch up a missed nightly after downtime
|
||||
RandomizedDelaySec = "600";
|
||||
};
|
||||
};
|
||||
}
|
||||
@ -48,6 +48,20 @@ def canonical_domain(recipe: str) -> str:
|
||||
return warm.stable_domain(recipe)
|
||||
|
||||
|
||||
def enrolled_recipes() -> list[str]:
|
||||
"""All recipes enrolled as data-warm canonicals (recipe_meta.WARM_CANONICAL=True), sorted. Used
|
||||
by the WC6 nightly sweep to know which canonicals to refresh via a green cold run on latest."""
|
||||
tests_dir = os.path.join(os.path.dirname(__file__), "..", "..", "tests")
|
||||
out = []
|
||||
try:
|
||||
for name in sorted(os.listdir(tests_dir)):
|
||||
if os.path.isfile(os.path.join(tests_dir, name, "recipe_meta.py")) and is_enrolled(name):
|
||||
out.append(name)
|
||||
except OSError:
|
||||
pass
|
||||
return out
|
||||
|
||||
|
||||
def registry_path(recipe: str) -> str:
|
||||
return os.path.join(warmsnap.app_dir(recipe), "canonical.json")
|
||||
|
||||
|
||||
86
runner/nightly_sweep.py
Normal file
86
runner/nightly_sweep.py
Normal file
@ -0,0 +1,86 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Nightly full-cold sweep (Phase 2w / WC6).
|
||||
|
||||
Invoked by the `nightly-sweep` systemd timer (nix/modules/nightly-sweep.nix). Order (plan WC6):
|
||||
1. Roll warm/infra to latest, HEALTH-GATED (WC1.1): re-run the keycloak + traefik reconcilers
|
||||
(warm_reconcile.py <app> — fetch latest recipe → deploy → health-gate → commit/rollback+alert).
|
||||
This is the health-gated "warm/infra → latest" step; a full operator `nixos-rebuild switch` is
|
||||
the config-deploy path, not the autonomous nightly's job (DECISIONS Phase-2w WC6).
|
||||
2. FULL-COLD sweep across enrolled (WARM_CANONICAL) recipes, SERIAL (MAX_TESTS honored — one at a
|
||||
time), each `RECIPE=<r> run_recipe_ci.py` on LATEST (no REF) → a green run promotes/refreshes
|
||||
that recipe's canonical (WC5). Serves as the daily authoritative regression.
|
||||
|
||||
MUST NOT run while a test/Drone build is in flight: if a `run_recipe_ci.py` is already active, skip
|
||||
this nightly (defer to the next) rather than pile on the single node. Bounded + serial. Exit 0 even
|
||||
if some recipes fail (logs per-recipe results; a red recipe just doesn't advance its canonical).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
# The sweep drives the recipe RUNS (run_recipe_ci) + reads enrollment (tests/<r>/recipe_meta.py),
|
||||
# which live in the cc-ci CHECKOUT (the nix store packages only runner/, not tests/). So operate
|
||||
# against $CCCI_REPO (default /root/cc-ci) — the same checkout run_recipe_ci already runs from.
|
||||
REPO = os.environ.get("CCCI_REPO", "/root/cc-ci")
|
||||
sys.path.insert(0, os.path.join(REPO, "runner"))
|
||||
from harness import canonical # noqa: E402
|
||||
|
||||
WARM_APPS = ["keycloak", "traefik"] # the live-warm/infra reconcilers to roll first (health-gated)
|
||||
|
||||
|
||||
def _here() -> str:
|
||||
return os.path.join(REPO, "runner")
|
||||
|
||||
|
||||
def _another_run_active() -> bool:
|
||||
"""True if a run_recipe_ci.py is already executing (don't pile onto the single node)."""
|
||||
r = subprocess.run(["pgrep", "-f", "run_recipe_ci.py"], capture_output=True, text=True)
|
||||
mine = str(os.getpid())
|
||||
pids = [p for p in r.stdout.split() if p and p != mine]
|
||||
return bool(pids)
|
||||
|
||||
|
||||
def roll_warm_infra() -> None:
|
||||
"""Re-run the health-gated reconcilers so keycloak + traefik roll to latest (WC1.1)."""
|
||||
for app in WARM_APPS:
|
||||
print(f"\n===== nightly: roll warm/infra {app} (health-gated) =====", flush=True)
|
||||
rc = subprocess.run(
|
||||
[sys.executable, os.path.join(_here(), "warm_reconcile.py"), app]
|
||||
).returncode
|
||||
print(f"nightly: reconcile {app} rc={rc}", flush=True)
|
||||
|
||||
|
||||
def sweep() -> int:
|
||||
recipes = canonical.enrolled_recipes()
|
||||
print(f"\n===== nightly cold sweep: enrolled canonicals = {recipes} =====", flush=True)
|
||||
results: dict[str, int] = {}
|
||||
for r in recipes:
|
||||
print(f"\n===== nightly: full-cold {r} (latest) =====", flush=True)
|
||||
env = dict(os.environ, RECIPE=r)
|
||||
env.pop("REF", None) # latest, not a PR head
|
||||
env.pop("CCCI_QUICK", None)
|
||||
env.pop("MODE", None)
|
||||
rc = subprocess.run(
|
||||
[sys.executable, os.path.join(_here(), "run_recipe_ci.py")], env=env
|
||||
).returncode
|
||||
results[r] = rc
|
||||
print(f"nightly: {r} rc={rc} ({'green→canonical refreshed' if rc == 0 else 'red'})", flush=True)
|
||||
print("\n===== nightly sweep summary =====", flush=True)
|
||||
for r, rc in results.items():
|
||||
print(f" {r}: {'PASS' if rc == 0 else 'FAIL'}", flush=True)
|
||||
return 0 # the sweep itself succeeds; per-recipe reds are reported, not fatal
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if _another_run_active():
|
||||
print("nightly: a run_recipe_ci.py is active — skipping this nightly (defer)", flush=True)
|
||||
return 0
|
||||
roll_warm_infra()
|
||||
return sweep()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
@ -59,3 +59,18 @@ def test_registry_roundtrip(tmp_path, monkeypatch):
|
||||
# the file is valid JSON on disk
|
||||
with open(canonical.registry_path("custom-html")) as f:
|
||||
assert json.load(f)["status"] == "warm"
|
||||
|
||||
|
||||
def test_enrolled_recipes_scans_meta(tmp_path, monkeypatch):
|
||||
# enrolled_recipes() lists recipes whose tests/<r>/recipe_meta.py sets WARM_CANONICAL=True.
|
||||
fake_harness = tmp_path / "runner" / "harness"
|
||||
fake_harness.mkdir(parents=True)
|
||||
monkeypatch.setattr(canonical, "__file__", str(fake_harness / "canonical.py"))
|
||||
for name, body in (("aaa", "WARM_CANONICAL = True\n"),
|
||||
("bbb", "DEPS=['x']\n"),
|
||||
("ccc", "WARM_CANONICAL = True\n")):
|
||||
d = tmp_path / "tests" / name
|
||||
d.mkdir(parents=True)
|
||||
(d / "recipe_meta.py").write_text(body)
|
||||
(tmp_path / "tests" / "ddd").mkdir(parents=True) # no recipe_meta.py at all
|
||||
assert canonical.enrolled_recipes() == ["aaa", "ccc"]
|
||||
|
||||
Reference in New Issue
Block a user