claim(2w): WC6 nightly full-cold sweep — timer+service roll warm/infra (health-gated) then serial cold sweep promoting canonicals (WC5); proven live
canonical.enrolled_recipes; runner/nightly_sweep.py (roll keycloak+traefik → serial full-cold over enrolled on latest → green promotes; skip if test active; operate against CCCI_REPO checkout for tests/); nix/modules/nightly-sweep.nix (timer 03:00 Persistent + oneshot service) wired in. 2 bugs fixed via live service run (repo-relative enrolled scan; util-linux for backup PTY). Live SERVICE sweep: enrolled=['custom-html'] → all tiers green → canonical advanced 1.10.0→1.11.0; red-run correctly does NOT promote. 71 unit pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -358,3 +358,22 @@ canonical at latest separately (one extra deploy) so the old known-good is never
|
|||||||
(DECISIONS Phase-2w WC5). Next: WC6 nightly sweep (systemd timer: nixos-rebuild switch FIRST then
|
(DECISIONS Phase-2w WC5). Next: WC6 nightly sweep (systemd timer: nixos-rebuild switch FIRST then
|
||||||
serial cold sweep over enrolled recipes; need canonical.enrolled_recipes() + a nightly-sweep nix
|
serial cold sweep over enrolled recipes; need canonical.enrolled_recipes() + a nightly-sweep nix
|
||||||
module). Building WC6 code while the Adversary verifies WC5.
|
module). Building WC6 code while the Adversary verifies WC5.
|
||||||
|
|
||||||
|
## 2026-05-29 — W3 WC6 nightly full-cold sweep built + proven (systemd service); claiming. WC5+WC6 close W3.
|
||||||
|
|
||||||
|
canonical.enrolled_recipes() (scan tests/*/recipe_meta.py for WARM_CANONICAL). runner/nightly_sweep.py
|
||||||
|
(roll keycloak+traefik via warm_reconcile health-gated → serial full-cold over enrolled recipes on
|
||||||
|
latest → each green promotes WC5; skip if a run is active; per-recipe red reported not fatal).
|
||||||
|
nix/modules/nightly-sweep.nix = systemd timer (OnCalendar 03:00 Persistent +RandomizedDelay) + oneshot
|
||||||
|
service; wired into configuration.nix. 71 unit pass.
|
||||||
|
|
||||||
|
Two bugs found via the live SERVICE run (not the direct run): (1) the store packages only runner/ (not
|
||||||
|
tests/), so enrolled_recipes scanned a nonexistent store/tests → []; fixed nightly_sweep to operate
|
||||||
|
against $CCCI_REPO=/root/cc-ci (the checkout with tests/) — same place run_recipe_ci runs from. (2) the
|
||||||
|
sweep wrapper's runtimeInputs lacked util-linux → abra's backup/restore PTY (`script`) failed → backup
|
||||||
|
red; added util-linux (matching cc-ci-run). After both fixes, the live SERVICE sweep: enrolled=
|
||||||
|
['custom-html'] → all 5 tiers green → WC5 promote advanced canonical 1.10.0→1.11.0+1.29.0; timer active
|
||||||
|
(next ~03:00). Also confirmed the red-run path (the util-linux flake) correctly did NOT promote
|
||||||
|
(known-good stayed 1.10.0 — never lose known-good). W3 (WC5+WC6) essentially closed. Remaining:
|
||||||
|
WC8 (resource/isolation hardening — mostly already in place) + WC9 (docs + --quick rollback proof,
|
||||||
|
already shown) → then DONE.
|
||||||
|
|||||||
@ -39,7 +39,12 @@ nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversa
|
|||||||
snapshot+registry; never lose known-good). Proven live: green cold custom-html run advanced the
|
snapshot+registry; never lose known-good). Proven live: green cold custom-html run advanced the
|
||||||
canonical 1.10.0+1.28.0 → 1.11.0+1.29.0 (snapshot refreshed, idle, per-run app torn down).
|
canonical 1.10.0+1.28.0 → 1.11.0+1.29.0 (snapshot refreshed, idle, per-run app torn down).
|
||||||
`--quick` never promotes (W2). **Adversary PASS @2026-05-29** (REVIEW-2w 5bbc47c, gate 125453d).
|
`--quick` never promotes (W2). **Adversary PASS @2026-05-29** (REVIEW-2w 5bbc47c, gate 125453d).
|
||||||
- [ ] **WC6** — Nightly full-cold sweep (scheduled, declarative, MAX_TESTS-bounded).
|
- [x] **WC6** — Nightly full-cold sweep. `nix/modules/nightly-sweep.nix` (systemd TIMER OnCalendar
|
||||||
|
03:00 Persistent + oneshot service) → `runner/nightly_sweep.py`: roll warm/infra (keycloak+traefik
|
||||||
|
health-gated, WC1.1) → SERIAL full-cold run over enrolled (`canonical.enrolled_recipes`) recipes
|
||||||
|
on latest → each green run promotes its canonical (WC5); skips if a test is in flight. Proven via
|
||||||
|
the live service: enrolled=['custom-html'] → all tiers green → canonical advanced 1.10.0→1.11.0.
|
||||||
|
**CLAIMED — see Gate.**
|
||||||
- [x] **WC7** — Trigger/authority/labeling: default `!testme`=cold (unchanged); `--quick` opt-in via
|
- [x] **WC7** — Trigger/authority/labeling: default `!testme`=cold (unchanged); `--quick` opt-in via
|
||||||
bridge `parse_trigger` (`!testme --quick` → CCCI_QUICK=1 Drone param, deployed+live-verified);
|
bridge `parse_trigger` (`!testme --quick` → CCCI_QUICK=1 Drone param, deployed+live-verified);
|
||||||
never gates merge; runs carry mode=quick (lower-confidence label); clean no-canonical fallback
|
never gates merge; runs carry mode=quick (lower-confidence label); clean no-canonical fallback
|
||||||
@ -133,6 +138,30 @@ headline e2e is green (below). No recipe/harness change needed.
|
|||||||
|
|
||||||
## Gate
|
## Gate
|
||||||
|
|
||||||
|
### Gate: WC6 — CLAIMED, awaiting Adversary (@2026-05-29)
|
||||||
|
|
||||||
|
**WHAT.** Nightly full-cold sweep: a scheduled job rolls warm/infra to latest (health-gated, WC1.1)
|
||||||
|
then runs the full COLD suite serially across enrolled canonical recipes on latest — refreshing each
|
||||||
|
canonical's known-good (WC5) + a daily authoritative regression. Declarative, MAX_TESTS-bounded
|
||||||
|
(serial), skips if a test is in flight. **WHERE:** `nix/modules/nightly-sweep.nix` (timer+service),
|
||||||
|
`runner/nightly_sweep.py`, `runner/harness/canonical.py` (`enrolled_recipes`). Wired into
|
||||||
|
`hosts/cc-ci/configuration.nix`.
|
||||||
|
|
||||||
|
**HOW + EXPECTED (cold):**
|
||||||
|
1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **71 passed** (incl. test_canonical enrolled_recipes).
|
||||||
|
2. **Timer present:** `systemctl is-active nightly-sweep.timer` → active; `systemctl list-timers
|
||||||
|
nightly-sweep.timer` → next ~03:00 (Persistent).
|
||||||
|
3. **Live sweep (via the systemd SERVICE, store copy):** set the custom-html canonical to an OLDER
|
||||||
|
version, then `systemctl start nightly-sweep.service` → journal shows: roll keycloak rc=0 + traefik
|
||||||
|
rc=0 (health-gated, noop at latest); `enrolled canonicals = ['custom-html']`; full-cold custom-html
|
||||||
|
install/upgrade/backup/restore/custom **all pass**; `WC5 promote: canonical custom-html advanced to
|
||||||
|
known-good 1.11.0+1.29.0`; `custom-html: PASS`; afterwards `canonical.json` version ADVANCED to
|
||||||
|
1.11.0+1.29.0, canonical idle, traefik+keycloak 200, system running. Builder ran this live: **PASS**.
|
||||||
|
(A red recipe in the sweep is reported FAIL + does NOT promote — known-good safe; verified when a
|
||||||
|
missing-util-linux backup flake red'd a run and the canonical stayed put, then fixed.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
### Gate: WC5 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 5bbc47c, gate 125453d)
|
### Gate: WC5 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 5bbc47c, gate 125453d)
|
||||||
Anti-poison gate predicate + live advancement 1.10.0→1.11.0 (cold-only) cold-verified. Builder may
|
Anti-poison gate predicate + live advancement 1.10.0→1.11.0 (cold-only) cold-verified. Builder may
|
||||||
proceed to WC6. (claim detail retained below.)
|
proceed to WC6. (claim detail retained below.)
|
||||||
|
|||||||
@ -17,6 +17,7 @@
|
|||||||
../../modules/backupbot.nix
|
../../modules/backupbot.nix
|
||||||
../../modules/harness.nix
|
../../modules/harness.nix
|
||||||
../../modules/warm-keycloak.nix
|
../../modules/warm-keycloak.nix
|
||||||
|
../../modules/nightly-sweep.nix
|
||||||
];
|
];
|
||||||
|
|
||||||
# --- Tailscale (ACCESS-CRITICAL: do not break, this is the only route in) ---
|
# --- Tailscale (ACCESS-CRITICAL: do not break, this is the only route in) ---
|
||||||
|
|||||||
46
nix/modules/nightly-sweep.nix
Normal file
46
nix/modules/nightly-sweep.nix
Normal file
@ -0,0 +1,46 @@
|
|||||||
|
# Phase 2w / WC6 — nightly full-cold sweep. A systemd TIMER fires nightly and runs
|
||||||
|
# `runner/nightly_sweep.py`: roll warm/infra (keycloak+traefik) to latest health-gated (WC1.1) THEN
|
||||||
|
# a SERIAL full-cold run across enrolled (WARM_CANONICAL) recipes on latest — each green run
|
||||||
|
# promotes/refreshes that recipe's canonical (WC5), serving as the daily authoritative regression.
|
||||||
|
# Serial = MAX_TESTS honored (one at a time); skips itself if a test is already in flight. Declarative
|
||||||
|
# + reproducible (runner/ packaged in the nix store, D8-clean).
|
||||||
|
{ pkgs, ... }:
|
||||||
|
let
|
||||||
|
runnerSrc = ../../runner;
|
||||||
|
# The sweep drives run_recipe_ci.py (pytest/playwright) — needs the full harness env like cc-ci-run.
|
||||||
|
pyEnv = pkgs.python3.withPackages (ps: with ps; [ pytest playwright ]);
|
||||||
|
sweep = pkgs.writeShellApplication {
|
||||||
|
name = "cc-ci-nightly-sweep";
|
||||||
|
# util-linux provides `script` (abra's PTY wrapper for backup/restore TTY ops) — same as cc-ci-run.
|
||||||
|
runtimeInputs = with pkgs; [ abra docker git curl jq gnused gnugrep gnutar coreutils util-linux procps ];
|
||||||
|
text = ''
|
||||||
|
export HOME=/root
|
||||||
|
export PLAYWRIGHT_BROWSERS_PATH=${pkgs.playwright-driver.browsers}
|
||||||
|
export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
|
||||||
|
exec ${pyEnv}/bin/python3 ${runnerSrc}/nightly_sweep.py
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
in
|
||||||
|
{
|
||||||
|
systemd.services.nightly-sweep = {
|
||||||
|
description = "Phase-2w nightly: roll warm/infra (health-gated) + full-cold sweep over canonicals";
|
||||||
|
after = [ "deploy-proxy.service" "warm-keycloak.service" "docker.service" ];
|
||||||
|
environment.HOME = "/root";
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
# A full sweep across several recipes (each a cold deploy/test/teardown) is long; bound it.
|
||||||
|
TimeoutStartSec = "21600"; # 6h ceiling
|
||||||
|
ExecStart = "${sweep}/bin/cc-ci-nightly-sweep";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
systemd.timers.nightly-sweep = {
|
||||||
|
description = "Nightly trigger for the Phase-2w full-cold canonical sweep (WC6)";
|
||||||
|
wantedBy = [ "timers.target" ];
|
||||||
|
timerConfig = {
|
||||||
|
OnCalendar = "*-*-* 03:00:00";
|
||||||
|
Persistent = true; # catch up a missed nightly after downtime
|
||||||
|
RandomizedDelaySec = "600";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
}
|
||||||
@ -48,6 +48,20 @@ def canonical_domain(recipe: str) -> str:
|
|||||||
return warm.stable_domain(recipe)
|
return warm.stable_domain(recipe)
|
||||||
|
|
||||||
|
|
||||||
|
def enrolled_recipes() -> list[str]:
|
||||||
|
"""All recipes enrolled as data-warm canonicals (recipe_meta.WARM_CANONICAL=True), sorted. Used
|
||||||
|
by the WC6 nightly sweep to know which canonicals to refresh via a green cold run on latest."""
|
||||||
|
tests_dir = os.path.join(os.path.dirname(__file__), "..", "..", "tests")
|
||||||
|
out = []
|
||||||
|
try:
|
||||||
|
for name in sorted(os.listdir(tests_dir)):
|
||||||
|
if os.path.isfile(os.path.join(tests_dir, name, "recipe_meta.py")) and is_enrolled(name):
|
||||||
|
out.append(name)
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
def registry_path(recipe: str) -> str:
|
def registry_path(recipe: str) -> str:
|
||||||
return os.path.join(warmsnap.app_dir(recipe), "canonical.json")
|
return os.path.join(warmsnap.app_dir(recipe), "canonical.json")
|
||||||
|
|
||||||
|
|||||||
86
runner/nightly_sweep.py
Normal file
86
runner/nightly_sweep.py
Normal file
@ -0,0 +1,86 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Nightly full-cold sweep (Phase 2w / WC6).
|
||||||
|
|
||||||
|
Invoked by the `nightly-sweep` systemd timer (nix/modules/nightly-sweep.nix). Order (plan WC6):
|
||||||
|
1. Roll warm/infra to latest, HEALTH-GATED (WC1.1): re-run the keycloak + traefik reconcilers
|
||||||
|
(warm_reconcile.py <app> — fetch latest recipe → deploy → health-gate → commit/rollback+alert).
|
||||||
|
This is the health-gated "warm/infra → latest" step; a full operator `nixos-rebuild switch` is
|
||||||
|
the config-deploy path, not the autonomous nightly's job (DECISIONS Phase-2w WC6).
|
||||||
|
2. FULL-COLD sweep across enrolled (WARM_CANONICAL) recipes, SERIAL (MAX_TESTS honored — one at a
|
||||||
|
time), each `RECIPE=<r> run_recipe_ci.py` on LATEST (no REF) → a green run promotes/refreshes
|
||||||
|
that recipe's canonical (WC5). Serves as the daily authoritative regression.
|
||||||
|
|
||||||
|
MUST NOT run while a test/Drone build is in flight: if a `run_recipe_ci.py` is already active, skip
|
||||||
|
this nightly (defer to the next) rather than pile on the single node. Bounded + serial. Exit 0 even
|
||||||
|
if some recipes fail (logs per-recipe results; a red recipe just doesn't advance its canonical).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
|
||||||
|
# The sweep drives the recipe RUNS (run_recipe_ci) + reads enrollment (tests/<r>/recipe_meta.py),
|
||||||
|
# which live in the cc-ci CHECKOUT (the nix store packages only runner/, not tests/). So operate
|
||||||
|
# against $CCCI_REPO (default /root/cc-ci) — the same checkout run_recipe_ci already runs from.
|
||||||
|
REPO = os.environ.get("CCCI_REPO", "/root/cc-ci")
|
||||||
|
sys.path.insert(0, os.path.join(REPO, "runner"))
|
||||||
|
from harness import canonical # noqa: E402
|
||||||
|
|
||||||
|
WARM_APPS = ["keycloak", "traefik"] # the live-warm/infra reconcilers to roll first (health-gated)
|
||||||
|
|
||||||
|
|
||||||
|
def _here() -> str:
|
||||||
|
return os.path.join(REPO, "runner")
|
||||||
|
|
||||||
|
|
||||||
|
def _another_run_active() -> bool:
|
||||||
|
"""True if a run_recipe_ci.py is already executing (don't pile onto the single node)."""
|
||||||
|
r = subprocess.run(["pgrep", "-f", "run_recipe_ci.py"], capture_output=True, text=True)
|
||||||
|
mine = str(os.getpid())
|
||||||
|
pids = [p for p in r.stdout.split() if p and p != mine]
|
||||||
|
return bool(pids)
|
||||||
|
|
||||||
|
|
||||||
|
def roll_warm_infra() -> None:
|
||||||
|
"""Re-run the health-gated reconcilers so keycloak + traefik roll to latest (WC1.1)."""
|
||||||
|
for app in WARM_APPS:
|
||||||
|
print(f"\n===== nightly: roll warm/infra {app} (health-gated) =====", flush=True)
|
||||||
|
rc = subprocess.run(
|
||||||
|
[sys.executable, os.path.join(_here(), "warm_reconcile.py"), app]
|
||||||
|
).returncode
|
||||||
|
print(f"nightly: reconcile {app} rc={rc}", flush=True)
|
||||||
|
|
||||||
|
|
||||||
|
def sweep() -> int:
|
||||||
|
recipes = canonical.enrolled_recipes()
|
||||||
|
print(f"\n===== nightly cold sweep: enrolled canonicals = {recipes} =====", flush=True)
|
||||||
|
results: dict[str, int] = {}
|
||||||
|
for r in recipes:
|
||||||
|
print(f"\n===== nightly: full-cold {r} (latest) =====", flush=True)
|
||||||
|
env = dict(os.environ, RECIPE=r)
|
||||||
|
env.pop("REF", None) # latest, not a PR head
|
||||||
|
env.pop("CCCI_QUICK", None)
|
||||||
|
env.pop("MODE", None)
|
||||||
|
rc = subprocess.run(
|
||||||
|
[sys.executable, os.path.join(_here(), "run_recipe_ci.py")], env=env
|
||||||
|
).returncode
|
||||||
|
results[r] = rc
|
||||||
|
print(f"nightly: {r} rc={rc} ({'green→canonical refreshed' if rc == 0 else 'red'})", flush=True)
|
||||||
|
print("\n===== nightly sweep summary =====", flush=True)
|
||||||
|
for r, rc in results.items():
|
||||||
|
print(f" {r}: {'PASS' if rc == 0 else 'FAIL'}", flush=True)
|
||||||
|
return 0 # the sweep itself succeeds; per-recipe reds are reported, not fatal
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
if _another_run_active():
|
||||||
|
print("nightly: a run_recipe_ci.py is active — skipping this nightly (defer)", flush=True)
|
||||||
|
return 0
|
||||||
|
roll_warm_infra()
|
||||||
|
return sweep()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
@ -59,3 +59,18 @@ def test_registry_roundtrip(tmp_path, monkeypatch):
|
|||||||
# the file is valid JSON on disk
|
# the file is valid JSON on disk
|
||||||
with open(canonical.registry_path("custom-html")) as f:
|
with open(canonical.registry_path("custom-html")) as f:
|
||||||
assert json.load(f)["status"] == "warm"
|
assert json.load(f)["status"] == "warm"
|
||||||
|
|
||||||
|
|
||||||
|
def test_enrolled_recipes_scans_meta(tmp_path, monkeypatch):
|
||||||
|
# enrolled_recipes() lists recipes whose tests/<r>/recipe_meta.py sets WARM_CANONICAL=True.
|
||||||
|
fake_harness = tmp_path / "runner" / "harness"
|
||||||
|
fake_harness.mkdir(parents=True)
|
||||||
|
monkeypatch.setattr(canonical, "__file__", str(fake_harness / "canonical.py"))
|
||||||
|
for name, body in (("aaa", "WARM_CANONICAL = True\n"),
|
||||||
|
("bbb", "DEPS=['x']\n"),
|
||||||
|
("ccc", "WARM_CANONICAL = True\n")):
|
||||||
|
d = tmp_path / "tests" / name
|
||||||
|
d.mkdir(parents=True)
|
||||||
|
(d / "recipe_meta.py").write_text(body)
|
||||||
|
(tmp_path / "tests" / "ddd").mkdir(parents=True) # no recipe_meta.py at all
|
||||||
|
assert canonical.enrolled_recipes() == ["aaa", "ccc"]
|
||||||
|
|||||||
Reference in New Issue
Block a user