review(1b): ✅ RL6 PASS + Adversary FINAL SIGN-OFF — git mv my REVIEW*.md → machine-docs/ (lockstep; Builder moved theirs in 992d87c, README stays root). Watchdog survived (resolve_state prefers machine-docs/; it pinged me from machine-docs/STATUS-1b.md). Refs re-verified (README+install.md updated; no .drone/flake/scripts refs; closure byte-identical 8i3jcad9 unaffected). ALL RL1-RL6 Adversary-PASS, no VETO — Builder cleared to write ## DONE
This commit is contained in:
287
machine-docs/REVIEW-1b.md
Normal file
287
machine-docs/REVIEW-1b.md
Normal file
@ -0,0 +1,287 @@
|
||||
# REVIEW — Phase 1b (review & lint pass) — Adversary ledger
|
||||
|
||||
**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase1b-review-lint.md`
|
||||
**Loop state for THIS phase:** STATUS-1b / BACKLOG-1b / JOURNAL-1b (Builder) · **REVIEW-1b (Adversary, this file)** · DECISIONS.md shared.
|
||||
Phase-1 STATUS.md/BACKLOG.md/REVIEW.md and the Phase-1c `*-1c.md` files are HISTORY — not this phase's state.
|
||||
|
||||
This phase the Adversary is **also the white-box reviewer** (§3 checklist), so this ledger holds both
|
||||
white-box review findings and the eventual cold RL3 re-verification of D1–D10.
|
||||
|
||||
DoD I must independently confirm (RL1 lint-in-CI-green · RL2 §3 checklist run, blocking fixed · **RL3
|
||||
full cold D1–D10 re-verify — the final gate** · RL4 docs). Order per §2: tooling → review fixes → *then*
|
||||
RL3. **Cardinal rule:** never weaken a test to satisfy a lint/review nit; RL3 must confirm cleanup
|
||||
softened/skipped/regressed nothing.
|
||||
|
||||
---
|
||||
|
||||
## Phase-1b orientation @2026-05-27 (Adversary cold start)
|
||||
- Pulled clean; Phase 1c is signed-off DONE (commit 6d2bc3d). Phase 1b kicked off by operator (manual transition).
|
||||
- Builder has **not yet seeded** STATUS-1b/BACKLOG-1b/JOURNAL-1b and has not claimed W0. No gate pending.
|
||||
- I began the independent white-box §3 review immediately (it's my role this phase and needs no Builder gate).
|
||||
|
||||
## White-box §3 prep pass #1 @2026-05-27 — over post-1c codebase (PRE-cleanup baseline; advisory until RL3)
|
||||
Recording the baseline state *before* any W0/W1 cleanup, so I can later confirm the cleanup regressed nothing.
|
||||
|
||||
- **Tests are real** — PASS (provisional). Swept all 6 recipe suites (custom-html, lasuite-docs, keycloak,
|
||||
matrix-synapse, n8n, cryptpad) × install/upgrade/backup + conftest + runner/harness. No
|
||||
`@pytest.mark.skip/xfail/skipif`, no commented-out asserts, no tautologies. Install tests assert real
|
||||
app content (matrix: parses `versions` JSON non-empty; keycloak: admin DOM; others: Playwright body).
|
||||
Upgrade tests deploy v(n-1) → write marker → upgrade → assert exact marker survives. Backup tests
|
||||
establish+verify state → backup → mutate+verify → restore → assert exact pre-mutation state (keycloak
|
||||
deletes a realm). **Watch-item (to re-check black-box at RL3):** every upgrade test has a *conditional*
|
||||
`pytest.skip()` when no previous published version exists (e.g. custom-html test_upgrade.py:17-18). Valid
|
||||
by design, but if it ALWAYS skips, the upgrade stage would be silently fake — RL3 must confirm the
|
||||
upgrade stage actually RUNS (prev version found) for ≥1 recipe, not just skips. (1c E2E exercised this.)
|
||||
- **Server state Nix-declared & idempotent** — PASS (provisional). No `.bootstrapped`/run-once sentinels in
|
||||
modules/ or scripts/ (grep clean). Convergence/oneshot pattern per §9 to be re-read fully in pass #2.
|
||||
- **No footguns / sleep** — PASS (provisional). All `time.sleep()` in runner/harness/lifecycle.py (147,157,
|
||||
212,238) + bridge.py (280) are **poll-loop intervals inside `while time.time() < deadline:` loops**, not
|
||||
bare readiness waits. `wait_healthy` polls converge-then-HTTP with timeouts. Teardown (lifecycle.py:215)
|
||||
is correctly ordered (undeploy → `docker stack rm` fallback → volumes/secrets while .env exists → drop
|
||||
.env last), retries volume removal, and **verifies residual is empty (raises TeardownError otherwise)**.
|
||||
- **No secrets in code/committed files** — PASS (provisional). Grep for inline passwords/tokens/private-key
|
||||
blocks across *.py/*.nix/*.sh/*.yml clean (only env/file references + generators). Full leak re-verify
|
||||
(incl. published logs + dashboard, and generated app passwords) belongs to RL3 D6.
|
||||
|
||||
Still owed in white-box pass #2 (after I read the rest): **harness DRY** (recipe quirks in shared harness,
|
||||
not per-recipe copy-paste), **log redaction real** (bridge/dashboard/log pipeline), **architecture matches
|
||||
plan** (layout/§3, poll-primary trigger §4.1, traefik-is-coop-cloud-recipe §4.2; drift → DECISIONS.md).
|
||||
|
||||
## W0 (RL1 — lint/format tooling + green) : **PASS** @2026-05-27 (Adversary cold)
|
||||
Gate claimed in STATUS-1b. Acceptance: clean checkout → `nix develop .#lint --command bash
|
||||
scripts/lint.sh` → `lint: PASS`; lint stage wired in `.drone.yml` push pipeline. **Verified cold,
|
||||
independently** (no nix on sandbox; ran on cc-ci over a *pristine* tree, not the Builder's working copy):
|
||||
|
||||
- **Cold checkout = exact reviewed SHA.** `git archive 233939a` (= my `origin/main` HEAD) piped to
|
||||
cc-ci → `/tmp/ccci-cold` (clean tree, no untracked/cached state, secrets submodule empty as lint
|
||||
excludes it). Not cloned from `/root/cc-ci` (that's a non-git plain copy) — archived from my own clone.
|
||||
- **Lint PASS cold.** `HOME=/root nix develop .#lint --command bash scripts/lint.sh` → **exit 0,
|
||||
`lint: PASS`.** All 8 linters ran clean: nixpkgs-fmt (0/14 reformat), statix, deadnix, ruff format
|
||||
(32 files), ruff check (all passed), shfmt, shellcheck, yamllint.
|
||||
- **Stage real, not rigged.** `scripts/lint.sh` genuinely invokes each linter in check mode and
|
||||
accumulates a `fail` flag → `exit "$fail"` (correct `set -uo pipefail`, no `-e`, so all run). The
|
||||
`.drone.yml` `self-test` push pipeline runs the *exact* command `nix develop .#lint --command bash
|
||||
scripts/lint.sh` and FAILs the build on non-zero. Toolchain pinned from nixpkgs in `flake.nix`
|
||||
(`devShells.lint`), so CI == local.
|
||||
- **Gate has TEETH (break-it probe).** Injected violations into the cold tree (a `.py` with
|
||||
`import os,sys` + `x=1+2`, and a mis-formatted `.nix`) → re-ran lint → **exit 1, `lint: FAIL`**
|
||||
(ruff E401/I001/F401 + nixpkgs-fmt). So the stage is not vacuously green.
|
||||
|
||||
Verdict: **W0 PASS.** Builder may proceed to W1.
|
||||
Advisory (not W0-blocking; re-confirm at RL3): Builder notes the Gitea→Drone *push* webhook is flaky
|
||||
(§4.1), so the lint stage may not auto-fire as a real Drone build on every push — RL1's intent
|
||||
("future commits stay clean") depends on that path actually firing. The stage IS wired and proven
|
||||
green via its exact command; I'll confirm a real push triggers the Drone lint build when I re-verify
|
||||
M2/D-gates at RL3 (it overlaps). Not filing a finding now — bounded phase, acceptance-as-stated is met.
|
||||
|
||||
## White-box §3 pass #2 @2026-05-27 (Adversary, post-W0 formatted code) — RL2 input
|
||||
Remaining §3 checklist items. **No blocking findings.**
|
||||
|
||||
- **Harness is DRY** — PASS. Recipe quirks live in shared harness + per-recipe *declarative* metadata
|
||||
(`tests/<recipe>/recipe_meta.py`: HEALTH_PATH/HEALTH_OK/timeouts/EXTRA_ENV), consumed uniformly by
|
||||
`tests/conftest.py` (`_recipe_meta`, `deployed`/`deployed_app` fixtures) and
|
||||
`runner/harness/lifecycle.py` (`_recipe_extra_env`). **No `if recipe == "..."` branches in the shared
|
||||
harness** (the M6.5 no-surgery rule holds). Recipe-specific logic is isolated to that recipe's dir
|
||||
(e.g. keycloak `kc_admin.py`, cryptpad's derived SANDBOX_DOMAIN). Only smell: the ~6-8-line `old_app`
|
||||
upgrade fixture is copy-pasted across recipes — thin boilerplate over shared metadata; **advisory**,
|
||||
not a violation (factoring it would just add another per-recipe injection point). → IDEAS, not blocking.
|
||||
- **Architecture matches plan** — PASS. §4.1 trigger is **poll-primary** (`bridge/bridge.py` `poll_loop`
|
||||
runs unconditionally every ≤60s; webhook is optional + dedup'd by comment id; exact trimmed `!testme`;
|
||||
commenter-auth via read-level `GET /orgs/{owner}/members/{user}` 204=allow, fail-closed). §4.2 Traefik
|
||||
is the **real coop-cloud/traefik recipe via abra** (`modules/proxy.nix`: `abra recipe fetch/app new
|
||||
traefik`, `WILDCARDS_ENABLED=1`, `compose.wildcard.yml`, `LETS_ENCRYPT_ENV=""` → no ACME, cert as
|
||||
`ssl_cert`/`ssl_key` swarm secrets) — no hand-rolled traefik.nix. §3 layout matches.
|
||||
- **Server state Nix-declared & idempotent** — PASS. `modules/proxy.nix` `deploy-proxy` is
|
||||
`Type=oneshot`+`RemainAfterExit`, re-runs every activation and converges (insert secret only if
|
||||
absent, deploy). No `.bootstrapped`/run-once sentinels anywhere (grep clean, pass #1). Leans on 1c's
|
||||
already-proven D8 (byte-identical closure + live throwaway rebuild, no manual post-step).
|
||||
- **Log redaction is real** — PASS for infra secrets; **one advisory gap to verify behaviorally at
|
||||
RL3/D6.** `runner/run_recipe_ci.py` `_redact_values()` reads `/run/secrets/*` (≥8-char values) and
|
||||
`run_stage_redacted()` masks them in live-streamed stage output (sorted longest-first → no partial
|
||||
leak). **But class-B *generated app passwords* are NOT under `/run/secrets/*`, so they are NOT in the
|
||||
`_REDACT` list** — their non-leak rests entirely on the "harness never prints them / abra doesn't echo
|
||||
generated ones" assumption (code comment, run_recipe_ci.py:59-60). Also: the runner's *own* stdout
|
||||
(the `cc-ci-run …` Drone step) bypasses `run_stage_redacted`. This is exactly what my behavioral D6
|
||||
leak test must catch at RL3 (grep published Drone logs **and** the dashboard for a known generated app
|
||||
password). Phase-1 D6 passed that test once; recording the white-box shape so RL3 re-checks it, not a
|
||||
new blocking finding. → **WATCH-ITEM for RL3/D6.**
|
||||
- **Readability / docs accuracy** — advisory; defer to RL4 (docs) + the ruff/lint pass already covers
|
||||
dead code / style deterministically.
|
||||
|
||||
**Net of §3 white-box review (RL2 input): no blocking findings; 2 advisories** (old_app copy-paste →
|
||||
IDEAS; app-secret redaction → RL3/D6 watch-item). I expect Builder's W1 to be light. I have NOT filed
|
||||
`[adversary]` BACKLOG items since nothing is blocking — will file if W1/RL3 surfaces a real defect.
|
||||
|
||||
## Operator added RL5 + RL6 (plan §7, 2026-05-27) — both BLOCKING for 1b DONE. Noted; verification plan:
|
||||
- **RL5** (Builder moves; Adversary verifies cold): `modules/`→`nix/modules/`, `hosts/`→`nix/hosts/`;
|
||||
`flake.nix`/`flake.lock` STAY at root so build ref `#cc-ci` is unchanged; fix flake internal paths +
|
||||
`.drone.yml`/scripts refs; update `docs/architecture.md`. **Verification folds into RL3:** a fresh
|
||||
recursive clone must still rebuild **byte-identical to the running system** (toplevel store hash WILL
|
||||
change — expected; what must hold is build==running + reproducible). I'll re-confirm cold at RL3.
|
||||
- **RL6** (coordinated near-END-of-1b): move `STATUS*/REVIEW*/JOURNAL*/BACKLOG*/DECISIONS.md` →
|
||||
`machine-docs/`; **README.md stays at root** (operator decision — human readme, not protocol). Update
|
||||
ALL refs (cc-ci-plan plans, AGENTS.md, .drone.yml, scripts). I verify refs updated + nothing broken.
|
||||
⚠ **CAVEAT affecting ME:** the watchdog (`launch.sh`) reads `STATUS-<id>.md`/`REVIEW-<id>.md` at repo
|
||||
ROOT for handoffs/transitions — moving breaks it until launch.sh updated + watchdog restarted IN
|
||||
LOCKSTEP (orchestrator handles that). So **I keep writing REVIEW-1b.md at root until the coordinated
|
||||
cutover**, and at that moment I `git mv` my own REVIEW files (single-writer rule) in lockstep. Will NOT
|
||||
move them unilaterally or while a phase transition is pending.
|
||||
|
||||
## RL2 (§3 white-box checklist) : **PASS** @2026-05-27 (Adversary)
|
||||
My white-box passes #1+#2 found **no blocking findings**; Builder's own §3 self-review agrees. Advisories
|
||||
triaged (old_app copy-paste → IDEAS; generated-app-secret redaction → RL3/D6 watch-item). RL2 confirmed.
|
||||
|
||||
## RL5 (nix/ consolidation) — structural PASS @2026-05-27; build-proof folds into RL3 below
|
||||
- `modules/` and `hosts/` **gone from root**; `nix/modules/` (12 .nix) + `nix/hosts/cc-ci/`
|
||||
(configuration.nix, hardware.nix) present; **`flake.nix` + `flake.lock` stay at root** (build ref
|
||||
`#cc-ci` unchanged). `flake.nix` imports `./nix/hosts/cc-ci/configuration.nix`. **No dangling
|
||||
`./modules`/`./hosts` refs** in flake.nix/.drone.yml/scripts (grep clean). docs/architecture.md +
|
||||
DECISIONS updated per Builder. The "flake still evaluates + builds byte-identical with new paths" proof
|
||||
= the cold rebuild in RL3 (below).
|
||||
|
||||
## RL3 (final gate) — IN PROGRESS @2026-05-27 (Adversary cold). Re-verifying all D1–D10; partial so far:
|
||||
- **Cardinal rule — tests NOT weakened : PASS.** Diffed every `tests/**/test_*.py` + `runner/harness/`
|
||||
between pre-1b (`6d2bc3d`, the 1c-DONE commit) and HEAD. **Every change is ruff line-wrapping only** —
|
||||
assertion predicates, comparison operators (`==`, `in`), expected values, marker/SQL strings, and
|
||||
`wait_healthy` params are all byte-for-byte preserved (verified by reading the `-w` diff in full). **No
|
||||
assertion removed/softened, no `pytest.skip`/`xfail`/`assert True` added, no `test_` fn deleted.** The
|
||||
format+RL5 cleanup regressed no test logic.
|
||||
- **System health (cc-ci canonical) : confirmed.** `readlink /run/current-system` ==
|
||||
`8i3jcad9mrr01558lqckpi26nxn2ra3m-nixos-system-…50ab793` (matches claim); `systemctl is-system-running`
|
||||
→ **running**; 5 infra stacks up (traefik[2 svc]/drone/ccci-bridge/ccci-dashboard/backups), no leftover
|
||||
test app (idle). [Note: "6 stacks" in 1c included a transient test app; 5 infra stacks is the idle baseline.]
|
||||
- **D8 + RL5 byte-identical cold rebuild : PASS @2026-05-27 (Adversary cold, independent).** On cc-ci:
|
||||
fresh `git clone --recurse-submodules` of origin to `/tmp/ccci-rl3` (HEAD `aa120d1`, submodule `secrets`
|
||||
@`2312f1c` clean, `secrets/secrets.yaml` present) → `nixos-rebuild build --flake
|
||||
"git+file:///tmp/ccci-rl3?submodules=1#cc-ci"` → **toplevel `8i3jcad9mrr01558lqckpi26nxn2ra3m…` ==
|
||||
running** (byte-identical, build==running). Proves D8 (reproducible from a fresh clone) **and** RL5 (new
|
||||
`nix/` layout evaluates+builds, `#cc-ci` ref unchanged). Sanity: a build *without* `?submodules=1` fails
|
||||
`secrets/secrets.yaml does not exist` — confirms secrets genuinely come from the submodule, not baked in.
|
||||
Token used via transient `-c http.extraHeader` (not persisted in clone config — verified); temp clone removed.
|
||||
### Fresh live `!testme` e2e #1 — custom-html PR#2 (build #151, @2026-05-27) — D1/D2/D3/D7 PASS
|
||||
Posted exact `!testme` (comment 13743, authorized org-member bot) @20:33:16Z. Bridge (poll 30s) →
|
||||
**build #151** for PR-head `db9a9502`.
|
||||
- **D1 PASS** — triggered build for the PR head, **latency 20s** (<60s). Other comments don't trigger
|
||||
(only `!testme` matched; verified historically + exact-match code). Re-commenting re-ran (PR comment
|
||||
links to #151, an earlier identical comment linked to an older run #4 → re-run confirmed).
|
||||
- **D2 PASS** — install/upgrade/backup ran as **separate reported stages, all green**: install 2 passed
|
||||
(incl. playwright) 68.7s; **upgrade `test_upgrade_preserves_data` PASSED 24.8s — it actually RAN, not
|
||||
skipped** (resolves the pass#1 conditional-skip watch-item); backup `test_backup_mutate_restore` PASSED
|
||||
42.9s. Real abra deploy/upgrade/backup-restore, no mocks.
|
||||
- **D3 PASS** — `test_playwright_page PASSED` (real browser against the live app).
|
||||
- **D7 PASS** — bridge posted to PR#2: `run for custom-html @ db9a9502 ✅ passed →
|
||||
drone.../cc-ci/151` (run link + outcome). Dashboard `ci.commoninternet.net` overview renders custom-html
|
||||
→ `success` (YunoHost-CI-like badges; title "cc-ci — Co-op Cloud recipe CI").
|
||||
- **D6 infra-secret leak : PASS** — fetched #151 published step log; grepped each `/run/secrets/*` value
|
||||
(bridge gitea/drone tokens, drone_rpc_secret, webhook_hmac, drone_gitea_client_secret, test_secret,
|
||||
wildcard_cert, wildcard_key): **0 matches each**; no echoed generated values / private keys; dashboard
|
||||
is a 21-line static status overview (structurally carries no secrets). (custom-html generates no app
|
||||
secrets, so the class-B app-password path is tested by e2e #2 below.)
|
||||
|
||||
### D6 generated-app-secret WATCH-ITEM — RESOLVED (white-box) + behavioral check in flight
|
||||
White-box: `harness/abra.py` `secret_generate()` runs `abra app secret generate … -m` via `_run()`,
|
||||
which `subprocess.run(capture_output=True)` — **the output (which holds the generated values) is
|
||||
captured and never printed** (`check=False`, so no failure path re-emits it). So generated app secrets
|
||||
never reach the Drone log → that's *why* the proactive `_REDACT` (infra-only) gap is not a real leak.
|
||||
Residual advisory (theoretical): a `check=True` abra cmd that FAILS embeds its stdout/stderr in the
|
||||
raised `AbraError` msg, which pytest would print — only on failure, and abra status output isn't secret
|
||||
values; low risk, noting it. **Behavioral confirmation in flight:** e2e #2 = keycloak PR#1 (generates an
|
||||
admin password readable at `/run/secrets/admin_password`); watcher captures that exact value mid-run then
|
||||
greps the published log + dashboard for it (expect 0). Result logged on completion.
|
||||
|
||||
### D4/D5/D8/D9/D10 — RL3 status
|
||||
- **D4 (recipe-local tests)** — discovery logic in `run_recipe_ci.py` is **byte-identical** (formatting-
|
||||
only) to the Phase-1 D4-passed version; custom-html ships no own `tests/`. Carried-forward; will note if
|
||||
the keycloak run exercises recipe-local discovery.
|
||||
- **D5 (per-recipe tree + enroll)** — **PASS.** 6 trees present (custom-html/cryptpad/keycloak/lasuite-
|
||||
docs/matrix-synapse/n8n) + `conftest.py`; **no test files deleted in 1b** (`git diff --diff-filter=D
|
||||
6d2bc3d..HEAD -- tests/` empty); enroll documented in `docs/enroll-recipe.md` ("Copy from an existing
|
||||
recipe e.g. tests/custom-html/…", no-harness-surgery). Advisory: plan §3's literal `tests/_template/`
|
||||
was **never created** (didn't exist pre-1b either — copy-existing-recipe used instead); pre-1b deviation,
|
||||
should be in DECISIONS — minor, not a 1b blocker.
|
||||
- **D8 (reproducible server)** — **PASS** (byte-identical cold rebuild above).
|
||||
- **D9 (docs)** — **PASS.** All 6 docs present (architecture/baseline/enroll-recipe/install/runbook/
|
||||
secrets); README has the RL4 lint section (local + CI-enforced); `architecture.md` updated to the
|
||||
`nix/` layout (RL4/RL5) and the 1c secrets model.
|
||||
- **D10 (breadth, 6 recipes)** — IN PROGRESS. Stance: test code + shared harness are **byte-identical**
|
||||
(formatting-only) and the **closure is byte-identical** to the one that produced the Phase-1/1c six-
|
||||
recipe green runs, so breadth carries forward; the cleanup-regression risk is covered by 2 **fresh**
|
||||
category-spanning green runs (custom-html=simple ✅ #151; keycloak=SSO/DB in flight). Will record the
|
||||
carry-forward set + this reasoning; can run additional recipes (sequentially) if the operator wants all
|
||||
6 fresh.
|
||||
|
||||
### Fresh live e2e #2 — keycloak PR#1 (build #152) — heavy SSO/DB recipe, D1/D2/D3 + D6-behavioral
|
||||
- **D1** — build #152, **latency 8s**. **D2** — full 3 stages green on a heavyweight SSO/DB recipe:
|
||||
install (`test_realm_endpoint_healthy` + `test_playwright_admin_login`, 446s), upgrade
|
||||
(`test_upgrade_preserves_realm`, 484s — **ran**), backup (`test_backup_mutate_restore`, 488s).
|
||||
**D3** — playwright admin-login. Real keycloak + postgres, generated admin password + DB secrets.
|
||||
- **D6 behavioral (app-secret) — PASS.** keycloak generated an admin password (`/run/secrets/admin_password`)
|
||||
+ DB creds during the run; published #152 log shows **0**: BEGIN-PRIVATE-KEY, password assignments,
|
||||
echoed `admin_password`, secret-generate output, or standalone high-entropy tokens. **Wildcard cert+key
|
||||
leak re-checked PROPERLY** (my first grep mis-parsed the multi-line PEM as a flag — fixed; interior
|
||||
base64 line grep): **0 matches in BOTH #151 and #152**. (Self-note: the buggy grep dumped the wildcard
|
||||
key into a sandbox /tmp task file — deleted immediately; never in repo/published/dashboard.)
|
||||
- **D2 teardown guarantee — PASS.** After both runs: **no** orphaned `*-pr*` stacks/volumes/secrets;
|
||||
system `running`, canonical still byte-identical `8i3jcad9`.
|
||||
|
||||
## ✅ RL3 — FULL COLD D1–D10 RE-VERIFICATION : **PASS** @2026-05-27 (Adversary). Nothing weakened.
|
||||
All re-verified on the **cleaned + RL5 byte-identical closure** (`8i3jcad9`==running==fresh-clone build),
|
||||
fresh evidence <24h. The lint/format + `nix/` refactor regressed nothing.
|
||||
|
||||
| D | Verdict | Evidence |
|
||||
|---|---|---|
|
||||
| D1 trigger | PASS | `!testme`→#151 (20s), #152 (8s); exact-match; re-comment re-ran |
|
||||
| D2 matrix | PASS | custom-html + keycloak: install/upgrade/backup all green as separate stages; **upgrade actually ran** (not skipped); real abra deploy; teardown left no orphans |
|
||||
| D3 py+playwright | PASS | playwright assertions green in both runs |
|
||||
| D4 recipe-local | PASS (carry-fwd) | discovery code byte-identical (formatting-only) to Phase-1 D4-PASS impl |
|
||||
| D5 test tree | PASS | 6 trees + `conftest`; enroll doc; **no tests/ files deleted in 1b** |
|
||||
| D6 secrets | PASS | 8/8 infra-secret values + wildcard cert/key + generated keycloak admin pw: **0** in logs/dashboard; white-box: `secret_generate` output captured-never-printed |
|
||||
| D7 results UX | PASS | PR comment w/ run link + ✅passed; dashboard overview renders recipe statuses |
|
||||
| D8 reproducible | PASS | fresh recursive clone → `nixos-rebuild build …?submodules=1#cc-ci` → toplevel `8i3jcad9`==running |
|
||||
| D9 docs | PASS | 6 docs present; README lint section (RL4); architecture.md = `nix/` layout + 1c secrets model |
|
||||
| D10 breadth | PASS | 2 **fresh** category-spanning green runs (custom-html=simple #151; keycloak=SSO/DB #152) + carry-forward of the Phase-1 Adversary-verified **6/6** set (cryptpad/lasuite-docs/matrix-synapse/n8n, builds #84–#108) — test+harness+closure byte-identical, so breadth holds; cleanup-regression risk covered by the 2 fresh runs |
|
||||
| Cardinal rule | PASS | `6d2bc3d..HEAD` test diff is ruff line-wrapping only — no assertion/skip/test-fn change |
|
||||
| RL5 | PASS | nix/ layout, flake at root (#cc-ci ref unchanged), byte-identical rebuild |
|
||||
|
||||
**Note on D10 scope:** I did **not** re-run all 6 recipes fresh — that would be gold-plating against the
|
||||
bounded-phase discipline, since the 4 carried recipes use the **byte-identical** harness/test code against
|
||||
the **byte-identical** closure that produced their Phase-1 green runs, so a re-run carries ~zero regression
|
||||
signal beyond the 2 fresh runs already done. If the operator wants strict 6/6-fresh, I can run the
|
||||
remaining 4 sequentially on request.
|
||||
|
||||
## ✅ RL6 — protocol files → `machine-docs/` : **PASS** @2026-05-27 (Adversary, lockstep cutover)
|
||||
The coordinated cutover executed cleanly:
|
||||
- **Orchestrator lockstep done.** `cc-ci-plan/launch.sh` now has `resolve_state()` (lines 67-69) that
|
||||
**prefers `machine-docs/<file>` and falls back to root** — so the watchdog survives the move and stays
|
||||
move-agnostic. Proof it works post-move: the watchdog **pinged me for the RL6 gate from
|
||||
`machine-docs/STATUS-1b.md`** (it read the moved file). Handoff intact.
|
||||
- **Builder moved** (commit 992d87c): `STATUS*.md`/`BACKLOG*.md`/`JOURNAL*.md` (3 each) + `DECISIONS.md`
|
||||
→ `machine-docs/`. **README.md correctly LEFT at repo root** (operator decision).
|
||||
- **Adversary moved** (this commit, single-writer rule): `REVIEW-1b.md` + `REVIEW.md` + `REVIEW-1c.md`
|
||||
→ `machine-docs/`. Root now holds only `README.md` (+ flake/nix/code); no protocol file left at root.
|
||||
- **References re-verified.** README "Loop state" section updated → "lives under **`machine-docs/`**";
|
||||
`docs/install.md` → `machine-docs/DECISIONS.md`. **No** `.drone.yml` / `scripts/` / `flake.nix` /
|
||||
`nix/hosts` references to protocol files (grep clean) ⇒ the **build closure is unaffected** (cc-ci
|
||||
still `running`, byte-identical `8i3jcad9` — RL6 is a repo-doc move, touches no nix input).
|
||||
- **Trivial advisory (non-blocking):** 4 `See DECISIONS.md` **bare-name** comment refs in
|
||||
`nix/modules/{drone,drone-runner,proxy}.nix` aren't path-qualified to `machine-docs/` — but they were
|
||||
never path-qualified pre-move (always bare "DECISIONS.md"), the file is still findable by name, and
|
||||
README states its location. Optional tidy (prefix `machine-docs/`), not an RL6 failure. → IDEAS.
|
||||
|
||||
Verdict: **RL6 PASS.**
|
||||
|
||||
## 🏁 ADVERSARY FINAL SIGN-OFF — Phase 1b : ALL RL1–RL6 Adversary-PASS @2026-05-27. **NO VETO.**
|
||||
| RL | Verdict |
|
||||
|---|---|
|
||||
| RL1 lint/format in CI + green | ✅ PASS (cold, with break-it teeth) |
|
||||
| RL2 §3 white-box checklist | ✅ PASS (no blocking findings) |
|
||||
| RL3 full cold D1–D10 re-verify | ✅ PASS (nothing weakened; byte-identical closure; 2 fresh e2e; leak-clean) |
|
||||
| RL4 docs | ✅ PASS |
|
||||
| RL5 nix/ consolidation | ✅ PASS (byte-identical rebuild) |
|
||||
| RL6 machine-docs/ move | ✅ PASS (watchdog-survived lockstep) |
|
||||
|
||||
No open `[adversary]` findings; advisories triaged to IDEAS (old_app copy-paste; `_template` deviation;
|
||||
bare-name DECISIONS refs) + one documented RL1 advisory (flaky Gitea→Drone *push* webhook — lint stage is
|
||||
wired + proven via its exact command, auto-fire needs the operator's webhook; non-blocking). **The Builder
|
||||
is cleared to write `## DONE` to `machine-docs/STATUS-1b.md`.** Once DONE is written, the DONE handshake
|
||||
holds (every RL has a <24h Adversary PASS, no VETO) and the 1b loop terminates.
|
||||
147
machine-docs/REVIEW-1c.md
Normal file
147
machine-docs/REVIEW-1c.md
Normal file
@ -0,0 +1,147 @@
|
||||
# REVIEW-1c.md — Adversary ledger for Phase 1c (Full reproducibility + genuine D8 live rebuild)
|
||||
|
||||
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase1c-full-reproducibility.md`
|
||||
Definition of Done: **C1–C7** (each must be Adversary-verified cold within 24h before DONE).
|
||||
|
||||
- **C1** — Secrets-repo split (`cc-ci-secrets` private repo, secrets-only, consumed via flake input; base stays one well-parameterized repo; `nixosConfigurations.cc-ci` still byte-identical to running).
|
||||
- **C2** — Cert in git (wildcard cert+key are sops secrets in `cc-ci-secrets`, decrypted at activation; "operator drops a cert file" step gone; rebuild serves valid TLS from git-sourced cert).
|
||||
- **C3** — All secrets in git, one exception (only out-of-band secret = bootstrap age key; everything else sops-encrypted in git).
|
||||
- **C4** — Genuine throwaway-VM live rebuild (blank NixOS VM in `terraform-ci`, only bootstrap age key provisioned; clone base+secrets, `nixos-rebuild switch`, oneshots converge, cert+secrets decrypt, no manual step outside `docs/install.md`; Adversary performs cold).
|
||||
- **C5** — Honest D8 (evidence rewritten: static byte-identical closure + live throwaway rebuild; "infeasible by design" removed; any limitation narrow + Adversary-signed-off).
|
||||
- **C6** — Resource fit + cleanup (`cc-nix-test` 6→4 GB; throwaway VM at 4 GB; ≤~12 GB running guideline; throwaway destroyed after test; final sizing recorded in DECISIONS.md).
|
||||
- **C7** — Docs (install.md/secrets.md/architecture.md + plan refs updated to new model; fresh engineer can stand up an instance).
|
||||
|
||||
Mapping to method milestones: W1→C6(headroom), W2→C1/C2/C3, W3→C4(VM), W4→C4(rebuild), W5→C4/C5(cold proof+honest D8), W6→C6/C7(cleanup+docs).
|
||||
|
||||
Standing rules: verify every claim from a COLD START (fresh shell, own clone, no cached state). Re-run the acceptance check myself. Veto power: `## VETO <reason>` forbids DONE until cleared.
|
||||
|
||||
---
|
||||
|
||||
## Cold-start baseline @2026-05-27 (Phase 1c kickoff)
|
||||
|
||||
Adversary loop entered. Observations from cold start:
|
||||
- `git pull --rebase` → up to date @ `492fa23` (Phase-1 DONE sign-off). **No Phase-1c state files yet** (STATUS-1c.md / BACKLOG-1c.md / JOURNAL-1c.md absent) — Builder has not begun 1c bootstrap. Nothing CLAIMED.
|
||||
- `ssh cc-ci 'hostname && systemctl is-system-running'` → `nixos` / `running` (healthy, pre-refactor baseline).
|
||||
- SOCKS proxy `127.0.0.1:1055` and `ssh cc-ci` working. Incus skill present at `/srv/incus-terraform-nix-vm-creator/skills/incus-terraform/SKILL.md`.
|
||||
|
||||
No gates to verify yet. Idling until the Builder seeds 1c state and claims the first gate (watchdog will ping on CLAIM). Will keep break-it probes ready (greps for plaintext secrets in base + store; cert-in-git decrypt path; byte-identical drift; throwaway-VM rebuild cold-repro).
|
||||
|
||||
## Pre-W2 cold baselines @2026-05-27 16:10Z (reference values for verifying C1/C2/C3 after W2)
|
||||
|
||||
Builder has bootstrapped 1c state; **W2 in flight, not yet CLAIMED**. Decisions recorded by Builder (DECISIONS.md): secrets linkage = **git submodule** (deviates from flake-input default — rationale: no private-repo fetch cred at nix-eval, keeps `defaultSopsFile` a local path = minimal change + trivially byte-identical); bootstrap key for throwaway = **recovery age key via `sops.age.keyFile`**.
|
||||
|
||||
Reference values to compare against after W2:
|
||||
- **C1 byte-identical** — running system toplevel: `/nix/store/m1pdvbhlmlj3x3gn0x83rgwcgssks7qs-nixos-system-nixos-24.11.20250630.50ab793` (booted: `09ia5qd0jw0nghx83b4fijcg2jak9cp4-…`). nixos-version `24.11.20250630.50ab793 (Vicuna)`. After the refactor, `nixos-rebuild build .#cc-ci` must produce the **same** toplevel (pure structural move ⇒ identical closure).
|
||||
- **C2 cert content** — out-of-band cert at `cc-ci:/var/lib/ci-certs/live/`: `fullchain.pem` 2909 B sha256 `c1d96d61a43bfec10716e18d13832bd325ef173e9af01f197a48490481300080`; `privkey.pem` 227 B sha256 `9ec25d00910677718762713717b8c763da46fa7489e292b057e916a252d0ca42` (EC key). After W2 these must be **sops-decrypted from git** to the same path with the **same hashes**, and the operator-cert-drop precondition framing in proxy.nix must be gone.
|
||||
- **C3 no-plaintext** — base repo clean: `secrets/secrets.yaml` is sops `ENC[AES256_GCM,…]`; `git grep` for `BEGIN … PRIVATE KEY|BEGIN CERTIFICATE` outside `secrets/` = 0 matches; no `*.pem/*.key/*.crt/*.p12/*.pfx` tracked. After W2: cert+key must be `ENC[…]` in `cc-ci-secrets`, never plaintext; base must stay clean; also grep the **Nix store** for decrypted secret material at activation.
|
||||
|
||||
Things to scrutinize hard when W2 is CLAIMED:
|
||||
1. Submodule actually points at a **private** `recipe-maintainers/cc-ci-secrets` holding only encrypted secrets (no code/config logic).
|
||||
2. Byte-identical: same toplevel store path (or differences are only expected & explained — zero functional drift).
|
||||
3. Cert genuinely served from the git-sourced cert after switch (live TLS handshake on a `*.ci.commoninternet.net` host), not the stale out-of-band file.
|
||||
4. All D1–D10 still hold after the refactor (no regression) — spot-check the live system health + a `!testme`-path sanity check before DONE.
|
||||
|
||||
## Interim probe @2026-05-27 16:22Z — cc-ci-secrets repo (pre-W2-gate; not a gate verdict)
|
||||
|
||||
Independent cold check of the new secrets repo (Builder W2 step 1, commit `f972bc1`), via Gitea API with bot creds:
|
||||
- `recipe-maintainers/cc-ci-secrets` exists, **`private: True`**, non-empty. Top-level: `.sops.yaml`, `README.md`, `secrets.yaml` (no code / no config logic — matches §2's "encrypted secrets only"; README is doc-only and leak-clean).
|
||||
- `secrets.yaml`: **all 8 keys `ENC[...]`** — 6 infra (test_secret, drone_rpc_secret, drone_gitea_client_secret, bridge_drone_token, bridge_gitea_token, bridge_webhook_hmac) **+ `wildcard_cert` + `wildcard_key`**. **0 plaintext PEM/cert markers**; sops `mac` metadata present. → cert+key genuinely moved into sops-in-git (C2/C3 secrets-side looks good).
|
||||
- Layout nuance: secrets file is at repo **root** `secrets.yaml`; Builder will mount the submodule at base `secrets/` so it resolves to `secrets/secrets.yaml`. OK for the submodule linkage.
|
||||
|
||||
**Not yet verifiable (needs W2 base-switch + activation):** byte-identical build==running (C1), cert sops-**decrypts to the same hashes** at `/var/lib/ci-certs/live/` (C2 — must match fullchain `c1d96d61…`, privkey `9ec25d00…`), no plaintext leak into the **Nix store**, live TLS from git-cert, and no D1–D10 regression. Will run these when **Gate W2** is CLAIMED.
|
||||
|
||||
## W2: PASS @2026-05-27 16:55Z — secrets-split + cert-in-git (verifies C1, C2, C3) — COLD
|
||||
|
||||
Gate W2 CLAIMED by Builder (commits `f972bc1`/`f79e542`/`faa3709`; running toplevel `vh6vwxbl…`). Verified independently from a cold start (fresh clone on cc-ci, own checks, no reliance on the Builder's `/root/cc-ci`):
|
||||
|
||||
**(1) Byte-identical build==running (C1) — PASS.** Fresh recursive clone of `origin/main` (HEAD `0633aa7`) on cc-ci into `/tmp/advverify`, submodule `secrets`→`2312f1c` initialized with bot creds (via `http.extraheader`, not URL/args), `secrets/secrets.yaml` present + `ENC[…]`. `nixos-rebuild build --flake 'git+file:///tmp/advverify?submodules=1#cc-ci'` → `/nix/store/vh6vwxbl4qr9whzpwgjimhf9gn4329p8-nixos-system-…` == `/run/current-system` (`readlink -f` identical). **Zero drift** — the *currently published* repo+submodule reproduces the *currently running* system byte-for-byte. Base stays one parameterized repo; only `secrets/` is the external private submodule.
|
||||
|
||||
**(2) Cert in git + live TLS (C2) — PASS.** `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` are now **symlinks → `/run/secrets/wildcard_cert`,`wildcard_key`** (sops-decrypted at activation), not out-of-band files. File sha256 `c1d96d61…`/`9ec25d00…` == my pre-W2 operator-cert baseline (byte-identical cert, now git-sourced). `secrets.nix` adds `wildcard_cert`(0444)/`wildcard_key`(0400) with a comment that this "Replaces the prior operator-drops-a-cert-file step." Live HTTPS `https://ci.commoninternet.net` via proxy → `http_code=200`, `ssl_verify_result=0`, served leaf = LE `*.ci.commoninternet.net` (SAN `*.ci`+bare), valid 2026-05-26→08-24. **Served leaf fingerprint `57:8D:67:9E:FE:89:…:B8:A6` == the git-sourced cert's leaf fingerprint** (computed locally from the decrypted file) → live TLS provably served from the git cert, full chain of custody intact.
|
||||
|
||||
**(3) No plaintext leak (C3) — PASS.** Base repo: `secrets/` is a gitlink (`.gitmodules`→ private `cc-ci-secrets`); no `*.pem/*.key` tracked; `git grep BEGIN…PRIVATE KEY|CERTIFICATE` outside REVIEW text = 0. `cc-ci-secrets`: all 8 secrets `ENC[…]` (6 infra + cert + key), 0 plaintext PEM, valid sops MAC, private repo. On the host: secrets decrypt to **`/run/secrets.d` (ramfs, in-memory)**, not the world-readable store; no private key found in the system-closure store dirs.
|
||||
|
||||
**Non-regression:** `systemctl is-system-running`=running, **0 failed units**; swarm stack all 1/1 (`traefik` v3.6.15, `drone` 2.26.0, `ccci-bridge`, `ccci-dashboard`, `backups`), `drone-runner-exec` running; reconcile oneshots converged. No D1–D10 regression observed.
|
||||
|
||||
→ **C1, C2, C3 Adversary-PASS** (24h freshness clock starts now; will be re-exercised on the blank host at C4). Remaining for DONE: C4 (genuine throwaway-VM live rebuild), C5 (honest D8), C6 (resize+cleanup), C7 (docs). No VETO.
|
||||
|
||||
## Corroboration @2026-05-27 17:23Z — sops cert re-decrypts at BOOT (after W1 resize-reboot)
|
||||
|
||||
W1 (Builder, `6c03a27`) resized cc-nix-test 6→4 GB and rebooted the live server. Cold spot-check post-reboot: system `running`, 0 failed, mem 3575 MB (≈4 GB applied), live TLS `http_code=200 ssl_verify=0`. Cert symlink target moved `/run/secrets.d/8/` → `/1/` (ramfs wiped on reboot) but `fullchain.pem` sha256 still `c1d96d61…`. → the git-sourced sops cert **re-decrypts byte-identically at boot**, not only at `switch` — strengthens C2 (reproducible from git across a cold boot). No formal gate (W1 has no Adversary gate); W4 = next gate. Builder W3 DONE: throwaway VM reachable `100.126.124.86`.
|
||||
|
||||
## C4/W5 verification standard (set @2026-05-27 17:30Z — read before claiming W4)
|
||||
|
||||
My cold proof of the throwaway-VM live rebuild (C4) will require, and I will REJECT a skipped/faked TLS check:
|
||||
- Rebuilt VM **keeps `DOMAIN = ci.commoninternet.net`** (same instance ⇒ proves the SAME system reproduces). The git cert only covers `*.ci.commoninternet.net` + bare — **do NOT use a `ci2.commoninternet.net` domain** (no `*.ci2` cert ⇒ TLS unverifiable / would be a fake pass).
|
||||
- Fresh VM has a NEW tailnet IP; public DNS for `*.ci.commoninternet.net` → gateway → the *real* cc-ci, not the fresh VM. So verify TLS **on the fresh VM itself**, forcing resolution to the VM: `curl --resolve <host>.ci.commoninternet.net:443:127.0.0.1` (or to the VM's tailnet IP), SNI `ci.commoninternet.net`.
|
||||
- **Served leaf fingerprint must == the git cert leaf** `57:8D:67:9E:FE:89:…:B8:A6` (sha256), proving Traefik on the rebuilt host serves the sops-from-git cert. Cert-from-git serving is an integral part of the C4/D8 proof.
|
||||
- Plus: oneshots converge (swarm/proxy/drone/bridge/dashboard), all secrets decrypt, **no manual step outside `docs/install.md`**, only the bootstrap age key provisioned out-of-band.
|
||||
|
||||
## C1 refresh @2026-05-27 18:00Z — byte-identical against NEW keyFile config (izsmiajw)
|
||||
|
||||
Builder W4 Step A (`9cc6788`/`24fe11a`) added `sops.age.keyFile` (recovery key on clones, host-derived on cc-ci) and switched cc-ci → new toplevel `izsmiajwjwa12356mm35fw08jdy5f0zs` (supersedes the `vh6vwxbl` from my 16:55 W2 PASS). Re-verified cold: fresh recursive clone (HEAD `24fe11a`, submodule `2312f1c`) → `nixos-rebuild build` = `izsmiajw` == `/run/current-system`. **BYTE-IDENTICAL: YES, zero drift.** Live host healthy (running, 0 failed), cert sha `c1d96d61…`, TLS `200/ssl_verify=0`. → **C1 stays Adversary-PASS** against the current running config; clock refreshed 18:00Z. (W4 Step B throwaway rebuild still in flight — not yet CLAIMED.)
|
||||
|
||||
## W4/C4 + C5: PASS @2026-05-27 18:55Z — genuine throwaway-VM live rebuild (COLD, independent)
|
||||
|
||||
Gate W4 CLAIMED by Builder. Verified by performing my OWN independent clean-room rebuild on a fresh throwaway VM (not the Builder's — theirs was destroyed). Full cold flow, following `docs/install.md` exactly:
|
||||
|
||||
**Setup (mine, cold):** Created `ccci-w5-rebuild` in Incus `terraform-ci` via the REST API (image `incus-base-vm`, 4 GB/2 cpu/20 GB; tailnet via the CURRENT `TS_AUTH_KEY` from `/srv/cc-ci/.testenv`). Confirmed genuinely **blank**: NixOS 24.11 base config, no `/root/cc-ci`, no docker/swarm, **no `/var/lib/sops-nix/key.txt`**. Provisioned the **ONE** out-of-band secret = the recovery age key (`/srv/cc-ci/.sops/master-age.txt`) → `/var/lib/sops-nix/key.txt` (0600). `git clone --recursive` base+secrets (bot creds via per-command header, not persisted) → HEAD `b54ea6d`, submodule `secrets`→`2312f1c` (ENC), `age.keyFile` present. **One** `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` (detached unit). **No step outside docs/install.md.** Switch succeeded in ~14 min.
|
||||
|
||||
**C4 convergence — PASS (cold):**
|
||||
- **Byte-identical:** rebuilt VM `/run/current-system` = `/nix/store/ld19aj2dcrjm6jarq1k6rvhc0zww34qq-nixos-system-…` == cc-ci's running toplevel. A blank host + 2 git repos + 1 age key reproduces cc-ci **bit-for-bit** (re-exercises C1 on a clean host).
|
||||
- `systemctl is-system-running` = **running, 0 failed units**.
|
||||
- **All 6 swarm stacks 1/1** (traefik app + socket-proxy, drone, ccci-bridge `cc-ci-bridge:cb0f9d7c6936`, ccci-dashboard `cc-ci-dashboard:daf1afd05cae`, backups) — same images as cc-ci; serialized reconcile oneshots converged on the single switch.
|
||||
- **All secrets incl. cert decrypt from git** via the recovery key (the VM's SSH host key is NOT a sops recipient — proves the recovery-key model): `/var/lib/ci-certs/live/fullchain.pem` → `/run/secrets.d/1/wildcard_cert` (**ramfs**, not store), sha256 `c1d96d61…` (== operator original). Re-exercises C2/C3 on a clean host.
|
||||
- **TLS from git cert (off-box):** curl through the proxy to the rebuilt VM's Traefik (SNI `ci.commoninternet.net`, resolved to the VM IP) → `ssl_verify=0`; served leaf fingerprint **`57:8D:67:9E:FE:89:…:B8:A6`** == git cert leaf exactly (CN=`*.ci.commoninternet.net`, LE E8). The rebuilt VM serves the sops-from-git wildcard cert. (404 body is expected — no app deployed behind `probe`.)
|
||||
|
||||
**C5 honest D8 — PASS.** D8 now has both halves: static (byte-identical build==running, W2/16:55Z + ld19aj2 18:00Z) **plus** dynamic (this live throwaway rebuild). `docs/install.md` states the rebuild is "verified," not "infeasible by design"; `docs/` and `DECISIONS.md` carry no "infeasible" wording (the only residual hits are in the Phase-1 HISTORY `REVIEW.md`/`JOURNAL.md` — superseding note appended to Phase-1 REVIEW.md). **Narrow documented limitation, Adversary-signed-off:** `docs/install.md §2` keeps the one-time **Drone↔Gitea OAuth grant** as a documented manual post-step (can't be Nix-declared without the bot password on the box). This does NOT block system/swarm convergence (drone server came up 1/1 without it) and its functional effect (Drone cloning/building) is exactly what the upcoming E2E-TESTME (E1-E6) validates. I accept it as a narrow, documented, justified limitation — not a blanket "infeasible."
|
||||
|
||||
→ **C1, C2, C3, C4, C5 all Adversary-PASS** (re-exercised cold on a blank host; clocks refreshed 18:55Z). No VETO.
|
||||
|
||||
### VM identity for the Builder (target for the E2E-TESTME swap — per orchestrator actor/critic split)
|
||||
- **Incus instance:** `ccci-w5-rebuild` (project `terraform-ci` on b1), Running, 4 GB.
|
||||
- **Current tailnet IP:** `100.97.167.73` | tailscale DNSName `ccci-w5-rebuild.taila4a0bf.ts.net` (not yet renamed).
|
||||
- Stack is UP and converged (ld19aj2, 6/6). **I am keeping it running** (C6 override). The Builder owns the swap (original→`cc-nix-test-orig` first, then `ccci-w5-rebuild`→`cc-nix-test`) + runs `!testme`; **the Adversary will NOT rename nodes** and will independently verify E1-E6 from a cold start afterward.
|
||||
|
||||
**[adversary heads-up for the E2E swap window]** The rebuilt VM's `ccci-bridge` is converged and **already polling Gitea with the real bot token**, as is the original cc-ci's bridge. During the swap window BOTH (`cc-nix-test` = throwaway and the kept-running `cc-nix-test-orig`) will see the same `!testme` → risk of **double builds / double PR comments**, which can muddy E2's "a NEW build started via the bridge" check (which instance's build counts?). Recommend the Builder **pause/stop the original's `ccci-bridge` (or its drone) during the e2e** so only the rebuilt VM (the system under test) triggers. Not a product defect (normal operation has one cc-ci) — a test-window artifact of running two cc-ci's at once; flagging so the e2e evidence stays unambiguous.
|
||||
|
||||
## E2E-TESTME (E1–E6): PASS @2026-05-27 19:00Z — independent cold verification
|
||||
|
||||
Builder ran the real `!testme` acceptance (spec `cc-ci-plan/test-e2e-testme-acceptance.md`) on my W5 VM swapped in as `cc-nix-test`, found+fixed a genuine clean-room gap **in git source** (Drone bot machine token: `DRONE_USER_CREATE …,token:$(cat /run/secrets/bridge_drone_token)` — without it a fresh Drone auto-generates a random token and the bridge gets 401; exactly the out-of-band gap E2E is meant to catch), then swapped back. I verified each criterion independently (querying the rebuilt VM's Drone / Gitea / dashboard directly — not the Builder's quotes):
|
||||
- **E2 PASS** — cc-ci Drone **build #4 event=custom, trigger/sender=autonomic-bot** (bridge poll, not manual), params `RECIPE=custom-html PR=2 REF=db9a9502… SRC=recipe-maintainers/custom-html`; baseline before it was #3 (push). (`!testme` on a recipe PR triggers a parameterized build on the **cc-ci** pipeline, so custom-html's own repo correctly shows counter=0.)
|
||||
- **E4 PASS** — build #4 success; its `ci`-step log shows the **3 real stages all passing, no softening**: install `test_http_reachable`+`test_playwright_page` (Playwright) 2 passed, upgrade `test_upgrade_preserves_data` 1 passed, backup `test_backup_mutate_restore` 1 passed.
|
||||
- **E5 PASS** — clean undeploy: 0 residual `cust-*`/`<tag>-<6hex>` stacks or app `.envs` on the rebuilt VM.
|
||||
- **E6 PASS** — bridge posted to custom-html#2 (Gitea API): "cc-ci: run for `custom-html` @ `db9a9502` ✅ **passed** → …/cc-ci/4"; rebuilt VM's dashboard row = custom-html / success / #4.
|
||||
- **E1 + E3** — Builder captured the full external path live during the swap (HTTP/2 200, `nginx` welcome body, `*.ci.commoninternet.net` LE cert at `cust-bdddd9.ci.commoninternet.net` through the public gateway). I independently corroborated the rebuilt-VM serving half off-box: `curl` (via proxy) to `ci.commoninternet.net` resolved to the rebuilt VM IP → **200 ssl_verify=0** with real dashboard content + the git wildcard cert (leaf `57:8D:67…` established W5). The gateway's wildcard TLS-passthrough is established operator infra (Phase-1 M1). **Caveat:** the live external curl to the *deployed app* was not re-run by me (app torn down at E5 + swap reverted); if an independent live external re-run is required, it needs a brief re-swap (Builder owns swaps). I judge the durable evidence + VM-side serving sufficient — **E1/E3 PASS**.
|
||||
|
||||
→ **E2E-TESTME PASS** (E1–E6). The clean-room-rebuilt VM is operationally a working CI server end-to-end over the public domain.
|
||||
|
||||
## DONE-verification @2026-05-27 19:05Z — C1–C7 cold review (Builder declared work COMPLETE)
|
||||
|
||||
Config settled at FINAL **`cqym8knj`** (added the Drone-token fix). Both the canonical cc-ci (live `cc-nix-test`, 100.90.116.4, swapped back) and my parked rebuilt VM run `cqym8knj`.
|
||||
- **C1 PASS (refreshed cold @final):** fresh recursive clone (published HEAD `3bfb48b`, submodule `2312f1c`) → `nixos-rebuild build` = `cqym8knj` == `/run/current-system` on canonical cc-ci. **Byte-identical, zero drift.**
|
||||
- **C2 PASS** — cert sops-from-git, served leaf == git cert (W2 + W5 on the blank VM).
|
||||
- **C3 PASS** — base clean (submodule), 8 secrets ENC in private `cc-ci-secrets`, decrypt to ramfs not store.
|
||||
- **C4 PASS** — genuine throwaway-VM live rebuild (my own cold W5: blank VM + 2 repos + 1 age key → single switch → cqym8knj-class byte-identical [was ld19aj2 pre-fix], 0 failed, 6/6 stacks, cert+TLS from git).
|
||||
- **C5 PASS** — honest D8 (static + live; "infeasible by design" withdrawn — Phase-1 REVIEW.md superseded; docs carry no "infeasible"). Narrow signed-off limitation: Drone↔Gitea OAuth grant (install.md §2), now functionally validated by E2E-TESTME.
|
||||
- **C6 PASS** — cc-nix-test at 4 GB (W1); Builder's first throwaway destroyed; my W5 VM `ccci-w5-rebuild` **retained running per operator override** (intended promotion, not a leftover); running RAM = 4+4+4 = **12 GB ≤ 16** (within guideline). Final sizing = promote rebuilt VM (recorded; physical promotion operator-deferred).
|
||||
- **C7 — NOT YET PASS.** `docs/install.md` (23 hits) + `docs/secrets.md` (14) are updated to the new model, no "infeasible" in docs. **But `docs/architecture.md` is materially stale for 1c:** line 17 still describes secrets as local `secrets/secrets.yaml` decrypted "via the host SSH key" (no `cc-ci-secrets` submodule split, no recovery-key bootstrap, no cert-in-git), and §Network/TLS describes the cert as "pre-issued … at /var/lib/ci-certs/live/" (out-of-band) rather than sops-decrypted-from-git — i.e. the central 1c change is missing from the doc C7 explicitly names. Filed as `[adversary]` finding ADV-1c-1.
|
||||
|
||||
**DONE-readiness: WITHHELD on C7 only.** C1–C6 + E2E-TESTME are Adversary-PASS (<24h, no VETO). The Builder must update `docs/architecture.md` to the 1c model (secrets-repo split + recovery-key bootstrap + cert-in-git); I re-verify, then DONE may proceed. **No VETO** — this is a documentation-accuracy gap, not a correctness/security failure.
|
||||
|
||||
## C7: PASS @2026-05-27 20:10Z — ADV-1c-1 cleared (architecture.md updated to 1c model)
|
||||
|
||||
Builder fixed `docs/architecture.md` (`6276bfd`/`2a5affc`). Re-verified cold at HEAD: the secrets row now describes the **cc-ci-secrets submodule split** (base holds no secret material), **wildcard cert+key sops-encrypted in git**, decryption via the **bootstrap age key** (`sops.age.keyFile` — host-derived or the off-box **recovery key on a fresh/cloned host**), and "one age key the only secret not in git"; the swarm + Network/TLS rows now state the cert is **sops-decrypted from git** to `/var/lib/ci-certs/live/`. No stale pre-1c phrasing left. `install.md` + `secrets.md` already 1c-correct; no "infeasible" in `docs/`. A new engineer can stand up a fresh instance from the repo docs. **ADV-1c-1 CLOSED.** (Non-blocking: the external orchestrator `plan.md §1.5/§4.0/§4.4` still has pre-1c cert wording — out of repo, not the install doc; noted, not gating.)
|
||||
|
||||
→ **C7 Adversary-PASS.** **All C1–C7 + E2E-TESTME now Adversary-PASS (<24h, no VETO, no open [adversary] findings).** DONE handshake unblocked: the Builder may write `## DONE`; I will do a final cold confirmation (all PASS <24h, system healthy, no VETO) and sign off.
|
||||
|
||||
## ✅ DONE confirmed — Adversary final sign-off @2026-05-27 20:30Z
|
||||
|
||||
Builder wrote `## DONE` (`6228cc3`). Confirmed from a cold check — exit condition met:
|
||||
- **All C1–C7 + E2E-TESTME Adversary-PASS within 24h** (REVIEW-1c: W2 16:55Z; C1-refresh 18:00Z; W4/C4/C5 18:55Z; E2E + C1–C6 19:00/19:05Z; C7 20:10Z). **No standing VETO** (the only `## VETO` token is this file's rule description). **No open `[adversary]` findings** (ADV-1c-1 closed).
|
||||
- **Final cold health:** canonical cc-ci (live `cc-nix-test`, 100.90.116.4) toplevel `cqym8knjg7nkly1wdgwkyr873fm8scfl`, `running`, **0 failed**, 6 stacks, cert `c1d96d61…`, public `https://ci.commoninternet.net/` → **200 ssl_verify=0**. Rebuilt VM `ccci-w5-rebuild` (100.97.167.73) at the same `cqym8knj`, `running` (retained per C6 operator override). architecture.md re-checked at HEAD — 1c-correct, no regression.
|
||||
|
||||
**Phase 1c is genuinely DONE.** The VM is fully reproducible from git (base `cc-ci` + private `cc-ci-secrets` submodule incl. the wildcard cert, all secrets sops-in-git) — a blank NixOS host + the two repos + the one bootstrap age key → a single `nixos-rebuild switch` → a converged cc-ci that serves a real `!testme` run end-to-end over the public domain. I independently cold-proved the throwaway-VM live rebuild (C4/C5) and the E2E-TESTME (E1–E6). D8 closed honestly (static byte-identical + live rebuild; "infeasible by design" withdrawn). Two real reproducibility gaps were caught en route and fixed in git source (abra reconcile race; non-deterministic Drone bot token).
|
||||
|
||||
Open items the Builder handed to the operator are **not 1c-gating** (physical promotion of `ccci-w5-rebuild`→cc-nix-test; final teardown timing — both per the operator override). **Adversary loop terminating** — exit condition satisfied (STATUS `## DONE` + fresh PASS logged for every C1–C7 + E2E-TESTME).
|
||||
|
||||
<!-- Append PASS/FAIL verdicts below with timestamps + evidence. -->
|
||||
546
machine-docs/REVIEW.md
Normal file
546
machine-docs/REVIEW.md
Normal file
@ -0,0 +1,546 @@
|
||||
# REVIEW — cc-ci Adversary (append-only)
|
||||
|
||||
This file is owned by the **Adversary** loop (§6.1). The Builder seeds this stub at bootstrap and
|
||||
does not edit it afterward. Adversary appends milestone/D-item verdicts (`<id>: PASS @<ts>` +
|
||||
evidence, or `FAIL` + a finding in `BACKLOG.md ## Adversary findings`), and may write `## VETO`.
|
||||
|
||||
<!-- Adversary verdicts below -->
|
||||
|
||||
## M0 — Foundations: PASS @2026-05-26T21:35Z
|
||||
|
||||
Verified cold (fresh shell, own clone `/srv/cc-ci/cc-ci-adv`, isolated host build dir
|
||||
`/root/cc-ci-advverify`, no reuse of Builder's `/root/cc-ci`).
|
||||
|
||||
Acceptance — "`systemctl is-system-running` healthy after a rebuild from the repo" + Builder's
|
||||
sops claim:
|
||||
- **Repo rebuilds cc-ci:** synced M0 commit `deb4a0f` (git-archive, no .git) to host, ran
|
||||
`nixos-rebuild build --flake .#cc-ci` → `BUILD EXIT 0`, produced
|
||||
`…-nixos-system-nixos-24.11.20250630.50ab793`. Current HEAD also builds clean.
|
||||
- **System health:** `systemctl is-system-running` → `running`; `systemctl --failed` → 0 units.
|
||||
- **sops decrypt:** `/run/secrets/test_secret` present, mode `400 root:root`, 41 bytes, value
|
||||
begins `cc-c…` (matches claimed generated `cc-ci-m0-…`). `secrets/secrets.yaml` is genuinely
|
||||
encrypted (2× `ENC[…]` + sops metadata block).
|
||||
- **D6 leak probe (early):** the decrypted plaintext value appears **0 times** across *all* git
|
||||
history (`git grep -F over git rev-list --all`) and 0× in plaintext in `secrets.yaml`. No leak.
|
||||
|
||||
Note (not a finding; context for the M1 gate): the *running* system is already ahead of M0 — its
|
||||
closure includes docker, `unit-swarm-init`, and **traefik** units (`traefik.yml`,
|
||||
`traefik-stack.yml`, `unit-traefik-deploy`) that are **not yet committed** (HEAD `ab839ae` is
|
||||
swarm-only, no traefik). Expected mid-M1 churn, but the Traefik config must be committed to the
|
||||
repo before M1 is claimed or it fails D8 reproducibility — will check at the M1 gate.
|
||||
|
||||
## M1 — Swarm + abra target: PASS @2026-05-26T22:20Z
|
||||
|
||||
Verified cold from own clone; deployed my **own** probe recipe via abra (not trusting the Builder's
|
||||
hand-test). Acceptance "a recipe deployed via abra is reachable over HTTPS at
|
||||
`*.ci.commoninternet.net`, then fully torn down leaving no volumes" + orchestrator's M1 checklist
|
||||
(a–d).
|
||||
|
||||
- **(a) Real coop-cloud/traefik recipe (not hand-rolled):** `docker service ls` →
|
||||
`traefik_…_app` (`traefik:v3.6.15`) + `…_socket-proxy` (lscr.io socket-proxy) — the canonical
|
||||
recipe layout, deployed via abra (`scripts/deploy-proxy.sh`). `modules/traefik.nix` is deleted.
|
||||
- **(b) Wildcard on web-secure + proxy overlay:** static `traefik.yml` has `web-secure: :443`
|
||||
(web→web-secure 301 redirect, verified live). File provider `/etc/traefik/file-provider.yml`:
|
||||
`tls.certificates: [{certFile:/run/secrets/ssl_cert, keyFile:/run/secrets/ssl_key}]`; swarm
|
||||
secrets `…_ssl_cert_v1`/`…_ssl_key_v1` mounted (2909 B / 227 B = the pre-issued cert). My probe
|
||||
app `advm1probe_…_app` was attached to the `proxy` overlay.
|
||||
- **E2E (cold deploy):** `abra app new custom-html -D advm1probe.ci.commoninternet.net` (forced
|
||||
`LETS_ENCRYPT_ENV=""`) → `deploy succeeded 🟢`. Via SOCKS proxy: **HTTP 200**; served cert
|
||||
`subject: CN=*.ci.commoninternet.net`, SAN-matched, `SSL certificate verify ok`, issuer LE E8 —
|
||||
i.e. the **pre-issued wildcard**, NOT a per-host ACME cert.
|
||||
- **(c) No Gandi/DNS token, no ACME credential:** repo (all history) clean; on host the only
|
||||
gandi/dns-challenge strings are **commented-out** recipe-template options (`#GANDI_…`,
|
||||
`#SECRET_GANDIV5_…`) holding no value. Active traefik env = `LETS_ENCRYPT_ENV=` (empty),
|
||||
`WILDCARDS_ENABLED=1`, `compose.wildcard.yml`. `staging`/`production` certResolvers are *defined*
|
||||
in traefik.yml (stock template) but **referenced by no router**; both acme.json are **0 bytes**;
|
||||
**0 ACME lines in traefik logs**. No ACME ever fires. (Hardening risk filed — see findings.)
|
||||
- **(d) Manual renewal documented:** DECISIONS.md — operator re-issues at same paths, then
|
||||
`abra app secret rm … ssl_cert` + re-insert at bumped version; install.md "Renewed out-of-band;
|
||||
never ACME here."
|
||||
- **Teardown:** `abra app undeploy` + `volume remove` → post-teardown services/containers/volumes/
|
||||
secrets for the probe **all 0**. Also independently confirmed the Builder's `cchtml1` test left 0
|
||||
runtime resources (only its inert `.env` config file remains, harmless).
|
||||
|
||||
Verdict: **M1 PASS.** Not a hard fail on (c) — no token/credential exists and no ACME fires — but
|
||||
the inert ACME resolvers + test-app default `LETS_ENCRYPT_ENV=production` are a latent hazard that
|
||||
goes live when the harness deploys apps; filed as `[adversary]` for M4.
|
||||
|
||||
<!-- M2 live-trigger probe @2026-05-26T23:30Z: this push should create Drone build #4 -->
|
||||
|
||||
## M2 — Drone online: PASS @2026-05-26T23:32Z
|
||||
|
||||
Verified cold from own clone. Acceptance: "push to cc-ci triggers a visible green Drone build."
|
||||
|
||||
- **Drone server healthy:** `https://drone.ci.commoninternet.net/healthz` → HTTP 200 via gateway.
|
||||
Exec runner (`drone-runner-exec.service`) active, `polling the remote server capacity=2 type=exec`.
|
||||
- **Repo wired:** in Drone's DB the `recipe-maintainers/cc-ci` repo is `repo_active=1`,
|
||||
`repo_config=.drone.yml`. Gitea↔Drone OAuth proven by the in-pipeline `clone` step succeeding
|
||||
against the private repo (build can't clone without working OAuth/repo token).
|
||||
- **Push→green, independently triggered:** I pushed my own commit `91a8e8d` (a REVIEW.md change) →
|
||||
Drone created **build #4**, `build_event=push`, `build_trigger=@hook` (Gitea webhook), and it ran
|
||||
**`success`**: stage `self-test` exit 0, steps `clone`+`hello` both exit 0. Builds #1–#3 (Builder
|
||||
commits) likewise all `success` via `@hook`. (My earlier M0/M1 review pushes predate the
|
||||
`.drone.yml`, so correctly produced no builds.)
|
||||
- **Visible logs (D7 precondition):** `logs` table holds per-step log blobs for every build; Drone
|
||||
UI/API serve them. Full D7 UX is M8.
|
||||
|
||||
Verdict: **M2 PASS.** No new findings.
|
||||
|
||||
## M3 — Comment bridge: PRE-CLAIM PROGRESS (not yet PASS) @2026-05-26T23:48Z
|
||||
|
||||
M3 is **Blocked** in STATUS (Gitea not delivering webhooks), so not a gate verdict yet. But the
|
||||
bridge is deployed and I independently hammered its auth/filter logic — the part I can verify
|
||||
regardless of the delivery leg (and which survives a pivot to API polling). Probes were live POSTs
|
||||
to `https://ci.commoninternet.net/hook` via the SOCKS proxy, with HMAC signatures I computed from
|
||||
the on-host secret (read with root; value never printed/committed):
|
||||
|
||||
| probe | expect | got |
|
||||
|---|---|---|
|
||||
| no `X-Gitea-Signature` | 401 | **401** |
|
||||
| bad signature | 401 | **401** |
|
||||
| valid sig, event=`ping` (not issue_comment) | 204 | **204** |
|
||||
| valid sig, `!testmexyz` on a real PR | 204 (no trigger) | **204** |
|
||||
| valid sig, `!testme` but issue is not a PR | 204 | **204** |
|
||||
| valid sig, `!testme` on PR, action=`edited` | 204 | **204** |
|
||||
| valid sig, `!testme` on real PR, **non-collaborator** | 403 | **403** |
|
||||
|
||||
So: HMAC fail-closed + timing-safe (`compare_digest`, verified before body parse), `!testmexyz`
|
||||
correctly ignored (exact trimmed match), non-PR ignored, and a non-collaborator is rejected (403;
|
||||
collaborator status re-checked via Gitea API, not trusted from the signed payload). Source review
|
||||
of `bridge/bridge.py` found no auth bypass.
|
||||
|
||||
**Blocker independently corroborated (operator-side):** the bridge hook *is* registered + active on
|
||||
`recipe-maintainers/cc-ci` (id 210, events `[issue_comment]` → `ci.commoninternet.net/hook`), and
|
||||
the bot is not a Gitea site-admin (`GET /admin/hooks` → 403) nor org owner, so it genuinely cannot
|
||||
inspect/change Gitea's `[webhook] ALLOWED_HOST_LIST`. Endorse STATUS `## Blocked`: needs operator
|
||||
allowlisting or the documented poll-the-API fallback.
|
||||
|
||||
**Still UNVERIFIED for an M3 PASS:** (1) the positive path — a valid collaborator `!testme` actually
|
||||
starts a build + posts the PR comment end-to-end; (2) real Gitea→bridge delivery (or the polling
|
||||
pivot). Will complete both when M3 is claimed.
|
||||
|
||||
**Noted for M7 (not a finding yet):** the Drone-managed Gitea webhook (id 209) carries its webhook
|
||||
secret as a `?secret=` query param in the hook URL (Drone default; admin-only in Gitea, not in cc-ci
|
||||
git / CI logs / dashboard). Will adjudicate against D6 at M7.
|
||||
|
||||
## M4 — Harness + install stage: VERIFICATION IN PROGRESS (no verdict yet) @2026-05-27T00:35Z
|
||||
|
||||
M4 is CLAIMED. Code review done; runtime checks so far:
|
||||
- **A1 CLOSED** (see BACKLOG): harness forces `LETS_ENCRYPT_ENV=""` every deploy; live app
|
||||
`cust-c95a69` served the wildcard cert, 0 ACME lines, no certresolver.
|
||||
- **Happy-path teardown works:** a prior run's app `cust-e084bd` was fully torn down (gone) — not
|
||||
an orphan; earlier ambiguity was a run cycling apps.
|
||||
- **Two teardown-robustness defects filed (A2, A3):** janitor's `-pr` filter is dead code under the
|
||||
`cust-<hex>` naming (no crash-orphan reaping); teardown is best-effort/unverified and deletes the
|
||||
`.env` even on failed undeploy (silent orphan, run still green).
|
||||
- **Deferred to next idle tick (a Builder harness run is active now; sequential-only):** my own
|
||||
cold install run (green install + Playwright + clean teardown verification) and the §6 kill-mid-run
|
||||
probe to test A3 empirically. Verdict (PASS/FAIL) follows that.
|
||||
|
||||
## M4 — Harness + install stage: PASS @2026-05-27T01:05Z
|
||||
|
||||
Verified by my **own** cold harness run (`RECIPE=custom-html REF=advcold… cc-ci-run
|
||||
runner/run_recipe_ci.py`, app `cust-cfeb6a`, isolated from a Builder run that happened to run
|
||||
concurrently as `cust-3c1970` — no collision, distinct domains/volumes/secrets):
|
||||
- **Install stage green:** `test_install.py` → 2 passed (27s): `test_http_reachable` (HTTPS 200 via
|
||||
gateway) + `test_playwright_page` (real Chromium loads the live app, status 200, served HTML).
|
||||
- **Guaranteed teardown:** after the run, `cust-cfeb6a` left **0** services / volumes / secrets /
|
||||
containers / `.env` — fully clean. Infra (traefik/drone/bridge/backups) untouched.
|
||||
- A1 closed (no-ACME enforced). **Open robustness findings A2 (dead `-pr` janitor) + A3 (unverified
|
||||
best-effort teardown)** concern the *crash* path (finalizer-skipped), not this happy-path run;
|
||||
they don't block M4's literal acceptance but must be resolved before DONE (D2 teardown guarantee).
|
||||
Kill-mid-run probe to substantiate A2/A3 deferred until the host is idle.
|
||||
|
||||
Verdict: **M4 PASS.**
|
||||
|
||||
## M5 — Upgrade + backup/restore stages: PASS @2026-05-27T01:05Z
|
||||
|
||||
Same cold run, stages 2 and 3 — both genuine end-to-end (no mocks; assertions reviewed in source
|
||||
and not softened):
|
||||
- **Upgrade green:** `test_upgrade.py` → 1 passed (41s). Deploys the **previous published version**
|
||||
(`previous_version` = `recipe_versions[-2]`), writes a marker into the volume-backed html dir,
|
||||
upgrades to latest (`abra upgrade`), then asserts HTTP 200 **and** the marker survives — a real
|
||||
version change with data persistence across the volume (`cust-…_content`), not a no-op.
|
||||
- **Backup/restore green:** `test_backup.py` → 1 passed (37s). Writes `original`, `abra backup`,
|
||||
mutates to `mutated` (asserted), `abra restore`, then asserts the served content is back to
|
||||
`original` ("restore did not return the pre-mutation state"). Real backup→mutate→restore cycle
|
||||
via backup-bot-two.
|
||||
- Teardown clean (same `cust-cfeb6a` 0-remnant check above covers all three stages — same domain
|
||||
reused per stage).
|
||||
|
||||
Verdict: **M5 PASS.**
|
||||
|
||||
## M6 — Recipe-local tests + second recipe: VERIFICATION IN PROGRESS (no verdict yet) @2026-05-27T01:48Z
|
||||
|
||||
M6 CLAIMED. Host has been continuously busy (Builder M6.5 ramp), so deploy-based checks are
|
||||
deferred to an idle window; static + evidence review so far:
|
||||
- **custom-html 3-stage:** already verified cold by me (see M5 PASS) — green + clean teardown.
|
||||
- **D4 recipe-local discovery — code genuine:** `run_recipe_ci.snapshot_recipe_tests` copies the
|
||||
recipe-shipped `tests/` before abra re-checkouts to a version tag, then `run_recipe_local` deploys
|
||||
the app and runs those tests against the LIVE app via `CCCI_BASE_URL`/`CCCI_APP_DOMAIN`, merged as
|
||||
a separate stage with guaranteed teardown. Demo branch `recipe-maintainers/custom-html@
|
||||
ci/d4-recipe-local` confirmed to ship `tests/test_recipe_local.py` (Gitea API). Will run it cold to
|
||||
confirm the stage executes+passes.
|
||||
- **keycloak (#2) install — test genuine:** `/realms/master` 200 health + real Playwright admin
|
||||
console login (waits for the username field). `recipe_meta.py` (HEALTH_PATH/timeouts) confirms D5
|
||||
"no harness surgery". Empirical keycloak reproduction deferred (heavy deploy; idle window).
|
||||
- **Filed [adversary] A4** (concurrency): same-recipe concurrent runs share `~/.abra/recipes/<recipe>`
|
||||
with no isolation/lock/concurrency-cap — a collision vector for the §6 concurrency check; to
|
||||
confirm empirically.
|
||||
|
||||
Pending for idle host: cold D4 run, keycloak reproduce, A2/A3 kill-probe re-test, A4 concurrency test.
|
||||
|
||||
## D6/M7 — preliminary leak scan of published Drone logs (PASS so far; M7 not yet claimed) @2026-05-27T02:05Z
|
||||
|
||||
Host-safe probe while the host was busy. Pulled Drone's `database.sqlite`, dumped all 42 `logs`
|
||||
rows (~25.5k chars of published per-step build output), scanned:
|
||||
- **Known infra secrets — 0 leaks:** webhook HMAC (64), drone token (32), gitea token (40) each
|
||||
appear **0×** in the logs (exact `grep -F`).
|
||||
- **No value patterns:** 0 matches for `password|secret|token = <value>`.
|
||||
- The only long hex/base64 hits are **git commit SHAs** in `git clone/merge` output — benign.
|
||||
Caveat: current Drone logs are hello-world + self-test; the full M7/D6 test must also cover
|
||||
app-generated secrets (e.g. keycloak DB passwords) in recipe-run logs AND the dashboard (M8). This
|
||||
is a clean baseline, not the final D6 verdict. (DB copy was scanned off-box and deleted; no secret
|
||||
value printed or committed.)
|
||||
|
||||
## M3 — Comment bridge: PASS @2026-05-27T03:13Z
|
||||
|
||||
Verified cold against the NEW design (orchestrator change: polling-PRIMARY + org-membership auth;
|
||||
webhook now optional). Re-reviewed `bridge/bridge.py` (256 lines) — sound — then live-probed the
|
||||
running bridge + Drone:
|
||||
- **`!testme` triggers a run ≤60s:** I posted `!testme` (comment 13708) on PR #1 at epoch
|
||||
1779847690 → bridge `[poll] triggered build 35` → Drone build 35 created at 1779847702 =
|
||||
**12s** latency. (Build is `failure` only because `RECIPE=cc-ci` has no `tests/cc-ci/`; the
|
||||
trigger + event=custom recipe-CI pipeline fired correctly — integration is live.)
|
||||
- **Re-commenting re-runs:** my new comment 13708 → build 35, distinct from the earlier
|
||||
comment 13705 → build 26. Distinct comment ids each fire once (dedup via `_claim`).
|
||||
- **Other comments do NOT trigger:** I posted `!testmexyz` → **no** build created, no bridge
|
||||
trigger log. Exact trimmed match enforced.
|
||||
- **Auth enforced (org-membership, fail-closed):** `GET /orgs/recipe-maintainers/members/<u>` —
|
||||
autonomic-bot & notplants → 204 (allowed), `definitely-not-a-member-zzz9` → 404 (rejected).
|
||||
`is_authorized` returns True only on 204/allowlist; anything else (incl. errors) → False.
|
||||
- **Link back:** bridge posted run-link comment 13706 ("cc-ci: started CI run … → drone…/recip…").
|
||||
- **Concurrency cap live:** runner `capacity=1` (`DRONE_RUNNER_CAPACITY=1`) + pipeline
|
||||
`concurrency:limit:1` — recipe-CI builds serialize.
|
||||
|
||||
Verdict: **M3 PASS.** (Polling is outbound read+comment only — no repo-admin; webhook optional.)
|
||||
Note: full bridge→3-stage-recipe-CI E2E on a *real recipe* PR is the Builder's in-flight
|
||||
integration item / D10 — build 35 shows the pipeline wiring works; green-on-a-real-recipe is M10.
|
||||
|
||||
## D6 — leak scan extended to recipe-CI build logs (still clean) @2026-05-27T04:05Z
|
||||
|
||||
Followup to the earlier hello-world scan: scanned the logs of all 7 `event=custom` recipe-CI builds
|
||||
(~26.7k chars — these ran real `abra app deploy` + `abra app secret generate`, so generated app
|
||||
secrets *could* surface here). Result: **0** `password|secret = <value>` patterns, **0** "secret
|
||||
generated/inserted" value lines (abra doesn't echo secret values), and every long hex/base64 hit is
|
||||
benign — Nix store paths, git SHAs, Drone workspace dir names (`<rand16>/drone/src`), pytest
|
||||
tracebacks. No app-secret leak in published recipe-run logs. (Full M7/D6 verdict still pending the
|
||||
dashboard (M8) leak check + final M7 claim.)
|
||||
|
||||
## M6 — Recipe-local tests + second recipe: PASS @2026-05-27T04:43Z
|
||||
|
||||
Acceptance: "both recipes green (custom-html 3-stage; keycloak install) + recipe-local merged",
|
||||
plus D4/D5. Verified by a mix of my own cold runs + deep Drone-log corroboration (keycloak's 31-min
|
||||
deploy made a self-rerun impractical on the contended host, so I read the actual build #39 logs, not
|
||||
a Builder summary):
|
||||
- **custom-html 3-stage:** my own cold run (see M5 PASS) — install/upgrade/backup green, 0 orphans.
|
||||
- **keycloak (#2) full 3-stage — build #39 (event=custom, RECIPE=keycloak, success):** actual log
|
||||
lines show `PASSED test_realm_endpoint_healthy`, `PASSED test_playwright_admin_login` (install,
|
||||
510s), `PASSED test_upgrade_preserves_realm` (upgrade, 610s — DB realm survived), `PASSED
|
||||
test_backup_mutate_restore` (backup, 495s — realm restored). Three separate reported stages (D2).
|
||||
Tests are genuine (admin REST + real Playwright admin-console login; reviewed source — not mocked).
|
||||
Post-run: **0** keycloak services/volumes (clean teardown).
|
||||
- **D4 recipe-local — verified by my OWN run:** `RECIPE=custom-html SRC=…/custom-html
|
||||
REF=ci/d4-recipe-local` → recipe-shipped `tests/test_recipe_local.py` snapshotted to a temp dir
|
||||
(immune to abra's version re-checkout), deployed the app, ran
|
||||
`test_recipe_local_serves_content PASSED` against the LIVE app via `CCCI_BASE_URL`, merged as a
|
||||
`recipe-local` stage; clean teardown (0 `cust-` leftovers).
|
||||
- **D5 (no harness surgery):** keycloak enrolled via `tests/keycloak/` + `recipe_meta.py` only; no
|
||||
changes to shared `runner/harness` code. enroll-recipe.md documents the flow.
|
||||
|
||||
Verdict: **M6 PASS.** (keycloak full 3-stage also satisfies the first M6.5 breadth slot.)
|
||||
|
||||
## M6.5 — breadth ramp: RUNNING EVIDENCE (no verdict yet — recipes 5–6 + gate pending) @2026-05-27T06:12Z
|
||||
|
||||
Deep-corroborating each recipe's canonical Drone recipe-ci build from its actual logs (genuine
|
||||
3-stage assertions, not summaries). Confirmed green so far (categories in parens):
|
||||
- **custom-html** (simple/stateless) — build #33 + my own cold 3-stage run (M4/M5).
|
||||
- **keycloak** (SSO + DB-backed) — build #39: realm health + Playwright admin login (install),
|
||||
`test_upgrade_preserves_realm`, `test_backup_mutate_restore` (M6 verdict).
|
||||
- **cryptpad** (stateful, no external DB) — build #46: `test_http_reachable`,
|
||||
`test_playwright_loads_cryptpad`, `test_upgrade_preserves_data`, `test_backup_mutate_restore`.
|
||||
- **matrix-synapse** (large-volume / DB + media store) — build #51: `test_client_api_healthy`,
|
||||
`test_client_api_advertises_versions`, `test_upgrade_preserves_data`, `test_backup_mutate_restore`.
|
||||
All three stages reported separately per build (D2). Categories covered: simple, SSO/DB, stateful,
|
||||
large-volume. **Remaining:** recipe #5/#6 (multi-service+S3/object-storage, e.g. lasuite; and the
|
||||
6th for breadth) + the M6.5 gate. Final M6.5/D10 verdict after those + the §6 concurrency check.
|
||||
|
||||
## Reconciliation @2026-05-27T06:18Z (watchdog ping)
|
||||
|
||||
Checked all standing claims: **every CLAIMED milestone gate through M6 is Adversary-PASS** —
|
||||
M0 @21:35, M1 @22:20, M2 @23:32, M3 @03:13, M4 @01:05, M5 @01:05, M6 @04:43 (all <24h). The
|
||||
"Gate: M0/M1/M2/M3 — CLAIMED, awaiting Adversary" strings still present in STATUS.md §Gates are
|
||||
**stale** (already cleared here); a watchdog scanning that section may false-positive on them —
|
||||
Builder may want to annotate them PASS. **No open milestone claim right now:** M6.5 is in-flight
|
||||
(4/6 recipes corroborated green: custom-html/keycloak/cryptpad/matrix-synapse; recipes 5–6 + the
|
||||
M6.5 gate pending), M7/M8/M9/M10 not yet claimed. Open findings: A2 (live janitor sweep pending an
|
||||
idle host; mechanism already verified). Nothing for me to verify is currently blocked on me.
|
||||
|
||||
## M6.5 — Breadth ramp (recipes 3–6): PASS @2026-05-27T07:25Z
|
||||
|
||||
Acceptance: "recipes 3–6 each full three-stage green; enrolling N≥3 needed no shared-harness changes."
|
||||
All six recipes' canonical Drone recipe-ci builds deep-corroborated from their actual logs (genuine
|
||||
assertions + 3 separately-reported stages each; clean teardown):
|
||||
- **cryptpad** #46 (stateful) — http + Playwright, `test_upgrade_preserves_data`, `test_backup_mutate_restore`.
|
||||
- **matrix-synapse** #51 (large-volume/DB+media) — `test_client_api_healthy`/`_advertises_versions`,
|
||||
`test_upgrade_preserves_data`, `test_backup_mutate_restore`.
|
||||
- **lasuite-docs** #57 (multi-service + S3/MinIO) — `test_http_reachable`, `test_playwright_loads_frontend`,
|
||||
`test_upgrade_preserves_data`, `test_backup_mutate_restore`.
|
||||
- **n8n** #63 (workflow) — `test_healthz`, `test_playwright_loads_editor`, `test_upgrade_preserves_data`,
|
||||
`test_backup_mutate_restore`.
|
||||
(recipes 1–2 custom-html #33/keycloak #39 verified under M4/M5/M6.)
|
||||
- **D5 (no harness surgery) verified:** grepped shared harness (`runner/harness`, `conftest`,
|
||||
`run_recipe_ci`) — **no per-recipe branching** (`if recipe==…`); the only recipe names there are
|
||||
comments. Per-recipe quirks (cryptpad SANDBOX_DOMAIN, health paths, timeouts) live in
|
||||
`tests/<recipe>/recipe_meta.py` and are consumed via the generic `EXTRA_ENV`/meta hook in
|
||||
`deploy_app`. Enrolling a recipe = `tests/<recipe>/` + `recipe_meta.py` only.
|
||||
- **bluesky→n8n swap is plan-sanctioned + documented** (DECISIONS): bluesky-pds needs TLS-passthrough
|
||||
to an in-container caddy doing its own ACME — incompatible with the no-DNS-token/no-ACME design;
|
||||
documented non-CI'd recipe (per §2's explicit allowance). The 5 required D10 categories
|
||||
(simple/SSO+DB/stateful/large-volume/multi-service+S3) are covered without it.
|
||||
|
||||
Verdict: **M6.5 PASS.** Note: these builds were triggered as recipe-ci custom builds (RECIPE param);
|
||||
the **real `!testme`-on-a-PR** end-to-end for the breadth set is D10/M10, still to verify.
|
||||
|
||||
## M7 — Secrets hardening (D6): PASS @2026-05-27T07:55Z
|
||||
|
||||
Acceptance: "Adversary's secret-grep over published logs finds nothing; rotation doc followed."
|
||||
Verified the §9 hard rule (no plaintext secret in git, logs, or UI) across ALL surfaces:
|
||||
- **Published Drone logs — clean:** dumped every `logs` row across all builds (~119k chars; incl. the
|
||||
6 recipe runs that generate app secrets). The 3 infra secrets (webhook HMAC / drone token / gitea
|
||||
token, read from `/run/secrets`) each appear **0×**; no `password|secret|token=<value>` patterns;
|
||||
long-token hits are git SHAs / nix paths / Drone workspace names (benign).
|
||||
- **Dashboard — clean:** `https://ci.commoninternet.net/` (200) + `/badge/*.svg`: 0 secret patterns,
|
||||
0 infra-secret values.
|
||||
- **Git (all history) — clean:** each infra secret **0×**; `secrets/secrets.yaml` is sops-encrypted
|
||||
(7× `ENC[…]`). No plaintext infra secret committed.
|
||||
- **Redaction filter** (`run_recipe_ci.run_stage_redacted`): masks any `/run/secrets/*` value (≥8
|
||||
chars) in stage stdout before it reaches Drone. Present as a safety net; 0 `REDACTED` markers in
|
||||
logs = no secret was ever echoed in the first place.
|
||||
- **Rotation doc (`docs/secrets.md`) matches reality:** `.sops.yaml` has exactly the documented two
|
||||
recipients — host key `age1h90ut…` (from cc-ci's ed25519 SSH host key) + off-box master recovery
|
||||
`age1cmk26t…`; sops-nix decrypts to `/run/secrets/<name>` (0400 root) using the SSH host key
|
||||
(verified at M0 + present now). A1/A2 split + rotation steps are coherent.
|
||||
|
||||
Minor (not a finding): the redaction list covers infra secrets only, not per-run generated app
|
||||
secrets — but abra doesn't echo generated secrets (recipe logs clean) so no app-secret ever surfaced.
|
||||
|
||||
Verdict: **M7 PASS.**
|
||||
|
||||
## M8 — Dashboard (D7): PASS @2026-05-27T08:10Z
|
||||
|
||||
Acceptance: "overview matches reality across several runs; outcomes mirrored to PR comments."
|
||||
- **Overview matches reality:** `https://ci.commoninternet.net/` lists all 6 enrolled recipes, each
|
||||
`success` with the **exact canonical build #s I independently corroborated** (cryptpad #46,
|
||||
custom-html #33, keycloak #39, lasuite-docs #57, matrix-synapse #51, n8n #63) + relative "last run"
|
||||
times; cc-ci itself correctly excluded; 30s auto-refresh; YunoHost-CI-like recipe table + status
|
||||
badges, dark theme.
|
||||
- **Status badges:** `/badge/keycloak.svg` encodes `success` (per-recipe embeddable badge).
|
||||
- **PR-comment outcome reflection:** on PR #1 the bridge posted a start comment (id 13709 → run #35)
|
||||
and a **final-outcome** comment (id 13712: "run for `cc-ci` @ `d397720a` ❌ **failure** → …/76") —
|
||||
mirrors the final pass/fail and links the run. (Failure case shown; success path is the same code.)
|
||||
- **No secret leak** on the dashboard/badges (verified under M7).
|
||||
|
||||
Verdict: **M8 PASS.** (A green ✅ outcome reflected on a *real recipe* PR is exercised at D10/M10.)
|
||||
|
||||
## M10/D10 — independent confirmation of the Docker Hub rate-limit blocker @2026-05-27T10:25Z
|
||||
|
||||
The Builder filed lasuite-docs upgrade failing on Docker Hub anonymous pull rate limits (A1 registry
|
||||
creds needed; 5/6 recipes green via real `!testme`). I disbelieved and verified — it is **real, not a
|
||||
masked harness defect**:
|
||||
- Queried Docker Hub's rate-limit headers from cc-ci's own source IP (68.14.43.142):
|
||||
`ratelimit-limit: 100;w=21600`, **`ratelimit-remaining: 1`** — i.e. ~1 anonymous pull left in the
|
||||
6h window. The D10 breadth runs (6 recipes, lasuite alone = 9 images) drained the anonymous quota.
|
||||
- lasuite Drone builds (#88/#92 failure, #93 killed) show no `toomanyrequests` in pytest output —
|
||||
expected, because a rate-limited pull manifests at the docker/swarm task layer (deploy/health
|
||||
timeout), not in the test log; the header check is the direct proof.
|
||||
- The CI system itself is sound: lasuite install + backup are green; only the upgrade stage (most
|
||||
image pulls) is gated, and only by the external quota. This is precisely the plan's anticipated A1
|
||||
input (§1.5/§4.4: "rate-limit failure traced to this is a finding, then request creds").
|
||||
|
||||
**Consequence for DONE:** D10 requires all 6 recipes green via real `!testme` with all 3 stages.
|
||||
lasuite-docs upgrade cannot reliably pass without authenticated registry pulls. **This is an
|
||||
operator-action blocker** (provide Docker Hub creds → sops `secrets/`), analogous to the M3 webhook
|
||||
whitelist. Not a VETO of system quality; a missing external input. DONE must wait until lasuite's
|
||||
upgrade goes green via `!testme` (creds provided, or quota-window retry verified stable).
|
||||
|
||||
## M10/D10 — real-!testme proof: 5/6 VERIFIED (6th blocked on registry creds) @2026-05-27T10:42Z
|
||||
|
||||
Independently verified the full real-`!testme` path (D1 trigger + D2 three genuine stages + D7
|
||||
outcome reflection) for 5 of 6 recipes, from a cold read of Drone + bridge logs + Gitea PR comments:
|
||||
| recipe | build | bridge poll-trigger (real !testme) | stages | result |
|
||||
|---|---|---|---|---|
|
||||
| custom-html | #84 | PR#2 comment 13717 | 3 (4 asserts) | success |
|
||||
| keycloak | #86 | PR#1 comment 13719 | 3 (4 asserts) | success |
|
||||
| matrix-synapse | #87 | PR#1 comment 13720 | 3 (4 asserts) | success |
|
||||
| n8n | #89 | PR#1 comment 13722 | 3 (4 asserts) | success |
|
||||
| cryptpad | #90 | PR#2 comment 13727 | 3 (4 asserts) | success |
|
||||
- Each build is `event=custom` with `REF`=PR-head sha (tests the PR's code, D1), 3 separately-reported
|
||||
stages install/upgrade/backup (D2), and the bridge logged a genuine `[poll] triggered build N …
|
||||
by autonomic-bot` for each (real comment, not a manual build).
|
||||
- **Outcome reflection (D7):** verified on keycloak PR#1 — `!testme` → bridge comment "run for
|
||||
`keycloak` @ 04400dff ✅ **passed** → …" (success path; ❌ failure path seen earlier on cc-ci).
|
||||
- **6th recipe lasuite-docs:** install+backup green via `!testme`, **upgrade blocked** on the
|
||||
Docker Hub anon rate limit (independently confirmed: remaining 1/100). Category = multi-service +
|
||||
S3/object-storage; until its upgrade is green via `!testme`, **D10 is not fully met** (5/6).
|
||||
|
||||
Verdict: **D10 PARTIAL (5/6)** — pass for 5; the 6th awaits operator registry creds. No system defect;
|
||||
the gap is the external pull quota. DONE must wait for lasuite's 3rd stage green via `!testme`.
|
||||
|
||||
## M9/D8 — Reproducibility: core PROVEN; full live blank-VM rebuild pending registry creds @2026-05-27T10:52Z
|
||||
|
||||
D8 ("entire server declared in the flake; rebuildable from scratch per docs/install.md; Adversary
|
||||
rebuilds on a throwaway VM OR documents why infeasible + what was tested"). Done so far:
|
||||
- **Nix-level reproducibility PROVEN (strongest evidence the repo *is* the server):** synced repo
|
||||
**HEAD** (clean `git archive`, no .git) to an isolated host dir, ran `nixos-rebuild build
|
||||
--flake .#cc-ci` → `BUILD EXIT 0`, and the built closure
|
||||
`…m1pdvbhlmlj3x3gn0x83rgwcgssks7qs-nixos-system…` is **byte-identical to `/run/current-system`**.
|
||||
So the entire running server (swarm, drone, traefik reconcile, comment-bridge, dashboard,
|
||||
backupbot, sops secrets) is fully declared in the repo with **zero uncommitted drift** — a clean
|
||||
rebuild reproduces it exactly. (`nixos-rebuild build` is not rate-limited; image pulls happen at
|
||||
swarm runtime.)
|
||||
- **docs/install.md is a complete from-scratch path:** operator preconditions (A1) + the whole
|
||||
install = clone + one `nixos-rebuild switch` (reconcile oneshots auto-converge proxy/drone/bridge/
|
||||
dashboard) + one-time `bootstrap-drone-oauth.sh`. Accurate vs. the verified architecture.
|
||||
- **Deferred (per plan's documented-alternative allowance):** a full from-scratch LIVE deploy on a
|
||||
blank NixOS VM (incus available) pulls every recipe/infra image at swarm runtime → hits the **same
|
||||
Docker Hub anon rate limit** confirmed under M10 (remaining 1/100). Since DONE is already gated on
|
||||
those operator registry creds, I will do the throwaway-VM live rebuild **when creds arrive**
|
||||
(unblocks D8 live + D10 lasuite together) rather than wall against the quota now.
|
||||
|
||||
Status: **D8 reproducibility core PASS (Nix + docs); live blank-VM rebuild pending creds** — to
|
||||
complete before DONE.
|
||||
|
||||
## D9 — Documentation: PASS @2026-05-27T10:55Z
|
||||
|
||||
Acceptance: "README + docs/ explain architecture, enroll a recipe, add/run tests locally, operate/
|
||||
rotate secrets, debug a failed run; a new engineer can enroll a recipe and get a green run using
|
||||
only the docs." Reviewed the full set:
|
||||
- **architecture.md** — components, the `!testme` flow, network/TLS, resource safety.
|
||||
- **enroll-recipe.md** — mirror the recipe → add `tests/<recipe>/` tree → recipe-local (D4) → add to
|
||||
bridge poll list → optional webhook → run locally. Matches the verified enroll mechanism (D5: I
|
||||
confirmed enrolling needs only `tests/<recipe>/`+`recipe_meta.py`, no harness surgery).
|
||||
- **runbook.md** — where to look, common failure modes, orphans/cleanup, re-run/trigger by hand,
|
||||
cancel a stuck build (debug a failed run).
|
||||
- **secrets.md** — sops model + rotation (verified accurate vs reality under M7).
|
||||
- **install.md** — from-scratch server build (verified reproducible under M9/D8).
|
||||
- **README** — entrypoint, `!testme` overview, repo layout.
|
||||
The enroll flow documented matches what I exercised hands-on for D4/M6 (custom-html recipe-local) and
|
||||
what the Builder used for recipes 2–6 with no harness changes. Coverage is complete & accurate.
|
||||
|
||||
Verdict: **D9 PASS.**
|
||||
|
||||
## Scrutiny — lasuite `abra app upgrade -c` (no-converge-checks) is NOT a test-softening @2026-05-27T11:45Z
|
||||
|
||||
The Builder's fix (575efb5) for lasuite's upgrade "convergence failure" adds `-c` to `abra app
|
||||
upgrade`. Per the anti-drift rule I checked whether this weakens the test to make a red pass — it
|
||||
does **not**:
|
||||
- `-c` disables only **abra's** convergence poll, which false-fails a slow 9-service rolling upgrade
|
||||
(stop-first roll while pulling new images) even when services do converge.
|
||||
- The harness's own verification post-upgrade is fully intact and is the real gate:
|
||||
`test_upgrade_preserves_data` → `upgrade_app` → **`wait_healthy`** (= `services_converged`: every
|
||||
stack service N/N replicas, looped up to recipe_meta `DEPLOY_TIMEOUT`=900s + HTTP health loop),
|
||||
then asserts `http_get ∈ {200,301,302}` **and** a real `psql` read that the pre-upgrade
|
||||
`ci_marker` row survived ("postgres data did not survive the upgrade").
|
||||
- So a genuinely failed upgrade (services never reach N/N, app unhealthy, or DB data lost) **still
|
||||
fails** the stage. The change trades abra's buggy/impatient check for the harness's more patient +
|
||||
more meaningful one.
|
||||
Cleared as legitimate. **Still required for D10 6/6:** an empirical lasuite upgrade **green via real
|
||||
`!testme`**, whose build log I'll confirm shows genuine convergence (N/N) + the data-survival
|
||||
assertion passing — not just absence of an abra error.
|
||||
|
||||
## M10/D10 — Proof: 6/6 PASS @2026-05-27T11:57Z
|
||||
|
||||
All six recipes now green via REAL `!testme` PRs, all three stages genuinely exercised — the 6th
|
||||
(lasuite-docs) corroborated this tick:
|
||||
- **lasuite-docs build #108** (event=custom, REF=9f685240=PR#1 head): real trigger confirmed in
|
||||
bridge log (`[poll] triggered build 108 for lasuite-docs@9f685240 (PR #1, comment 13738) by
|
||||
autonomic-bot`). 3 stages green: install (`test_http_reachable`, `test_playwright_loads_frontend`,
|
||||
148s); **upgrade `test_upgrade_preserves_data` PASSED (141s)** — with the `-c` fix, the harness's
|
||||
own `wait_healthy` (9 services N/N) + the `psql` data-survival check passed (no "did not survive"),
|
||||
so the upgrade genuinely converged + DB data persisted (NOT hollowed by `-c`); backup
|
||||
`test_backup_mutate_restore` PASSED (158s).
|
||||
- Full D10 set (all via real `!testme`, comment-reflected): custom-html #84 (simple), keycloak #86
|
||||
(SSO/identity+DB), matrix-synapse #87 (large-volume/DB+media), n8n #89 (workflow), cryptpad #90
|
||||
(stateful), lasuite-docs #108 (multi-service+S3/object-storage). All 5 required categories covered.
|
||||
- Registry creds (A1) turned out NOT to be required — the real blocker was abra's false-convergence
|
||||
check (fixed by `-c`); the rate limit was transient (quota recovered). Creds remain a documented
|
||||
good-to-have for robustness.
|
||||
|
||||
Verdict: **D10 PASS (6/6).**
|
||||
|
||||
## D8 — Reproducible server: PASS (documented-alternative) @2026-05-27T12:00Z
|
||||
|
||||
D8 accepts either a throwaway-VM rebuild OR "documenting why a full from-scratch rebuild was
|
||||
infeasible and what was tested instead." A full from-scratch **live** rebuild on a throwaway host is
|
||||
**infeasible by design**, for two immovable reasons I verified:
|
||||
1. **sops is bound to cc-ci's host identity** — `modules/secrets.nix` decrypts via
|
||||
`/etc/ssh/ssh_host_ed25519_key`; `.sops.yaml` recipients are only cc-ci's host age key + the
|
||||
master recovery key. A throwaway VM (different host key) is not a recipient → cannot decrypt the
|
||||
infra secrets → drone/bridge/etc. can't start without operator re-keying.
|
||||
2. **Operator preconditions are cc-ci-specific** — the pre-issued wildcard cert
|
||||
(`/var/lib/ci-certs/live`) and the DNS `*.ci.commoninternet.net → gateway → (passthrough) cc-ci`
|
||||
point at cc-ci itself; they can't be reproduced on a throwaway VM (operator-owned, immovable).
|
||||
**What was tested instead (stronger than a fresh-VM rebuild):** synced repo HEAD (clean, no .git) to
|
||||
an isolated dir and `nixos-rebuild build --flake .#cc-ci` produced a closure **byte-identical to
|
||||
`/run/current-system`** — i.e. the entire running server (swarm, drone, traefik reconcile,
|
||||
comment-bridge, dashboard, backupbot, sops) is fully declared in the repo with **zero uncommitted
|
||||
drift**; a clean rebuild reproduces it exactly. install.md is an accurate single-`nixos-rebuild`
|
||||
from-scratch path + the documented operator preconditions. Every component was independently verified
|
||||
live on cc-ci (M0–M10).
|
||||
|
||||
Verdict: **D8 PASS** (Nix reproducibility proven byte-for-byte; throwaway-VM live rebuild infeasible
|
||||
by design — documented per the plan's explicit allowance).
|
||||
|
||||
## DONE-readiness (Adversary) @2026-05-27T12:00Z
|
||||
|
||||
All D1–D10 have an Adversary PASS dated within 24h, and findings A1–A4 are all closed. **No VETO.**
|
||||
| D | verdict | when |
|
||||
|---|---|---|
|
||||
| D1 trigger | PASS | M3 03:13 + D10 real-!testme runs |
|
||||
| D2 3-stage matrix | PASS | M4/M5/M6 + D10 6/6 (real, 3 stages each) |
|
||||
| D3 Playwright | PASS | live in every recipe install/D10 run |
|
||||
| D4 recipe-local | PASS | M6 (own run) |
|
||||
| D5 per-recipe tree / no harness surgery | PASS | M6.5 |
|
||||
| D6 secrets | PASS | M7 (grep clean: logs+dashboard+git) |
|
||||
| D7 results UX | PASS | M8 (overview matches reality + PR outcome) |
|
||||
| D8 reproducible server | PASS | byte-identical build==running + documented-alt |
|
||||
| D9 docs | PASS | full docs set reviewed |
|
||||
| D10 six recipes via !testme | PASS (6/6) | #84/#86/#87/#89/#90/#108 |
|
||||
From the Adversary side, the DONE handshake (§6.1) is **CLEARED** — Builder may flip STATUS → DONE.
|
||||
(Note: registry creds remain a documented good-to-have for rate-limit robustness, not a DONE blocker.)
|
||||
|
||||
## Adversary sign-off on DONE @2026-05-27T12:12Z
|
||||
|
||||
STATUS shows `## DONE` (Builder, 1c10fa5). Final cold reality check confirms it is not a ledger lie:
|
||||
- All D1–D10 carry an Adversary PASS dated 2026-05-27 (<24h); findings A1–A4 all **closed**; **no
|
||||
standing `## VETO`**.
|
||||
- Live system: `systemctl is-system-running` → running, 0 failed units.
|
||||
- Dashboard (`ci.commoninternet.net`): **6/6 recipes success**, matching the corroborated Drone
|
||||
builds (#84/#86/#87/#89/#90/#108, all real-`!testme`, 3 genuine stages each).
|
||||
- Steady state clean: **0** orphaned `<tag>-<6hex>` test apps/volumes; teardown + janitor verified.
|
||||
The DONE is **confirmed**. Adversary loop terminating — exit condition met (STATUS `## DONE` + fresh
|
||||
PASS logged for every D1–D10). Standing note: Docker Hub registry creds remain a documented
|
||||
good-to-have for rate-limit robustness (not a correctness gap).
|
||||
|
||||
---
|
||||
## SUPERSEDED by Phase 1c (appended @2026-05-27 18:55Z)
|
||||
The Phase-1 D8 verdict above (and the "throwaway-VM live rebuild **infeasible by design**" wording
|
||||
at lines ~485–502) is **CORRECTED / superseded** by Phase 1c. The premise no longer holds: the
|
||||
project's own recovery age key decrypts the repo's secrets on a fresh host, and the wildcard cert is
|
||||
now sops-in-git — so a from-scratch live rebuild IS feasible and has been **performed and verified**.
|
||||
Adversary cold-proved it 2026-05-27: a blank NixOS Incus VM + the two git repos + the single
|
||||
bootstrap age key → one `nixos-rebuild switch` → fully-converged cc-ci, byte-identical (`ld19aj2`),
|
||||
0 failed, 6 stacks 1/1, cert decrypted from git, TLS leaf == git cert. See REVIEW-1c.md (W4/C4/C5
|
||||
PASS). D8 is now honest: static byte-identical **plus** live throwaway rebuild; "infeasible by design"
|
||||
is withdrawn.
|
||||
Reference in New Issue
Block a user