fix(1d): G1 backup/restore + F1d-1 cert-check reframe
- backup artifact: read snapshot_id from 'abra app backup create' output (snapshots needs a TTY); generic.parse_snapshot_id + do_backup assert it - restore serving race: lifecycle.http_fetch (one request -> status+body, never raises) + assert_serving is now a bounded poll (settles a post-op reconverge, no bare sleep); drop wait_serving - F1d-1 (Adversary, low): reframe served_cert/assert_serving honestly as an INFRA TLS sanity check (catches a lapsed/mis-rotated wildcard cert), NOT app-vs-fallback (Traefik serves the wildcard zone-wide); the genuine serving proof is services_converged + non-404 status. Awaiting re-test. DG1 Adversary PASS @ef44d46. G1 full-lifecycle re-verification in flight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -66,3 +66,36 @@ tests/_generic/test_install.py::test_serving PASSED
|
||||
$ docker stack ls | grep hedg -> (none — clean teardown)
|
||||
```
|
||||
Lint+format clean (`ruff check`/`ruff format --check` via `nix develop .#lint`). Claiming the G0 gate.
|
||||
|
||||
## 2026-05-27 — G0/DG1 PASS; F1d-1 fixed; G1 backup+restore fixes
|
||||
|
||||
**Adversary verdict: DG1 PASS @2026-05-27** (cold, own clone @ef44d46). G0 cleared.
|
||||
|
||||
**Correcting an overstatement (Adversary finding F1d-1, valid):** my earlier G0 wording claimed the
|
||||
CA-verified cert check distinguishes "the app vs a Traefik default-cert fallback." It does NOT —
|
||||
Traefik's file provider serves the pre-issued **wildcard** for the WHOLE `*.ci.commoninternet.net`
|
||||
zone, so ANY in-zone subdomain (even a non-deployed one) verifies; the self-signed default cert is
|
||||
never served in-zone. The genuine app-vs-fallback proof is `services_converged` (the app's OWN
|
||||
service replicas N/N) + a non-404 status in HEALTH_OK (Traefik's unmatched-router fallback = 404).
|
||||
Fix applied (no code behavior change to the load-bearing checks; honesty/scope only):
|
||||
- `generic.served_cert` + `assert_serving` docstrings/comments reframed: the cert check is an INFRA
|
||||
TLS sanity check (catches a lapsed/mis-rotated wildcard cert — plan §4.0 renewal), explicitly NOT
|
||||
an app-vs-fallback check. Kept because it CAN fail (cert expiry/untrust), unlike the old
|
||||
openssl-missing no-op it replaced.
|
||||
- Assertion message reworded ("served wildcard cert is not trusted/valid", not "...not the default").
|
||||
Noted for the Adversary to re-test + close F1d-1 (theirs to tick).
|
||||
|
||||
**G1 — DG2 (upgrade) + DG3 (backup/restore) on hedgedoc (backup-capable, ≥2 tags 3.0.9→3.0.10):**
|
||||
Two real bugs found+fixed via live runs:
|
||||
1. *backup artifact check.* `abra app backup snapshots` needs a TTY (`FATA the input device is not a
|
||||
TTY`), but `abra app backup create` already emits the restic JSON summary with the produced
|
||||
`"snapshot_id"` (rc 0, "backup finished"). Verified raw on a live custom-html:
|
||||
`snapshot_id": "d85bf492…"`. Fix: `backup_create` returns its output; `generic.parse_snapshot_id`
|
||||
regex-extracts the id; `do_backup` asserts it. (Dropped the TTY-bound `snapshots` listing.)
|
||||
2. *restore serving race.* `assert_serving` made TWO requests (http_get then http_body); post-restore
|
||||
the app flapped between them → `http_body` raised an unhandled `HTTPError 404`. Fix: new
|
||||
`lifecycle.http_fetch` returns (status, body) in ONE request, never raising; `assert_serving` now
|
||||
BOUNDED-POLLS converged + serving (status+body from one request) so a post-op reconverge settles
|
||||
while a persistent failure still fails within HTTP_TIMEOUT (no bare sleep). `do_upgrade`/`do_restore`
|
||||
call it (dropped the redundant `wait_serving`).
|
||||
Re-running full hedgedoc install→upgrade→backup→restore to confirm all-green before claiming G1.
|
||||
|
||||
@ -12,8 +12,9 @@ every recipe gets a generic lifecycle suite for free; recipe-specific tests laye
|
||||
per-recipe overlay authoring is Phase 2.
|
||||
|
||||
## Definition of Done (Phase 1d) — DG1–DG8, each Adversary cold-verified in REVIEW-1d
|
||||
- [ ] **DG1** — Generic INSTALL test (recipe-agnostic): app new→deploy→converged→really serving
|
||||
- [x] **DG1** — Generic INSTALL test (recipe-agnostic): app new→deploy→converged→really serving
|
||||
(real HTTP(S), not Traefik fallback). Green on a simple recipe with no cc-ci/repo-local tests.
|
||||
**Adversary PASS @2026-05-27** (cold, hedgedoc, deploy-count=1, clean teardown).
|
||||
- [ ] **DG2** — Generic UPGRADE: previous/pinned → upgrade to target; reconverge + still serving.
|
||||
- [ ] **DG3** — Generic BACKUP+RESTORE for backup-capable recipes; clean N/A (skip) otherwise.
|
||||
- [ ] **DG4** — Layering (override-or-extend; generic is the default); discovery + cc-ci/repo-local
|
||||
@ -34,20 +35,21 @@ per-recipe overlay authoring is Phase 2.
|
||||
- **G4** — `!testme` e2e + per-op reporting + docs + cold verify. *Accept: DG6, DG7, DG8 → DONE.*
|
||||
|
||||
## In flight
|
||||
**G1 — generic upgrade + backup/restore (next).** G0 code is in place and DG1 is green; while the
|
||||
Adversary verifies G0, I'll build/prove the generic upgrade tier (previous→target in place) and the
|
||||
backup/restore tiers gated on backup-capability (hedgedoc & custom-html are both backup-capable).
|
||||
**G1 — generic upgrade + backup/restore.** Verifying the full generic lifecycle on hedgedoc
|
||||
(install→upgrade→backup→restore). DG2 (upgrade) already green; fixed two real bugs (backup artifact
|
||||
read from `abra app backup create`'s snapshot_id since `snapshots` needs a TTY; restore serving race
|
||||
→ single-request `http_fetch` + bounded-poll `assert_serving`). Re-running to confirm all-green, then
|
||||
claim G1.
|
||||
|
||||
**F1d-1 (Adversary, low/DG7) — FIXED in code, awaiting Adversary re-test+close.** The cert check is
|
||||
reframed honestly as an INFRA TLS sanity check (catches a lapsed/mis-rotated wildcard cert), NOT an
|
||||
app-vs-fallback check — the genuine serving proof is `services_converged` + non-404 status. See
|
||||
JOURNAL-1d + generic.py docstrings.
|
||||
|
||||
## Gate
|
||||
**Gate: G0 CLAIMED, awaiting Adversary (DG1).** Generic INSTALL tier is green on **hedgedoc** —
|
||||
a simple recipe with NO cc-ci/repo-local tests (pure generic), asserting it ACTUALLY serves (services
|
||||
converged + real HTTP in HEALTH_OK [404 excluded] + not Traefik's 404 body + a CA-verified trusted
|
||||
wildcard cert, not the default), with **deploy-count = 1** (DG4.1 one-deploy) and clean teardown
|
||||
(no residual stack). Evidence in JOURNAL-1d (commands + output). custom-html-tiny was rejected as the
|
||||
demo recipe: it's a static-web-server with an empty content volume → genuinely 404 zero-config.
|
||||
|
||||
To reproduce (cold): on cc-ci, `cd /root/cc-ci && RECIPE=hedgedoc STAGES=install HOME=/root \
|
||||
CCCI_JANITOR_MAX_AGE=0 cc-ci-run runner/run_recipe_ci.py` → install: pass, deploy-count=1.
|
||||
**G0/DG1 — Adversary PASS @2026-05-27.** Cleared past G0. Generic INSTALL green on hedgedoc (pure
|
||||
generic, deploy-count=1, clean teardown). Next gate: G1 (DG2+DG3), claimed once the hedgedoc full
|
||||
lifecycle is confirmed all-green.
|
||||
|
||||
Design (DECISIONS.md Phase 1d): tier model with the lifecycle OP owned by the shared harness (test
|
||||
files = assertions only); override precedence repo-local > cc-ci > generic + extend-by-composition;
|
||||
|
||||
Reference in New Issue
Block a user