- backup artifact: read snapshot_id from 'abra app backup create' output (snapshots needs a TTY); generic.parse_snapshot_id + do_backup assert it - restore serving race: lifecycle.http_fetch (one request -> status+body, never raises) + assert_serving is now a bounded poll (settles a post-op reconverge, no bare sleep); drop wait_serving - F1d-1 (Adversary, low): reframe served_cert/assert_serving honestly as an INFRA TLS sanity check (catches a lapsed/mis-rotated wildcard cert), NOT app-vs-fallback (Traefik serves the wildcard zone-wide); the genuine serving proof is services_converged + non-404 status. Awaiting re-test. DG1 Adversary PASS @ef44d46. G1 full-lifecycle re-verification in flight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
102 lines
7.0 KiB
Markdown
102 lines
7.0 KiB
Markdown
# JOURNAL — Phase 1d (append-only)
|
|
|
|
## 2026-05-27 — Bootstrap Phase 1d
|
|
|
|
Read SSOT `plan-phase1d-generic-test-suite.md` + plan.md §6.1/§7/§9. Studied the post-1b codebase:
|
|
`runner/run_recipe_ci.py` (per-stage pytest, currently deploy-per-stage), `tests/conftest.py`
|
|
(fixtures `deployed_app`/`deployed`/`old_app` each deploy+teardown), `runner/harness/{lifecycle,abra,naming}.py`,
|
|
and existing recipe tests (custom-html/keycloak/etc.).
|
|
|
|
Access re-verified (bootstrap, new phase):
|
|
```
|
|
$ ssh cc-ci 'hostname && whoami && nixos-version'
|
|
nixos / root / 24.11.20250630.50ab793 (Vicuna)
|
|
$ ssh cc-ci 'abra --version' -> abra version 0.13.0-beta-06a57de
|
|
$ ssh cc-ci 'docker stack ls' -> traefik, drone, ccci-bridge, ccci-dashboard, backups all up
|
|
$ ssh cc-ci 'grep -ri backupbot ~/.abra/recipes/custom-html/'
|
|
compose.yml: backupbot.backup=true ; backupbot.backup.path=/usr/share/nginx/html
|
|
$ curl -u bot ... /repos/recipe-maintainers/custom-html-tiny -> 200 (mirrored)
|
|
```
|
|
So: backup-capability is detectable by scanning compose for `backupbot.backup`; custom-html-tiny is
|
|
mirrored and has NO cc-ci tests dir → it's the DG1 pure-generic target.
|
|
|
|
**Design recorded in DECISIONS.md (Phase 1d section).** Key calls: tier model with the lifecycle OP
|
|
owned by the shared harness (test files = assertions only); OVERRIDE precedence repo-local > cc-ci >
|
|
generic with extend-by-composition; deploy-ONCE with a deploy-count guard; base version = previous
|
|
(when upgrade runs) else target; backup-capability auto-detect; install-steps shell hook.
|
|
|
|
Seeded STATUS-1d / BACKLOG-1d / JOURNAL-1d. Next: implement G0 (generic.py + discovery.py +
|
|
tests/_generic/ + deploy-once orchestrator), then verify generic install green on custom-html-tiny.
|
|
|
|
## 2026-05-27 — G0 generic install + deploy-once orchestrator: DG1 GREEN
|
|
|
|
Built the G0 machinery and proved DG1 end-to-end on the real server:
|
|
- `runner/harness/generic.py` — `assert_serving` (services converged + real HTTP in HEALTH_OK [excludes
|
|
404] + not Traefik's 404 body + **CA-verified TLS cert is the trusted wildcard**), op helpers
|
|
(`do_upgrade`/`do_backup`/`do_restore`), `backup_capable` (scan compose for backupbot.backup).
|
|
- `runner/harness/discovery.py` — per-op overlay resolution (repo-local > cc-ci > generic), custom
|
|
test discovery (both locations, additive), install-steps hook discovery.
|
|
- `tests/_generic/test_{install,upgrade,backup,restore}.py` — assertion-only tiers using `live_app`.
|
|
- `runner/run_recipe_ci.py` — deploy-ONCE orchestrator: base version (prev if upgrade+exists else
|
|
target), tiers run against the shared deployment, one teardown in finally, deploy-count guard +
|
|
per-op summary.
|
|
- `tests/conftest.py` — `live_app` fixture (reads CCCI_APP_DOMAIN; tiers never deploy).
|
|
- `lifecycle.deploy_app` — deploy-count recorder + install-steps hook + **pin DOMAIN to the run
|
|
domain** (fixes recipes whose .env.sample uses `{{ .Domain }}`, which this abra leaves unexpanded).
|
|
|
|
**Two real generic bugs found+fixed via live runs (not "should work"):**
|
|
1. custom-html-tiny deploy failed: `DOMAIN={{ .Domain }}` not auto-filled by `abra app new -D` on
|
|
0.13.0-beta → `can't evaluate field Domain`. Fix: `env_set(domain,"DOMAIN",domain)` in deploy_app.
|
|
2. `served_cert_subject` used `openssl s_client`, but **openssl is not on the host** (`cc-ci-run`
|
|
runtimeInputs has no openssl) → it silently returned None → the "not default cert" check was a
|
|
no-op (a DG7 can't-fail smell). Replaced with a pure-Python **CA-verified handshake** (`ssl`):
|
|
a publicly-trusted LE wildcard verifies + matches hostname; Traefik's self-signed default fails
|
|
verification → a genuine assertion. Verified the verify path on the host:
|
|
`ssl.create_default_context()` against ci.commoninternet.net → VERIFIED, CN=*.ci.commoninternet.net,
|
|
SAN=[*.ci.commoninternet.net, ci.commoninternet.net].
|
|
|
|
**DG1 evidence (cc-ci, final code):** custom-html-tiny is a static-web-server with an empty content
|
|
volume → genuinely serves 404 zero-config (not a serving demo), so picked **hedgedoc** (simple
|
|
category, NO cc-ci/repo-local tests → pure generic; backup-capable bonus):
|
|
```
|
|
$ RECIPE=hedgedoc STAGES=install cc-ci-run runner/run_recipe_ci.py
|
|
===== TIER: install (generic: tests/_generic/test_install.py) =====
|
|
tests/_generic/test_install.py::test_serving PASSED
|
|
===== RUN SUMMARY ===== deploy-count = 1 (expect 1) install : pass
|
|
$ docker stack ls | grep hedg -> (none — clean teardown)
|
|
```
|
|
Lint+format clean (`ruff check`/`ruff format --check` via `nix develop .#lint`). Claiming the G0 gate.
|
|
|
|
## 2026-05-27 — G0/DG1 PASS; F1d-1 fixed; G1 backup+restore fixes
|
|
|
|
**Adversary verdict: DG1 PASS @2026-05-27** (cold, own clone @ef44d46). G0 cleared.
|
|
|
|
**Correcting an overstatement (Adversary finding F1d-1, valid):** my earlier G0 wording claimed the
|
|
CA-verified cert check distinguishes "the app vs a Traefik default-cert fallback." It does NOT —
|
|
Traefik's file provider serves the pre-issued **wildcard** for the WHOLE `*.ci.commoninternet.net`
|
|
zone, so ANY in-zone subdomain (even a non-deployed one) verifies; the self-signed default cert is
|
|
never served in-zone. The genuine app-vs-fallback proof is `services_converged` (the app's OWN
|
|
service replicas N/N) + a non-404 status in HEALTH_OK (Traefik's unmatched-router fallback = 404).
|
|
Fix applied (no code behavior change to the load-bearing checks; honesty/scope only):
|
|
- `generic.served_cert` + `assert_serving` docstrings/comments reframed: the cert check is an INFRA
|
|
TLS sanity check (catches a lapsed/mis-rotated wildcard cert — plan §4.0 renewal), explicitly NOT
|
|
an app-vs-fallback check. Kept because it CAN fail (cert expiry/untrust), unlike the old
|
|
openssl-missing no-op it replaced.
|
|
- Assertion message reworded ("served wildcard cert is not trusted/valid", not "...not the default").
|
|
Noted for the Adversary to re-test + close F1d-1 (theirs to tick).
|
|
|
|
**G1 — DG2 (upgrade) + DG3 (backup/restore) on hedgedoc (backup-capable, ≥2 tags 3.0.9→3.0.10):**
|
|
Two real bugs found+fixed via live runs:
|
|
1. *backup artifact check.* `abra app backup snapshots` needs a TTY (`FATA the input device is not a
|
|
TTY`), but `abra app backup create` already emits the restic JSON summary with the produced
|
|
`"snapshot_id"` (rc 0, "backup finished"). Verified raw on a live custom-html:
|
|
`snapshot_id": "d85bf492…"`. Fix: `backup_create` returns its output; `generic.parse_snapshot_id`
|
|
regex-extracts the id; `do_backup` asserts it. (Dropped the TTY-bound `snapshots` listing.)
|
|
2. *restore serving race.* `assert_serving` made TWO requests (http_get then http_body); post-restore
|
|
the app flapped between them → `http_body` raised an unhandled `HTTPError 404`. Fix: new
|
|
`lifecycle.http_fetch` returns (status, body) in ONE request, never raising; `assert_serving` now
|
|
BOUNDED-POLLS converged + serving (status+body from one request) so a post-op reconverge settles
|
|
while a persistent failure still fails within HTTP_TIMEOUT (no bare sleep). `do_upgrade`/`do_restore`
|
|
call it (dropped the redundant `wait_serving`).
|
|
Re-running full hedgedoc install→upgrade→backup→restore to confirm all-green before claiming G1.
|