Files
cc-ci/machine-docs/JOURNAL-1d.md
autonomic-bot ef44d4658b feat(1d): G0 — generic install + deploy-once orchestrator (DG1 green on hedgedoc)
- harness/generic.py: recipe-agnostic assert_serving (converged + real HTTP, 404-excluded +
  not Traefik 404 body + CA-verified trusted wildcard cert), op helpers, backup_capable detect
- harness/discovery.py: per-op overlay resolution (repo-local > cc-ci > generic), custom + hook
- tests/_generic/: assertion-only tiers (install/upgrade/backup/restore) on the shared deployment
- run_recipe_ci.py: deploy-ONCE orchestrator, per-op summary, deploy-count guard (DG4.1)
- conftest live_app fixture; lifecycle deploy-count + install-steps hook + pin DOMAIN to run domain

DG1 cold-verified green on hedgedoc (pure generic, deploy-count=1, clean teardown). G0 CLAIMED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:27:55 +01:00

4.5 KiB

JOURNAL — Phase 1d (append-only)

2026-05-27 — Bootstrap Phase 1d

Read SSOT plan-phase1d-generic-test-suite.md + plan.md §6.1/§7/§9. Studied the post-1b codebase: runner/run_recipe_ci.py (per-stage pytest, currently deploy-per-stage), tests/conftest.py (fixtures deployed_app/deployed/old_app each deploy+teardown), runner/harness/{lifecycle,abra,naming}.py, and existing recipe tests (custom-html/keycloak/etc.).

Access re-verified (bootstrap, new phase):

$ ssh cc-ci 'hostname && whoami && nixos-version'
nixos / root / 24.11.20250630.50ab793 (Vicuna)
$ ssh cc-ci 'abra --version'        -> abra version 0.13.0-beta-06a57de
$ ssh cc-ci 'docker stack ls'       -> traefik, drone, ccci-bridge, ccci-dashboard, backups all up
$ ssh cc-ci 'grep -ri backupbot ~/.abra/recipes/custom-html/'
  compose.yml: backupbot.backup=true ; backupbot.backup.path=/usr/share/nginx/html
$ curl -u bot ... /repos/recipe-maintainers/custom-html-tiny  -> 200 (mirrored)

So: backup-capability is detectable by scanning compose for backupbot.backup; custom-html-tiny is mirrored and has NO cc-ci tests dir → it's the DG1 pure-generic target.

Design recorded in DECISIONS.md (Phase 1d section). Key calls: tier model with the lifecycle OP owned by the shared harness (test files = assertions only); OVERRIDE precedence repo-local > cc-ci > generic with extend-by-composition; deploy-ONCE with a deploy-count guard; base version = previous (when upgrade runs) else target; backup-capability auto-detect; install-steps shell hook.

Seeded STATUS-1d / BACKLOG-1d / JOURNAL-1d. Next: implement G0 (generic.py + discovery.py + tests/_generic/ + deploy-once orchestrator), then verify generic install green on custom-html-tiny.

2026-05-27 — G0 generic install + deploy-once orchestrator: DG1 GREEN

Built the G0 machinery and proved DG1 end-to-end on the real server:

  • runner/harness/generic.pyassert_serving (services converged + real HTTP in HEALTH_OK [excludes 404] + not Traefik's 404 body + CA-verified TLS cert is the trusted wildcard), op helpers (do_upgrade/do_backup/do_restore), backup_capable (scan compose for backupbot.backup).
  • runner/harness/discovery.py — per-op overlay resolution (repo-local > cc-ci > generic), custom test discovery (both locations, additive), install-steps hook discovery.
  • tests/_generic/test_{install,upgrade,backup,restore}.py — assertion-only tiers using live_app.
  • runner/run_recipe_ci.py — deploy-ONCE orchestrator: base version (prev if upgrade+exists else target), tiers run against the shared deployment, one teardown in finally, deploy-count guard + per-op summary.
  • tests/conftest.pylive_app fixture (reads CCCI_APP_DOMAIN; tiers never deploy).
  • lifecycle.deploy_app — deploy-count recorder + install-steps hook + pin DOMAIN to the run domain (fixes recipes whose .env.sample uses {{ .Domain }}, which this abra leaves unexpanded).

Two real generic bugs found+fixed via live runs (not "should work"):

  1. custom-html-tiny deploy failed: DOMAIN={{ .Domain }} not auto-filled by abra app new -D on 0.13.0-beta → can't evaluate field Domain. Fix: env_set(domain,"DOMAIN",domain) in deploy_app.
  2. served_cert_subject used openssl s_client, but openssl is not on the host (cc-ci-run runtimeInputs has no openssl) → it silently returned None → the "not default cert" check was a no-op (a DG7 can't-fail smell). Replaced with a pure-Python CA-verified handshake (ssl): a publicly-trusted LE wildcard verifies + matches hostname; Traefik's self-signed default fails verification → a genuine assertion. Verified the verify path on the host: ssl.create_default_context() against ci.commoninternet.net → VERIFIED, CN=.ci.commoninternet.net, SAN=[.ci.commoninternet.net, ci.commoninternet.net].

DG1 evidence (cc-ci, final code): custom-html-tiny is a static-web-server with an empty content volume → genuinely serves 404 zero-config (not a serving demo), so picked hedgedoc (simple category, NO cc-ci/repo-local tests → pure generic; backup-capable bonus):

$ RECIPE=hedgedoc STAGES=install cc-ci-run runner/run_recipe_ci.py
===== TIER: install (generic: tests/_generic/test_install.py) =====
tests/_generic/test_install.py::test_serving PASSED
===== RUN SUMMARY =====   deploy-count = 1 (expect 1)   install : pass
$ docker stack ls | grep hedg   -> (none — clean teardown)

Lint+format clean (ruff check/ruff format --check via nix develop .#lint). Claiming the G0 gate.