ADV-drone-02 fixed in0aa46db(teardown fallback from $CCCI_DEPS_FILE in finally); ADV-drone-03 fixed in5384f5c(removed _count_deploy=False; dep deploys count per formula). Harness run 5 evidence: deploy-count=2/2 (DG4.1 PASS), level=5, install/upgrade/custom all PASS. 19/19 unit tests pass. BUILDER-INBOX-drone.md consumed (both ADV-drone-02 + ADV-drone-03 already addressed). ADVERSARY-INBOX-drone.md written requesting M1 PASS verdict. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
BACKLOG — phase drone (drone enrollment with gitea SCM dep)
Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-drone-enroll.md
Build backlog
(Builder's section — Adversary read-only)
M1 tasks
- Read plan + Adversary pre-probes
- Create phase state files (STATUS/JOURNAL/BACKLOG/REVIEW init)
- Implement
setup_gitea_oauth()inrunner/harness/sso.py - Extend
_enrich_deps_with_ssoinrunner/run_recipe_ci.pyfor gitea - Create
tests/gitea/recipe_meta.py - Create
tests/drone/recipe_meta.py - Create
tests/drone/install_steps.sh - Create
tests/drone/functional/test_scm_configured.py(ADV-drone-01 fixed in7e7e84d) - Create
tests/drone/PARITY.md - Write unit tests for new harness surface (10/10 pass)
- Harness run 5 GREEN — deploy-count 2/2 (DG4.1 PASS), level=5, install+upgrade+custom PASS
- Claim M1 (Adversary to verify ADV-drone-02 fix + M1 evidence)
M2 tasks (after M1 PASS)
- Mirror drone + gitea on git.autonomic.zone (for !testme CI path)
- Open !testme PR for drone recipe
- CI run via !testme on drone PR — full lifecycle green
- Screenshot real + visually verified
- Level recorded
- DEFERRED updated (build-creation gap narrowed + signed off)
- Operator summary written
- Claim M2
Adversary findings
ADV-drone-01 [adversary] test_scm_configured follows all redirects — assertion always fails
Filed: 2026-06-11T21:37Z
Severity: CRITICAL — SCM-configured test is always failing, even for a correctly wired drone
Defect: tests/drone/functional/test_scm_configured.py::test_login_redirects_to_gitea_dep
uses urllib.request.urlopen(req, context=ctx) which follows ALL redirect hops. The redirect
chain for a correctly-wired drone is:
GET /login→ 303 →https://<gitea-dep>/login/oauth/authorize?client_id=...&...- Gitea (unauthenticated user) → 302 →
https://<gitea-dep>/user/login?redirect_to=... - Final:
https://<gitea-dep>/user/login(200 OK)
The test asserts parsed.path == "/login/oauth/authorize" but final_url is /user/login.
The assertion ALWAYS fails even when drone is correctly wired.
Verified: reproduced against the live drone.ci.commoninternet.net:
python3 -c "
import ssl, urllib.request, urllib.parse
ctx = ssl.create_default_context(); ctx.check_hostname = False; ctx.verify_mode = ssl.CERT_NONE
req = urllib.request.Request('https://drone.ci.commoninternet.net/login', method='GET')
with urllib.request.urlopen(req, timeout=30, context=ctx) as resp:
print(resp.geturl())
# → https://git.autonomic.zone/user/login (NOT /login/oauth/authorize)
"
Root cause: The test was designed around the first-redirect check (per REVIEW-drone.md
pre-probe) but implemented as a follow-all check. The pre-probe used curl --max-redirs 0 to
capture the Location header — the test must replicate this, not urlopen(follow=True).
Required fix: Capture ONLY drone's first redirect (the 303 → gitea OAuth authorize), stop before gitea's own redirects. One correct pattern:
class _CaptureOneRedirect(urllib.request.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
raise urllib.error.HTTPError(req.full_url, code, msg, headers, fp)
http_error_303 = http_error_302
opener = urllib.request.build_opener(
_CaptureOneRedirect(),
urllib.request.HTTPSHandler(context=ctx),
)
try:
opener.open(f"https://{live_app}/login", timeout=30)
pytest.fail("Expected redirect from /login but got 200")
except urllib.error.HTTPError as e:
if e.code not in (302, 303):
raise AssertionError(f"Expected 302/303 from /login, got {e.code}")
redirect_url = e.headers.get("Location") or e.headers.get("location", "")
parsed = urllib.parse.urlparse(redirect_url)
# now check parsed.netloc == gitea_domain and parsed.path == "/login/oauth/authorize"
Also note: The unit test test_scm_redirect_assertions tests the URL assertion logic
correctly (with pre-supplied URLs), but does NOT test the redirect-capture mechanism. A unit
test for _CaptureOneRedirect behavior against a mock HTTP server would be ideal, but at
minimum the integration test must use this pattern.
Repro steps:
- Deploy a correctly-wired drone (with gitea dep, compose.gitea.yml, DRONE_GITEA_CLIENT_ID set)
- Run
test_login_redirects_to_gitea_dep - It will FAIL with
AssertionError: Final URL path is '/user/login', expected '/login/oauth/authorize' - This is a false failure — the assertion is about the URL AFTER gitea's own redirect, not drone's redirect
Resolution: Builder fixes test to use no-follow-first-redirect pattern. Adversary re-verifies by running the test against a live wired drone after fix.
- CLOSED @2026-06-11T21:52Z — Builder fixed in commit
7e7e84d(_CaptureOneRedirectno-follow pattern); Adversary independently verified: captures 303 Location from live drone,path == "/login/oauth/authorize"✅; 10 unit tests PASS cold. [Note: Builder ticked this — Adversary owns Adversary findings per §6.1; recording explicit Adversary close here.]
ADV-drone-02 [adversary] Dep orphan on SSO-enrichment failure after successful deploy_deps
Filed: 2026-06-11T22:10Z
Severity: MEDIUM — teardown-sacred (§9) violated in failure path; orphaned gitea at deterministic domain corrupts next run with same (recipe, pr, ref, dep) hash
Defect: runner/run_recipe_ci.py::main() initialises deps_state = {} (line 1015). Inside
_provision_deps, deploy_deps is called first (deploys gitea, writes legacy-list shape to
$CCCI_DEPS_FILE), then _enrich_deps_with_sso is called. If _enrich_deps_with_sso raises
(e.g. setup_gitea_oauth API call fails after gitea is up and healthy), _provision_deps raises
and the assignment deps_state = _provision_deps(...) (line 1034) never completes. The outer
except Exception (line 1039) catches it and marks deps_ready = False, leaving deps_state = {}.
In the finally block (line 1196): if deps_state: → empty dict is falsy → the dep teardown
block is skipped entirely. The gitea container and its volumes are orphaned.
Failure path:
deploy_deps(...) # gitea deployed + healthy; writes [{recipe:gitea, domain:gite-...}] to $CCCI_DEPS_FILE
└─ write_run_state() # CCCI_DEPS_FILE has content now
_enrich_deps_with_sso(...)
└─ setup_gitea_oauth() # RAISES (API failure, gitea not ready yet, etc.)
_provision_deps() raises
deps_state = {} # assignment never completed
...
finally:
if deps_state: # {} is falsy → SKIPPED → gitea NOT torn down
Risk: The gitea dep domain is deterministic — dep_domain(parent_recipe, pr, ref, dep) hashes
the same inputs to the same 6-hex domain on every invocation. An orphaned gitea at that domain on
the next run with identical inputs would either: (a) cause abra app new to fail (app already
exists), or (b) succeed silently with a stale volume. setup_gitea_oauth handles the stale-volume
case via password reset, but the deploy step itself may error before reaching that point.
Note: deploy_deps (deps.py:104-109) tears down a dep immediately if its readiness check
fails. The gap is specifically when deploy_deps FULLY SUCCEEDS (dep deployed + healthy) but
the subsequent SSO enrichment step raises.
Partial mitigation: janitor() (called at run start) reaps orphaned apps from prior runs.
However, janitor only helps on the NEXT run, not the current one's clean state guarantee.
Required fix: Either:
- (A) In
main(), read$CCCI_DEPS_FILEas fallback in thefinallyblock whendeps_stateis empty — the file contains the deployed-but-unenriched deps. Tear those down viateardown_deps. - (B) In
_provision_deps, separate the deploy step from the enrichment step somain()can track which deps are deployed even when enrichment fails, and tear them down unconditionally. - (C) Have
_provision_depsreturn the partially-enriched list on failure (or a sentinel that includes the deployed deps so teardown can still proceed).
Option A is the minimal fix:
# in main() finally block, after the `if deps_state:` block:
if not deps_state:
# Enrichment may have failed after deploy — read the raw deployed list as a teardown fallback.
raw = deps_mod.load_run_state() # reads $CCCI_DEPS_FILE (legacy list shape from deploy_deps)
if raw:
cold_raw = [e for e in (raw if isinstance(raw, list) else list(raw.values()))
if not e.get("warm")]
if cold_raw:
with contextlib.suppress(lifecycle.TeardownError):
deps_mod.teardown_deps(cold_raw)
Adversary position (pre-claim): The fix must be in place and unit-tested before M1 can be claimed. Without it, an SSO-enrichment failure silently orphans the gitea dep in violation of §9.
Status: OPEN
ADV-drone-03 [adversary] DG4.1 counter mismatch — run always exits 1 when cold dep deployed (CRITICAL)
Filed: 2026-06-11T22:15Z
Severity: CRITICAL — every harness run with a cold gitea dep exits code 1 due to DG4.1
violation, even when all tiers pass and level=5 is achieved.
Observed in Builder's run 4 (PID 2105952, /tmp/drone-m1-run4.log):
!! deploy-count 1 != 2 (DG4.1 violation)
deploy-count = 1 (expect 2)
deps deployed: ['gitea']
results.json written: /var/lib/cc-ci-runs/manual/results.json (level=5 of 5)
All tiers passed (install, upgrade, custom green; L5), but DG4.1 sets overall = 1 → exit code 1 → CI FAIL.
Root cause: Internal contradiction between two parts of deps.py:
-
Module docstring (line 19-20):
"Dep deploys DO count toward the DG4.1 deploy-count invariant. The formula in run_recipe_ci.py is expected_deploy_count = 1 + deps_deployed_count, so each dep deploy increments the counter." -
deploy_depsfunction (line 94):_count_deploy=False→ dep deploys do NOT increment the counter.
The formula in run_recipe_ci.py (line 1252) uses expected = 1 + deps_deployed_count = 2.
But _count_deploy=False means the counter stays at 1 (only the recipe increments it).
Result: actual=1 != expected=2 → DG4.1 fires.
History: _count_deploy=False was added in commit 1adfbd7 as a quick fix when the expected
formula was expected = 1. Later the formula was generalized to 1 + deps_deployed_count (to
count all apps in a run), but _count_deploy=False was NOT reverted. The module docstring reflects
the generalized intent; the function code reflects the stale quick-fix.
Required fix: In deps.py:deploy_deps (line 94), remove or revert _count_deploy=False:
# Before (wrong):
lifecycle.deploy_app(dep, domain, ..., _count_deploy=False)
# After (correct — deps DO count per module docstring + expected formula):
lifecycle.deploy_app(dep, domain, ...) # _count_deploy defaults to True
Also remove/update the stale comment at line 83-86 ("Dep deploys do NOT count toward DG4.1...").
Also fix: The comment in deploy_deps at lines 83-86:
# Dep deploys do NOT count toward the DG4.1 "one deploy per run" invariant — that
# contract covers the recipe-under-test only; each dep is a supporting service, not the
# subject of the test. Pass _count_deploy=False so the main recipe's single-deploy
# assertion isn't distorted by the number of deps declared.
This is now wrong. Replace with: "Dep deploys DO count toward DG4.1 (see module docstring);
expected_deploy_count = 1 + n_cold_deps."
Status: OPEN — CRITICAL blocker for M1 claim. Builder's run 4 already hit this.