Skip to content

Expose subprocess PID as public property on SubprocessCLITransport #1001

@cameronalizadeh-distyl

Description

@cameronalizadeh-distyl

Summary

Request a public pid (and ideally process) property on SubprocessCLITransport so external orchestrators can manage the CLI subprocess without reaching into private attributes.

Why this needs a tracking issue

The request is currently fragmented across multiple artifacts in different repos, with no canonical place to track demand or coordinate resolution:

This issue is a canonical home for the request. The two open PRs in this repo (#819, #995) both close it; maintainers can pick one and close the other.

Use case

Distyl.ai operates a coding-agent platform that runs the Claude Code CLI under a per-PID credential proxy. Inbound HTTP from the agent subprocess is intercepted by a local mitmdump that identifies the calling process via kernel-attested PID and injects the right user's credentials. The subprocess never sees real secrets.

To register credentials with the proxy, we need the CLI subprocess PID. Today we reach for it like this (paraphrased from our runtime):

transport = getattr(self._sdk_client, "_transport", None)
process = getattr(transport, "_process", None) if transport else None
pid = getattr(process, "pid", None) if process else None

Three levels of getattr against private attributes, defensively wrapped. Brittle to any internal refactor. A public property would let us write pid = self._sdk_client.transport.pid and remove the defensive scaffolding.

The original requester (anthropic-sdk-python#1370) needed it for cancel-cleanup; we need it for credential injection. Both are valid orchestrator concerns the SDK currently makes harder than necessary.

Related ask: bounded wait + SIGKILL fallback in close()

#1370 also requested that SubprocessCLITransport.close() add a bounded wait after terminate() with a SIGKILL fallback. The current close() does await self._process.wait() after terminate(), but inside with suppress(Exception) and without a timeout — meaning if the CLI process ignores SIGTERM (stuck network call, hung hook, etc.), close() blocks indefinitely.

This is the unresolved half of the original architectural concern. It surfaces in our use case as a hard constraint: we keep the CLI subprocess alive across conversation turns. The original per-query connect() / disconnect() pattern hit a cleanup race in an earlier SDK version where disconnect() returned before the subprocess fully exited and the next query's --resume collided with the still-running prior subprocess (failed with "Command failed with exit code 1"). The current close() does await self._process.wait() which may have addressed that specific race, but the unbounded-wait concern from #1370 still applies if SIGTERM is ignored.

Worth resolving alongside the PID exposure since they touch the same code path and serve the same architectural improvement (clean subprocess lifecycle for external owners).

Suggested resolution

  1. Merge one of feat(transport): expose pid and process as public properties #819 or feat(transport): expose subprocess PID for external cleanup #995 (or a maintainer-led merge of the two) to expose pid and ideally process.
  2. Tackle the bounded-wait + SIGKILL fallback in close() as a follow-up, or fold it into the same PR.
  3. Happy to close this issue once a PID-exposing PR lands. Flagging #1370 and the duplicate PRs for cleanup as well.

Happy to test against a release candidate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions