Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@

### Features Added

- `invocations_ws` (WebSocket) protocol support on `InvocationAgentServerHost`.
Register a handler with the new `@app.ws_handler` decorator to host a
full-duplex WebSocket endpoint at `/invocations_ws` on the same host that
serves `POST /invocations`. The SDK calls `await websocket.accept()` before
invoking the handler, runs WebSocket Ping/Pong keep-alive in the background
(default 30 s; configurable via the new `ws_ping_interval` constructor
argument), closes the connection cleanly on handler return, and maps
uncaught exceptions to RFC 6455 close code `1011`. Each connection emits a
structured close-event log line carrying `ws.session_id`, `ws.close_code`,
and `ws.duration_ms`, and the same fields are recorded as OpenTelemetry
span attributes. `/readiness`, OTEL export, graceful shutdown, and the
`x-platform-server` identity header continue to be inherited from
`azure-ai-agentserver-core`.

### Breaking Changes

### Bugs Fixed
Expand Down
60 changes: 59 additions & 1 deletion sdk/agentserver/azure-ai-agentserver-invocations/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Azure AI Agent Server Invocations client library for Python

The `azure-ai-agentserver-invocations` package provides the invocation protocol endpoints for Azure AI Hosted Agent containers. It plugs into the [`azure-ai-agentserver-core`](https://pypi.org/project/azure-ai-agentserver-core/) host framework and adds the full invocation lifecycle: `POST /invocations`, `GET /invocations/{id}`, `POST /invocations/{id}/cancel`, and `GET /invocations/docs/openapi.json`.
The `azure-ai-agentserver-invocations` package provides the invocation protocol endpoints for Azure AI Hosted Agent containers. It plugs into the [`azure-ai-agentserver-core`](https://pypi.org/project/azure-ai-agentserver-core/) host framework and supports two transports on the same host:

- **HTTP** (`invocations` protocol) — `POST /invocations`, `GET /invocations/{id}`, `POST /invocations/{id}/cancel`, `GET /invocations/docs/openapi.json`.
- **WebSocket** (`invocations_ws` protocol) — full-duplex streaming at `/invocations_ws`, registered with `@app.ws_handler`.

## Getting started

Expand All @@ -25,6 +28,7 @@ This automatically installs `azure-ai-agentserver-core` as a dependency.
- `@app.invoke_handler` — **Required.** Handles `POST /invocations`.
- `@app.get_invocation_handler` — Optional. Handles `GET /invocations/{id}`.
- `@app.cancel_invocation_handler` — Optional. Handles `POST /invocations/{id}/cancel`.
- `@app.ws_handler` — Optional. Handles WebSocket connections at `/invocations_ws`.

### Protocol endpoints

Expand All @@ -34,6 +38,7 @@ This automatically installs `azure-ai-agentserver-core` as a dependency.
| `GET` | `/invocations/{invocation_id}` | No | Retrieve invocation status or result |
| `POST` | `/invocations/{invocation_id}/cancel` | No | Cancel a running invocation |
| `GET` | `/invocations/docs/openapi.json` | No | Serve the agent's OpenAPI 3.x spec |
| `WS` | `/invocations_ws` | No | Full-duplex WebSocket transport (`invocations_ws` protocol) |

### Request and response headers

Expand Down Expand Up @@ -182,6 +187,57 @@ app = InvocationAgentServerHost(openapi_spec={
})
```

## WebSocket protocol (`invocations_ws`)

The same `InvocationAgentServerHost` object also exposes a WebSocket transport at `/invocations_ws`. Container authors do not install or import a second package — registering an `@app.ws_handler` is the only step. A multi-protocol agent shares one host, one session, and one process.

### Quick start

```python
from azure.ai.agentserver.invocations import InvocationAgentServerHost
from starlette.requests import Request
from starlette.responses import JSONResponse, Response
from starlette.websockets import WebSocket

app = InvocationAgentServerHost()


@app.invoke_handler # POST /invocations (HTTP)
async def invoke(request: Request) -> Response:
payload = await request.json()
return JSONResponse({"echo": payload})


@app.ws_handler # /invocations_ws (WebSocket)
async def ws(websocket: WebSocket) -> None:
async for message in websocket.iter_text():
await websocket.send_text(message)


app.run()
```

### What the SDK does for `@app.ws_handler`

- Registers `/invocations_ws` on the same Starlette host as `/invocations` and `/readiness`.
- Calls `await websocket.accept()` before invoking your handler.
- Runs WebSocket Ping/Pong keep-alive in the background — default 30 s, configurable via `InvocationAgentServerHost(ws_ping_interval=...)`. Set `ws_ping_interval=0` to disable. Frames are sent at the WebSocket protocol layer (RFC 6455 opcode `0x9`/`0xA`) by the underlying Hypercorn server, which keeps the connection alive across Azure APIM and Azure Load Balancer's ~4 minute idle timeout without any extra application traffic.
- Closes the connection cleanly on handler return (close code `1000`) or maps an uncaught handler exception to close code `1011`.
- Emits a structured close-event log line carrying `ws.session_id`, `ws.close_code`, and `ws.duration_ms`. The same fields are recorded as OpenTelemetry span attributes so the connection lifetime is visible end-to-end.
- Inherits `/readiness`, OpenTelemetry export, graceful shutdown, and the `x-platform-server` identity header from `azure-ai-agentserver-core`.

### Handler signature

The handler receives a Starlette [`WebSocket`][starlette-ws] and returns `None`. The full WebSocket API — `iter_text`, `iter_bytes`, `iter_json`, `send_text`, `send_bytes`, `send_json`, `close`, `headers`, `query_params`, `client`, `state` — is available, so application protocols on top of `invocations_ws` are entirely under your control.

[starlette-ws]: https://www.starlette.io/websockets/

### Reference: configuration

| Constructor argument | Default | Description |
|---|---|---|
| `ws_ping_interval` | `30.0` (seconds) | WebSocket protocol Ping interval. `0` disables keep-alive. Negative or non-finite values are rejected. |

## Troubleshooting

### Reporting issues
Expand All @@ -196,6 +252,8 @@ Visit the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/
|---|---|
| [simple_invoke_agent](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/) | Minimal synchronous request-response |
| [async_invoke_agent](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/) | Long-running operations with polling and cancellation |
| [ws_invoke_agent](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-invocations/samples/ws_invoke_agent/) | Combined `POST /invocations` (HTTP) and `/invocations_ws` (WebSocket) host |
| [ws_bidirectional_streaming_agent](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-invocations/samples/ws_bidirectional_streaming_agent/) | Full-duplex `/invocations_ws` agent: server-pushed heartbeats + concurrent token streams + mid-flight cancel |

## Contributing

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,31 @@ class InvocationConstants:
ATTR_SPAN_SESSION_ID = "azure.ai.agentserver.invocations.session_id"
ATTR_SPAN_ERROR_CODE = "azure.ai.agentserver.invocations.error.code"
ATTR_SPAN_ERROR_MESSAGE = "azure.ai.agentserver.invocations.error.message"


class InvocationsWSConstants:
"""invocations_ws (WebSocket) protocol constants.

Route, span attribute keys, and ping/pong defaults for the
WebSocket endpoint hosted alongside the HTTP invocations protocol.
"""

# Route
ROUTE_PATH = "/invocations_ws"

# Default WebSocket Ping interval in seconds.
# Azure APIM and Azure Load Balancer drop idle WebSocket connections
# after ~4 minutes; 30 s gives a comfortable safety margin.
DEFAULT_PING_INTERVAL_S = 30.0

# Close codes (RFC 6455)
CLOSE_NORMAL = 1000 # handler returned cleanly
CLOSE_INTERNAL_ERROR = 1011 # handler raised an unhandled exception
CLOSE_SERVICE_RESTART = 1012 # graceful shutdown drained the connection

# Span attribute keys
ATTR_SPAN_SESSION_ID = "ws.session_id"
ATTR_SPAN_CLOSE_CODE = "ws.close_code"
ATTR_SPAN_DURATION_MS = "ws.duration_ms"
ATTR_SPAN_ERROR_CODE = "azure.ai.agentserver.invocations_ws.error.code"
ATTR_SPAN_ERROR_MESSAGE = "azure.ai.agentserver.invocations_ws.error.message"
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
)

from ._constants import InvocationConstants
from ._invocation_ws import _WSHandlerMixin

logger = logging.getLogger("azure.ai.agentserver")

Expand Down Expand Up @@ -93,13 +94,18 @@ def _sanitize_id(value: str, fallback: str) -> str:
return value


class InvocationAgentServerHost(AgentServerHost):
class InvocationAgentServerHost(_WSHandlerMixin, AgentServerHost):
"""Invocation protocol host for Azure AI Hosted Agents.

A :class:`~azure.ai.agentserver.core.AgentServerHost` subclass that adds
the invocation protocol endpoints. Use the decorator methods to wire
handler functions to the endpoints.

The same host object also exposes the ``invocations_ws`` (WebSocket)
transport at :data:`/invocations_ws` — register a handler with the
:meth:`ws_handler` decorator. Multi-protocol agents share a single
host, session, and process.

For multi-protocol agents, compose via cooperative inheritance::

class MyHost(InvocationAgentServerHost, ResponsesAgentServerHost):
Expand All @@ -108,18 +114,29 @@ class MyHost(InvocationAgentServerHost, ResponsesAgentServerHost):
Usage::

from azure.ai.agentserver.invocations import InvocationAgentServerHost
from starlette.websockets import WebSocket

app = InvocationAgentServerHost()

@app.invoke_handler
@app.invoke_handler # POST /invocations
async def handle(request):
return JSONResponse({"ok": True})

@app.ws_handler # /invocations_ws
async def ws(websocket: WebSocket) -> None:
async for message in websocket.iter_text():
await websocket.send_text(message)

app.run()

:param openapi_spec: Optional OpenAPI spec dict. When provided, the spec
is served at ``GET /invocations/docs/openapi.json``.
:type openapi_spec: Optional[dict[str, Any]]
:param ws_ping_interval: Seconds between WebSocket protocol Ping frames
on ``/invocations_ws``. ``None`` (default) selects 30 s; ``0``
disables keep-alive. Configured on the underlying Hypercorn
server so the framing is opcode 0x9 / 0xA, not application JSON.
:type ws_ping_interval: Optional[float]
"""

_INSTRUMENTATION_SCOPE = "Azure.AI.AgentServer.Invocations"
Expand All @@ -128,15 +145,19 @@ def __init__(
self,
*,
openapi_spec: Optional[dict[str, Any]] = None,
ws_ping_interval: Optional[float] = None,
**kwargs: Any,
) -> None:
self._invoke_fn: Optional[Callable] = None
self._get_invocation_fn: Optional[Callable] = None
self._cancel_invocation_fn: Optional[Callable] = None
self._openapi_spec = openapi_spec

# Initialise WS handler slots (raises ValueError on a bad interval).
self._init_ws_state(ws_ping_interval)

# Build invocation routes and pass to parent via routes kwarg
invocation_routes = [
invocation_routes: list[Any] = [
Route(
"/invocations/docs/openapi.json",
self._get_openapi_spec_endpoint,
Expand All @@ -161,6 +182,7 @@ def __init__(
methods=["POST"],
name="cancel_invocation",
),
self._build_ws_route(self._ws_endpoint),
]

# Merge with any routes from sibling mixins via cooperative init
Expand All @@ -169,10 +191,53 @@ def __init__(

# --- Invocations startup configuration logging ---
logger.info(
"Invocations protocol: openapi_spec_configured=%s",
"Invocations protocol: openapi_spec_configured=%s, "
"ws_ping_interval=%s",
self._openapi_spec is not None,
(
"disabled"
if self._ws_ping_interval == 0
else f"{self._ws_ping_interval}s"
),
)

# ------------------------------------------------------------------
# Hypercorn server config (WebSocket Ping/Pong keep-alive)
# ------------------------------------------------------------------

def _build_hypercorn_config(self, host: str, port: int) -> object:
"""Extend the base Hypercorn config with the WebSocket Ping interval.

Hypercorn sends WS protocol Ping frames every
``websocket_ping_interval`` seconds on every active WebSocket
connection — exactly the keep-alive the ``invocations_ws`` spec
requires. ``ws_ping_interval=0`` leaves the default
``None`` (disabled).

:param host: Network interface to bind.
:type host: str
:param port: Port to bind.
:type port: int
:return: The configured Hypercorn config.
:rtype: hypercorn.config.Config
"""
config = super()._build_hypercorn_config(host, port)
if self._ws_ping_interval and self._ws_ping_interval > 0:
try:
# ``websocket_ping_interval`` is a float-or-None on
# Hypercorn ≥0.14; assigning a positive float enables
# protocol-level Ping frames.
config.websocket_ping_interval = self._ws_ping_interval # type: ignore[attr-defined]
except Exception: # pylint: disable=broad-exception-caught
# Hypercorn <0.14 does not support per-server WS ping —
# leave the default and warn so operators can upgrade.
logger.warning(
"Hypercorn does not support websocket_ping_interval; "
"WebSocket keep-alive will be best-effort.",
exc_info=True,
)
return config

# ------------------------------------------------------------------
# Handler decorators
# ------------------------------------------------------------------
Expand Down
Loading
Loading