Python: Add auto_retry.py sample for rate limit handling#4223
Python: Add auto_retry.py sample for rate limit handling#4223
Conversation
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
|
@copilot this is not what I asked. The first version, a class based one, should not be a wrapped, it should be a class decorator that can take any |
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
Updated the sample so approach 1 is now a class decorator ( |
eavanvalkenburg
left a comment
There was a problem hiding this comment.
@copilot close, some minor comments
| @@ -0,0 +1,238 @@ | |||
| # Copyright (c) Microsoft. All rights reserved. | |||
|
|
|||
There was a problem hiding this comment.
we need the header defining the extra dependency here
There was a problem hiding this comment.
Added the PEP 723 script header with explicit dependencies at the top of auto_retry.py (including tenacity) in commit 23f98b7.
|
|
||
|
|
||
| @with_rate_limit_retry() | ||
| class RetryingAzureOpenAIChatClient(AzureOpenAIChatClient): |
There was a problem hiding this comment.
do we need this, can't we just do client = with_rate_limit_retry(AzureOpenAIChatClient(...))?
There was a problem hiding this comment.
Updated approach 1 to use instance wrapping directly (client = with_rate_limit_retry(AzureOpenAIChatClient(...))) and removed the extra decorated subclass in commit 23f98b7.
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Python sample demonstrating practical retry patterns for rate-limited model calls (HTTP 429) using tenacity, aligned with the existing samples structure and intended to be runnable via PEP 723 tooling.
Changes:
- Added
auto_retry.pysample showing (1) an instance-based client wrapper aroundget_response()and (2) chat middleware-based retries (class + function styles). - Included PEP 723 inline script metadata to declare additional dependency on
tenacity.
| @retry( | ||
| stop=stop_after_attempt(retry_attempts), | ||
| wait=wait_exponential(multiplier=1, min=4, max=10), | ||
| retry=retry_if_exception_type(RateLimitError), | ||
| reraise=True, | ||
| before_sleep=before_sleep_log(logger, logging.WARNING), | ||
| ) | ||
| async def _with_retry(): | ||
| return await original_get_response(*args, **kwargs) |
There was a problem hiding this comment.
This @retry(...) decorator is created inside get_response_with_retry(), so a new decorated callable (and inner coroutine) gets constructed on every call. Consider moving the retry wrapper construction outside the per-call path (or using AsyncRetrying) to avoid unnecessary overhead.
| @retry( | |
| stop=stop_after_attempt(retry_attempts), | |
| wait=wait_exponential(multiplier=1, min=4, max=10), | |
| retry=retry_if_exception_type(RateLimitError), | |
| reraise=True, | |
| before_sleep=before_sleep_log(logger, logging.WARNING), | |
| ) | |
| async def _with_retry(): | |
| return await original_get_response(*args, **kwargs) | |
| async def _with_retry(): | |
| async for attempt in AsyncRetrying( | |
| stop=stop_after_attempt(retry_attempts), | |
| wait=wait_exponential(multiplier=1, min=4, max=10), | |
| retry=retry_if_exception_type(RateLimitError), | |
| reraise=True, | |
| before_sleep=before_sleep_log(logger, logging.WARNING), | |
| ): | |
| with attempt: | |
| return await original_get_response(*args, **kwargs) |
There was a problem hiding this comment.
Switched the class-decorator retry path from constructing a per-call @retry wrapper to AsyncRetrying iteration inside _with_retry, as suggested, in commit 01374e6.
| # /// script | ||
| # requires-python = ">=3.10" | ||
| # dependencies = [ | ||
| # "agent-framework[azure]", |
There was a problem hiding this comment.
The PEP 723 dependency string agent-framework[azure] appears to reference an azure extra that isn't defined by this repo’s published agent-framework package metadata (there’s no [project.optional-dependencies] in python/pyproject.toml). This will cause uv run/pip to fail with “extra not provided”. Use agent-framework (no extra) or a concrete package that actually exists (e.g., agent-framework-core) instead.
| # "agent-framework[azure]", | |
| # "agent-framework", |
There was a problem hiding this comment.
Updated the PEP 723 dependency from agent-framework[azure] to agent-framework in commit 01374e6.
Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
Updated approach 1 back to a class-decorator pattern ( |
Motivation and Context
Every model inference API is rate-limited. Without built-in retry support in the framework, every consumer must write similar boilerplate. This sample provides practical retry patterns using
tenacityso developers can focus on agent logic.Description
Adds
python/samples/02-agents/auto_retry.pydemonstrating two approaches to automatic retry onRateLimitError(HTTP 429), with updates from review feedback:Approach 1 – Class decorator
Uses a class decorator (
with_rate_limit_retry) that can be applied to anySupportsChatGetResponse-compatible client type and patchesget_response()with retry behavior for non-streaming requests.Streaming calls are passed through unchanged (with a note that streaming retry requires more delicate handling).
The retry flow in this path uses
AsyncRetryingto avoid constructing a new decorated callable on each request.Approach 2 – Chat middleware (two styles)
RateLimitRetryMiddleware(ChatMiddleware)wrapsawait call_next()inAsyncRetrying.@chat_middlewaredecorated function wrapscall_next()with a tenacity@retry-decorated inner async function.Also adds the required PEP 723 inline script metadata header at the top of the sample to declare extra dependencies, using:
agent-frameworktenacityValidation performed for this sample:
python -m ruff check samples/02-agents/auto_retry.pypython -m py_compile python/samples/02-agents/auto_retry.pyContribution Checklist
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.