Skip to content

Python: Add auto_retry.py sample for rate limit handling#4223

Open
Copilot wants to merge 5 commits intomainfrom
copilot/add-rate-limiting-handling
Open

Python: Add auto_retry.py sample for rate limit handling#4223
Copilot wants to merge 5 commits intomainfrom
copilot/add-rate-limiting-handling

Conversation

Copy link
Contributor

Copilot AI commented Feb 24, 2026

Motivation and Context

Every model inference API is rate-limited. Without built-in retry support in the framework, every consumer must write similar boilerplate. This sample provides practical retry patterns using tenacity so developers can focus on agent logic.

Description

Adds python/samples/02-agents/auto_retry.py demonstrating two approaches to automatic retry on RateLimitError (HTTP 429), with updates from review feedback:

  1. Approach 1 – Class decorator
    Uses a class decorator (with_rate_limit_retry) that can be applied to any SupportsChatGetResponse-compatible client type and patches get_response() with retry behavior for non-streaming requests.
    Streaming calls are passed through unchanged (with a note that streaming retry requires more delicate handling).

    The retry flow in this path uses AsyncRetrying to avoid constructing a new decorated callable on each request.

  2. Approach 2 – Chat middleware (two styles)

    • Class-basedRateLimitRetryMiddleware(ChatMiddleware) wraps await call_next() in AsyncRetrying.
    • Function-based@chat_middleware decorated function wraps call_next() with a tenacity @retry-decorated inner async function.

Also adds the required PEP 723 inline script metadata header at the top of the sample to declare extra dependencies, using:

  • agent-framework
  • tenacity

Validation performed for this sample:

  • python -m ruff check samples/02-agents/auto_retry.py
  • python -m py_compile python/samples/02-agents/auto_retry.py

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.
Original prompt

This section details on the original issue you should resolve

<issue_title>Python: Add typical rate limiting handling for model clients</issue_title>
<issue_description>Every single model inference API is rate limited these days, so any practical use of the method AzureOpenAIChatClient.create_agent would need to deal with retry logic leading to boiler plate code.

I would like Agent Framework to solve this typical pattern, helping the developer to focus on the rest. A simple implementation using tenacity library would be as follows (but streaming needs a more delicate handling):

class AzureOpenAIChatClientWithRetry(AzureOpenAIChatClient):
    """Azure OpenAI Chat Client with built-in retry logic for handling rate limits."""

    retry_attempts = 3
    """Number of retry attempts for rate limit errors."""

    @staticmethod
    def _before_sleep_log(retry_state: RetryCallState) -> None:
        """Log when rate limiting is reached and retry is about to sleep."""
        attempt_number = retry_state.attempt_number
        wait_time = retry_state.next_action.sleep if retry_state.next_action else 0
        logger.warning(
            "Rate limiting reached. Attempt %d failed. Retrying in %.2f seconds...",
            attempt_number,
            wait_time,
        )

    @override
    @retry(
        stop=stop_after_attempt(retry_attempts),
        wait=wait_exponential(multiplier=1, min=4, max=10),
        retry=retry_if_exception_type(RateLimitError),
        reraise=True,
        before_sleep=_before_sleep_log
    )
    def get_response(self, *args, **kwargs):
        """Get response with retry on rate limit errors (429 status code only)."""
        return super().get_response(*args, **kwargs)
```</issue_description>

<agent_instructions>Create a new sample in python/samples/02-agents called `auto_retry.py` and build a retry setup in two ways, 1) the way outlined in this issue, with a class wrapper that can take any ChatClient and wraps the get_response with the `@retry` decorator and 2) with a chat middleware that does a wraps the call to `call_next` </agent_instructions>

## Comments on the Issue (you are @copilot in this section)

<comments>
<comment_new><author>@eavanvalkenburg</author><body>
Given how simple this is to write, I don't think we need to do this uniformly, would you be willing to create a sample out of this @santiagxf ?</body></comment_new>
<comment_new><author>@eavanvalkenburg</author><body>
The reason we do not want to include this directly in is because there is no single standard (even though tenacity does come close) and so we won't want to get into a situation that means we have to support a number of variants, while the base implementation with any of them is quite simple.</body></comment_new>
</comments>

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
Copilot AI changed the title [WIP] Add rate limiting handling for model clients Python: Add auto_retry.py sample for rate limit handling Feb 24, 2026
@eavanvalkenburg
Copy link
Member

@copilot this is not what I asked. The first version, a class based one, should not be a wrapped, it should be a class decorator that can take any SupportsGetResponse object and then wrap the get_response method within it with the retry decorator from tenacity, not what you have now.

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 25, 2026

@copilot this is not what I asked. The first version, a class based one, should not be a wrapped, it should be a class decorator that can take any SupportsGetResponse object and then w...

Updated the sample so approach 1 is now a class decorator (with_rate_limit_retry) that can be applied to any SupportsChatGetResponse client type and patches get_response with tenacity retry logic (instead of overriding in a subclass wrapper). Addressed in commit da740f3.

Copy link
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot close, some minor comments

@@ -0,0 +1,238 @@
# Copyright (c) Microsoft. All rights reserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the header defining the extra dependency here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the PEP 723 script header with explicit dependencies at the top of auto_retry.py (including tenacity) in commit 23f98b7.



@with_rate_limit_retry()
class RetryingAzureOpenAIChatClient(AzureOpenAIChatClient):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this, can't we just do client = with_rate_limit_retry(AzureOpenAIChatClient(...))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated approach 1 to use instance wrapping directly (client = with_rate_limit_retry(AzureOpenAIChatClient(...))) and removed the extra decorated subclass in commit 23f98b7.

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
@eavanvalkenburg eavanvalkenburg marked this pull request as ready for review February 25, 2026 10:44
Copilot AI review requested due to automatic review settings February 25, 2026 10:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Python sample demonstrating practical retry patterns for rate-limited model calls (HTTP 429) using tenacity, aligned with the existing samples structure and intended to be runnable via PEP 723 tooling.

Changes:

  • Added auto_retry.py sample showing (1) an instance-based client wrapper around get_response() and (2) chat middleware-based retries (class + function styles).
  • Included PEP 723 inline script metadata to declare additional dependency on tenacity.

Comment on lines 84 to 92
@retry(
stop=stop_after_attempt(retry_attempts),
wait=wait_exponential(multiplier=1, min=4, max=10),
retry=retry_if_exception_type(RateLimitError),
reraise=True,
before_sleep=before_sleep_log(logger, logging.WARNING),
)
async def _with_retry():
return await original_get_response(*args, **kwargs)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This @retry(...) decorator is created inside get_response_with_retry(), so a new decorated callable (and inner coroutine) gets constructed on every call. Consider moving the retry wrapper construction outside the per-call path (or using AsyncRetrying) to avoid unnecessary overhead.

Suggested change
@retry(
stop=stop_after_attempt(retry_attempts),
wait=wait_exponential(multiplier=1, min=4, max=10),
retry=retry_if_exception_type(RateLimitError),
reraise=True,
before_sleep=before_sleep_log(logger, logging.WARNING),
)
async def _with_retry():
return await original_get_response(*args, **kwargs)
async def _with_retry():
async for attempt in AsyncRetrying(
stop=stop_after_attempt(retry_attempts),
wait=wait_exponential(multiplier=1, min=4, max=10),
retry=retry_if_exception_type(RateLimitError),
reraise=True,
before_sleep=before_sleep_log(logger, logging.WARNING),
):
with attempt:
return await original_get_response(*args, **kwargs)

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched the class-decorator retry path from constructing a per-call @retry wrapper to AsyncRetrying iteration inside _with_retry, as suggested, in commit 01374e6.

# /// script
# requires-python = ">=3.10"
# dependencies = [
# "agent-framework[azure]",
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PEP 723 dependency string agent-framework[azure] appears to reference an azure extra that isn't defined by this repo’s published agent-framework package metadata (there’s no [project.optional-dependencies] in python/pyproject.toml). This will cause uv run/pip to fail with “extra not provided”. Use agent-framework (no extra) or a concrete package that actually exists (e.g., agent-framework-core) instead.

Suggested change
# "agent-framework[azure]",
# "agent-framework",

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PEP 723 dependency from agent-framework[azure] to agent-framework in commit 01374e6.

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 25, 2026

@copilot this is not what I asked. The first version, a class based one, should not be a wrapped, it should be a class decorator that can take any SupportsGetResponse object and then w...

Updated approach 1 back to a class-decorator pattern (with_rate_limit_retry) that patches get_response on a SupportsChatGetResponse client type in commit 01374e6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: Add typical rate limiting handling for model clients

5 participants