Python: Add auto_retry.py sample for rate limit handling by Copilot · Pull Request #4223 · microsoft/agent-framework

Copilot · 2026-02-24T19:21:40Z

Motivation and Context

Every model inference API is rate-limited. Without built-in retry support in the framework, every consumer must write similar boilerplate. This sample provides practical retry patterns using tenacity so developers can focus on agent logic.

Description

Adds python/samples/02-agents/auto_retry.py demonstrating two approaches to automatic retry on RateLimitError (HTTP 429), with updates from review feedback:

Approach 1 – Class decorator
Uses a class decorator (with_rate_limit_retry) that can be applied to any SupportsChatGetResponse-compatible client type and patches get_response() with retry behavior for non-streaming requests.
Streaming calls are passed through unchanged (with a note that streaming retry requires more delicate handling).

The retry flow in this path uses AsyncRetrying to avoid constructing a new decorated callable on each request.
Approach 2 – Chat middleware (two styles)
- Class-based – RateLimitRetryMiddleware(ChatMiddleware) wraps await call_next() in AsyncRetrying.
- Function-based – @chat_middleware decorated function wraps call_next() with a tenacity @retry-decorated inner async function.

Also adds the required PEP 723 inline script metadata header at the top of the sample to declare extra dependencies, using:

agent-framework
tenacity

Validation performed for this sample:

python -m ruff check samples/02-agents/auto_retry.py
python -m py_compile python/samples/02-agents/auto_retry.py

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Original prompt

This section details on the original issue you should resolve

<issue_title>Python: Add typical rate limiting handling for model clients</issue_title>
<issue_description>Every single model inference API is rate limited these days, so any practical use of the method AzureOpenAIChatClient.create_agent would need to deal with retry logic leading to boiler plate code.

I would like Agent Framework to solve this typical pattern, helping the developer to focus on the rest. A simple implementation using tenacity library would be as follows (but streaming needs a more delicate handling):

class AzureOpenAIChatClientWithRetry(AzureOpenAIChatClient):
    """Azure OpenAI Chat Client with built-in retry logic for handling rate limits."""

    retry_attempts = 3
    """Number of retry attempts for rate limit errors."""

    @staticmethod
    def _before_sleep_log(retry_state: RetryCallState) -> None:
        """Log when rate limiting is reached and retry is about to sleep."""
        attempt_number = retry_state.attempt_number
        wait_time = retry_state.next_action.sleep if retry_state.next_action else 0
        logger.warning(
            "Rate limiting reached. Attempt %d failed. Retrying in %.2f seconds...",
            attempt_number,
            wait_time,
        )

    @override
    @retry(
        stop=stop_after_attempt(retry_attempts),
        wait=wait_exponential(multiplier=1, min=4, max=10),
        retry=retry_if_exception_type(RateLimitError),
        reraise=True,
        before_sleep=_before_sleep_log
    )
    def get_response(self, *args, **kwargs):
        """Get response with retry on rate limit errors (429 status code only)."""
        return super().get_response(*args, **kwargs)
```</issue_description>

<agent_instructions>Create a new sample in python/samples/02-agents called `auto_retry.py` and build a retry setup in two ways, 1) the way outlined in this issue, with a class wrapper that can take any ChatClient and wraps the get_response with the `@retry` decorator and 2) with a chat middleware that does a wraps the call to `call_next` </agent_instructions>

## Comments on the Issue (you are @copilot in this section)

<comments>
<comment_new><author>@eavanvalkenburg</author><body>
Given how simple this is to write, I don't think we need to do this uniformly, would you be willing to create a sample out of this @santiagxf ?</body></comment_new>
<comment_new><author>@eavanvalkenburg</author><body>
The reason we do not want to include this directly in is because there is no single standard (even though tenacity does come close) and so we won't want to get into a situation that means we have to support a number of variants, while the base implementation with any of them is quite simple.</body></comment_new>
</comments>

Fixes Python: Add typical rate limiting handling for model clients #1362

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

eavanvalkenburg · 2026-02-25T09:18:10Z

@copilot this is not what I asked. The first version, a class based one, should not be a wrapped, it should be a class decorator that can take any SupportsGetResponse object and then wrap the get_response method within it with the retry decorator from tenacity, not what you have now.

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Copilot · 2026-02-25T09:21:32Z

@copilot this is not what I asked. The first version, a class based one, should not be a wrapped, it should be a class decorator that can take any SupportsGetResponse object and then w...

Updated the sample so approach 1 is now a class decorator (with_rate_limit_retry) that can be applied to any SupportsChatGetResponse client type and patches get_response with tenacity retry logic (instead of overriding in a subclass wrapper). Addressed in commit da740f3.

eavanvalkenburg

@copilot close, some minor comments

eavanvalkenburg · 2026-02-25T10:30:54Z

python/samples/02-agents/auto_retry.py

@@ -0,0 +1,238 @@
+# Copyright (c) Microsoft. All rights reserved.
+


we need the header defining the extra dependency here

Added the PEP 723 script header with explicit dependencies at the top of auto_retry.py (including tenacity) in commit 23f98b7.

eavanvalkenburg · 2026-02-25T10:33:04Z

python/samples/02-agents/auto_retry.py

+
+
+@with_rate_limit_retry()
+class RetryingAzureOpenAIChatClient(AzureOpenAIChatClient):


do we need this, can't we just do client = with_rate_limit_retry(AzureOpenAIChatClient(...))?

Updated approach 1 to use instance wrapping directly (client = with_rate_limit_retry(AzureOpenAIChatClient(...))) and removed the extra decorated subclass in commit 23f98b7.

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Copilot

Pull request overview

Adds a new Python sample demonstrating practical retry patterns for rate-limited model calls (HTTP 429) using tenacity, aligned with the existing samples structure and intended to be runnable via PEP 723 tooling.

Changes:

Added auto_retry.py sample showing (1) an instance-based client wrapper around get_response() and (2) chat middleware-based retries (class + function styles).
Included PEP 723 inline script metadata to declare additional dependency on tenacity.

Copilot · 2026-02-25T10:50:10Z

python/samples/02-agents/auto_retry.py

+        @retry(
+            stop=stop_after_attempt(retry_attempts),
+            wait=wait_exponential(multiplier=1, min=4, max=10),
+            retry=retry_if_exception_type(RateLimitError),
+            reraise=True,
+            before_sleep=before_sleep_log(logger, logging.WARNING),
+        )
+        async def _with_retry():
+            return await original_get_response(*args, **kwargs)


This @retry(...) decorator is created inside get_response_with_retry(), so a new decorated callable (and inner coroutine) gets constructed on every call. Consider moving the retry wrapper construction outside the per-call path (or using AsyncRetrying) to avoid unnecessary overhead.

Suggested change

@retry(

stop=stop_after_attempt(retry_attempts),

wait=wait_exponential(multiplier=1, min=4, max=10),

retry=retry_if_exception_type(RateLimitError),

reraise=True,

before_sleep=before_sleep_log(logger, logging.WARNING),

)

async def _with_retry():

return await original_get_response(*args, **kwargs)

async def _with_retry():

async for attempt in AsyncRetrying(

stop=stop_after_attempt(retry_attempts),

wait=wait_exponential(multiplier=1, min=4, max=10),

retry=retry_if_exception_type(RateLimitError),

reraise=True,

before_sleep=before_sleep_log(logger, logging.WARNING),

):

with attempt:

return await original_get_response(*args, **kwargs)

Switched the class-decorator retry path from constructing a per-call @retry wrapper to AsyncRetrying iteration inside _with_retry, as suggested, in commit 01374e6.

Copilot · 2026-02-25T10:50:10Z

python/samples/02-agents/auto_retry.py

+# /// script
+# requires-python = ">=3.10"
+# dependencies = [
+#     "agent-framework[azure]",


The PEP 723 dependency string agent-framework[azure] appears to reference an azure extra that isn't defined by this repo’s published agent-framework package metadata (there’s no [project.optional-dependencies] in python/pyproject.toml). This will cause uv run/pip to fail with “extra not provided”. Use agent-framework (no extra) or a concrete package that actually exists (e.g., agent-framework-core) instead.

Suggested change

# "agent-framework[azure]",

# "agent-framework",

Updated the PEP 723 dependency from agent-framework[azure] to agent-framework in commit 01374e6.

python/samples/02-agents/auto_retry.py

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Copilot · 2026-02-25T12:34:07Z

@copilot this is not what I asked. The first version, a class based one, should not be a wrapped, it should be a class decorator that can take any SupportsGetResponse object and then w...

Updated approach 1 back to a class-decorator pattern (with_rate_limit_retry) that patches get_response on a SupportsChatGetResponse client type in commit 01374e6.

Initial plan

533b1e6

Copilot AI assigned Copilot and eavanvalkenburg Feb 24, 2026

Copilot started work on behalf of eavanvalkenburg February 24, 2026 19:21 View session

Add auto_retry.py sample for rate limiting handling

d12e0b7

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add rate limiting handling for model clients~~ Python: Add auto_retry.py sample for rate limit handling Feb 24, 2026

Copilot finished work on behalf of eavanvalkenburg February 24, 2026 19:31

markwallace-microsoft added the python label Feb 25, 2026

Copilot started work on behalf of eavanvalkenburg February 25, 2026 09:18 View session

Update auto_retry sample to use class decorator for get_response retries

da740f3

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Copilot finished work on behalf of eavanvalkenburg February 25, 2026 09:22

eavanvalkenburg reviewed Feb 25, 2026

View reviewed changes

Copilot started work on behalf of eavanvalkenburg February 25, 2026 10:34 View session

Address review feedback on auto_retry sample header and wrapper usage

23f98b7

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Copilot finished work on behalf of eavanvalkenburg February 25, 2026 10:38

eavanvalkenburg approved these changes Feb 25, 2026

View reviewed changes

eavanvalkenburg marked this pull request as ready for review February 25, 2026 10:44

Copilot AI review requested due to automatic review settings February 25, 2026 10:44

Copilot started reviewing on behalf of eavanvalkenburg February 25, 2026 10:45 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Copilot started work on behalf of eavanvalkenburg February 25, 2026 12:32 View session

eavanvalkenburg enabled auto-merge February 25, 2026 12:33

Restore class-decorator retry sample and address reviewer feedback

01374e6

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Copilot finished work on behalf of eavanvalkenburg February 25, 2026 12:34

giles17 approved these changes Feb 25, 2026

View reviewed changes

eavanvalkenburg approved these changes Feb 25, 2026

View reviewed changes

		@@ -0,0 +1,238 @@
		# Copyright (c) Microsoft. All rights reserved.



		@with_rate_limit_retry()
		class RetryingAzureOpenAIChatClient(AzureOpenAIChatClient):

Conversation

Copilot AI commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

eavanvalkenburg commented Feb 25, 2026

Uh oh!

Copilot AI commented Feb 25, 2026

Uh oh!

eavanvalkenburg left a comment

Choose a reason for hiding this comment

Uh oh!

eavanvalkenburg Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

eavanvalkenburg Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Copilot AI commented Feb 24, 2026 •

edited

Loading