fix: retry transient network errors during OAuth token refresh#987
Conversation
Add ConnectionError, ConnectTimeout, and ReadTimeout to the backoff retry decorator in _make_handled_request so transient network failures during OAuth token refresh are automatically retried. When retries are exhausted, wrap the exception in AirbyteTracedException with FailureType.transient_error and a clear user-facing message instead of exposing raw Python exception class names. Also wrap generic exceptions in _make_handled_request with AirbyteTracedException and FailureType.system_error for consistent structured error reporting. Co-Authored-By: bot_apk <apk@cognition.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1776051256-fix-oauth-connection-error-retry#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1776051256-fix-oauth-connection-error-retryPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
The test_cloud_read_with_private_endpoint test asserted the old generic Exception message. Updated to match the new AirbyteTracedException message from _make_handled_request. Co-Authored-By: bot_apk <apk@cognition.ai>
There was a problem hiding this comment.
Pull request overview
This PR improves OAuth token refresh robustness in the CDK by retrying transient network failures and ensuring token refresh failures are surfaced via structured AirbyteTracedExceptions with appropriate FailureTypes.
Changes:
- Retry OAuth token refresh on transient
requestsnetwork exceptions (ConnectionError,ConnectTimeout,ReadTimeout) using exponential backoff. - Wrap exhausted-retry network failures as
AirbyteTracedExceptionwithFailureType.transient_error, and wrap unexpected failures asFailureType.system_error. - Add/adjust unit tests to validate retry + error wrapping behavior and update connector builder test expectations for the new error message.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py |
Expands backoff retryable exceptions for token refresh and standardizes exception wrapping via AirbyteTracedException. |
unit_tests/sources/declarative/auth/test_oauth.py |
Adds tests covering retry behavior and correct failure-type classification during OAuth refresh. |
unit_tests/connector_builder/test_connector_builder_handler.py |
Updates expected error substring to match the new structured OAuth refresh failure message. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return DeclarativeOauth2Authenticator( | ||
| token_refresh_endpoint="{{ config['refresh_endpoint'] }}", | ||
| client_id="{{ config['client_id'] }}", | ||
| client_secret="{{ config['client_secret'] }}", | ||
| refresh_token="{{ parameters['refresh_token'] }}", | ||
| config=config, | ||
| token_expiry_date="{{ config['token_expiry_date'] }}", | ||
| parameters=parameters, |
There was a problem hiding this comment.
Good point about potential test order-dependency from shared mutable config/parameters dicts. That said, this follows the same pattern used by all other test classes in this file (e.g., TestDeclarativeOauth2Authenticator, TestSingleUseRefreshTokenOauth2Authenticator) which also reference the module-level dicts directly. The new tests don't mutate config or parameters, so they're safe from ordering issues in practice. Using deepcopy would be a nice defensive improvement but would be better done as a broader cleanup across the entire test module rather than just in this class.
| raise AirbyteTracedException( | ||
| message="OAuth access token refresh request failed.", | ||
| internal_message=f"Unexpected error during OAuth token refresh: {e}", | ||
| failure_type=FailureType.system_error, | ||
| ) from e |
There was a problem hiding this comment.
Valid observation — the docstring's Raises: section should be updated to reflect AirbyteTracedException instead of the generic Exception. This is a minor documentation fix that can be addressed in a follow-up.
Summary
The CDK's OAuth token refresh mechanism (
_make_handled_requestinabstract_oauth.py) did not retry on transient network errors likeConnectionError,ConnectTimeout, orReadTimeout. The@backoff.on_exceptiondecorator only retried onDefaultBackoffException(raised for HTTP 429/5xx responses). Transient network errors — such asConnectionResetErrorduring SSL handshake — fell through the exception handler and were re-raised without retry, producing a generic message like:This was classified as
system_errorinstead oftransient_error, which could prevent automatic retry at the platform level and cause unnecessary oncall alerts.Changes
Added transient network exceptions to the backoff retry decorator —
requests.exceptions.ConnectionError,ConnectTimeout, andReadTimeoutare now retried with exponential backoff (up to 300s), consistent withTRANSIENT_EXCEPTIONSinrate_limiting.pywhich already handles these for regular HTTP stream requests.Wrapped exhausted-retry network errors in
AirbyteTracedException— When retries are exhausted,refresh_access_tokencatches the transient exception and wraps it inAirbyteTracedExceptionwithFailureType.transient_errorand a clear user-facing message:"OAuth access token refresh request failed due to a network error."Wrapped generic exceptions with
AirbyteTracedException— The catch-allexcept Exceptionblock in_make_handled_requestnow raisesAirbyteTracedExceptionwithFailureType.system_errorinstead of a bareException, ensuring consistent structured error reporting.Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/16179
Review & Testing Checklist for Human
refresh_access_tokenwrapper correctly catches only transient network exceptions (not allRequestExceptionsubclasses)except Exception→AirbyteTracedException(system_error)change in_make_handled_requestdoesn't alter behavior for existing exception types that should propagate differentlysource-hubspot) and verify token refresh still works normallyConnectionErrorduring token refresh to confirm retry + proper error messageNotes
source-hubspotrate_limiting.pyline 19-25 which definesTRANSIENT_EXCEPTIONSincludingConnectionErrorfor regular HTTP stream requestsLink to Devin session: https://app.devin.ai/sessions/c8765b364f2d49c6a3d6e94361bba420