Skip to content

azure: support managed identity and fix a goroutine hang issue (#3094)#4337

Open
ti-chi-bot wants to merge 1 commit intopingcap:release-8.5from
ti-chi-bot:cherry-pick-3094-to-release-8.5
Open

azure: support managed identity and fix a goroutine hang issue (#3094)#4337
ti-chi-bot wants to merge 1 commit intopingcap:release-8.5from
ti-chi-bot:cherry-pick-3094-to-release-8.5

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #3094

What problem does this PR solve?

Issue Number: close #3093

TiCDC writes directly to Azure Blob Storage for the azblob sink. This PR adds Azure Managed Identity / Workload Identity token auth support and fixes a potential goroutine hang in cloud storage uploads.

What is changed and how it works?

  • Support Azure token-based auth (Managed Identity / Workload Identity) for azblob sinks.
  • Add timeout/cancellation handling for streaming/multipart uploads to avoid stuck goroutines.

Check List

Tests

  • Unit test
  • Manual test (E2E on Azure)

Manual test steps:

  1. Deploy TiCDC and TiDB cluster on Azure with Managed Identity / Workload Identity enabled and configure an azblob sink pointing to Azure Blob Storage.
  2. Run a changefeed and generate writes.
  3. Verify objects are uploaded to the target container/prefix and the changefeed continues running over time.

Questions

Will it cause performance regression or break compatibility?

No expected regression; existing SAS-based configs remain supported.

Do you need to update user documentation, design documentation or monitoring documentation?

N/A

Release note

Support Azure Managed Identity authentication for azblob sinks and avoid potential hangs during cloud storage uploads.

@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Mar 3, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 3, 2026

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be LGTMed and approved by the reviewers firstly.
  2. For pull requests to TiDB-x branches, it must have no failed tests.
  3. AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Member Author

@wlwilliamx This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 3, 2026

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the reliability and security of TiCDC's integration with Azure Blob Storage. It introduces modern authentication mechanisms for Azure sinks, allowing for more secure and manageable deployments. Additionally, it addresses a critical issue where cloud storage upload operations could hang indefinitely, by implementing robust timeout and cancellation logic, thereby enhancing the overall stability of data replication.

Highlights

  • Azure Managed Identity Support: Added support for Azure Managed Identity and Workload Identity token-based authentication for azblob sinks, enhancing security and integration with Azure services.
  • Goroutine Hang Fix: Implemented timeout and cancellation handling for streaming and multipart cloud storage uploads to prevent potential goroutine hangs during network stalls or prolonged operations.
  • Dependency Updates: Updated several Go dependencies, including Azure SDKs, goccy/go-json, pierrec/lz4/v4, and stretchr/testify, to newer versions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • go.mod
    • Updated github.com/Azure/azure-sdk-for-go/sdk/azcore from v1.16.0 to v1.20.0.
    • Updated github.com/goccy/go-json from v0.10.2 to v0.10.4.
    • Updated github.com/pierrec/lz4/v4 from v4.1.18 to v4.1.21.
    • Updated github.com/stretchr/testify from v1.10.0 to v1.11.1.
    • Updated github.com/Azure/azure-sdk-for-go/sdk/azidentity from v1.7.0 to v1.10.1.
    • Updated github.com/AzureAD/microsoft-authentication-library-for-go from v1.2.2 to v1.4.2.
    • Updated github.com/google/flatbuffers from v2.0.8+incompatible to v24.3.25+incompatible.
    • Updated github.com/hamba/avro/v2 from v2.22.2 to v2.27.0.
    • Updated github.com/klauspost/cpuid/v2 from v2.2.7 to v2.2.9.
    • Added github.com/spf13/afero v1.14.0 as an indirect dependency.
    • Removed github.com/xitongsys/parquet-go indirect dependency.
    • Adjusted various pingcap and golang.org/x module versions, and included a gRPC downgrade in the replace directives.
  • pkg/util/external_storage.go
    • Refactored error messages in getExternalStorage for broader applicability.
    • Introduced withTimeoutIfNoDeadline to conditionally apply timeouts to contexts lacking a deadline.
    • Modified the Create method of extStorageWithTimeout to wrap ExternalFileWriter with custom timeout and cancellation logic.
    • Added writerWithCancelAndTimeout struct with Write and Close methods that enforce timeouts and ensure cancellation of the Create() context for multipart uploads.
  • pkg/util/external_storage_test.go
    • Added blockingCtxWriter and blockingCreateCtxWriter to simulate blocking I/O operations for testing purposes.
    • Introduced mockCreateExternalStorage to facilitate testing of the Create method's context handling.
    • Added TestExtStorageCreateWriterWriteTimeout to verify that streaming writes respect default timeouts.
    • Added TestExtStorageCreateMultipartWriteCancelsCreateCtxOnTimeout to confirm that multipart writes cancel the Create() context upon timeout.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1f710b96-510a-4181-b132-c631f1ef564d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Azure Managed Identity authentication for azblob sinks and fixes a potential goroutine hang during cloud storage uploads by adding timeout and cancellation handling. The changes to pkg/util/external_storage.go are well-implemented and include corresponding unit tests. However, the go.mod file contains multiple unresolved merge conflicts which are critical and must be fixed before this PR can be merged.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@wlwilliamx wlwilliamx force-pushed the cherry-pick-3094-to-release-8.5 branch from e3b6dd2 to 6866a01 Compare March 6, 2026 10:03
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 6, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 6, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wlwilliamx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Mar 6, 2026
@wlwilliamx
Copy link
Collaborator

/test all

@wlwilliamx
Copy link
Collaborator

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 6, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 6, 2026

@ti-chi-bot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cdc-storage-integration-heavy 6866a01 link true /test pull-cdc-storage-integration-heavy
pull-cdc-pulsar-integration-light 6866a01 link false /test pull-cdc-pulsar-integration-light
pull-cdc-kafka-integration-heavy 6866a01 link true /test pull-cdc-kafka-integration-heavy
pull-cdc-storage-integration-light 6866a01 link true /test pull-cdc-storage-integration-light
pull-cdc-kafka-integration-light 6866a01 link true /test pull-cdc-kafka-integration-light
pull-cdc-mysql-integration-light 6866a01 link true /test pull-cdc-mysql-integration-light
pull-cdc-mysql-integration-heavy 6866a01 link true /test pull-cdc-mysql-integration-heavy

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved do-not-merge/cherry-pick-not-approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants