[release/10.0] Fix KeyRingProvider thread pool starvation on cold start#66737
Open
DeagleGross wants to merge 2 commits into
Open
Conversation
* fix thread starvation
Contributor
There was a problem hiding this comment.
Pull request overview
Backport to release/10.0 of #66683, which fixes a thread-pool starvation hang in KeyRingProvider on cold start. The previous async-refresh path scheduled refresh work onto TaskScheduler.Default and then blocked every caller on Task.Wait(); with a constrained pool and concurrent first-time Protect/Unprotect calls, all workers blocked waiting for a worker that didn't exist, freezing the app until the runtime's hill-climber injected more threads (~118 s in the reported repro).
Changes:
- On true cold start (no
_cacheableKeyRingyet and no in-flight task), the first thread to acquire_cacheableKeyRingLockObjnow runsGetCacheableKeyRinginline; other callers park on the lock and re-check the cache on entry. - Removes the
Debug.Assert(!forceRefresh, ...)on the stale-ring early-return and instead guards that branch with!forceRefresh, so forced refreshes never consume a stale ring (preserved behavior, now expressed in non-debug code). - Adds a regression test asserting that on cold start the refresh delegate runs on the calling thread (encoding the invariant that prevents starvation, since timing assertions would be flaky).
Show a summary per file
| File | Description |
|---|---|
| src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs | Splits cold-start from stale-cache refresh in GetCurrentKeyRingCoreNew; cold start runs synchronously under lock; tightens the stale-ring early-return guard. |
| src/DataProtection/DataProtection/test/Microsoft.AspNetCore.DataProtection.Tests/KeyManagement/KeyRingProviderTests.cs | Adds regression test verifying the cold-start refresh executes on the calling thread. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 0
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
11 tasks
halter73
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of #66683 to release/10.0
KeyRingProvider.GetCurrentKeyRingCoreNewhandles two states with one mechanism:TaskScheduler.Default; every caller takes the early-return and immediately gets the stale ring. Nobody blocks.existingTask.Wait()— pinning a thread-pool thread on a task that needs a free thread-pool thread to run. With a constrained pool (e.g.ThreadPool.SetMaxThreads(16, …)and 16 concurrentProtectcalls — exactly the issue's repro), every worker is parked waiting for a worker that doesn't exist. The runtime's hill climber eventually injects extra threads (~118 s in the report) and the app recovers, but during the freeze nothing makes progress.Fix is to split up cold-start (no stale
cacheableKeyRingyet) and do the synchronous load on the first thread acquiring lock. Others will be waiting on lock as in the old implementation.Related #54675
Fixes #66380
Customer Impact
A customer running ASP.NET Core on .NET 10 reported (#66380) that their app freezes on startup when several authenticated requests arrive concurrently. The app handles the first ~10 requests and then completely freezes for ~118 seconds with every thread‑pool thread stuck inside
KeyRingProvider.GetCurrentKeyRingCoreNew, blocked inTask.Wait().Regression?
Risk
There is a workaround that customers can apply today without waiting for this fix:
Verification
Packaging changes reviewed?
When servicing release/2.3