Skip to content

[release/10.0] Fix KeyRingProvider thread pool starvation on cold start#66737

Open
DeagleGross wants to merge 2 commits into
dotnet:release/10.0from
DeagleGross:dmkorolev/rel-10/keyringprovider-threadpool-starvation
Open

[release/10.0] Fix KeyRingProvider thread pool starvation on cold start#66737
DeagleGross wants to merge 2 commits into
dotnet:release/10.0from
DeagleGross:dmkorolev/rel-10/keyringprovider-threadpool-starvation

Conversation

@DeagleGross
Copy link
Copy Markdown
Member

Backport of #66683 to release/10.0

KeyRingProvider.GetCurrentKeyRingCoreNew handles two states with one mechanism:

  • State A — stale ring exists. The cached ring expired but a previous value is still in the field. Refresh work is dispatched onto TaskScheduler.Default; every caller takes the early-return and immediately gets the stale ring. Nobody blocks.
  • State B — no ring at all (cold start). Same dispatch path runs, but now there is no stale ring to fall back on, so every caller falls through to existingTask.Wait() — pinning a thread-pool thread on a task that needs a free thread-pool thread to run. With a constrained pool (e.g. ThreadPool.SetMaxThreads(16, …) and 16 concurrent Protect calls — exactly the issue's repro), every worker is parked waiting for a worker that doesn't exist. The runtime's hill climber eventually injects extra threads (~118 s in the report) and the app recovers, but during the freeze nothing makes progress.

Fix is to split up cold-start (no stale cacheableKeyRing yet) and do the synchronous load on the first thread acquiring lock. Others will be waiting on lock as in the old implementation.

Related #54675
Fixes #66380

Customer Impact

A customer running ASP.NET Core on .NET 10 reported (#66380) that their app freezes on startup when several authenticated requests arrive concurrently. The app handles the first ~10 requests and then completely freezes for ~118 seconds with every thread‑pool thread stuck inside KeyRingProvider.GetCurrentKeyRingCoreNew, blocked in Task.Wait().

Regression?

  • Yes
  • No

Risk

There is a workaround that customers can apply today without waiting for this fix:

AppContext.SetSwitch("Microsoft.AspNetCore.DataProtection.KeyManagement.DisableAsyncKeyRingUpdate", true);
  • High
  • Medium
  • Low

Verification

  • Manual (required)
  • Automated

Packaging changes reviewed?

  • Yes
  • No
  • N/A

When servicing release/2.3

  • Make necessary changes in eng/PatchConfig.props

@DeagleGross DeagleGross self-assigned this May 19, 2026
Copilot AI review requested due to automatic review settings May 19, 2026 10:50
@DeagleGross DeagleGross added the area-dataprotection Includes: DataProtection label May 19, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backport to release/10.0 of #66683, which fixes a thread-pool starvation hang in KeyRingProvider on cold start. The previous async-refresh path scheduled refresh work onto TaskScheduler.Default and then blocked every caller on Task.Wait(); with a constrained pool and concurrent first-time Protect/Unprotect calls, all workers blocked waiting for a worker that didn't exist, freezing the app until the runtime's hill-climber injected more threads (~118 s in the reported repro).

Changes:

  • On true cold start (no _cacheableKeyRing yet and no in-flight task), the first thread to acquire _cacheableKeyRingLockObj now runs GetCacheableKeyRing inline; other callers park on the lock and re-check the cache on entry.
  • Removes the Debug.Assert(!forceRefresh, ...) on the stale-ring early-return and instead guards that branch with !forceRefresh, so forced refreshes never consume a stale ring (preserved behavior, now expressed in non-debug code).
  • Adds a regression test asserting that on cold start the refresh delegate runs on the calling thread (encoding the invariant that prevents starvation, since timing assertions would be flaky).
Show a summary per file
File Description
src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs Splits cold-start from stale-cache refresh in GetCurrentKeyRingCoreNew; cold start runs synchronously under lock; tightens the stale-ring early-return guard.
src/DataProtection/DataProtection/test/Microsoft.AspNetCore.DataProtection.Tests/KeyManagement/KeyRingProviderTests.cs Adds regression test verifying the cold-start refresh executes on the calling thread.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-dataprotection Includes: DataProtection

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants