Skip to content

[release/9.0] Fix KeyRingProvider thread pool starvation on cold start#66736

Open
DeagleGross wants to merge 2 commits into
dotnet:release/9.0from
DeagleGross:dmkorolev/rel-9/keyringprovider-threadpool-starvation
Open

[release/9.0] Fix KeyRingProvider thread pool starvation on cold start#66736
DeagleGross wants to merge 2 commits into
dotnet:release/9.0from
DeagleGross:dmkorolev/rel-9/keyringprovider-threadpool-starvation

Conversation

@DeagleGross
Copy link
Copy Markdown
Member

Backport of #66683 to release/9.0

KeyRingProvider.GetCurrentKeyRingCoreNew handles two states with one mechanism:

  • State A — stale ring exists. The cached ring expired but a previous value is still in the field. Refresh work is dispatched onto TaskScheduler.Default; every caller takes the early-return and immediately gets the stale ring. Nobody blocks.
  • State B — no ring at all (cold start). Same dispatch path runs, but now there is no stale ring to fall back on, so every caller falls through to existingTask.Wait() — pinning a thread-pool thread on a task that needs a free thread-pool thread to run. With a constrained pool (e.g. ThreadPool.SetMaxThreads(16, …) and 16 concurrent Protect calls — exactly the issue's repro), every worker is parked waiting for a worker that doesn't exist. The runtime's hill climber eventually injects extra threads (~118 s in the report) and the app recovers, but during the freeze nothing makes progress.

Fix is to split up cold-start (no stale cacheableKeyRing yet) and do the synchronous load on the first thread acquiring lock. Others will be waiting on lock as in the old implementation.

Related #54675
Fixes #66380

Customer Impact

A customer running ASP.NET Core on .NET 10 reported (#66380) that their app freezes on startup when several authenticated requests arrive concurrently. The app handles the first ~10 requests and then completely freezes for ~118 seconds with every thread‑pool thread stuck inside KeyRingProvider.GetCurrentKeyRingCoreNew, blocked in Task.Wait().

Regression?

  • Yes
  • No

Risk

There is a workaround that customers can apply today without waiting for this fix:

AppContext.SetSwitch("Microsoft.AspNetCore.DataProtection.KeyManagement.DisableAsyncKeyRingUpdate", true);
  • High
  • Medium
  • Low

Verification

  • Manual (required)
  • Automated

Packaging changes reviewed?

  • Yes
  • No
  • N/A

When servicing release/2.3

  • Make necessary changes in eng/PatchConfig.props

@DeagleGross DeagleGross self-assigned this May 19, 2026
Copilot AI review requested due to automatic review settings May 19, 2026 10:49
@DeagleGross DeagleGross added the area-dataprotection Includes: DataProtection label May 19, 2026
@dotnet-policy-service dotnet-policy-service Bot added this to the 9.0.x milestone May 19, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Hi @DeagleGross. If this is not a tell-mode PR, please make sure to follow the instructions laid out in the servicing process document.
Otherwise, please add tell-mode label.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR backports a fix to the DataProtection KeyRingProvider cache-refresh logic to prevent thread-pool starvation on cold start when there is no cached key ring yet (startup / first requests).

Changes:

  • Detect true cold-start (no cached key ring) and perform the initial key ring load synchronously on the calling thread while holding the lock, avoiding TaskScheduler.Default + Task.Wait() starvation.
  • Preserve the existing “stale ring exists” behavior by continuing to dispatch refresh asynchronously so callers can return the stale ring without blocking.
  • Add a regression unit test that asserts cold-start refresh work runs on the calling thread (invariant preventing starvation).
Show a summary per file
File Description
src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs Splits cold-start vs stale-refresh behavior so cold start loads inline under lock, avoiding thread-pool starvation.
src/DataProtection/DataProtection/test/Microsoft.AspNetCore.DataProtection.Tests/KeyManagement/KeyRingProviderTests.cs Adds a regression test asserting cold-start refresh runs on the calling thread.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

Comment thread src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs Outdated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-dataprotection Includes: DataProtection

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants