Skip to content

Conversation

@felixweilbach
Copy link

Summary:
ThreadPool gets stored in a static variable here extension/threadpool/threadpool.cpp:146

This means the destructor of ThreadPool will be run when the process exits or a DLL containing this code unloads.

While working with ExecuTorch I experienced a deadlock during unloading our DLL (which contained ExecuTorch) at runtime. This was caused by the pthreadpool_destroy function pthreadpool/src/windows.c:366 waiting forever on the worker threads.

Why this is happening exactly is unclear to me. It is likely a race condition inside Windows Parallel Loader (https://blogs.blackberry.com/en/2017/10/windows-10-parallel-loading-breakdown) as I could see its functions in the stack trace of the stuck worker threads after they returned from their main function.

The issue was mitigated on my side by calling executorch::extension::threadpool::get_threadpool()->_unsafe_reset_threadpool(0); before unloading the DLL.

This is just a workaround. I think a proper fix would be to rework the ThreadPool singleton and allow for explicit termination of it.

Differential Revision: D89889628

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16416

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 1 Unrelated Failure

As of commit 2d43467 with merge base daf93a1 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 30, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 30, 2025

@felixweilbach has exported this pull request. If you are a Meta employee, you can view the originating Diff in D89889628.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary:

ThreadPool gets stored in a static variable here extension/threadpool/threadpool.cpp:146

This means the destructor of ThreadPool will be run when the process exits or a DLL containing this code unloads.

While working with ExecuTorch I experienced a deadlock during unloading our DLL (which contained ExecuTorch) at runtime. This was caused by the pthreadpool_destroy function pthreadpool/src/windows.c:366 waiting forever on the worker threads.

Why this is happening exactly is unclear to me. It is likely a race condition inside Windows Parallel Loader (https://blogs.blackberry.com/en/2017/10/windows-10-parallel-loading-breakdown) as I could see its functions in the stack trace of the stuck worker threads after they returned from their main function.

The issue was mitigated on my side by calling `executorch::extension::threadpool::get_threadpool()->_unsafe_reset_threadpool(0);` before unloading the DLL.

This is just a workaround. I think a proper fix would be to rework the ThreadPool singleton and allow for explicit termination of it.

Differential Revision: D89889628
Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants