Description
A .NET 10 process on Linux can hang permanently when the DOTNET_PerfMapEnabled code path is active (env var or runtime-enabled via the Diagnostics Server IPC, e.g. dotnet-trace --enable-perfmap). The hang is a three-way deadlock between the GC suspension machinery, CodeFragmentHeap::m_Lock, and PerfMap::s_csPerfMap, triggered while a virtual-call-stub resolve worker is generating a new resolve stub.
The two locks are taken in the wrong order with respect to GC-mode handling:
CodeFragmentHeap::m_Lock is constructed with CRST_UNSAFE_ANYMODE, so acquiring it does not toggle the calling thread to preemptive GC.
PerfMap::s_csPerfMap is constructed with CRST_DEFAULT (no flags), so acquiring it does toggle a cooperative thread to preemptive via Thread::RareDisablePreemptiveGC, which can block while a GC suspension is in progress.
The result: a cooperative thread holding m_Lock calls into PerfMap, gets stuck inside RareDisablePreemptiveGC waiting for the GC to finish — but the GC is waiting for that same thread to reach a safe point, and every other thread that needs to allocate a stub is queued behind the held m_Lock.
Reproduction Steps
Likely synthetic repro recipe:
- Run any .NET 10 Linux x64 workload with PerfMap enabled (
DOTNET_PerfMapEnabled=1, or attach with dotnet-trace collect -p <pid> --enable-perfmap).
- Drive heavy virtual-call-stub creation (large numbers of polymorphic interface call sites being warmed up concurrently) while also driving allocation that forces frequent GCs.
- Eventually a cooperative thread is suspended inside
PerfMap::LogStubs -> s_csPerfMap while holding CodeFragmentHeap::m_Lock, and the process deadlocks.
Crash dumps available to debug this directly though.
Expected behavior
PerfMap logging on a stub-allocation path must not be able to deadlock with the GC.
Not sure the actual fix, as CodeFragmentHeap::m_Lock was correctly "default" leaving us in coop mode because this was meant as a quick call, and the work behind PerfMap::s_csPerfMap is heavyweight, meaning we should let the GC run. This change makes ResolveWorkerAsmStub a more heavyweight function which may need to swap to preemptive mode, or possibly the calls into PerfMap need to be lighter weight. Or maybe I'm overthinking it.
Actual behavior
The GC thread is suspending the world and waiting on a cooperative thread to reach a safe point.
Thread 60 is preemptive, was about to acquire s_csPerfMap from inside PerfMap::LogStubs. The default-flagged Crst is toggling it to preemptive via RareDisablePreemptiveGC, where it now sits indefinitely because the GC is already trying to suspend it:
02 libcoreclr!GCEvent::Impl::Wait+0xd2 unix/events.cpp:179
03 libcoreclr!Thread::RareDisablePreemptiveGC+0x14e threadsuspend.cpp:2223
04 libcoreclr!CrstBase::AcquireLock+0xc crst.h:174
05 libcoreclr!CrstBase::CrstHolder::CrstHolder+0xc crosscomp.h:349
06 libcoreclr!PerfMap::LogStubs+0x19a perfmap.cpp:462
07 libcoreclr!CodeFragmentHeap::RealAllocAlignedMem+0x122
08 libcoreclr!VirtualCallStubManager::GenerateResolveStub+0xb6
09 libcoreclr!VirtualCallStubManager::ResolveWorker+0x8a8
0a libcoreclr!VSD_ResolveWorker+0x2e7
0b libcoreclr!ResolveWorkerAsmStub+0x71
Thread 60 is still holding CodeFragmentHeap::m_Lock from frame 07.
Thread 61 is cooperative, blocked at trying to acquire m_Lock (held by thread 60):
00 libc_so!__lll_lock_wait_private+0x90
01 libc_so!pthread_mutex_lock+0x167
02 libcoreclr!CrstBase::Enter+0x94 crst.cpp:265
03 libcoreclr!CrstBase::AcquireLock+0x5 crst.h:174
04 libcoreclr!CrstBase::CrstHolder::CrstHolder+0x5 crosscomp.h:349
05 libcoreclr!CodeFragmentHeap::RealAllocAlignedMem+0x2a
06 libcoreclr!VirtualCallStubManager::GenerateResolveStub+0xb6
07 libcoreclr!VirtualCallStubManager::ResolveWorker+0x8a8
08 libcoreclr!VSD_ResolveWorker+0x2e7
09 libcoreclr!ResolveWorkerAsmStub+0x71
Because thread 61 is Cooperative and the GC is trying to suspend the runtime, the GC thread also waits on this thread to either reach a safe point or go preemptive — which it cannot do because it is parked inside pthread_mutex_lock.
Cycle:
GC -> waits for cooperative threads to suspend
T60 -> preemptive, in RareDisablePreemptiveGC waiting for GC to finish
(holds CodeFragmentHeap::m_Lock)
T61 -> cooperative, blocked on CodeFragmentHeap::m_Lock held by T60
(and the GC is waiting on T61 to suspend)
Regression?
Likely a regression from #113943.
Known Workarounds
Two environment knobs that, set together, prevent the bad path from being exercised:
DOTNET_EnableDiagnostics_IPC=0
DOTNET_PerfMapEnabled=0
DOTNET_PerfMapEnabled=0 ensures PerfMap::s_enabled stays false at startup, so PerfMap::LogStubs early-outs and never touches s_csPerfMap.
DOTNET_EnableDiagnostics_IPC=0 shuts off the Diagnostics Server, which is otherwise able to enable PerfMap at runtime via the ds_rt_enable_perfmap IPC handler regardless of the env var (dotnet-trace --enable-perfmap, dotnet-monitor, third-party APM agents).
Both are needed: setting only one leaves the other path open.
Configuration
Likely not Linux specific, but that's where I debugged it.
Description
A .NET 10 process on Linux can hang permanently when the
DOTNET_PerfMapEnabledcode path is active (env var or runtime-enabled via the Diagnostics Server IPC, e.g.dotnet-trace --enable-perfmap). The hang is a three-way deadlock between the GC suspension machinery,CodeFragmentHeap::m_Lock, andPerfMap::s_csPerfMap, triggered while a virtual-call-stub resolve worker is generating a new resolve stub.The two locks are taken in the wrong order with respect to GC-mode handling:
CodeFragmentHeap::m_Lockis constructed withCRST_UNSAFE_ANYMODE, so acquiring it does not toggle the calling thread to preemptive GC.PerfMap::s_csPerfMapis constructed withCRST_DEFAULT(no flags), so acquiring it does toggle a cooperative thread to preemptive viaThread::RareDisablePreemptiveGC, which can block while a GC suspension is in progress.The result: a cooperative thread holding
m_Lockcalls into PerfMap, gets stuck insideRareDisablePreemptiveGCwaiting for the GC to finish — but the GC is waiting for that same thread to reach a safe point, and every other thread that needs to allocate a stub is queued behind the heldm_Lock.Reproduction Steps
Likely synthetic repro recipe:
DOTNET_PerfMapEnabled=1, or attach withdotnet-trace collect -p <pid> --enable-perfmap).PerfMap::LogStubs->s_csPerfMapwhile holdingCodeFragmentHeap::m_Lock, and the process deadlocks.Crash dumps available to debug this directly though.
Expected behavior
PerfMap logging on a stub-allocation path must not be able to deadlock with the GC.
Not sure the actual fix, as
CodeFragmentHeap::m_Lockwas correctly "default" leaving us in coop mode because this was meant as a quick call, and the work behindPerfMap::s_csPerfMapis heavyweight, meaning we should let the GC run. This change makesResolveWorkerAsmStuba more heavyweight function which may need to swap to preemptive mode, or possibly the calls into PerfMap need to be lighter weight. Or maybe I'm overthinking it.Actual behavior
The GC thread is suspending the world and waiting on a cooperative thread to reach a safe point.
Thread 60 is preemptive, was about to acquire
s_csPerfMapfrom insidePerfMap::LogStubs. The default-flagged Crst is toggling it to preemptive viaRareDisablePreemptiveGC, where it now sits indefinitely because the GC is already trying to suspend it:Thread 60 is still holding
CodeFragmentHeap::m_Lockfrom frame 07.Thread 61 is cooperative, blocked at trying to acquire
m_Lock(held by thread 60):Because thread 61 is
Cooperativeand the GC is trying to suspend the runtime, the GC thread also waits on this thread to either reach a safe point or go preemptive — which it cannot do because it is parked insidepthread_mutex_lock.Cycle:
Regression?
Likely a regression from #113943.
Known Workarounds
Two environment knobs that, set together, prevent the bad path from being exercised:
DOTNET_PerfMapEnabled=0ensuresPerfMap::s_enabledstays false at startup, soPerfMap::LogStubsearly-outs and never touchess_csPerfMap.DOTNET_EnableDiagnostics_IPC=0shuts off the Diagnostics Server, which is otherwise able to enable PerfMap at runtime via theds_rt_enable_perfmapIPC handler regardless of the env var (dotnet-trace --enable-perfmap,dotnet-monitor, third-party APM agents).Both are needed: setting only one leaves the other path open.
Configuration
Likely not Linux specific, but that's where I debugged it.