fix: skip SA_NODEFER when CHAIN_AT_START is active#1572
fix: skip SA_NODEFER when CHAIN_AT_START is active#1572
Conversation
SA_NODEFER (added in #1446) is incompatible with the CHAIN_AT_START signal handler strategy. When chaining to the runtime's signal handler (e.g. Mono), the runtime may reset the signal to SIG_DFL and re-raise. With SA_NODEFER the re-raised signal is delivered immediately, killing the process before our handler can regain control. Without SA_NODEFER, the re-raised signal is blocked during handler execution, allowing the runtime handler to return and sentry-native to proceed with crash capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
89200fa to
ab26763
Compare
Yeah, this makes sense, but is there no Mono test in the downstream integration tests? It is quite painful that the two runtimes differ so severely, given that we seem to lack any early warning for that particular config. Or did this only happen with .NET 10?
Not critical is an understatement. It is as critical as with all other use cases, when the signal actually comes from code that the Native SDK should handle. Wouldn't it be better to just mask the incoming signal before we invoke the handler at start? This way, a |
| g_sigaction.sa_flags = SA_SIGINFO | SA_ONSTACK; | ||
| if (g_backend_config.handler_strategy | ||
| != SENTRY_HANDLER_STRATEGY_CHAIN_AT_START) { | ||
| g_sigaction.sa_flags |= SA_NODEFER; | ||
| } |
There was a problem hiding this comment.
Tbh, this feels quite like the hammer.
Could you test an approach like this inside the chain-at-start block of process_ucontext()
sigset_t mask, old_mask;
sigemptyset(&mask);
sigaddset(&mask, uctx->signum);
sigprocmask(SIG_BLOCK, &mask, &old_mask);
invoke_signal_handler(uctx->signum, uctx->siginfo, (void *)uctx->user_context);
if (ip != get_instruction_pointer(uctx)
|| sp != get_stack_pointer(uctx)) {
// No need to restore the signal mask here: sigreturn will
// restore it from the saved ucontext.
return;
}
// once we know we own the signal, unmask again
sigprocmask(SIG_SETMASK, &old_mask, NULL);There was a problem hiding this comment.
Thanks for the suggestion. I tested it on an emulator and faced a couple of issues:
-
sigprocmaskseems to be intercepted bylibsigchainon Android. A rawsyscall(SYS_rt_sigprocmask, SIG_BLOCK, ...)works, though. -
unmasking delivers a pending
SIGSEGVand kills the processMono restores
SIG_DFLand re-raisesSIGSEGV:sigaction(SIGSEGV, &saved_default_handler, NULL); raise(SIGSEGV);
With the signal blocked, the
raise()creates a pending signal instead of being delivered immediately. When unmasked, the pendingSIGSEGVis delivered, but the handler is nowSIG_DFL, so it terminates the process before sentry-native can capture the crash.I guess leaving it unmasked would have more or less the same trade-off as the original
SA_NODEFERremoval approach, basically losing recursive crash detection. What if we would restore our handler, and consume the the pending signal withsigtimedwait?
There was a problem hiding this comment.
sigset_t mask, old_mask;
sigemptyset(&mask);
sigaddset(&mask, uctx->signum);
// bypass libsigchain on Android
syscall(SYS_rt_sigprocmask, SIG_BLOCK, &mask, &old_mask,
sizeof(sigset_t));
invoke_signal_handler(uctx->signum, uctx->siginfo, (void *)uctx->user_context);
if (ip != get_instruction_pointer(uctx)
|| sp != get_stack_pointer(uctx)) {
return;
}
// restore our handler
struct sigaction current;
sigaction(uctx->signum, NULL, ¤t);
if (current.sa_handler == SIG_DFL) {
sigaction(uctx->signum, &g_sigaction, NULL);
}
// consume pending signal
struct timespec timeout = { 0, 0 };
sigtimedwait(&mask, NULL, &timeout);
// unmask
syscall(SYS_rt_sigprocmask, SIG_SETMASK, &old_mask, NULL,
sizeof(sigset_t));
SA_NODEFER(added in #1446) breaks theCHAIN_AT_STARTsignal handler strategy, which we are trying to take into use in Sentry .NET for Android (getsentry/sentry-dotnet#4676).When
CHAIN_AT_STARTis active, sentry-native chains to the runtime's previous signal handler before processing the crash. Mono'smono_handle_native_crashresets the crashing signal toSIG_DFLand re-raises it as part of its crash handling flow. WithSA_NODEFER, the re-raised signal is delivered immediately — hittingSIG_DFLand killing the process before sentry-native can regain control and capture the crash.Without
SA_NODEFER, the re-raised signal is blocked while the handler is still running, so Mono's handler returns normally and sentry-native proceeds with crash capture.The fix
Only set
SA_NODEFERwhenCHAIN_AT_STARTis not active. Recursive crash detection (the reasonSA_NODEFERwas added) is not critical forCHAIN_AT_STARTbecause:process_ucontextbetween the runtime handler returning and the dispatch to the handler threadHow it was found
Investigated on Android x86_64 with the fresh new .NET 10.0.4 release. Used
straceto tracert_sigactioncalls and observed that after sentry-native called Mono's SIGSEGV handler viainvoke_signal_handler, a second SIGSEGV was delivered immediately (due toSA_NODEFER) and went toSIG_DFL, killing the process. RemovingSA_NODEFERforCHAIN_AT_STARTallows Mono's handler to return and sentry-native to capture the crash successfully.