Skip to content

fix: skip SA_NODEFER when CHAIN_AT_START is active#1572

Open
jpnurmi wants to merge 2 commits intomasterfrom
jpnurmi/fix/sa-nodefer-chain-at-start
Open

fix: skip SA_NODEFER when CHAIN_AT_START is active#1572
jpnurmi wants to merge 2 commits intomasterfrom
jpnurmi/fix/sa-nodefer-chain-at-start

Conversation

@jpnurmi
Copy link
Collaborator

@jpnurmi jpnurmi commented Mar 11, 2026

SA_NODEFER (added in #1446) breaks the CHAIN_AT_START signal handler strategy, which we are trying to take into use in Sentry .NET for Android (getsentry/sentry-dotnet#4676).

When CHAIN_AT_START is active, sentry-native chains to the runtime's previous signal handler before processing the crash. Mono's mono_handle_native_crash resets the crashing signal to SIG_DFL and re-raises it as part of its crash handling flow. With SA_NODEFER, the re-raised signal is delivered immediately — hitting SIG_DFL and killing the process before sentry-native can regain control and capture the crash.

Without SA_NODEFER, the re-raised signal is blocked while the handler is still running, so Mono's handler returns normally and sentry-native proceeds with crash capture.

The fix

Only set SA_NODEFER when CHAIN_AT_START is not active. Recursive crash detection (the reason SA_NODEFER was added) is not critical for CHAIN_AT_START because:

  • The handler thread architecture means most crash processing happens off the signal handler thread
  • The only vulnerable window is the small amount of code in process_ucontext between the runtime handler returning and the dispatch to the handler thread
  • In that narrow edge case, the process hangs instead of cleanly bailing out — arguably better than never capturing crashes at all

How it was found

Investigated on Android x86_64 with the fresh new .NET 10.0.4 release. Used strace to trace rt_sigaction calls and observed that after sentry-native called Mono's SIGSEGV handler via invoke_signal_handler, a second SIGSEGV was delivered immediately (due to SA_NODEFER) and went to SIG_DFL, killing the process. Removing SA_NODEFER for CHAIN_AT_START allows Mono's handler to return and sentry-native to capture the crash successfully.

jpnurmi and others added 2 commits March 11, 2026 13:27
SA_NODEFER (added in #1446) is incompatible with the CHAIN_AT_START
signal handler strategy. When chaining to the runtime's signal handler
(e.g. Mono), the runtime may reset the signal to SIG_DFL and re-raise.
With SA_NODEFER the re-raised signal is delivered immediately, killing
the process before our handler can regain control.

Without SA_NODEFER, the re-raised signal is blocked during handler
execution, allowing the runtime handler to return and sentry-native
to proceed with crash capture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jpnurmi jpnurmi force-pushed the jpnurmi/fix/sa-nodefer-chain-at-start branch from 89200fa to ab26763 Compare March 11, 2026 12:40
@jpnurmi jpnurmi requested a review from supervacuus March 11, 2026 12:52
@supervacuus
Copy link
Collaborator

Mono's mono_handle_native_crash resets the crashing signal to SIG_DFL and re-raises it as part of its crash handling flow. With SA_NODEFER, the re-raised signal is delivered immediately

Yeah, this makes sense, but is there no Mono test in the downstream integration tests? It is quite painful that the two runtimes differ so severely, given that we seem to lack any early warning for that particular config. Or did this only happen with .NET 10?

Recursive crash detection (the reason SA_NODEFER was added) is not critical for CHAIN_AT_START because:

Not critical is an understatement. It is as critical as with all other use cases, when the signal actually comes from code that the Native SDK should handle.

Wouldn't it be better to just mask the incoming signal before we invoke the handler at start? This way, a raise inside the .NET handler is blocked until we sigreturn, and we retain the recursive reentrancy semantics of our handler (which are critical because we execute user-provided code from the handler). Disable the unmask once we know the .NET handler didn't feel responsible for the signal.

Comment on lines +515 to +519
g_sigaction.sa_flags = SA_SIGINFO | SA_ONSTACK;
if (g_backend_config.handler_strategy
!= SENTRY_HANDLER_STRATEGY_CHAIN_AT_START) {
g_sigaction.sa_flags |= SA_NODEFER;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, this feels quite like the hammer.

Could you test an approach like this inside the chain-at-start block of process_ucontext()

sigset_t mask, old_mask;
sigemptyset(&mask);
sigaddset(&mask, uctx->signum);
sigprocmask(SIG_BLOCK, &mask, &old_mask);

invoke_signal_handler(uctx->signum, uctx->siginfo, (void *)uctx->user_context);

if (ip != get_instruction_pointer(uctx)
    || sp != get_stack_pointer(uctx)) {
    // No need to restore the signal mask here: sigreturn will 
    // restore it from the saved ucontext.
    return;
}

// once we know we own the signal, unmask again
sigprocmask(SIG_SETMASK, &old_mask, NULL);

Copy link
Collaborator Author

@jpnurmi jpnurmi Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I tested it on an emulator and faced a couple of issues:

  1. sigprocmask seems to be intercepted by libsigchain on Android. A raw syscall(SYS_rt_sigprocmask, SIG_BLOCK, ...) works, though.

  2. unmasking delivers a pending SIGSEGV and kills the process

    Mono restores SIG_DFL and re-raises SIGSEGV:

    sigaction(SIGSEGV, &saved_default_handler, NULL);
    raise(SIGSEGV);

    With the signal blocked, the raise() creates a pending signal instead of being delivered immediately. When unmasked, the pending SIGSEGV is delivered, but the handler is now SIG_DFL, so it terminates the process before sentry-native can capture the crash.

    I guess leaving it unmasked would have more or less the same trade-off as the original SA_NODEFER removal approach, basically losing recursive crash detection. What if we would restore our handler, and consume the the pending signal with sigtimedwait?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigset_t mask, old_mask;
sigemptyset(&mask);
sigaddset(&mask, uctx->signum);
// bypass libsigchain on Android
syscall(SYS_rt_sigprocmask, SIG_BLOCK, &mask, &old_mask,
    sizeof(sigset_t));

invoke_signal_handler(uctx->signum, uctx->siginfo, (void *)uctx->user_context);

if (ip != get_instruction_pointer(uctx)
    || sp != get_stack_pointer(uctx)) {
    return;
}

// restore our handler
struct sigaction current;
sigaction(uctx->signum, NULL, &current);
if (current.sa_handler == SIG_DFL) {
    sigaction(uctx->signum, &g_sigaction, NULL);
}

// consume pending signal
struct timespec timeout = { 0, 0 };
sigtimedwait(&mask, NULL, &timeout);

// unmask
syscall(SYS_rt_sigprocmask, SIG_SETMASK, &old_mask, NULL,
    sizeof(sigset_t));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants