(POC) feat: Sentry native crash backend #1433

mujacica · 2025-10-29T19:54:05Z

Sentry native crash backend
Out-of-process daemon/handler
MacOS/Android/Windows/Linux support
Integration with external crash reporter
IPC/SHM/Signaling multi-platform implementation
Minidump writers for all platforms
In-process option (process_crash function)
Different options for Minidump sizes
Full sentry-codebase integration
Sentry logger integration
Sentry debug-flags integration

mujacica · 2026-01-12T12:27:56Z

mujacica · 2026-01-13T17:54:54Z

src/backends/native/sentry_crash_daemon.c

Debug symbols are not required for basic E2E testing - the tests verify crash events are received, not symbolication quality. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add (void)thread_idx to suppress C4100 warning on Windows where the parameter is only used in macOS-specific code paths. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The debug_id for ELF modules must be converted to little-endian GUID format for Sentry symbolication. Apply the same byte swapping as sentry_modulefinder_linux.c does. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

On Linux, the signal handler only captures the crashing thread because opendir/readdir aren't signal-safe. Add daemon-side thread enumeration from /proc/<pid>/task to capture all thread IDs for the native event. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add enable-logs and capture-log arguments to E2E tests so structured logs are captured and sent with crash events. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix module capture from /proc/maps to correctly handle PIE binaries: - Capture first mapping for each file regardless of permissions - Calculate base address by subtracting file offset from mapping start - This ensures modules are found even when base mapping is r-- not r-x - Fix ARM64 address validation in is_valid_code_addr(): - x86_64 user space is below 0x00007FFFFFFFFFFF - ARM64 user space can use addresses like 0xAAAA_xxxx with ASLR - Only reject kernel space addresses (>= 0xFFFF_0000_0000_0000) - Add Windows module and thread capture for native mode: - EnumProcessModules for module enumeration - CreateToolhelp32Snapshot for thread enumeration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add explicit casts when passing stack_size (uint64_t) to sentry_malloc and ReadProcessMemory on 32-bit Windows where size_t is 32-bit. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The default 20 attempts with 5 second intervals (100s total) was too short for some events to appear in Sentry. Increased to 100 attempts with 6 second intervals (600s total) for more reliable E2E tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Extract PE TimeDateStamp from module files on disk and include code_id (timestamp + size) in debug_meta images. This helps Sentry identify Windows PE modules even without full PDB debug info. The code_id format is "{TimeDateStamp:08X}{SizeOfImage:x}" which matches the Windows symbol server convention. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Start stack capture from minimum of SP and FP (FP might be below SP) - Add debug logging for SP, FP, and stack start values - Log GetLastError() when ReadProcessMemory or OpenProcess fails Note: Frame pointer-based unwinding on Windows x64/ARM64 is unreliable since compilers often don't use frame pointers. A proper fix would use StackWalk64 from dbghelp.dll, but this improves diagnostics for now. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace frame pointer-based stack walking with StackWalk64 from dbghelp.dll for reliable out-of-process stack unwinding on Windows x64/ARM64. Key changes: - Add walk_stack_with_dbghelp() function for out-of-process unwinding - Use custom ReadMemoryRoutine callback for cross-process memory access - Initialize dbghelp with SymInitialize() for proper PE unwind info access - Works like the inproc backend but for the crashed process This fixes the issue where Windows crash events only had 1 frame because frame pointer-based unwinding doesn't work reliably on x64/ARM64 where compilers often optimize away frame pointers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move dbghelp.h include and forward declaration to top of file so the function is declared before it's called in build_stacktrace_for_thread. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…hreads - Add enrich_frame_with_module_info() to set package and image_addr on frames - Update Linux/Windows thread loops to build stacktraces for all threads - Use thread-specific context for Windows StackWalk64 calls - Fix macOS stack file deletion timing to preserve files for native stacktrace Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Return null stacktrace when frames array is empty (Sentry rejects empty frames) - Only add stacktrace to thread if it contains frames - Use thread-specific ucontext_t for Linux register extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove unused thread_id variable from register extraction section (the StackWalk64 section already has its own walk_thread_id). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The module capture from /proc/pid/maps was only using the size of the first memory mapping for each module. This caused frame-to-module matching to fail for addresses in later segments (code, data) since those addresses fell outside the incorrectly small module bounds. For example, a PIE binary might have: - First mapping: 0x5500-0x5510 (rodata, offset=0) → size=0x10 - Code mapping: 0x5510-0x5590 (code, offset=0x10) - Data mapping: 0x5590-0x55a0 (data, offset=0x90) Previously we'd set size=0x10, so code addresses like 0x5540 would not match the module. Now we track all mappings and extend the size to cover from base_address to the maximum end address of all segments. This fixes the issue where Linux native backend crashes showed <unknown> for module names in Sentry, while Windows and macOS worked correctly (they use platform APIs that return accurate sizes). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The mod->uuid field is uint8_t[] with 1-byte alignment, but GUID structure requires 4-byte alignment. Casting directly to GUID* could cause undefined behavior or incorrect reads on some platforms. Use memcpy to copy bytes into a properly aligned GUID struct before passing to sentry__uuid_from_native(). This fix ensures debug_id is correctly formatted, allowing Sentry to match frames to debug images properly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add debug logging to crash daemon: - Log module count before adding debug_meta - Warn when no modules are captured - Log when frames can't be matched to any module Add diagnostic output to E2E tests: - Add get_debug_meta_from_event() helper function - Print debug_meta.images info (count, addresses, sizes) - Print frame info (instruction addresses, symbolication status) This helps diagnose why Windows shows "unknown problem" while Linux shows "missing debug files" - the difference indicates frames aren't being matched to debug images on Windows. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mujacica · 2026-01-16T16:37:38Z

@sentry review

src/backends/native/sentry_crash_ipc.c

The native backend on Windows was causing symbolicator to fail with "internal server error" because it was missing the debug_file field that minidump events include. This prevented proper symbolication. Changes: - Add pdb_name field to sentry_module_info_t struct - Extract PDB filename from PE CodeView debug directory - Set debug_file field on PE images in native crash events - Fix code_id format to use lowercase hex (matching minidump format) The debug_file field is essential for Sentry's symbolicator to know where to look for PDB files when symbolicating Windows crash events. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The native backend on Windows was causing symbolicator to fail with "internal server error" because: 1. Missing debug_file field that tells symbolicator where to find PDBs 2. code_id format was wrong - timestamp needs 8-digit zero-padding Changes: - Add pdb_name field to sentry_module_info_t struct (Windows-only) - Extract PDB filename from PE CodeView debug directory - Set debug_file field on PE images in native crash events - Fix code_id format: use %08x%x (8-digit padded timestamp + size) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/backends/native/minidump/sentry_minidump_format.h

src/backends/native/minidump/sentry_minidump_macos.c

sentry · 2026-01-19T12:06:16Z

src/backends/native/minidump/sentry_minidump_macos.c

+    // SP points to the top of stack (lowest used address).
+    // Return addresses and saved registers are at addresses >= SP.
+    // We need to capture from SP *upward* for stack unwinding to work.
+    const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE / 8;


Bug: The MAX_STACK_SIZE for macOS minidumps is hardcoded to 64KB, while the signal handler captures up to 512KB, leading to truncated stack traces.
_{Severity: HIGH}

Suggested Fix

In src/backends/native/minidump/sentry_minidump_macos.c, change the definition of MAX_STACK_SIZE to use the full captured size. For example, change const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE / 8; to const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE; to align with the signal handler's behavior and other platforms.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/backends/native/minidump/sentry_minidump_macos.c#L531 Potential issue: In `sentry_minidump_macos.c`, the `MAX_STACK_SIZE` for stack capture is set to `SENTRY_CRASH_MAX_STACK_CAPTURE / 8`, which evaluates to 64KB. However, the signal handler in `sentry_crash_handler.c` is configured to capture up to `SENTRY_CRASH_MAX_STACK_CAPTURE` (512KB). This discrepancy causes the minidump writer to discard most of the captured stack data. As a result, applications on macOS, especially those with deep call stacks, will have incomplete stack traces in their crash reports, hindering debugging efforts. This is inconsistent with the Linux implementation which uses a much larger stack size.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-19T12:13:35Z

src/backends/native/minidump/sentry_minidump_linux.c

+                        ptr += ((nhdr->n_namesz + 3) & ~3); // Align to 4 bytes
+                        size_t len = nhdr->n_descsz < max_len ? nhdr->n_descsz
+                                                              : max_len;
+                        memcpy(build_id, ptr, len);


ELF note parsing bounds check ignores alignment padding

Low Severity

The bounds check at line 839 uses unaligned sizes (nhdr->n_namesz + nhdr->n_descsz) but line 846 advances ptr by the aligned name size (((nhdr->n_namesz + 3) & ~3)). This mismatch means a malformed ELF file with small n_namesz requiring alignment padding could pass the bounds check but then cause a buffer over-read when copying n_descsz bytes at line 849. For example, with n_namesz=1, n_descsz=10, and 11 remaining bytes: the check passes (1+10≤11), but after aligned advance of 4 bytes, only 7 bytes remain for a 10-byte read.

cursor · 2026-01-19T12:13:35Z

src/backends/native/minidump/sentry_minidump_macos.c

+        }
+
+        ptr += cmd->cmdsize;
+    }


Mach-O load command parsing lacks bounds validation

Low Severity

The Mach-O load command parsing loop iterates up to header.ncmds times without verifying that ptr remains within commands_buf. Before reading cmd->cmd and cmd->cmdsize at line 253-255, there's no check that ptr + sizeof(struct load_command) is within bounds. Additionally, ptr += cmd->cmdsize at line 262 could advance past the buffer if cmdsize is corrupted in a malformed Mach-O file. A malformed binary could cause out-of-bounds memory reads.

cursor · 2026-01-19T12:13:35Z

src/backends/native/minidump/sentry_minidump_macos.c

+        minidump_module_t *mdmodule = &module_list->modules[i];
+        memset(mdmodule, 0, sizeof(*mdmodule));
+
+        const sentry_module_info_t *module = &writer->crash_ctx->modules[i];


Missing bounds check for crash context array indices

Medium Severity

The minidump writers use module_count and num_threads from the crash context to iterate over fixed-size arrays (modules[SENTRY_CRASH_MAX_MODULES] with 512 elements and threads[SENTRY_CRASH_MAX_THREADS] with 256 elements), but never validate these counts against the array bounds. If the crash context contains values exceeding these limits (due to signal handler bugs or shared memory corruption), the code performs out-of-bounds array access. For example, macOS write_module_list_stream accesses writer->crash_ctx->modules[i] where i can reach module_count - 1 without checking if it exceeds 511.

Additional Locations (2)

src/backends/native/minidump/sentry_minidump_linux.c#L1065-L1070

src/backends/native/minidump/sentry_minidump_macos.c#L669-L678

cursor · 2026-01-19T12:13:35Z

src/backends/native/minidump/sentry_minidump_format.h

+    uint64_t count;
+    minidump_rva_t base_rva; // All memory starts here
+    minidump_memory64_descriptor_t ranges[]; // Variable length
+} PACKED_ATTR minidump_memory64_list_t;


Memory64 list base_rva uses wrong field size

Low Severity

The minidump_memory64_list_t structure defines base_rva as minidump_rva_t (32-bit uint32_t), but per Microsoft's minidump specification, MINIDUMP_MEMORY64_LIST.BaseRva is RVA64 which is 64-bit. This format mismatch would produce invalid minidumps if this type were used to write Memory64 streams. While currently unused in the implementation, the definition is incorrect and could cause data corruption if used in the future.

bitsandfoxes mentioned this pull request Nov 13, 2025

Custom Sentry Crash Reporter to replace vendored backends #1449

Open

sentry bot reviewed Jan 13, 2026

View reviewed changes

src/backends/native/sentry_crash_daemon.c Show resolved Hide resolved

mujacica added 26 commits January 15, 2026 10:33

Initial implementation

22cc593

Update build structure and daemon

c08fdb4

Updates for Windows support

14acb34

Fix logging

5edcb7d

Fix MacOs builds

02a6d71

Fix Warnings

5395f73

Test fixes

4951eda

Fix linux builds

9c0fd1b

Fix linux/mac modules

a71cf41

Fix concurency

18e5eb4

Fix build errors

1bd917c

Formatting

6a6e5c2

Fix 32bit builds

d5e589d

Fix format

6557ce8

Fix some issues

6a7dc03

Native tests need http/transport

931e568

Fix CMake

95728b6

Fix build

94f3250

Fix errors

d7ef2a6

Fix format

3958e80

Fix style

5d8ba3f

Fix some of the issues

da3e584

Fix format

1f9789c

Moooore fixes

203d113

Debug CI

866b4f0

More debug

f2719fc

mujacica and others added 20 commits January 15, 2026 23:20

Remove debug symbol upload from E2E workflow

9f1964c

Debug symbols are not required for basic E2E testing - the tests verify crash events are received, not symbolication quality. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix unused parameter warning on Windows

4a70066

Add (void)thread_idx to suppress C4100 warning on Windows where the parameter is only used in macOS-specific code paths. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix ELF debug_id byte swapping for Linux module capture

d17f0fa

The debug_id for ELF modules must be converted to little-endian GUID format for Sentry symbolication. Apply the same byte swapping as sentry_modulefinder_linux.c does. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix code style

31ee726

Enable structured logs in E2E crash tests

cf804f7

Add enable-logs and capture-log arguments to E2E tests so structured logs are captured and sent with crash events. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix Windows 32-bit compile warnings for uint64_t to size_t conversion

b83de19

Add explicit casts when passing stack_size (uint64_t) to sentry_malloc and ReadProcessMemory on 32-bit Windows where size_t is 32-bit. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix Windows build: add forward declaration for walk_stack_with_dbghelp

29dca8a

Move dbghelp.h include and forward declaration to top of file so the function is declared before it's called in build_stacktrace_for_thread. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix unused variable warning on Windows ClangCL build

8b3565d

Remove unused thread_id variable from register extraction section (the StackWalk64 section already has its own walk_thread_id). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

sentry bot reviewed Jan 16, 2026

View reviewed changes

src/backends/native/sentry_crash_ipc.c Show resolved Hide resolved

mujacica force-pushed the feat/sentry_native_backend branch from d46d50a to d38b5e1 Compare January 17, 2026 08:10

mujacica marked this pull request as ready for review January 17, 2026 11:50

cursor bot reviewed Jan 17, 2026

View reviewed changes

src/backends/native/minidump/sentry_minidump_format.h Outdated Show resolved Hide resolved

src/backends/native/minidump/sentry_minidump_macos.c Show resolved Hide resolved

More windows fixes + PR fixes

464a622

sentry bot reviewed Jan 19, 2026

View reviewed changes

cursor bot reviewed Jan 19, 2026

View reviewed changes

Uh oh!

(POC) feat: Sentry native crash backend #1433

Are you sure you want to change the base?

(POC) feat: Sentry native crash backend #1433

Uh oh!

Conversation

mujacica commented Oct 29, 2025

Uh oh!

mujacica commented Jan 12, 2026

Uh oh!

mujacica commented Jan 13, 2026

Uh oh!

Uh oh!

mujacica commented Jan 16, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sentry bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 19, 2026

Choose a reason for hiding this comment

ELF note parsing bounds check ignores alignment padding

Uh oh!

cursor bot Jan 19, 2026

Choose a reason for hiding this comment

Mach-O load command parsing lacks bounds validation

Uh oh!

cursor bot Jan 19, 2026

Choose a reason for hiding this comment

Missing bounds check for crash context array indices

Uh oh!

cursor bot Jan 19, 2026

Choose a reason for hiding this comment

Memory64 list base_rva uses wrong field size

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants