-
-
Notifications
You must be signed in to change notification settings - Fork 200
(POC) feat: Sentry native crash backend #1433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
mujacica
commented
Oct 29, 2025
- Sentry native crash backend
- Out-of-process daemon/handler
- MacOS/Android/Windows/Linux support
- Integration with external crash reporter
- IPC/SHM/Signaling multi-platform implementation
- Minidump writers for all platforms
- In-process option (process_crash function)
- Different options for Minidump sizes
- Full sentry-codebase integration
- Sentry logger integration
- Sentry debug-flags integration
|
@sentry review |
1 similar comment
|
@sentry review |
Debug symbols are not required for basic E2E testing - the tests verify crash events are received, not symbolication quality. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add (void)thread_idx to suppress C4100 warning on Windows where the parameter is only used in macOS-specific code paths. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The debug_id for ELF modules must be converted to little-endian GUID format for Sentry symbolication. Apply the same byte swapping as sentry_modulefinder_linux.c does. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
On Linux, the signal handler only captures the crashing thread because opendir/readdir aren't signal-safe. Add daemon-side thread enumeration from /proc/<pid>/task to capture all thread IDs for the native event. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add enable-logs and capture-log arguments to E2E tests so structured logs are captured and sent with crash events. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix module capture from /proc/maps to correctly handle PIE binaries: - Capture first mapping for each file regardless of permissions - Calculate base address by subtracting file offset from mapping start - This ensures modules are found even when base mapping is r-- not r-x - Fix ARM64 address validation in is_valid_code_addr(): - x86_64 user space is below 0x00007FFFFFFFFFFF - ARM64 user space can use addresses like 0xAAAA_xxxx with ASLR - Only reject kernel space addresses (>= 0xFFFF_0000_0000_0000) - Add Windows module and thread capture for native mode: - EnumProcessModules for module enumeration - CreateToolhelp32Snapshot for thread enumeration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explicit casts when passing stack_size (uint64_t) to sentry_malloc and ReadProcessMemory on 32-bit Windows where size_t is 32-bit. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The default 20 attempts with 5 second intervals (100s total) was too short for some events to appear in Sentry. Increased to 100 attempts with 6 second intervals (600s total) for more reliable E2E tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract PE TimeDateStamp from module files on disk and include
code_id (timestamp + size) in debug_meta images. This helps Sentry
identify Windows PE modules even without full PDB debug info.
The code_id format is "{TimeDateStamp:08X}{SizeOfImage:x}" which
matches the Windows symbol server convention.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Start stack capture from minimum of SP and FP (FP might be below SP) - Add debug logging for SP, FP, and stack start values - Log GetLastError() when ReadProcessMemory or OpenProcess fails Note: Frame pointer-based unwinding on Windows x64/ARM64 is unreliable since compilers often don't use frame pointers. A proper fix would use StackWalk64 from dbghelp.dll, but this improves diagnostics for now. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace frame pointer-based stack walking with StackWalk64 from dbghelp.dll for reliable out-of-process stack unwinding on Windows x64/ARM64. Key changes: - Add walk_stack_with_dbghelp() function for out-of-process unwinding - Use custom ReadMemoryRoutine callback for cross-process memory access - Initialize dbghelp with SymInitialize() for proper PE unwind info access - Works like the inproc backend but for the crashed process This fixes the issue where Windows crash events only had 1 frame because frame pointer-based unwinding doesn't work reliably on x64/ARM64 where compilers often optimize away frame pointers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move dbghelp.h include and forward declaration to top of file so the function is declared before it's called in build_stacktrace_for_thread. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…hreads - Add enrich_frame_with_module_info() to set package and image_addr on frames - Update Linux/Windows thread loops to build stacktraces for all threads - Use thread-specific context for Windows StackWalk64 calls - Fix macOS stack file deletion timing to preserve files for native stacktrace Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Return null stacktrace when frames array is empty (Sentry rejects empty frames) - Only add stacktrace to thread if it contains frames - Use thread-specific ucontext_t for Linux register extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove unused thread_id variable from register extraction section (the StackWalk64 section already has its own walk_thread_id). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The module capture from /proc/pid/maps was only using the size of the first memory mapping for each module. This caused frame-to-module matching to fail for addresses in later segments (code, data) since those addresses fell outside the incorrectly small module bounds. For example, a PIE binary might have: - First mapping: 0x5500-0x5510 (rodata, offset=0) → size=0x10 - Code mapping: 0x5510-0x5590 (code, offset=0x10) - Data mapping: 0x5590-0x55a0 (data, offset=0x90) Previously we'd set size=0x10, so code addresses like 0x5540 would not match the module. Now we track all mappings and extend the size to cover from base_address to the maximum end address of all segments. This fixes the issue where Linux native backend crashes showed <unknown> for module names in Sentry, while Windows and macOS worked correctly (they use platform APIs that return accurate sizes). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The mod->uuid field is uint8_t[] with 1-byte alignment, but GUID structure requires 4-byte alignment. Casting directly to GUID* could cause undefined behavior or incorrect reads on some platforms. Use memcpy to copy bytes into a properly aligned GUID struct before passing to sentry__uuid_from_native(). This fix ensures debug_id is correctly formatted, allowing Sentry to match frames to debug images properly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add debug logging to crash daemon: - Log module count before adding debug_meta - Warn when no modules are captured - Log when frames can't be matched to any module Add diagnostic output to E2E tests: - Add get_debug_meta_from_event() helper function - Print debug_meta.images info (count, addresses, sizes) - Print frame info (instruction addresses, symbolication status) This helps diagnose why Windows shows "unknown problem" while Linux shows "missing debug files" - the difference indicates frames aren't being matched to debug images on Windows. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add debug logging to crash daemon: - Log module count before adding debug_meta - Warn when no modules are captured - Log when frames can't be matched to any module Add diagnostic output to E2E tests: - Add get_debug_meta_from_event() helper function - Print debug_meta.images info (count, addresses, sizes) - Print frame info (instruction addresses, symbolication status) This helps diagnose why Windows shows "unknown problem" while Linux shows "missing debug files" - the difference indicates frames aren't being matched to debug images on Windows. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@sentry review |
The native backend on Windows was causing symbolicator to fail with "internal server error" because it was missing the debug_file field that minidump events include. This prevented proper symbolication. Changes: - Add pdb_name field to sentry_module_info_t struct - Extract PDB filename from PE CodeView debug directory - Set debug_file field on PE images in native crash events - Fix code_id format to use lowercase hex (matching minidump format) The debug_file field is essential for Sentry's symbolicator to know where to look for PDB files when symbolicating Windows crash events. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
d46d50a to
d38b5e1
Compare
The native backend on Windows was causing symbolicator to fail with "internal server error" because: 1. Missing debug_file field that tells symbolicator where to find PDBs 2. code_id format was wrong - timestamp needs 8-digit zero-padding Changes: - Add pdb_name field to sentry_module_info_t struct (Windows-only) - Extract PDB filename from PE CodeView debug directory - Set debug_file field on PE images in native crash events - Fix code_id format: use %08x%x (8-digit padded timestamp + size) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| // SP points to the top of stack (lowest used address). | ||
| // Return addresses and saved registers are at addresses >= SP. | ||
| // We need to capture from SP *upward* for stack unwinding to work. | ||
| const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE / 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The MAX_STACK_SIZE for macOS minidumps is hardcoded to 64KB, while the signal handler captures up to 512KB, leading to truncated stack traces.
Severity: HIGH
Suggested Fix
In src/backends/native/minidump/sentry_minidump_macos.c, change the definition of MAX_STACK_SIZE to use the full captured size. For example, change const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE / 8; to const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE; to align with the signal handler's behavior and other platforms.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/backends/native/minidump/sentry_minidump_macos.c#L531
Potential issue: In `sentry_minidump_macos.c`, the `MAX_STACK_SIZE` for stack capture is
set to `SENTRY_CRASH_MAX_STACK_CAPTURE / 8`, which evaluates to 64KB. However, the
signal handler in `sentry_crash_handler.c` is configured to capture up to
`SENTRY_CRASH_MAX_STACK_CAPTURE` (512KB). This discrepancy causes the minidump writer to
discard most of the captured stack data. As a result, applications on macOS, especially
those with deep call stacks, will have incomplete stack traces in their crash reports,
hindering debugging efforts. This is inconsistent with the Linux implementation which
uses a much larger stack size.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| ptr += ((nhdr->n_namesz + 3) & ~3); // Align to 4 bytes | ||
| size_t len = nhdr->n_descsz < max_len ? nhdr->n_descsz | ||
| : max_len; | ||
| memcpy(build_id, ptr, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ELF note parsing bounds check ignores alignment padding
Low Severity
The bounds check at line 839 uses unaligned sizes (nhdr->n_namesz + nhdr->n_descsz) but line 846 advances ptr by the aligned name size (((nhdr->n_namesz + 3) & ~3)). This mismatch means a malformed ELF file with small n_namesz requiring alignment padding could pass the bounds check but then cause a buffer over-read when copying n_descsz bytes at line 849. For example, with n_namesz=1, n_descsz=10, and 11 remaining bytes: the check passes (1+10≤11), but after aligned advance of 4 bytes, only 7 bytes remain for a 10-byte read.
| } | ||
|
|
||
| ptr += cmd->cmdsize; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mach-O load command parsing lacks bounds validation
Low Severity
The Mach-O load command parsing loop iterates up to header.ncmds times without verifying that ptr remains within commands_buf. Before reading cmd->cmd and cmd->cmdsize at line 253-255, there's no check that ptr + sizeof(struct load_command) is within bounds. Additionally, ptr += cmd->cmdsize at line 262 could advance past the buffer if cmdsize is corrupted in a malformed Mach-O file. A malformed binary could cause out-of-bounds memory reads.
| minidump_module_t *mdmodule = &module_list->modules[i]; | ||
| memset(mdmodule, 0, sizeof(*mdmodule)); | ||
|
|
||
| const sentry_module_info_t *module = &writer->crash_ctx->modules[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing bounds check for crash context array indices
Medium Severity
The minidump writers use module_count and num_threads from the crash context to iterate over fixed-size arrays (modules[SENTRY_CRASH_MAX_MODULES] with 512 elements and threads[SENTRY_CRASH_MAX_THREADS] with 256 elements), but never validate these counts against the array bounds. If the crash context contains values exceeding these limits (due to signal handler bugs or shared memory corruption), the code performs out-of-bounds array access. For example, macOS write_module_list_stream accesses writer->crash_ctx->modules[i] where i can reach module_count - 1 without checking if it exceeds 511.
Additional Locations (2)
| uint64_t count; | ||
| minidump_rva_t base_rva; // All memory starts here | ||
| minidump_memory64_descriptor_t ranges[]; // Variable length | ||
| } PACKED_ATTR minidump_memory64_list_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory64 list base_rva uses wrong field size
Low Severity
The minidump_memory64_list_t structure defines base_rva as minidump_rva_t (32-bit uint32_t), but per Microsoft's minidump specification, MINIDUMP_MEMORY64_LIST.BaseRva is RVA64 which is 64-bit. This format mismatch would produce invalid minidumps if this type were used to write Memory64 streams. While currently unused in the implementation, the definition is incorrect and could cause data corruption if used in the future.