Skip to content

Conversation

@mujacica
Copy link
Contributor

  • Sentry native crash backend
  • Out-of-process daemon/handler
  • MacOS/Android/Windows/Linux support
  • Integration with external crash reporter
  • IPC/SHM/Signaling multi-platform implementation
  • Minidump writers for all platforms
  • In-process option (process_crash function)
  • Different options for Minidump sizes
  • Full sentry-codebase integration
  • Sentry logger integration
  • Sentry debug-flags integration

@mujacica
Copy link
Contributor Author

@sentry review

1 similar comment
@mujacica
Copy link
Contributor Author

@sentry review

mujacica and others added 20 commits January 15, 2026 23:20
Debug symbols are not required for basic E2E testing - the tests
verify crash events are received, not symbolication quality.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add (void)thread_idx to suppress C4100 warning on Windows where
the parameter is only used in macOS-specific code paths.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The debug_id for ELF modules must be converted to little-endian GUID
format for Sentry symbolication. Apply the same byte swapping as
sentry_modulefinder_linux.c does.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
On Linux, the signal handler only captures the crashing thread because
opendir/readdir aren't signal-safe. Add daemon-side thread enumeration
from /proc/<pid>/task to capture all thread IDs for the native event.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add enable-logs and capture-log arguments to E2E tests so structured
logs are captured and sent with crash events.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix module capture from /proc/maps to correctly handle PIE binaries:
  - Capture first mapping for each file regardless of permissions
  - Calculate base address by subtracting file offset from mapping start
  - This ensures modules are found even when base mapping is r-- not r-x

- Fix ARM64 address validation in is_valid_code_addr():
  - x86_64 user space is below 0x00007FFFFFFFFFFF
  - ARM64 user space can use addresses like 0xAAAA_xxxx with ASLR
  - Only reject kernel space addresses (>= 0xFFFF_0000_0000_0000)

- Add Windows module and thread capture for native mode:
  - EnumProcessModules for module enumeration
  - CreateToolhelp32Snapshot for thread enumeration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explicit casts when passing stack_size (uint64_t) to sentry_malloc
and ReadProcessMemory on 32-bit Windows where size_t is 32-bit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The default 20 attempts with 5 second intervals (100s total) was too
short for some events to appear in Sentry. Increased to 100 attempts
with 6 second intervals (600s total) for more reliable E2E tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract PE TimeDateStamp from module files on disk and include
code_id (timestamp + size) in debug_meta images. This helps Sentry
identify Windows PE modules even without full PDB debug info.

The code_id format is "{TimeDateStamp:08X}{SizeOfImage:x}" which
matches the Windows symbol server convention.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Start stack capture from minimum of SP and FP (FP might be below SP)
- Add debug logging for SP, FP, and stack start values
- Log GetLastError() when ReadProcessMemory or OpenProcess fails

Note: Frame pointer-based unwinding on Windows x64/ARM64 is unreliable
since compilers often don't use frame pointers. A proper fix would use
StackWalk64 from dbghelp.dll, but this improves diagnostics for now.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace frame pointer-based stack walking with StackWalk64 from dbghelp.dll
for reliable out-of-process stack unwinding on Windows x64/ARM64.

Key changes:
- Add walk_stack_with_dbghelp() function for out-of-process unwinding
- Use custom ReadMemoryRoutine callback for cross-process memory access
- Initialize dbghelp with SymInitialize() for proper PE unwind info access
- Works like the inproc backend but for the crashed process

This fixes the issue where Windows crash events only had 1 frame because
frame pointer-based unwinding doesn't work reliably on x64/ARM64 where
compilers often optimize away frame pointers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move dbghelp.h include and forward declaration to top of file so the
function is declared before it's called in build_stacktrace_for_thread.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…hreads

- Add enrich_frame_with_module_info() to set package and image_addr on frames
- Update Linux/Windows thread loops to build stacktraces for all threads
- Use thread-specific context for Windows StackWalk64 calls
- Fix macOS stack file deletion timing to preserve files for native stacktrace

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Return null stacktrace when frames array is empty (Sentry rejects empty frames)
- Only add stacktrace to thread if it contains frames
- Use thread-specific ucontext_t for Linux register extraction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove unused thread_id variable from register extraction section
(the StackWalk64 section already has its own walk_thread_id).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The module capture from /proc/pid/maps was only using the size of the
first memory mapping for each module. This caused frame-to-module
matching to fail for addresses in later segments (code, data) since
those addresses fell outside the incorrectly small module bounds.

For example, a PIE binary might have:
- First mapping: 0x5500-0x5510 (rodata, offset=0) → size=0x10
- Code mapping:  0x5510-0x5590 (code, offset=0x10)
- Data mapping:  0x5590-0x55a0 (data, offset=0x90)

Previously we'd set size=0x10, so code addresses like 0x5540 would
not match the module. Now we track all mappings and extend the size
to cover from base_address to the maximum end address of all segments.

This fixes the issue where Linux native backend crashes showed
<unknown> for module names in Sentry, while Windows and macOS worked
correctly (they use platform APIs that return accurate sizes).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The mod->uuid field is uint8_t[] with 1-byte alignment, but GUID
structure requires 4-byte alignment. Casting directly to GUID* could
cause undefined behavior or incorrect reads on some platforms.

Use memcpy to copy bytes into a properly aligned GUID struct before
passing to sentry__uuid_from_native().

This fix ensures debug_id is correctly formatted, allowing Sentry
to match frames to debug images properly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add debug logging to crash daemon:
- Log module count before adding debug_meta
- Warn when no modules are captured
- Log when frames can't be matched to any module

Add diagnostic output to E2E tests:
- Add get_debug_meta_from_event() helper function
- Print debug_meta.images info (count, addresses, sizes)
- Print frame info (instruction addresses, symbolication status)

This helps diagnose why Windows shows "unknown problem" while Linux
shows "missing debug files" - the difference indicates frames aren't
being matched to debug images on Windows.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add debug logging to crash daemon:
- Log module count before adding debug_meta
- Warn when no modules are captured
- Log when frames can't be matched to any module

Add diagnostic output to E2E tests:
- Add get_debug_meta_from_event() helper function
- Print debug_meta.images info (count, addresses, sizes)
- Print frame info (instruction addresses, symbolication status)

This helps diagnose why Windows shows "unknown problem" while Linux
shows "missing debug files" - the difference indicates frames aren't
being matched to debug images on Windows.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mujacica
Copy link
Contributor Author

@sentry review

The native backend on Windows was causing symbolicator to fail with
"internal server error" because it was missing the debug_file field
that minidump events include. This prevented proper symbolication.

Changes:
- Add pdb_name field to sentry_module_info_t struct
- Extract PDB filename from PE CodeView debug directory
- Set debug_file field on PE images in native crash events
- Fix code_id format to use lowercase hex (matching minidump format)

The debug_file field is essential for Sentry's symbolicator to know
where to look for PDB files when symbolicating Windows crash events.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mujacica mujacica force-pushed the feat/sentry_native_backend branch from d46d50a to d38b5e1 Compare January 17, 2026 08:10
The native backend on Windows was causing symbolicator to fail with
"internal server error" because:
1. Missing debug_file field that tells symbolicator where to find PDBs
2. code_id format was wrong - timestamp needs 8-digit zero-padding

Changes:
- Add pdb_name field to sentry_module_info_t struct (Windows-only)
- Extract PDB filename from PE CodeView debug directory
- Set debug_file field on PE images in native crash events
- Fix code_id format: use %08x%x (8-digit padded timestamp + size)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mujacica mujacica marked this pull request as ready for review January 17, 2026 11:50
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

// SP points to the top of stack (lowest used address).
// Return addresses and saved registers are at addresses >= SP.
// We need to capture from SP *upward* for stack unwinding to work.
const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE / 8;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The MAX_STACK_SIZE for macOS minidumps is hardcoded to 64KB, while the signal handler captures up to 512KB, leading to truncated stack traces.
Severity: HIGH

Suggested Fix

In src/backends/native/minidump/sentry_minidump_macos.c, change the definition of MAX_STACK_SIZE to use the full captured size. For example, change const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE / 8; to const size_t MAX_STACK_SIZE = SENTRY_CRASH_MAX_STACK_CAPTURE; to align with the signal handler's behavior and other platforms.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/backends/native/minidump/sentry_minidump_macos.c#L531

Potential issue: In `sentry_minidump_macos.c`, the `MAX_STACK_SIZE` for stack capture is
set to `SENTRY_CRASH_MAX_STACK_CAPTURE / 8`, which evaluates to 64KB. However, the
signal handler in `sentry_crash_handler.c` is configured to capture up to
`SENTRY_CRASH_MAX_STACK_CAPTURE` (512KB). This discrepancy causes the minidump writer to
discard most of the captured stack data. As a result, applications on macOS, especially
those with deep call stacks, will have incomplete stack traces in their crash reports,
hindering debugging efforts. This is inconsistent with the Linux implementation which
uses a much larger stack size.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

ptr += ((nhdr->n_namesz + 3) & ~3); // Align to 4 bytes
size_t len = nhdr->n_descsz < max_len ? nhdr->n_descsz
: max_len;
memcpy(build_id, ptr, len);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELF note parsing bounds check ignores alignment padding

Low Severity

The bounds check at line 839 uses unaligned sizes (nhdr->n_namesz + nhdr->n_descsz) but line 846 advances ptr by the aligned name size (((nhdr->n_namesz + 3) & ~3)). This mismatch means a malformed ELF file with small n_namesz requiring alignment padding could pass the bounds check but then cause a buffer over-read when copying n_descsz bytes at line 849. For example, with n_namesz=1, n_descsz=10, and 11 remaining bytes: the check passes (1+10≤11), but after aligned advance of 4 bytes, only 7 bytes remain for a 10-byte read.

Fix in Cursor Fix in Web

}

ptr += cmd->cmdsize;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mach-O load command parsing lacks bounds validation

Low Severity

The Mach-O load command parsing loop iterates up to header.ncmds times without verifying that ptr remains within commands_buf. Before reading cmd->cmd and cmd->cmdsize at line 253-255, there's no check that ptr + sizeof(struct load_command) is within bounds. Additionally, ptr += cmd->cmdsize at line 262 could advance past the buffer if cmdsize is corrupted in a malformed Mach-O file. A malformed binary could cause out-of-bounds memory reads.

Fix in Cursor Fix in Web

minidump_module_t *mdmodule = &module_list->modules[i];
memset(mdmodule, 0, sizeof(*mdmodule));

const sentry_module_info_t *module = &writer->crash_ctx->modules[i];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing bounds check for crash context array indices

Medium Severity

The minidump writers use module_count and num_threads from the crash context to iterate over fixed-size arrays (modules[SENTRY_CRASH_MAX_MODULES] with 512 elements and threads[SENTRY_CRASH_MAX_THREADS] with 256 elements), but never validate these counts against the array bounds. If the crash context contains values exceeding these limits (due to signal handler bugs or shared memory corruption), the code performs out-of-bounds array access. For example, macOS write_module_list_stream accesses writer->crash_ctx->modules[i] where i can reach module_count - 1 without checking if it exceeds 511.

Additional Locations (2)

Fix in Cursor Fix in Web

uint64_t count;
minidump_rva_t base_rva; // All memory starts here
minidump_memory64_descriptor_t ranges[]; // Variable length
} PACKED_ATTR minidump_memory64_list_t;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory64 list base_rva uses wrong field size

Low Severity

The minidump_memory64_list_t structure defines base_rva as minidump_rva_t (32-bit uint32_t), but per Microsoft's minidump specification, MINIDUMP_MEMORY64_LIST.BaseRva is RVA64 which is 64-bit. This format mismatch would produce invalid minidumps if this type were used to write Memory64 streams. While currently unused in the implementation, the definition is incorrect and could cause data corruption if used in the future.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants