Skip to content

Conversation

@ryanbreen
Copy link
Owner

Summary

Major fixes for ARM64 boot stability, improving success rate from 0% to ~75-80%.

Memory Layout Fixes

  • Move heap to 0x5000_0000 (after frame allocator ends)
  • Move kernel stacks to 0x5200_0000-0x5400_0000 (after 32MB heap)
  • Eliminates collision between heap and kernel stack regions

User Stack Mapping (TTBR0 vs TTBR1)

  • ARM64 userspace addresses must be in TTBR0 range (< 0xFFFF...)
  • Stack allocator returns HHDM (kernel) addresses, but userspace needs addresses in USER_STACK_REGION_START
  • Add map_user_stack_to_process_with_phys() to map physical frames to proper userspace virtual addresses

Frame Allocator Improvements

  • Change from fetch_add to compare-exchange loop to avoid wasting frame slots
  • Disable FREE_FRAMES reuse on ARM64 temporarily (investigating Vec corruption)

Known Issues (~25% failure rate)

  • Intermittent data abort at 0x28 during FdKind::clone (timing race)
  • Occasional OOM during stack allocation

These are pre-existing timing bugs that will be addressed in follow-up work.

Test plan

  • Run strict boot tests: 75-80% success rate (up from 0%)
  • Investigate remaining timing bugs in follow-up branch

🤖 Generated with Claude Code

Major fixes for ARM64 boot stability (0% → 75% success rate):

Memory Layout Fixes:
- Move heap to 0x5000_0000 (after frame allocator ends at 0x5000_0000)
- Move kernel stacks to 0x5200_0000-0x5400_0000 (after 32MB heap)
- This eliminates collision between heap and kernel stack regions

User Stack Mapping (TTBR0 vs TTBR1):
- ARM64 userspace addresses must be in TTBR0 range (< 0xFFFF...)
- Stack allocator returns HHDM (kernel) addresses, but userspace needs
  addresses in USER_STACK_REGION_START (0x0000_FFFF_FF00_0000)
- Add map_user_stack_to_process_with_phys() to map physical frames
  to proper userspace virtual addresses

Frame Allocator Improvements:
- Change from fetch_add to compare-exchange loop to avoid wasting
  frame slots when get_usable_frame() returns None
- Disable FREE_FRAMES reuse on ARM64 temporarily - investigating
  potential Vec corruption during concurrent access

Remaining Issues (~25% failure rate):
- Intermittent data abort at 0x28 during FdKind::clone (timing race)
- Occasional OOM during stack allocation (frame exhaustion)
These appear to be pre-existing timing bugs exposed by the memory fixes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ryanbreen ryanbreen merged commit 19aff9e into main Jan 31, 2026
1 of 2 checks passed
@ryanbreen ryanbreen deleted the feature/arm64-idle-context-fix branch January 31, 2026 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants