fast-get: fix crash by freeing buffer before removing partition#10585
fast-get: fix crash by freeing buffer before removing partition#10585tmleman wants to merge 2 commits intothesofproject:mainfrom
Conversation
Add LOG_ERR messages when memory partition grants fail to improve debuggability of userspace access issues. Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
Fix crash in fast_put() when freeing large buffers (>2048 bytes) by freeing the heap memory BEFORE removing the memory domain partition. The previous code removed the partition first, making the memory inaccessible to the current thread. When sof_heap_free() then tried to access the heap metadata, it caused a load prohibited exception. The fix reorders operations: 1. Free the buffer while partition still grants access 2. Remove the partition to prevent partition table leaks This crash manifested during pipeline deletion with 44.1kHz family sample rates (88200, 44100, etc.) which use large SRC coefficient buffers requiring userspace partition management. Fixes: b927092 ("fast-get: enable sharing between userspace threads") Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
There was a problem hiding this comment.
Pull request overview
Fixes a userspace crash in fast_put() by changing the teardown order for large fast-get buffers so the heap free happens while the memory-domain partition still allows access, then removing the partition afterward.
Changes:
- Add error logs when userspace access-grant fails in
fast_get(). - Reorder
fast_put()cleanup for large buffers: free first, then remove the memory-domain partition, with warning on removal failure.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * 1. Free the heap buffer FIRST (while partition still grants us access) | ||
| * 2. Then remove the partition (to prevent partition leaks) |
There was a problem hiding this comment.
The comment says removing the partition prevents partition leaks, but fast_get() can grant access (add partitions) to additional threads (see the additional fast_get_access_grant(k_current_get(), ...) path). fast_put() currently only removes the partition from entry->thread's domain, so partitions added to other threads will still leak. Either track/remove partitions for all threads that were granted access, or clarify the comment to match the actual behavior (owner-domain only).
| * 1. Free the heap buffer FIRST (while partition still grants us access) | |
| * 2. Then remove the partition (to prevent partition leaks) | |
| * 1. Free the heap buffer FIRST (while the owner-thread's partition | |
| * still grants us access) | |
| * 2. Then remove the partition from the owner thread's domain | |
| * (to prevent partition leaks in that domain only) |
| if (err < 0) | ||
| LOG_WRN("partition removal failed err=%d", err); |
There was a problem hiding this comment.
k_mem_domain_remove_partition() errors are all handled the same way (warn + proceed). If the error is something other than “partition not present”, this can leave a stale memory-domain mapping to a freed (and potentially reallocated) region. Consider handling expected errors (e.g., not-found) separately, and treating unexpected failures as a hard error (or avoid freeing/clearing the entry until removal succeeds). Also consider logging addr/size to aid diagnostics.
| if (err < 0) | |
| LOG_WRN("partition removal failed err=%d", err); | |
| if (err == -ENOENT) { | |
| /* Partition already removed elsewhere; log for diagnostics. */ | |
| LOG_INF("partition not found for addr %p size %zu", addr, entry->size); | |
| } else if (err < 0) { | |
| /* | |
| * Unexpected failure: we have freed the heap buffer but | |
| * the memory-domain mapping may still exist. Treat this | |
| * as a hard error to avoid stale mappings to freed memory. | |
| */ | |
| LOG_ERR("partition removal failed err=%d addr=%p size=%zu", | |
| err, addr, entry->size); | |
| k_panic(); | |
| } |
|
@jsarha Could this be related to the failure seen with nocodec? |
Fix crash in fast_put() when freeing large buffers (>2048 bytes) by freeing the heap memory BEFORE removing the memory domain partition.
The previous code removed the partition first, making the memory inaccessible to the current thread. When sof_heap_free() then tried to access the heap metadata, it caused a load prohibited exception.
The fix reorders operations:
This crash manifested during pipeline deletion with 44.1kHz family sample rates (88200, 44100, etc.) which use large SRC coefficient buffers requiring userspace partition management.
Fixes: b927092 ("fast-get: enable sharing between userspace threads")