Skip to content

fast-get: fix crash by freeing buffer before removing partition#10585

Open
tmleman wants to merge 2 commits intothesofproject:mainfrom
tmleman:topic/upstream/pr/fast_get/fix/userspace
Open

fast-get: fix crash by freeing buffer before removing partition#10585
tmleman wants to merge 2 commits intothesofproject:mainfrom
tmleman:topic/upstream/pr/fast_get/fix/userspace

Conversation

@tmleman
Copy link
Contributor

@tmleman tmleman commented Mar 2, 2026

Fix crash in fast_put() when freeing large buffers (>2048 bytes) by freeing the heap memory BEFORE removing the memory domain partition.

The previous code removed the partition first, making the memory inaccessible to the current thread. When sof_heap_free() then tried to access the heap metadata, it caused a load prohibited exception.

The fix reorders operations:

  1. Free the buffer while partition still grants access
  2. Remove the partition to prevent partition table leaks

This crash manifested during pipeline deletion with 44.1kHz family sample rates (88200, 44100, etc.) which use large SRC coefficient buffers requiring userspace partition management.

Fixes: b927092 ("fast-get: enable sharing between userspace threads")

tmleman added 2 commits March 2, 2026 14:18
Add LOG_ERR messages when memory partition grants fail to improve
debuggability of userspace access issues.

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
Fix crash in fast_put() when freeing large buffers (>2048 bytes) by
freeing the heap memory BEFORE removing the memory domain partition.

The previous code removed the partition first, making the memory
inaccessible to the current thread. When sof_heap_free() then tried to
access the heap metadata, it caused a load prohibited exception.

The fix reorders operations:
1. Free the buffer while partition still grants access
2. Remove the partition to prevent partition table leaks

This crash manifested during pipeline deletion with 44.1kHz family
sample rates (88200, 44100, etc.) which use large SRC coefficient
buffers requiring userspace partition management.

Fixes: b927092 ("fast-get: enable sharing between userspace
threads")

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a userspace crash in fast_put() by changing the teardown order for large fast-get buffers so the heap free happens while the memory-domain partition still allows access, then removing the partition afterward.

Changes:

  • Add error logs when userspace access-grant fails in fast_get().
  • Reorder fast_put() cleanup for large buffers: free first, then remove the memory-domain partition, with warning on removal failure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +285 to +286
* 1. Free the heap buffer FIRST (while partition still grants us access)
* 2. Then remove the partition (to prevent partition leaks)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says removing the partition prevents partition leaks, but fast_get() can grant access (add partitions) to additional threads (see the additional fast_get_access_grant(k_current_get(), ...) path). fast_put() currently only removes the partition from entry->thread's domain, so partitions added to other threads will still leak. Either track/remove partitions for all threads that were granted access, or clarify the comment to match the actual behavior (owner-domain only).

Suggested change
* 1. Free the heap buffer FIRST (while partition still grants us access)
* 2. Then remove the partition (to prevent partition leaks)
* 1. Free the heap buffer FIRST (while the owner-thread's partition
* still grants us access)
* 2. Then remove the partition from the owner thread's domain
* (to prevent partition leaks in that domain only)

Copilot uses AI. Check for mistakes.
Comment on lines +302 to +303
if (err < 0)
LOG_WRN("partition removal failed err=%d", err);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k_mem_domain_remove_partition() errors are all handled the same way (warn + proceed). If the error is something other than “partition not present”, this can leave a stale memory-domain mapping to a freed (and potentially reallocated) region. Consider handling expected errors (e.g., not-found) separately, and treating unexpected failures as a hard error (or avoid freeing/clearing the entry until removal succeeds). Also consider logging addr/size to aid diagnostics.

Suggested change
if (err < 0)
LOG_WRN("partition removal failed err=%d", err);
if (err == -ENOENT) {
/* Partition already removed elsewhere; log for diagnostics. */
LOG_INF("partition not found for addr %p size %zu", addr, entry->size);
} else if (err < 0) {
/*
* Unexpected failure: we have freed the heap buffer but
* the memory-domain mapping may still exist. Treat this
* as a hard error to avoid stale mappings to freed memory.
*/
LOG_ERR("partition removal failed err=%d addr=%p size=%zu",
err, addr, entry->size);
k_panic();
}

Copilot uses AI. Check for mistakes.
@kv2019i kv2019i requested a review from jsarha March 2, 2026 14:03
@kv2019i
Copy link
Collaborator

kv2019i commented Mar 2, 2026

@jsarha Could this be related to the failure seen with nocodec?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants