ipc3: Fix spinlock violation in PM context save on fuzzer#10193
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a spinlock validation assertion failure in native simulation builds with IPC3 by preventing hardware PM operations in POSIX environments. The recent Zephyr spinlock validation detects context switching while holding locks, which occurs when ipc_pm_context_save calls arch_irq_lock before the EDF work queue yields.
- Extends the existing guard condition to exclude hardware PM operations when
CONFIG_ZEPHYR_POSIXis defined - Preserves PM functionality while avoiding spinlock violations in native simulation builds
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
UT(https://github.com/thesofproject/sof/actions/runs/17238881393/job/48911240441?pr=10193) unfortunately already broken on the main branch (lucky for me). |
Fix spinlock validation assertion failure during fuzzing on native_sim builds with IPC3. Recent Zephyr commit added spinlock validation that detects context switching while holding spinlocks. This revealed that `ipc_pm_context_save` calls `arch_irq_lock` before the EDF work queue yields, causing: ASSERTION FAIL [arch_irq_unlocked(key) || ...] Context switching while holding lock! The hardware PM operations (arch_irq_lock, platform_timer_stop, etc.) are not needed for POSIX simulation environments. Extend the existing guard to exclude these operations when CONFIG_ZEPHYR_POSIX is defined. This preserves PM functionality while avoiding spinlock violations in native simulation builds. Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
kv2019i
left a comment
There was a problem hiding this comment.
Some opens on why exactly the lock is held, but not really a reason against this change.
| platform_context_save(sof_get()); | ||
|
|
||
| #if !defined(CONFIG_LIBRARY) | ||
| #if !defined(CONFIG_LIBRARY) && !defined(CONFIG_ZEPHYR_POSIX) |
There was a problem hiding this comment.
Sorry for late review. I'm still a bit baffled how come spinlock is held in the POSIX simulation runs. As far as I can tell, this shouldn't be the case. Do we have a bug lurking here somewhere?
The patch itself is ok. There is no need for these actions in simulator runs, but curious if @tmleman you looked up the where the lock is taken?
There was a problem hiding this comment.
In line 681:
/* mask all DSP interrupts */
arch_irq_lock();There was a problem hiding this comment.
@tmleman Aa, right, sorry, I thought the assert is hit on L681. But ack, no, we take the lock there, and in real hw we go to D3 (and code won't return) but in simulation there is a context switch and this then hits an assert. Thanks for clarifying, then this is ok!
Fix spinlock validation assertion failure during fuzzing on native_sim builds with IPC3.
Recent Zephyr commit added spinlock validation that detects context switching while holding spinlocks. This revealed that
ipc_pm_context_savecallsarch_irq_lockbefore the EDF work queue yields, causing:ASSERTION FAIL [arch_irq_unlocked(key) || ...] Context switching while holding lock!
The hardware PM operations (arch_irq_lock, platform_timer_stop, etc.) are not needed for POSIX simulation environments. Extend the existing guard to exclude these operations when CONFIG_ZEPHYR_POSIX is defined.
This preserves PM functionality while avoiding spinlock violations in native simulation builds.