libnvme: fix NULL handle dereference in discovery and identify paths#3341
Open
martin-belanger wants to merge 1 commit intolinux-nvme:masterfrom
Open
libnvme: fix NULL handle dereference in discovery and identify paths#3341martin-belanger wants to merge 1 commit intolinux-nvme:masterfrom
martin-belanger wants to merge 1 commit intolinux-nvme:masterfrom
Conversation
libnvme_ctrl_get_transport_handle() opens the device lazily and returns NULL if the open fails. This can happen when a udev-triggered nvmf-connect@.service fires for a discovery controller that is already in the process of being removed: the device is still visible in sysfs at scan time but the kernel returns EAGAIN on open because the controller is no longer in LIVE state. None of the callers checked the returned handle before passing it to libnvme_get_log() or libnvme_submit_admin_passthru(), causing a segmentation fault. Add NULL handle guards with appropriate log messages in nvme_discovery_log(), nvmf_dim(), and libnvme_ctrl_identify(). Add a silent safety net with a comment in libnvme_get_log() itself as a last line of defence against future callers that omit the check. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
0aabdf9 to
ca62b42
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
libnvme_ctrl_get_transport_handle()uses lazy open semantics: it opens/dev/nvmeXon first use and returnsNULLif the open fails. None of the callers validated the returned handle before passing it down the call chain, leading to a segmentation fault.The crash was observed in a
nvmf-connect@.serviceinstance triggered by udevd. The service fires when a discovery controller emits an AEN, but udevd can process the event after the controller has already begun teardown. By that point the kernel returnsEAGAINfromopen()because the controller is no longer inLIVEstate,libnvme_open()fails, and the NULL handle is eventually dereferenced inlibnvme_get_log().Fix
nvme_discovery_log()— guard afterlibnvme_ctrl_get_transport_handle()with aLIBNVME_LOG_DEBUGmessage to aid tracingnvmf_dim()— same guard, placed after the existing pre-condition checks wherectxis known valid, with aLIBNVME_LOG_ERRmessage consistent with the surrounding stylelibnvme_ctrl_identify()— same guard with aLIBNVME_LOG_DEBUGmessagelibnvme_get_log()— silent safety net with a comment as a last line of defence for any future caller that omits the checkTesting
Verified by running nvme-stas tests that rapidly create and delete NVMe devices, which reliably triggered the crash before this fix. No crash observed after applying the patch.