Skip to content

libnvme: fix NULL handle dereference in discovery and identify paths#3341

Open
martin-belanger wants to merge 1 commit intolinux-nvme:masterfrom
martin-belanger:fix-libnvme-null-hdl
Open

libnvme: fix NULL handle dereference in discovery and identify paths#3341
martin-belanger wants to merge 1 commit intolinux-nvme:masterfrom
martin-belanger:fix-libnvme-null-hdl

Conversation

@martin-belanger
Copy link
Copy Markdown

Problem

libnvme_ctrl_get_transport_handle() uses lazy open semantics: it opens /dev/nvmeX on first use and returns NULL if the open fails. None of the callers validated the returned handle before passing it down the call chain, leading to a segmentation fault.

The crash was observed in a nvmf-connect@.service instance triggered by udevd. The service fires when a discovery controller emits an AEN, but udevd can process the event after the controller has already begun teardown. By that point the kernel returns EAGAIN from open() because the controller is no longer in LIVE state, libnvme_open() fails, and the NULL handle is eventually dereferenced in libnvme_get_log().

Fix

  • nvme_discovery_log() — guard after libnvme_ctrl_get_transport_handle() with a LIBNVME_LOG_DEBUG message to aid tracing
  • nvmf_dim() — same guard, placed after the existing pre-condition checks where ctx is known valid, with a LIBNVME_LOG_ERR message consistent with the surrounding style
  • libnvme_ctrl_identify() — same guard with a LIBNVME_LOG_DEBUG message
  • libnvme_get_log() — silent safety net with a comment as a last line of defence for any future caller that omits the check

Testing

Verified by running nvme-stas tests that rapidly create and delete NVMe devices, which reliably triggered the crash before this fix. No crash observed after applying the patch.

libnvme_ctrl_get_transport_handle() opens the device lazily and returns
NULL if the open fails. This can happen when a udev-triggered
nvmf-connect@.service fires for a discovery controller that is already
in the process of being removed: the device is still visible in sysfs
at scan time but the kernel returns EAGAIN on open because the
controller is no longer in LIVE state.

None of the callers checked the returned handle before passing it to
libnvme_get_log() or libnvme_submit_admin_passthru(), causing a
segmentation fault.

Add NULL handle guards with appropriate log messages in
nvme_discovery_log(), nvmf_dim(), and libnvme_ctrl_identify(). Add a
silent safety net with a comment in libnvme_get_log() itself as a last
line of defence against future callers that omit the check.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
@martin-belanger martin-belanger force-pushed the fix-libnvme-null-hdl branch from 0aabdf9 to ca62b42 Compare May 5, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant