Skip to content

Draft: Migrate to go-nvlib#73

Open
cdesiniotis wants to merge 4 commits into
NVIDIA:masterfrom
cdesiniotis:migrate-to-go-nvlib
Open

Draft: Migrate to go-nvlib#73
cdesiniotis wants to merge 4 commits into
NVIDIA:masterfrom
cdesiniotis:migrate-to-go-nvlib

Conversation

@cdesiniotis
Copy link
Copy Markdown
Contributor

This PR migrates to using go-nvlib for enumerating NVIDIA PCI and vGPU devices on the system, as well as parsing the pci database file. go-nvlib contains a set of common go packages used across many cloud-native components, including the k8s-device-plugin and vgpu-device-manager.

The nvpci package is used for enumerating all NVIDIA PCI devices and creating the iommuMap and deviceMap which represent all of the passthrough GPUs. The nvmdev package is used for enumerating all NVIDIA vGPU devices and creating the vGgpuMap and gpuVgpuMap which represent all the vGPU devices. The pciids package is used for parsing the pci database.

cc @rthallisey @zvonkok @elezar @shivamerla

…ing device names

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
}
for _, dev := range devices {
gpuAddress := dev.Parent.Address
vgpuType := strings.ReplaceAll(dev.MDEVType, " ", "_")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, before this change, the device plugin would use the full contents of the mdev_type/name file to construct the corresponding resource name. E.g. mdev_type/name would contain the string NVIDIA A10-12Q and the plugin would advertise nvidia.com/NVIDIA_A10-12Q resources. The nvmdev package strips the leading NVIDIA | GRID in this file, so the resource name would now be nvidia.com/A10-12Q.

I personally would prefer the latter naming strategy, but this is a breaking change from the user's perspective. I have opened https://gitlab.com/nvidia/cloud-native/go-nvlib/-/merge_requests/45 to add a device.MDEVName field so we can use the full contents of mdev_type/name when constructing the resource name (and thereby retain the current behavior of the plugin).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant