Skip to content

vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR crashes on Radeon 860M (RDNA 3.5, Strix Point) #422

@xErik

Description

@xErik

Summary

vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR access-violates inside amdvlk64.dll when called against the Radeon 860M integrated GPU on Strix Point under Windows 11 with current AMD Adrenalin. The driver advertises both the VK_KHR_cooperative_matrix extension and the cooperativeMatrix feature flag (so callers reasonably proceed to query properties), but the property query itself crashes the host process.

Environment

  • Hardware: AMD Ryzen AI 9 HX 370 (Strix Point) with Radeon 860M (RDNA 3.5) integrated GPU. Hybrid laptop, also has an NVIDIA RTX 5050 Laptop dGPU.
  • OS: Windows 11 Home Single Language 10.0.26200.
  • Driver: AMD Adrenalin, driverInfo: 26.3.1 (LLPC), driverVersion: 2.0.388. ICD reports as VK_DRIVER_ID_AMD_PROPRIETARY (driverID = 1) but the loaded binary is amdvlk64.dll. The (LLPC) suffix confirms the AMDVLK lineage.
  • Device: vendorID 0x1002, deviceID 0x1114, deviceType PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU.
  • Vulkan SDK: 1.4.341.1 (LunarG).
  • Device API: 1.4.344. Conformance: 1.4.3.3.

Crash

Captured a full memory dump via Sysinternals ProcDump on a RelWithDebInfo build of ggml-org/llama.cpp tag b9016 (which calls the relevant Vulkan APIs). Visual Studio 2022 resolved the call stack with PDBs:

amdvlk64.dll!0x00007ffc4c415672                                        ← crash site (no symbols)
ggml-vulkan.dll!ggml_vk_get_device(unsigned __int64 idx) Line 5476     ← caller in ggml-vulkan.cpp
ggml-vulkan.dll!ggml_backend_vk_host_buffer_type() Line 13909
... (consumer code)

Exception: 0xC0000005 Access violation writing location 0x00007FFD2FBE79C0 (an address inside vulkan-1.dll's mapped region).

The crash sits between the count-only call (pProperties=nullptr, line 5474 in ggml-vulkan.cpp) and the immediately-following cm_props.resize(cm_props_num) (line 5476). The first call returns successfully with cm_props_num=4. The access violation surfaces immediately after, before the second (fill) call at line 5482.

Reproducer

The minimal reproducer is the standard two-call pattern documented in the spec:

// Vulkan 1.4 instance, AMD physical device for the Radeon 860M iGPU.
// Device advertises VK_KHR_cooperative_matrix in extensions; vkGetPhysicalDeviceFeatures2
// reports cooperativeMatrix == VK_TRUE on the device feature struct.

PFN_vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR fn =
    (PFN_vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR)
    vkGetInstanceProcAddr(instance, "vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR");

uint32_t count = 0;
fn(physicalDevice, &count, nullptr);                 // returns count=4 successfully
std::vector<VkCooperativeMatrixPropertiesKHR> props(count);
// access violation occurs at or before this point, inside amdvlk64.dll.

A self-contained C++ reproducer can be built from the source above plus a minimal Vulkan 1.4 instance + physical-device selection. The b9016 build of ggml-org/llama.cpp exhibits the crash through ggml_vk_get_device. Crash dump available on request.

Notes for triage

  • The bug appears to be specific to integrated AMD GPUs reporting cm support on this driver line. Discrete AMD GPUs running this driver have not been tested by us; the reporter's hybrid system only has the Radeon 860M as an AMD device.
  • vkGetPhysicalDeviceFeatures2 reports cooperativeMatrix == VK_TRUE, which is what causes consumers (including ggml-vulkan in llama.cpp) to proceed to query properties. If the driver does not actually support cm on this hardware, the feature flag should be VK_FALSE. If it does support cm, the property query should not access-violate.
  • A workaround patch has been proposed upstream in ggml-org/llama.cpp (skip cm on integrated AMD GPUs regardless of advertised support). Link will be added once the PR is open.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions