Skip to content

Conversation

@melihmete
Copy link

@melihmete melihmete commented Nov 19, 2025

Description

When developing Vulkan applications, understanding and handling GPU errors is crucial. Currently, traditional graphics debugging methods do not give detailed information about GPU faults.
The VK_EXT_device_fault extension provides detailed information when ERROR_DEVICE_LOST occur, while the VK_EXT_device_address_binding_report extension helps monitor GPU memory usage by reporting
allocated and bound/unbound addresses in Vulkan application.

General Checklist:

Please ensure the following points are checked:

  • My code follows the coding style
  • I have reviewed file licenses
  • I have commented any added functions (in line with Doxygen)
  • I have commented any code that could be hard to understand
  • My changes do not add any new compiler warnings
  • My changes do not add any new validation layer errors or warnings
  • I have used existing framework/helper functions where possible
  • My changes do not add any regressions
  • I have tested every sample to ensure everything runs correctly
  • This PR describes the scope and expected impact of the changes I am making

Note: The Samples CI runs a number of checks including:

  • I have updated the header Copyright to reflect the current year (CI build will fail if Copyright is out of date)
  • My changes build on Windows, Linux, macOS and Android. Otherwise I have documented any exceptions

If this PR contains framework changes:

  • [n/a] I did a full batch run using the batch command line argument to make sure all samples still work properly

Sample Checklist

If your PR contains a new or modified sample, these further checks must be carried out in addition to the General Checklist:

  • I have tested the sample on at least one compliant Vulkan implementation
  • [n/a] If the sample is vendor-specific, I have tagged it appropriately
  • [n/a] I have stated on what implementation the sample has been tested so that others can test on different implementations and platforms
  • [n/a] Any dependent assets have been merged and published in downstream modules
  • For new samples, I have added a paragraph with a summary to the appropriate chapter in the readme of the folder that the sample belongs to e.g. api samples readme
  • For new samples, I have added a tutorial README.md file to guide users through what they need to know to implement code using this feature. For example, see conditional_rendering
  • For new samples, I have added a link to the Antora navigation so that the sample will be listed at the Vulkan documentation site

@CLAassistant
Copy link

CLAassistant commented Nov 19, 2025

CLA assistant check
All committers have signed the CLA.

@melihmete melihmete changed the title VK_EXT_device_fault extension implementation Add a new sample for VK_EXT_device_fault Nov 19, 2025
@gary-sweet
Copy link
Contributor

I see a bunch of compilation errors when I try to build this:

samples/extensions/device_fault/device_fault.cpp:595:49: error: macro "REQUEST_REQUIRED_FEATURE" passed 4 arguments, but takes just 3
  595 |                          bufferDeviceAddress);
      |                                                 ^

framework/core/physical_device.h:188: note: macro "REQUEST_REQUIRED_FEATURE" defined here
  188 | #define REQUEST_REQUIRED_FEATURE(gpu, Feature, flag) gpu.request_required_feature<Feature>(&Feature::flag, #Feature, #flag)
      | 

samples/extensions/device_fault/device_fault.cpp:601:41: error: macro "REQUEST_REQUIRED_FEATURE" passed 4 arguments, but takes just 3
  601 |                          deviceFault);
      |                                         ^

samples/extensions/device_fault/device_fault.cpp:607:50: error: macro "REQUEST_OPTIONAL_FEATURE" passed 4 arguments, but takes just 3
  607 |                          reportAddressBinding);
      |                                                  ^

framework/core/physical_device.h:187: note: macro "REQUEST_OPTIONAL_FEATURE" defined here
  187 | #define REQUEST_OPTIONAL_FEATURE(gpu, Feature, flag) gpu.request_optional_feature<Feature>(&Feature::flag, #Feature, #flag)
      | 

samples/extensions/device_fault/device_fault.h:31:18: error: 'virtual void DeviceFault::request_gpu_features(vkb::PhysicalDevice&)' marked 'override', but does not override
   31 |     virtual void request_gpu_features(vkb::PhysicalDevice &gpu) override;
      |                  ^~~~~~~~~~~~~~~~~~~~

samples/extensions/device_fault/device_fault.cpp:592:5: error: 'REQUEST_REQUIRED_FEATURE' was not declared in this scope
  592 |     REQUEST_REQUIRED_FEATURE(gpu,
      |     ^~~~~~~~~~~~~~~~~~~~~~~~

samples/extensions/device_fault/device_fault.cpp:604:5: error: 'REQUEST_OPTIONAL_FEATURE' was not declared in this scope
  604 |     REQUEST_OPTIONAL_FEATURE(gpu,
      |     ^~~~~~~~~~~~~~~~~~~~~~~~

samples/extensions/device_fault/device_fault.cpp:610:22: error: invalid use of incomplete type 'class vkb::PhysicalDevice'
  610 |     auto &features = gpu.get_mutable_requested_features();
      |                      ^~~

framework/core/device.h:36:7: note: forward declaration of 'class vkb::PhysicalDevice'
   36 | class PhysicalDevice;
      |       ^~~~~~~~~~~~~~
     

@melihmete
Copy link
Author

@gary-sweet All fixed and pushed the changes.

@SaschaWillems
Copy link
Collaborator

Can you move the shaders to a subdirectory called "glsl"? We support different shading languages and the goal is for all samples to have shaders in different shading languages (glsl, slang and/or hlsl).

const std::vector<VkPushConstantRange> ranges = {
vkb::initializers::push_constant_range(graphics ? VK_SHADER_STAGE_VERTEX_BIT : VK_SHADER_STAGE_COMPUTE_BIT,
graphics ? sizeof(PushVertex) : sizeof(PushCompute), 0),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use a std::vector here, when you have just one element?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this sample is built on top of buffer_device_address so that snippet is implemented in buffer_device_address extension sample.

memory_barrier.dst_access_mask = VK_ACCESS_INDEX_READ_BIT;
memory_barrier.src_stage_mask = VK_PIPELINE_STAGE_TRANSFER_BIT;
memory_barrier.dst_stage_mask = VK_PIPELINE_STAGE_VERTEX_INPUT_BIT;
cmd->buffer_memory_barrier(*index_buffer, 0, VK_WHOLE_SIZE, memory_barrier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this memory barrier needed at all, considering that you wait_idle a few lines later anyways?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That section is also same with buffer_device_address extension sample.

// So we incorrectly call wait_idle here, so we can get the GPU in error state, and we can query it for device_fault before an exception is thrown.
VkResult error = get_device().get_queue_by_present(0).wait_idle();
check_device_fault(error);
ApiVulkanSample::submit_frame();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, instead of this slightly hacky approach:

	try
	{
		ApiVulkanSample::submit_frame();
	}
	catch (std::exception const &e)
	{
		vk::DeviceLostError const *device_lost_error = reinterpret_cast<vk::DeviceLostError const *>(&e);
		if (device_lost_error)
		{
			check_device_fault(VK_ERROR_DEVICE_LOST);
		}
	}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great hack! Changed it and tested it.

VkResult error = get_device().get_queue_by_present(0).wait_idle();
check_device_fault(error);
ApiVulkanSample::submit_frame();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments above are outdated with this change.

if (device_lost_error)
{
check_device_fault(VK_ERROR_DEVICE_LOST);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, my suggested code snippet is not correct: the reinterpret_cast would always give you some non-nullptr.
I think, the best you could do here is something like

    catch (std::runtime_error const &e)
    {
        if (strcmp("Detected Vulkan error: ERROR_DEVICE_LOST", e.what()) == 0)
        {
            check_device_fault(VK_ERROR_DEVICE_LOST);
        }
        std::rethrow_exception(std::current_exception());
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch (std::exception const &e)
    {
        vk::DeviceLostError const *device_lost_error = reinterpret_cast<vk::DeviceLostError const *>(&e);
        if (device_lost_error)
        {
            check_device_fault(VK_ERROR_DEVICE_LOST);
        }
    }

Actually previous snippet I've tested reports the error as VK_ERROR_DEVICE_LOST and I can retrieve the log I need. Second snippet you've suggested is not working for the purpose of this extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually previous snippet I've tested reports the error as VK_ERROR_DEVICE_LOST and I can retrieve the log I need.

Sure. But it would identify every error as a device lost error.
What does not work with the strcmp-based approach? What does e.what() look like on your end?

Copy link
Author

@melihmete melihmete Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous version logcat error itself VkResult VK_ERROR_DEVICE_LOST but with changed version, in the framework currently reported as "Detected Vulkan error: ERROR_INITIALIZATION_FAILED" and not logging VK_ERROR_DEVICE_LOST.

SaschaWillems
SaschaWillems previously approved these changes Nov 28, 2025
@gary-sweet
Copy link
Contributor

This does build ok for me now, and correctly reports as not-supported. I can't say any more than that though.

@melihmete
Copy link
Author

@SaschaWillems Hello, I've just received errors regarding Quality Checks / Copyright Headers Check in the files I haven't modified. Should I fix them or is it related to CI needs a fixing?

@SaschaWillems
Copy link
Collaborator

As per yesterdays call we'll ignore the CI failure caused by files you didn't touch. We'll try to make that CI step more robust/less error prone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants