Skip to content

dxg/wsl: fix div-by-zero in topology; skip non-queryable adapters#8

Open
hammmmy wants to merge 1 commit intoROCm:developfrom
hammmmy:wsl-gfx1103-support
Open

dxg/wsl: fix div-by-zero in topology; skip non-queryable adapters#8
hammmmy wants to merge 1 commit intoROCm:developfrom
hammmmy:wsl-gfx1103-support

Conversation

@hammmmy
Copy link
Copy Markdown

@hammmmy hammmmy commented Mar 16, 2026

dxg/wsl: fix div-by-zero in topology; skip non-queryable adapters

Motivation

Two generic robustness issues in librocdxg can cause crashes or silent
failures on any supported device:

  1. Integer divisions and a std::log2 call in topology.cpp are performed
    without guarding against zero denominators, triggering undefined behaviour
    for devices with partially populated DeviceInfo.
  2. A WDDMQueryAdapter failure for one adapter aborts the entire device
    enumeration loop, silently preventing any GPU from being found on systems
    where a non-queryable adapter (e.g. a software renderer) appears before
    the compute GPU.

Technical Details

src/topology.cpp

  • Guard WatchPointsNum() against zero before passing to std::log2.
  • Guard SimdPerCu() against zero before computing MaxWavesPerSIMD.
  • Guard NumArrays against zero before computing NumCUPerArray.
  • Replace assert(EngineId.Major && "…") with pr_err. Release builds
    must not abort on this condition; the diagnostic needs to reach the user.
  • When EngineId.Major remains 0 after ParseDeviceInfo, apply
    OverrideEngineId (from HSA_OVERRIDE_GFX_VERSION) and log a warning;
    log an error if both are unset.

All guards produce identical results for devices that already report
non-zero values — only the zero case is changed.

src/wddm/device.cppWDDMCreateDevices

Changed goto err_out1 to continue with a pr_debug message so a
single non-queryable adapter does not abort the entire enumeration.
The QueryAdapterSupported allowlist is unchanged and remains the gate
for device enumeration.

JIRA ID

N/A

Test Plan

Built librocdxg on WSL2 Ubuntu 24.04 with the standard cmake build. The
div-by-zero guards were validated to produce identical output to the
original code for all non-zero inputs.

Test Result

Build succeeds with no warnings introduced by these changes. The guarded
expressions return the same values as before for any device that reports
non-zero fields.

Submission Checklist

@hammmmy hammmmy force-pushed the wsl-gfx1103-support branch from e570ac2 to 565f303 Compare March 16, 2026 21:17
@hammmmy hammmmy closed this Mar 17, 2026
@hammmmy hammmmy reopened this Mar 17, 2026
topology.cpp: guard WatchPointsNum(), SimdPerCu() and NumArrays against
zero before use as divisors or log2 arguments.  Replace assert on
EngineId.Major with pr_err so release builds receive a diagnostic
instead of aborting.  Apply OverrideEngineId (HSA_OVERRIDE_GFX_VERSION)
as a last resort when EngineId.Major is still 0 after ParseDeviceInfo.

wddm/device.cpp: a WDDMQueryAdapter failure for one adapter previously
aborted the entire enumeration loop via goto err_out1.  On systems where
a non-queryable adapter (e.g. a software renderer) appears before the
compute GPU this silently prevents any GPU from being found.  Changed to
skip with pr_debug and continue.

Both changes are generic robustness fixes with no effect on devices that
already report non-zero values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hammmmy hammmmy force-pushed the wsl-gfx1103-support branch from 565f303 to d64deb7 Compare March 27, 2026 16:19
@hammmmy hammmmy changed the title wsl: add gfx1103 (RDNA3 Hawk Point APU) support dxg/wsl: fix div-by-zero in topology; skip non-queryable adapters Mar 27, 2026
@hammmmy hammmmy requested a review from fcui-amd March 27, 2026 16:23
* the topology-level override as a last resort; log an error if still 0. */
if (!props.EngineId.ui32.Major) {
if (props.OverrideEngineId.ui32.Major) {
props.EngineId = props.OverrideEngineId;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rocr would handle the EngineId and OverrideEngineId

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants