dxg/wsl: fix div-by-zero in topology; skip non-queryable adapters#8
Open
hammmmy wants to merge 1 commit intoROCm:developfrom
Open
dxg/wsl: fix div-by-zero in topology; skip non-queryable adapters#8hammmmy wants to merge 1 commit intoROCm:developfrom
hammmmy wants to merge 1 commit intoROCm:developfrom
Conversation
e570ac2 to
565f303
Compare
fcui-amd
reviewed
Mar 18, 2026
topology.cpp: guard WatchPointsNum(), SimdPerCu() and NumArrays against zero before use as divisors or log2 arguments. Replace assert on EngineId.Major with pr_err so release builds receive a diagnostic instead of aborting. Apply OverrideEngineId (HSA_OVERRIDE_GFX_VERSION) as a last resort when EngineId.Major is still 0 after ParseDeviceInfo. wddm/device.cpp: a WDDMQueryAdapter failure for one adapter previously aborted the entire enumeration loop via goto err_out1. On systems where a non-queryable adapter (e.g. a software renderer) appears before the compute GPU this silently prevents any GPU from being found. Changed to skip with pr_debug and continue. Both changes are generic robustness fixes with no effect on devices that already report non-zero values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
565f303 to
d64deb7
Compare
fcui-amd
reviewed
Mar 30, 2026
| * the topology-level override as a last resort; log an error if still 0. */ | ||
| if (!props.EngineId.ui32.Major) { | ||
| if (props.OverrideEngineId.ui32.Major) { | ||
| props.EngineId = props.OverrideEngineId; |
Collaborator
There was a problem hiding this comment.
rocr would handle the EngineId and OverrideEngineId
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
dxg/wsl: fix div-by-zero in topology; skip non-queryable adapters
Motivation
Two generic robustness issues in librocdxg can cause crashes or silent
failures on any supported device:
std::log2call intopology.cppare performedwithout guarding against zero denominators, triggering undefined behaviour
for devices with partially populated
DeviceInfo.WDDMQueryAdapterfailure for one adapter aborts the entire deviceenumeration loop, silently preventing any GPU from being found on systems
where a non-queryable adapter (e.g. a software renderer) appears before
the compute GPU.
Technical Details
src/topology.cppWatchPointsNum()against zero before passing tostd::log2.SimdPerCu()against zero before computingMaxWavesPerSIMD.NumArraysagainst zero before computingNumCUPerArray.assert(EngineId.Major && "…")withpr_err. Release buildsmust not abort on this condition; the diagnostic needs to reach the user.
EngineId.Majorremains 0 afterParseDeviceInfo, applyOverrideEngineId(fromHSA_OVERRIDE_GFX_VERSION) and log a warning;log an error if both are unset.
All guards produce identical results for devices that already report
non-zero values — only the zero case is changed.
src/wddm/device.cpp—WDDMCreateDevicesChanged
goto err_out1tocontinuewith apr_debugmessage so asingle non-queryable adapter does not abort the entire enumeration.
The
QueryAdapterSupportedallowlist is unchanged and remains the gatefor device enumeration.
JIRA ID
N/A
Test Plan
Built librocdxg on WSL2 Ubuntu 24.04 with the standard cmake build. The
div-by-zero guards were validated to produce identical output to the
original code for all non-zero inputs.
Test Result
Build succeeds with no warnings introduced by these changes. The guarded
expressions return the same values as before for any device that reports
non-zero fields.
Submission Checklist