Skip to content

[Release SM 6.9] Cherry-Pick Fix rawBufferVectorLoad/Store to widen min precision types to 32-bit#8369

Open
alsepkow wants to merge 2 commits intomicrosoft:release-1.9.2602from
alsepkow:user/alsepkow/cherry-pick-dc4354b
Open

[Release SM 6.9] Cherry-Pick Fix rawBufferVectorLoad/Store to widen min precision types to 32-bit#8369
alsepkow wants to merge 2 commits intomicrosoft:release-1.9.2602from
alsepkow:user/alsepkow/cherry-pick-dc4354b

Conversation

@alsepkow
Copy link
Copy Markdown
Contributor

Cherry-pick PR (#8274) and revert of out-of-scope changes PR (#8321)

Assisted by gh copilot.

SHA dc4354b
SHA 71aa195

alsepkow and others added 2 commits April 13, 2026 20:29
…icrosoft#8274)

## Summary

Fixes `RawBufferVectorLoad`/`Store` to use 32-bit element types
(`i32`/`f32`) for min precision types (`min16int`, `min16uint`,
`min16float`) instead of 16-bit (`i16`/`f16`). This matches how
pre-SM6.9 `RawBufferLoad` handles min precision.

Resolves microsoft#8273

## Root Cause

`TranslateBufLoad` in `HLOperationLower.cpp` creates the vector type
directly from the min precision element type (`i16`/`f16`) without
widening to `i32`/`f32`. This causes WARP (and potentially other
drivers) to load/store 2 bytes per element instead of 4, mismatching the
buffer layout.

## Fix

Apply the same widening pattern used for bool types:
- **Load**: Load as `v_i32`/`v_f32`, then trunc/fptrunc back to
`i16`/`half`
- **Store**: `sext`/`fpext` to `i32`/`f32`, then store as
`v_i32`/`v_f32`

## Testing

Added FileCheck test verifying all 3 min precision types produce
`i32`/`f32` vector load/store ops.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Tex Riddell <texr@microsoft.com>
…icrosoft#8321)

## Summary

Reverts the `DxilGenerationPass` ByteAddressBuffer scalar store changes
and removes scalar store tests that were included in PR microsoft#8274. These
changes were out of scope for the vector load/store fix and, on further
discussion, were concluded to be incomplete.

Follow-up issue for the proper fix: microsoft#8322

## What changed

- **DxilGenerationPass.cpp**: Reverted the ByteAddressBuffer fallback
path added in `ReplaceMinPrecisionRawBufferStoreByType`. The `sext`
fallback was wrong for `min16uint` (loses signedness), and the fix
belongs in CodeGen where signedness info is still available.
- **min_precision_raw_load_store.hlsl**: Removed scalar load/store
tests. Scalar `ByteAddressBuffer::Store<min16int>()` hits a pre-existing
crash in `TranslateMinPrecisionRawBuffer` (`cast<StructType>` on
ByteAddressBuffer's `i32` inner element). Test now covers vector
loads/stores only, which is the scope of the original fix.

## Context

The original PR microsoft#8274 correctly fixed `RawBufferVectorLoad/Store` to
widen min precision types to 32-bit. However, it also added a
ByteAddressBuffer scalar store fix in `DxilGenerationPass` that:
1. Crept outside the scope of the vector load/store fix
2. Was incomplete — the `sext` fallback is wrong for unsigned types
(`min16uint`)
3. Should instead be handled during Clang CodeGen, where signedness
information is available

Scalar ByteAddressBuffer template store widening for min precision types
is a separate pre-existing issue that needs a proper fix in CodeGen.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

2 participants