You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ultimately, we came to the conclusion that each subgroup invocation was probably somehow not in sync as each loop went on.
165
165
Particularly, the last invocation that spends some extra time writing to shared memory may have been lagging behind.
166
-
It is a simple fix to the emulated subgroup reduce and scan. A memory barrier was enough.
166
+
It is a simple fix to the emulated subgroup reduce and scan. A subgroup barrier was enough.
167
167
168
168
```cpp
169
169
T inclusive_scan(T value)
170
170
{
171
-
memory_barrier()
171
+
control_barrier()
172
172
173
173
rhs = shuffleUp(value, 1)
174
174
value = value + (firstInvocation ? identity : rhs)
@@ -185,17 +185,8 @@ T inclusive_scan(T value)
185
185
186
186
As a side note, using the `SPV_KHR_maximal_reconvergence` extension doesn't resolve this issue surprisingly.
187
187
188
-
However, this was only a problem on Nvidia devices.
189
-
And as the title of this article states, it's unclear whether this is a bug in Nvidia's SPIRV compiler or subgroup shuffle operations just do not imply reconvergence in the spec.
190
-
191
-
-------------------
192
-
193
-
P.S. you may note in the source code that the memory barrier contains the workgroup memory mask, despite us only needing sync in the subgroup scope.
However, this problem was only observed on Nvidia devices.
189
+
And as the title of this article states, it's unclear whether this is a bug in Nvidia's SPIRV compiler or subgroup shuffle operations just do not imply reconvergence in the Vulkan specification.
198
190
199
-
This is because unfortunately, the subgroup memory mask doesn't seem to count as a storage class, at least according to the Vulkan SPIRV validator.
200
-
Only the next step up in memory level is valid.
201
-
I feel like there's possibly something missing here.
191
+
----------------------------
192
+
_This issue was observed happening inconsistently on Nvidia driver version 576.80, released 17th June 2025._
0 commit comments