-
Notifications
You must be signed in to change notification settings - Fork 309
Add amdgpu intrinsics #1976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add amdgpu intrinsics #1976
Conversation
|
I think it can be "tested" the same way we do |
d6043df to
e102811
Compare
|
Thanks, I tried to replicate what’s there for nvptx + adding The diff to my first push from when opening this PR is here: https://github.com/rust-lang/stdarch/compare/b3f5bdae0efbdd5f7297d0225623bd31c7fe895b..e1028110e77561574bfb7ea349154d46b5ea7b86 |
sayantn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have put some comments
|
Also I have noticed that Clang provides a different set of intrinsics than these ( |
|
Thanks for the review! Interesting, I thought I do plan to write a common mod amdgpu {
mod gpu {
use super::*;
pub fn block_id_x() -> u32 {
workitem_id_x()
}
}
}
// Same for nvptx
mod gpu {
#[cfg(target_arch = "amdgpu")]
pub use amdgpu::gpu::*;
#[cfg(target_arch = "nvptx64")]
pub use nvptx::gpu::*;
// + more intrinsics as in gpuintrin.h
} |
|
The analogue to If there are some interesting platform-specific intrinsics, we can add them, but generally we follow GCC and Clang in regards to intrinsics. Yea, a common |
Add intrinsics for the amdgpu architecture.
e102811 to
d850408
Compare
|
Thanks for the review, that’s a nice case for const generics! I fixed all your inline comments. Also thanks for linking the clang intrinsics, I tend to forget about those. As common GPU intrinsics is a larger topic, I started a Discourse post here (please add your thoughts!): https://internals.rust-lang.org/t/naming-gpu-things-in-the-rust-compiler-and-standard-library/23833 My planned intrinsic “stack” is roughly as follows, from high-level to low-level:
We could also use clang names for
I would prefer following LLVM’s intrinsic names in Note for my future self: I got run.sh to work with |
sayantn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left some more comments
| //! [LLVM implementation]: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
| #[allow(improper_ctypes)] | ||
| unsafe extern "unadjusted" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there some specific reason you only made some of the (both LLVM and Rust) intrinsics safe? Normally we make an intrinsic unsafe only if it can cause memory unsafety, and I don't see how something like s.barrier.signal is unsafe, where s.barrier is safe?
| fn llvm_s_barrier_leave(barrier_type: u16); | ||
| #[link_name = "llvm.amdgcn.s.get.barrier.state"] | ||
| fn llvm_s_get_barrier_state(barrier_type: i32) -> u32; | ||
| #[link_name = "llvm.amdgcn.s.wave.barrier"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #[link_name = "llvm.amdgcn.s.wave.barrier"] | |
| #[link_name = "llvm.amdgcn.wave.barrier"] |
typo (probably also rename the intrinsics to (llvm_)wave_barrier
| safe fn llvm_wave_reduce_add(value: u32, strategy: u32) -> u32; | ||
| #[link_name = "llvm.amdgcn.wave.reduce.fadd"] | ||
| safe fn llvm_wave_reduce_fadd(value: f32, strategy: u32) -> f32; | ||
| #[link_name = "llvm.amdgcn.wave.reduce.and"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that LLVM also defined fsub and sub versions of these intrinsics, why aren't they included here?
| safe fn llvm_s_get_waveid_in_workgroup() -> u32; | ||
|
|
||
| // gfx11 | ||
| #[link_name = "llvm.amdgcn.permlane64"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #[link_name = "llvm.amdgcn.permlane64"] | |
| #[link_name = "llvm.amdgcn.permlane64.i32"] |
the type parameter should be specified
| #[inline] | ||
| #[unstable(feature = "stdarch_amdgpu", issue = "149988")] | ||
| pub unsafe fn sched_barrier<const MASK: u32>() { | ||
| static_assert_uimm_bits!(MASK, 11); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check enough? It seems like MASK can only take values mentioned above, not any linear combination of them -- in that case we should also check MASK.is_power_of_two() or something like that. Also nit, it might be better to make some constants for these values, but I'll leave it to your discretion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason when I tried to build this with all #[inline]s replaced by #[inline(never)] (so that the functions actually codegen), I get a segfault for permlane64_u32, and LLVM selection errors for s.get.waveid.in.workgroup and wave.id. This is probably due to missing target features or something similar, I noticed that you made comments that these are only available in gfx12 and higher (probably, my knowledge of GPUs is nonexistent), but we are building with target-cpu=gfx900.
|
I don't have any strong preferences, if you feel that the lower-level LLVM intrinsics are more useful that just having the clang intrinsics, go ahead with that -- we can change it, remove it, or add the clang intrinsics too if required (as long as they are unstable) |
Add intrinsics for the amdgpu architecture.
I’m not sure how to add/run CI (
ci/run.shfails for me e.g. fornvptx because
corecannot be found), but I checked that it compileswithout warnings with
Tracking issue: rust-lang/rust#149988