-
Notifications
You must be signed in to change notification settings - Fork 107
Perf: bring SIMD take back and generalize by Copy
#5722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
759b7ce to
4b3a5c3
Compare
Codecov Report❌ Patch coverage is ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Copytake back and generalize by Copy
CodSpeed Performance ReportMerging #5722 will improve performances by 22.72%Comparing Summary
Benchmarks breakdownFootnotes
|
|
it might be the case that the portable simd is actually faster than the avx2 impl? I think we need to do some more directed benchmarks... |
cc8671c to
5312d71
Compare
add OOB check + safety comments Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
f78fda7 to
078ca7c
Compare
Brings back the
portable_simdtakeimplementation back, and instead of constraining byNativePType, this bounds byT: Copyand will cast tou8-u64depending on the size of the type.This also adds a check for out-of-bounds indices that adds a single simd and bitwise instruction to the hot loop so we correctly panic at the end if there was an out of bounds.
It might be the case that separating out the
&=and thesimd_ltis so that the gather isn't depending on the&=, but if register pressure is high then we do not want to evict any data from the SIMD registers. I should probably benchmark that...