Perf: bring SIMD `take` back and generalize by `Copy` #5722

connortsui20 · 2025-12-12T21:57:16Z

Brings back the portable_simd take implementation back, and instead of constraining by NativePType, this bounds by T: Copy and will cast to u8 - u64 depending on the size of the type.

This also adds a check for out-of-bounds indices that adds a single simd and bitwise instruction to the hot loop so we correctly panic at the end if there was an out of bounds.

It might be the case that separating out the &= and the simd_lt is so that the gather isn't depending on the &=, but if register pressure is high then we do not want to evict any data from the SIMD registers. I should probably benchmark that...

codecov · 2025-12-12T22:06:32Z

Codecov Report

❌ Patch coverage is 54.32099% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.68%. Comparing base (9a76489) to head (078ca7c).

Files with missing lines	Patch %	Lines
vortex-compute/src/take/slice/asm_stubs.rs	0.00%	18 Missing ⚠️
vortex-compute/src/take/slice/portable.rs	44.82%	16 Missing ⚠️
vortex-compute/src/take/slice/mod.rs	0.00%	2 Missing ⚠️
vortex-compute/src/take/slice/avx2.rs	96.87%	1 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codspeed-hq · 2025-12-12T22:27:07Z

CodSpeed Performance Report

Merging #5722 will improve performances by 22.72%

_{Comparing ct/simd-portable-take (078ca7c) with develop (9a76489)}

Summary

⚡ 20 improvements
✅ 1236 untouched
⏩ 621 skipped¹

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`pvector_take_zipfian[16, 1000]`	10.4 µs	9.3 µs	+11.81%
⚡	`pvector_take_zipfian[256, 100000]`	674.8 µs	550 µs	+22.7%
⚡	`pvector_take_zipfian[2048, 10000]`	74.6 µs	62.2 µs	+19.88%
⚡	`pvector_take_zipfian[2048, 100000]`	678.3 µs	553.5 µs	+22.56%
⚡	`pvector_take_zipfian[8192, 10000]`	85.8 µs	73.4 µs	+16.82%
⚡	`pvector_take_zipfian[256, 10000]`	71.3 µs	59 µs	+20.92%
⚡	`pvector_take_zipfian[256, 1000]`	10.8 µs	9.7 µs	+11.33%
⚡	`pvector_take_uniform[16, 100000]`	674.4 µs	549.6 µs	+22.71%
⚡	`pvector_take_uniform[16, 10000]`	70.5 µs	58.2 µs	+21.21%
⚡	`pvector_take_zipfian[8192, 100000]`	700.7 µs	575.8 µs	+21.68%
⚡	`pvector_take_uniform[2048, 100000]`	678.1 µs	553.3 µs	+22.56%
⚡	`pvector_take_uniform[256, 100000]`	674.9 µs	550 µs	+22.69%
⚡	`pvector_take_uniform[2048, 10000]`	74.9 µs	62.6 µs	+19.67%
⚡	`pvector_take_zipfian[16, 100000]`	674.4 µs	549.5 µs	+22.72%
⚡	`pvector_take_uniform[16, 1000]`	11.1 µs	10.1 µs	+10.21%
⚡	`pvector_take_uniform[256, 10000]`	71.3 µs	59 µs	+20.93%
⚡	`pvector_take_uniform[256, 1000]`	11.6 µs	10.5 µs	+10.46%
⚡	`pvector_take_zipfian[16, 10000]`	70.9 µs	58.5 µs	+21.08%
⚡	`pvector_take_uniform[8192, 10000]`	87.9 µs	75.6 µs	+16.27%
⚡	`pvector_take_uniform[8192, 100000]`	717.4 µs	592.9 µs	+21.01%

621 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

connortsui20 · 2025-12-12T23:14:25Z

it might be the case that the portable simd is actually faster than the avx2 impl? I think we need to do some more directed benchmarks...

add OOB check + safety comments Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 requested review from a10y and gatesn December 12, 2025 21:57

connortsui20 added the performance Release label indicating an improvement to performance label Dec 12, 2025

connortsui20 force-pushed the ct/simd-portable-take branch from 759b7ce to 4b3a5c3 Compare December 12, 2025 21:58

connortsui20 marked this pull request as draft December 12, 2025 22:04

connortsui20 changed the title ~~Perf: bring portable simd back and generalize by Copy~~ Perf: bring SIMD take back and generalize by Copy Dec 12, 2025

connortsui20 mentioned this pull request Dec 13, 2025

Perf: optimize take_scalar #5723

Merged

connortsui20 force-pushed the ct/simd-portable-take branch 3 times, most recently from cc8671c to 5312d71 Compare December 13, 2025 17:49

connortsui20 added 3 commits December 15, 2025 09:04

bring portable simd take back

859e7d8

add OOB check + safety comments Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

add avx2 take impl back and bound by Copy

e184b25

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

TODO

078ca7c

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/simd-portable-take branch from f78fda7 to 078ca7c Compare December 15, 2025 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: bring SIMD `take` back and generalize by `Copy` #5722

Perf: bring SIMD `take` back and generalize by `Copy` #5722

Uh oh!

connortsui20 commented Dec 12, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

connortsui20 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Perf: bring SIMD take back and generalize by Copy #5722

Are you sure you want to change the base?

Perf: bring SIMD take back and generalize by Copy #5722

Uh oh!

Conversation

connortsui20 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #5722 will improve performances by 22.72%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

connortsui20 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Perf: bring SIMD `take` back and generalize by `Copy` #5722

Perf: bring SIMD `take` back and generalize by `Copy` #5722

connortsui20 commented Dec 12, 2025 •

edited

Loading

codecov bot commented Dec 12, 2025 •

edited

Loading

codspeed-hq bot commented Dec 12, 2025 •

edited

Loading