Skip to content

[Optimization]: Reduce branching when possible in casting.hpp#117

Open
zacharyvincze wants to merge 28 commits into
ROCm:developfrom
zacharyvincze:zv/optimization/optimize-casting-performance
Open

[Optimization]: Reduce branching when possible in casting.hpp#117
zacharyvincze wants to merge 28 commits into
ROCm:developfrom
zacharyvincze:zv/optimization/optimize-casting-performance

Conversation

@zacharyvincze
Copy link
Copy Markdown
Contributor

Details

  • Removes branching where possible to the casting helper functions seen in casting.hpp. Aims to reduce divergence on GPU kernel implementations.
  • Includes fixes to some float -> integer saturation casts, especially for 32/64-bit integer cases that are not represented exactly as 32-bit floats.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the core casting helpers to reduce branching (especially for GPU code paths) and adjusts saturation behavior for some float→integer conversions, alongside adding a small test and extending supported type traits.

Changes:

  • Refactors ScalarSaturateCast / ScalarRangeCast logic in casting.hpp to use more branchless/min-max based clamping and special-case small integer widths.
  • Extends type traits support to include long/ulong vectorized types.
  • Adds a new C++ test covering basic SaturateCast behavior and a few limit/vector cases.
  • Adjusts the GPU block dimensions for the Composite operator kernel launch.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
include/core/detail/casting.hpp Refactors saturate/range cast implementations to reduce branching and adjust clamping/rounding logic.
include/core/detail/type_traits.hpp Adds long / ulong to the type-traits macro set.
tests/roccv/cpp/src/tests/core/detail/test_saturate_cast.cpp Introduces a basic unit test for SaturateCast, including a couple of vectorized casts.
src/op_composite.cpp Changes GPU kernel launch block dimensions for the composite operator.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/core/detail/casting.hpp
Comment thread include/core/detail/casting.hpp Outdated
Comment thread include/core/detail/casting.hpp Outdated
Comment thread tests/roccv/cpp/src/tests/core/detail/test_saturate_cast.cpp
Comment thread src/op_composite.cpp
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
include/core/detail/casting.hpp 88.89% 4 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #117      +/-   ##
===========================================
+ Coverage    78.13%   78.31%   +0.18%     
===========================================
  Files           79       79              
  Lines         3347     3389      +42     
  Branches       733      733              
===========================================
+ Hits          2615     2654      +39     
  Misses         369      369              
- Partials       363      366       +3     
Files with missing lines Coverage Δ
include/core/detail/type_traits.hpp 100.00% <ø> (ø)
include/core/detail/casting.hpp 92.73% <88.89%> (+0.08%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@simonCatBot
Copy link
Copy Markdown

Review: [Optimization] Reduce branching in casting.hpp

Kernel optimization focusing on GPU divergence:

Changes:

  • Branch reduction in casting helper functions
  • Fixes for float->integer saturation casts (32/64-bit cases)
  • 4 files changed, +129/-33 lines

Assessment: Needs Review - Performance optimization.

Reducing branching in GPU kernels is always good for warp efficiency. The fixes for 32/64-bit integer saturation casts sound important - precision issues in type conversion can be subtle bugs.

Would benefit from:

  1. Performance benchmarks showing divergence reduction
  2. Verification that precision is maintained for edge cases
  3. Review of the saturation logic changes

Solid optimization PR.

@zacharyvincze
Copy link
Copy Markdown
Contributor Author

Fixed some issues with assuming that certain types were floats. Now making sure to use the proper function version when computing with doubles to maintain precision.

Added some more tests as well to catch more edge cases for Range/Saturate/Static casting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:precheckin enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants