[rocblas] Optimize tpsv APa address calculation#4213
Merged
TorreZuk merged 4 commits intoROCm:developfrom Apr 6, 2026
Merged
Conversation
0f3e5d7 to
f3116ec
Compare
f3116ec to
cf01e6b
Compare
Contributor
|
Have unrelated CI issues so require rebase, I will review tomorrow if no one else gets to it first. |
cf01e6b to
87bf44b
Compare
87bf44b to
74453d4
Compare
Contributor
|
Sorry for the delay but extended tests on this PR fail. run rocblas-test with --gtest_filter=nightly-known_bug' with Memory access fault by GPU. If you can run these tests locally you can likely refine locally to fix the bug. |
f1d7872 to
3c3b231
Compare
Contributor
Author
|
ping @TorreZuk |
Contributor
|
@actinks I am re-running some tests with your latest fixes, hopefully I can provide update tomorrow. |
3c3b231 to
f4d880c
Compare
Contributor
|
Sorry unrelated technical issues were blocking tests, still working on them to allow merge |
f4d880c to
8fa8d8c
Compare
TorreZuk
approved these changes
Apr 6, 2026
Contributor
TorreZuk
left a comment
There was a problem hiding this comment.
Changes passing all tests and clang formatted
assistant-librarian Bot
pushed a commit
to ROCm/rocBLAS
that referenced
this pull request
Apr 6, 2026
[rocblas] Optimize tpsv APa address calculation This pull request refactors and optimizes the forward and backward substitution routines in `rocblas_tpsv_kernels.cpp` by improving index calculations and loop structure. The changes enhance code clarity, reduce redundant calculations, and ensure more efficient traversal of packed matrix elements. **Forward substitution improvements:** * Refactored the calculation of matrix indices in the main substitution loop to initialize `rowA`, `colA`, and `indexA` before the loop and incrementally update them, reducing repeated computation and improving clarity. * Optimized the summation loop by initializing `colA` and `indexA` outside the loop and updating them incrementally, streamlining access to packed matrix elements. **Backward substitution improvements:** * Refactored the backward substitution loop to initialize `rowA`, `colA`, and `indexA` before the loop and decrementally update them, improving efficiency and readability. * Optimized the summation loop in backward substitution by initializing and updating `colA` and `indexA` incrementally, mirroring the improvements in forward substitution.
vidyasagar-amd
pushed a commit
that referenced
this pull request
Apr 9, 2026
This pull request refactors and optimizes the forward and backward substitution routines in `rocblas_tpsv_kernels.cpp` by improving index calculations and loop structure. The changes enhance code clarity, reduce redundant calculations, and ensure more efficient traversal of packed matrix elements. **Forward substitution improvements:** * Refactored the calculation of matrix indices in the main substitution loop to initialize `rowA`, `colA`, and `indexA` before the loop and incrementally update them, reducing repeated computation and improving clarity. * Optimized the summation loop by initializing `colA` and `indexA` outside the loop and updating them incrementally, streamlining access to packed matrix elements. **Backward substitution improvements:** * Refactored the backward substitution loop to initialize `rowA`, `colA`, and `indexA` before the loop and decrementally update them, improving efficiency and readability. * Optimized the summation loop in backward substitution by initializing and updating `colA` and `indexA` incrementally, mirroring the improvements in forward substitution. --------- Co-authored-by: Torre Zuk <Torre.Zuk@amd.com>
AaronStGeorge
pushed a commit
to AaronStGeorge/rocm-libraries
that referenced
this pull request
Apr 16, 2026
This pull request refactors and optimizes the forward and backward substitution routines in `rocblas_tpsv_kernels.cpp` by improving index calculations and loop structure. The changes enhance code clarity, reduce redundant calculations, and ensure more efficient traversal of packed matrix elements. **Forward substitution improvements:** * Refactored the calculation of matrix indices in the main substitution loop to initialize `rowA`, `colA`, and `indexA` before the loop and incrementally update them, reducing repeated computation and improving clarity. * Optimized the summation loop by initializing `colA` and `indexA` outside the loop and updating them incrementally, streamlining access to packed matrix elements. **Backward substitution improvements:** * Refactored the backward substitution loop to initialize `rowA`, `colA`, and `indexA` before the loop and decrementally update them, improving efficiency and readability. * Optimized the summation loop in backward substitution by initializing and updating `colA` and `indexA` incrementally, mirroring the improvements in forward substitution. --------- Co-authored-by: Torre Zuk <Torre.Zuk@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors and optimizes the forward and backward substitution routines in
rocblas_tpsv_kernels.cppby improving index calculations and loop structure. The changes enhance code clarity, reduce redundant calculations, and ensure more efficient traversal of packed matrix elements.Forward substitution improvements:
rowA,colA, andindexAbefore the loop and incrementally update them, reducing repeated computation and improving clarity.colAandindexAoutside the loop and updating them incrementally, streamlining access to packed matrix elements.Backward substitution improvements:
rowA,colA, andindexAbefore the loop and decrementally update them, improving efficiency and readability.colAandindexAincrementally, mirroring the improvements in forward substitution.