Skip to content

[rocblas] Optimize tpsv APa address calculation#4213

Merged
TorreZuk merged 4 commits intoROCm:developfrom
actinks:users/actinks/tpsv-opt
Apr 6, 2026
Merged

[rocblas] Optimize tpsv APa address calculation#4213
TorreZuk merged 4 commits intoROCm:developfrom
actinks:users/actinks/tpsv-opt

Conversation

@actinks
Copy link
Copy Markdown
Contributor

@actinks actinks commented Feb 3, 2026

This pull request refactors and optimizes the forward and backward substitution routines in rocblas_tpsv_kernels.cpp by improving index calculations and loop structure. The changes enhance code clarity, reduce redundant calculations, and ensure more efficient traversal of packed matrix elements.

Forward substitution improvements:

  • Refactored the calculation of matrix indices in the main substitution loop to initialize rowA, colA, and indexA before the loop and incrementally update them, reducing repeated computation and improving clarity.
  • Optimized the summation loop by initializing colA and indexA outside the loop and updating them incrementally, streamlining access to packed matrix elements.

Backward substitution improvements:

  • Refactored the backward substitution loop to initialize rowA, colA, and indexA before the loop and decrementally update them, improving efficiency and readability.
  • Optimized the summation loop in backward substitution by initializing and updating colA and indexA incrementally, mirroring the improvements in forward substitution.

@assistant-librarian assistant-librarian Bot added the external contribution Code contribution from users community.. label Feb 3, 2026
@actinks actinks marked this pull request as ready for review February 4, 2026 13:03
@actinks actinks requested a review from a team as a code owner February 4, 2026 13:03
@actinks actinks marked this pull request as draft February 4, 2026 13:07
@actinks actinks force-pushed the users/actinks/tpsv-opt branch from 0f3e5d7 to f3116ec Compare February 5, 2026 02:50
@actinks actinks marked this pull request as ready for review February 5, 2026 02:50
@TorreZuk TorreZuk force-pushed the users/actinks/tpsv-opt branch from f3116ec to cf01e6b Compare February 11, 2026 22:38
@TorreZuk
Copy link
Copy Markdown
Contributor

Have unrelated CI issues so require rebase, I will review tomorrow if no one else gets to it first.

@actinks actinks force-pushed the users/actinks/tpsv-opt branch from cf01e6b to 87bf44b Compare February 12, 2026 01:10
@TorreZuk TorreZuk force-pushed the users/actinks/tpsv-opt branch from 87bf44b to 74453d4 Compare February 17, 2026 23:08
@TorreZuk
Copy link
Copy Markdown
Contributor

TorreZuk commented Mar 3, 2026

Sorry for the delay but extended tests on this PR fail. run rocblas-test with --gtest_filter=nightly-known_bug' with Memory access fault by GPU. If you can run these tests locally you can likely refine locally to fix the bug.

@actinks actinks force-pushed the users/actinks/tpsv-opt branch from f1d7872 to 3c3b231 Compare March 25, 2026 09:03
@actinks
Copy link
Copy Markdown
Contributor Author

actinks commented Mar 30, 2026

ping @TorreZuk

@TorreZuk
Copy link
Copy Markdown
Contributor

@actinks I am re-running some tests with your latest fixes, hopefully I can provide update tomorrow.

@TorreZuk TorreZuk force-pushed the users/actinks/tpsv-opt branch from 3c3b231 to f4d880c Compare April 2, 2026 14:05
@TorreZuk
Copy link
Copy Markdown
Contributor

TorreZuk commented Apr 2, 2026

Sorry unrelated technical issues were blocking tests, still working on them to allow merge

@TorreZuk TorreZuk force-pushed the users/actinks/tpsv-opt branch from f4d880c to 8fa8d8c Compare April 2, 2026 20:08
Copy link
Copy Markdown
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes passing all tests and clang formatted

@TorreZuk TorreZuk merged commit 3ea8988 into ROCm:develop Apr 6, 2026
33 checks passed
assistant-librarian Bot pushed a commit to ROCm/rocBLAS that referenced this pull request Apr 6, 2026
[rocblas] Optimize tpsv APa address calculation

This pull request refactors and optimizes the forward and backward
substitution routines in `rocblas_tpsv_kernels.cpp` by improving index
calculations and loop structure. The changes enhance code clarity,
reduce redundant calculations, and ensure more efficient traversal of
packed matrix elements.

**Forward substitution improvements:**

* Refactored the calculation of matrix indices in the main substitution
loop to initialize `rowA`, `colA`, and `indexA` before the loop and
incrementally update them, reducing repeated computation and improving
clarity.
* Optimized the summation loop by initializing `colA` and `indexA`
outside the loop and updating them incrementally, streamlining access to
packed matrix elements.

**Backward substitution improvements:**

* Refactored the backward substitution loop to initialize `rowA`,
`colA`, and `indexA` before the loop and decrementally update them,
improving efficiency and readability.
* Optimized the summation loop in backward substitution by initializing
and updating `colA` and `indexA` incrementally, mirroring the
improvements in forward substitution.
vidyasagar-amd pushed a commit that referenced this pull request Apr 9, 2026
This pull request refactors and optimizes the forward and backward
substitution routines in `rocblas_tpsv_kernels.cpp` by improving index
calculations and loop structure. The changes enhance code clarity,
reduce redundant calculations, and ensure more efficient traversal of
packed matrix elements.

**Forward substitution improvements:**

* Refactored the calculation of matrix indices in the main substitution
loop to initialize `rowA`, `colA`, and `indexA` before the loop and
incrementally update them, reducing repeated computation and improving
clarity.
* Optimized the summation loop by initializing `colA` and `indexA`
outside the loop and updating them incrementally, streamlining access to
packed matrix elements.

**Backward substitution improvements:**

* Refactored the backward substitution loop to initialize `rowA`,
`colA`, and `indexA` before the loop and decrementally update them,
improving efficiency and readability.
* Optimized the summation loop in backward substitution by initializing
and updating `colA` and `indexA` incrementally, mirroring the
improvements in forward substitution.

---------

Co-authored-by: Torre Zuk <Torre.Zuk@amd.com>
AaronStGeorge pushed a commit to AaronStGeorge/rocm-libraries that referenced this pull request Apr 16, 2026
This pull request refactors and optimizes the forward and backward
substitution routines in `rocblas_tpsv_kernels.cpp` by improving index
calculations and loop structure. The changes enhance code clarity,
reduce redundant calculations, and ensure more efficient traversal of
packed matrix elements.

**Forward substitution improvements:**

* Refactored the calculation of matrix indices in the main substitution
loop to initialize `rowA`, `colA`, and `indexA` before the loop and
incrementally update them, reducing repeated computation and improving
clarity.
* Optimized the summation loop by initializing `colA` and `indexA`
outside the loop and updating them incrementally, streamlining access to
packed matrix elements.

**Backward substitution improvements:**

* Refactored the backward substitution loop to initialize `rowA`,
`colA`, and `indexA` before the loop and decrementally update them,
improving efficiency and readability.
* Optimized the summation loop in backward substitution by initializing
and updating `colA` and `indexA` incrementally, mirroring the
improvements in forward substitution.

---------

Co-authored-by: Torre Zuk <Torre.Zuk@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:extended external contribution Code contribution from users community.. project: rocblas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants