Skip to content

Enhance the coverage of FP8 TN gridbased on Navi4x#4209

Merged
ericwan-amd merged 5 commits intodevelopfrom
users/ericwan/navi4x_tuning_fp8_tn_gridbased
Feb 24, 2026
Merged

Enhance the coverage of FP8 TN gridbased on Navi4x#4209
ericwan-amd merged 5 commits intodevelopfrom
users/ericwan/navi4x_tuning_fp8_tn_gridbased

Conversation

@ericwan-amd
Copy link
Copy Markdown
Contributor

@ericwan-amd ericwan-amd commented Feb 3, 2026

Motivation

This PR aims to enhance the performance of the gridbased kernel in nav4x by enabling both DTVA and DTVB tuning for FP8 TN. Additionally, it introduces problem size distributions based on the previous version to better reflect realistic workloads.

Technical Details

  • Enabled DTVA and DTVB for more comprehensive tuning coverage
  • Expanded the tuning MT combinations and other params
  • Replaced deprecated logic YAML naming conventions for consistency
  • Extended tuning support from f8f8s to all f8 datatype

The tuning results of F8F8S will be like the below figures:
gfx1201:
image

gfx1200:
image

Test Plan

Verified locally using hipblaslt-test on gfx1200 and gfx1201 platforms.

Test Result

  • gfx1201 local hipblaslt-test result:
    [----------] Global test environment tear-down [==========] 40101 tests from 11 test suites ran. (388998 ms total) [ PASSED ] 40101 tests. hipBLASLt version: 100200 hipBLASLt git version: command line: ./build/release/clients/hipblaslt-test

  • gfx1200 local hipblaslt-test result
    [----------] Global test environment tear-down [==========] 40101 tests from 11 test suites ran. (473105 ms total) [ PASSED ] 40101 tests. hipBLASLt version: 100200 hipBLASLt git version: command line: ./build/release/clients/hipblaslt-test

Related Tickets

Submission Checklist

@ericwan-amd ericwan-amd requested a review from a team as a code owner February 3, 2026 05:22
@ericwan-amd ericwan-amd force-pushed the users/ericwan/navi4x_tuning_fp8_tn_gridbased branch from 9721f80 to bc7f34f Compare February 11, 2026 13:29
@math-ci-webhook
Copy link
Copy Markdown

perfci run on commit 1560727

math-ci run

ericwan-amd and others added 5 commits February 12, 2026 22:24
[note]:
  * tuned with DTV enabled
  * extended problem size to support frequently used models
[note]:
  * tuned with DTV enabled
  * extended gridpoints to support frequently used models
@ericwan-amd ericwan-amd force-pushed the users/ericwan/navi4x_tuning_fp8_tn_gridbased branch from 1560727 to 9830931 Compare February 14, 2026 08:16
@ericwan-amd ericwan-amd changed the title Users/ericwan/navi4x tuning fp8 tn gridbased Enhance the coverage of FP8 TN gridbased on Navi4x Feb 14, 2026
@math-ci-webhook
Copy link
Copy Markdown

perfci run on commit 9830931

math-ci run

@ericwan-amd ericwan-amd merged commit f828c84 into develop Feb 24, 2026
62 of 65 checks passed
@ericwan-amd ericwan-amd deleted the users/ericwan/navi4x_tuning_fp8_tn_gridbased branch February 24, 2026 09:17
aosewski pushed a commit that referenced this pull request Feb 24, 2026
## Motivation
This PR aims to enhance the performance of the gridbased kernel in nav4x
by enabling both DTVA and DTVB tuning for FP8 TN. Additionally, it
introduces problem size distributions based on the previous version to
better reflect realistic workloads.
<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details
* Enabled DTVA and DTVB for more comprehensive tuning coverage
* Expanded the tuning MT combinations and other params
* Replaced deprecated logic YAML naming conventions for consistency
* Extended tuning support from f8f8s to all f8 datatype
 
The tuning results of F8F8S will be like the below figures:
**gfx1201:**
<img width="288" height="300" alt="image"
src="https://github.com/user-attachments/assets/65411684-4958-4842-a8d1-e4c2151a57f0"
/>

**gfx1200:**
<img width="290" height="304" alt="image"
src="https://github.com/user-attachments/assets/f012ce9c-9ffa-4bf5-be1c-ba582937dec5"
/>


<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan
Verified locally using hipblaslt-test on gfx1200 and gfx1201 platforms.

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result
* gfx1201 local hipblaslt-test result:
`[----------] Global test environment tear-down
[==========] 40101 tests from 11 test suites ran. (388998 ms total)
[  PASSED  ] 40101 tests.
hipBLASLt version: 100200
hipBLASLt git version: 
command line: ./build/release/clients/hipblaslt-test `

* gfx1200 local hipblaslt-test result
`[----------] Global test environment tear-down
[==========] 40101 tests from 11 test suites ran. (473105 ms total)
[  PASSED  ] 40101 tests.
hipBLASLt version: 100200
hipBLASLt git version: 
command line: ./build/release/clients/hipblaslt-test`

<!-- Briefly summarize test outcomes. -->

## Related Tickets
- https://amd-hub.atlassian.net/browse/AIHPBLAS-776

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: ericwan-amd <Eric.Wang@amd.com>
NaveenElumalaiAMD pushed a commit that referenced this pull request Mar 6, 2026
## Motivation
This PR aims to enhance the performance of the gridbased kernel in nav4x
by enabling both DTVA and DTVB tuning for FP8 TN. Additionally, it
introduces problem size distributions based on the previous version to
better reflect realistic workloads.
<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details
* Enabled DTVA and DTVB for more comprehensive tuning coverage
* Expanded the tuning MT combinations and other params
* Replaced deprecated logic YAML naming conventions for consistency
* Extended tuning support from f8f8s to all f8 datatype
 
The tuning results of F8F8S will be like the below figures:
**gfx1201:**
<img width="288" height="300" alt="image"
src="https://github.com/user-attachments/assets/65411684-4958-4842-a8d1-e4c2151a57f0"
/>

**gfx1200:**
<img width="290" height="304" alt="image"
src="https://github.com/user-attachments/assets/f012ce9c-9ffa-4bf5-be1c-ba582937dec5"
/>


<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan
Verified locally using hipblaslt-test on gfx1200 and gfx1201 platforms.

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result
* gfx1201 local hipblaslt-test result:
`[----------] Global test environment tear-down
[==========] 40101 tests from 11 test suites ran. (388998 ms total)
[  PASSED  ] 40101 tests.
hipBLASLt version: 100200
hipBLASLt git version: 
command line: ./build/release/clients/hipblaslt-test `

* gfx1200 local hipblaslt-test result
`[----------] Global test environment tear-down
[==========] 40101 tests from 11 test suites ran. (473105 ms total)
[  PASSED  ] 40101 tests.
hipBLASLt version: 100200
hipBLASLt git version: 
command line: ./build/release/clients/hipblaslt-test`

<!-- Briefly summarize test outcomes. -->

## Related Tickets
- https://amd-hub.atlassian.net/browse/AIHPBLAS-776

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: ericwan-amd <Eric.Wang@amd.com>
jovanau pushed a commit to jovanau/rocm-libraries that referenced this pull request Mar 19, 2026
## Motivation
This PR aims to enhance the performance of the gridbased kernel in nav4x
by enabling both DTVA and DTVB tuning for FP8 TN. Additionally, it
introduces problem size distributions based on the previous version to
better reflect realistic workloads.
<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details
* Enabled DTVA and DTVB for more comprehensive tuning coverage
* Expanded the tuning MT combinations and other params
* Replaced deprecated logic YAML naming conventions for consistency
* Extended tuning support from f8f8s to all f8 datatype
 
The tuning results of F8F8S will be like the below figures:
**gfx1201:**
<img width="288" height="300" alt="image"
src="https://github.com/user-attachments/assets/65411684-4958-4842-a8d1-e4c2151a57f0"
/>

**gfx1200:**
<img width="290" height="304" alt="image"
src="https://github.com/user-attachments/assets/f012ce9c-9ffa-4bf5-be1c-ba582937dec5"
/>


<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan
Verified locally using hipblaslt-test on gfx1200 and gfx1201 platforms.

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result
* gfx1201 local hipblaslt-test result:
`[----------] Global test environment tear-down
[==========] 40101 tests from 11 test suites ran. (388998 ms total)
[  PASSED  ] 40101 tests.
hipBLASLt version: 100200
hipBLASLt git version: 
command line: ./build/release/clients/hipblaslt-test `

* gfx1200 local hipblaslt-test result
`[----------] Global test environment tear-down
[==========] 40101 tests from 11 test suites ran. (473105 ms total)
[  PASSED  ] 40101 tests.
hipBLASLt version: 100200
hipBLASLt git version: 
command line: ./build/release/clients/hipblaslt-test`

<!-- Briefly summarize test outcomes. -->

## Related Tickets
- https://amd-hub.atlassian.net/browse/AIHPBLAS-776

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: ericwan-amd <Eric.Wang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants