Enhance the coverage of FP8 TN gridbased on Navi4x by ericwan-amd · Pull Request #4209 · ROCm/rocm-libraries

ericwan-amd · 2026-02-03T05:22:22Z

Motivation

This PR aims to enhance the performance of the gridbased kernel in nav4x by enabling both DTVA and DTVB tuning for FP8 TN. Additionally, it introduces problem size distributions based on the previous version to better reflect realistic workloads.

Technical Details

Enabled DTVA and DTVB for more comprehensive tuning coverage
Expanded the tuning MT combinations and other params
Replaced deprecated logic YAML naming conventions for consistency
Extended tuning support from f8f8s to all f8 datatype

The tuning results of F8F8S will be like the below figures:
gfx1201:

gfx1200:

Test Plan

Verified locally using hipblaslt-test on gfx1200 and gfx1201 platforms.

Test Result

gfx1201 local hipblaslt-test result:
[----------] Global test environment tear-down [==========] 40101 tests from 11 test suites ran. (388998 ms total) [ PASSED ] 40101 tests. hipBLASLt version: 100200 hipBLASLt git version: command line: ./build/release/clients/hipblaslt-test
gfx1200 local hipblaslt-test result
[----------] Global test environment tear-down [==========] 40101 tests from 11 test suites ran. (473105 ms total) [ PASSED ] 40101 tests. hipBLASLt version: 100200 hipBLASLt git version: command line: ./build/release/clients/hipblaslt-test

Related Tickets

https://amd-hub.atlassian.net/browse/AIHPBLAS-776

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

math-ci-webhook · 2026-02-11T15:49:18Z

perfci run on commit `1560727`

math-ci run

[note]: * tuned with DTV enabled * extended problem size to support frequently used models

…ridbased

[note]: * tuned with DTV enabled * extended gridpoints to support frequently used models

… handling on gfx1201

… handling on gfx1200

math-ci-webhook · 2026-02-14T09:39:44Z

perfci run on commit `9830931`

math-ci run

## Motivation This PR aims to enhance the performance of the gridbased kernel in nav4x by enabling both DTVA and DTVB tuning for FP8 TN. Additionally, it introduces problem size distributions based on the previous version to better reflect realistic workloads.  ## Technical Details * Enabled DTVA and DTVB for more comprehensive tuning coverage * Expanded the tuning MT combinations and other params * Replaced deprecated logic YAML naming conventions for consistency * Extended tuning support from f8f8s to all f8 datatype The tuning results of F8F8S will be like the below figures: **gfx1201:** <img width="288" height="300" alt="image" src="https://github.com/user-attachments/assets/65411684-4958-4842-a8d1-e4c2151a57f0" /> **gfx1200:** <img width="290" height="304" alt="image" src="https://github.com/user-attachments/assets/f012ce9c-9ffa-4bf5-be1c-ba582937dec5" />  ## Test Plan Verified locally using hipblaslt-test on gfx1200 and gfx1201 platforms.  ## Test Result * gfx1201 local hipblaslt-test result: `[----------] Global test environment tear-down [==========] 40101 tests from 11 test suites ran. (388998 ms total) [ PASSED ] 40101 tests. hipBLASLt version: 100200 hipBLASLt git version: command line: ./build/release/clients/hipblaslt-test ` * gfx1200 local hipblaslt-test result `[----------] Global test environment tear-down [==========] 40101 tests from 11 test suites ran. (473105 ms total) [ PASSED ] 40101 tests. hipBLASLt version: 100200 hipBLASLt git version: command line: ./build/release/clients/hipblaslt-test`  ## Related Tickets - https://amd-hub.atlassian.net/browse/AIHPBLAS-776 ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: ericwan-amd <Eric.Wang@amd.com>

ericwan-amd requested a review from a team as a code owner February 3, 2026 05:22

github-actions Bot added the project: hipblaslt label Feb 3, 2026

assistant-librarian Bot added the organization: ROCm label Feb 3, 2026

ericwan-amd force-pushed the users/ericwan/navi4x_tuning_fp8_tn_gridbased branch from 9721f80 to bc7f34f Compare February 11, 2026 13:29

ericwan-amd and others added 5 commits February 12, 2026 22:24

Enhance the coverage of gridbased on gfx1201

b281d93

[note]: * tuned with DTV enabled * extended problem size to support frequently used models

Update FP8 logic YAML to include missing tuning results for gfx1201 g…

cf9334d

…ridbased

Enhance the coverage of gridbased on gfx1200

cf5816e

[note]: * tuned with DTV enabled * extended gridpoints to support frequently used models

Update FP8 TN gridbased logic YAML to unify UseCustomMainLoopSchedule…

2deaaec

… handling on gfx1201

Update FP8 TN gridbased logic YAML to unify UseCustomMainLoopSchedule…

9830931

… handling on gfx1200

ericwan-amd force-pushed the users/ericwan/navi4x_tuning_fp8_tn_gridbased branch from 1560727 to 9830931 Compare February 14, 2026 08:16

ericwan-amd changed the title ~~Users/ericwan/navi4x tuning fp8 tn gridbased~~ Enhance the coverage of FP8 TN gridbased on Navi4x Feb 14, 2026

ericwan-amd requested review from CurtisFu1002, cliffxzx, cmingch and wenchuanchen February 14, 2026 08:48

cmingch approved these changes Feb 24, 2026

View reviewed changes

wenchuanchen approved these changes Feb 24, 2026

View reviewed changes

ericwan-amd merged commit f828c84 into develop Feb 24, 2026
62 of 65 checks passed

ericwan-amd deleted the users/ericwan/navi4x_tuning_fp8_tn_gridbased branch February 24, 2026 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance the coverage of FP8 TN gridbased on Navi4x#4209

Enhance the coverage of FP8 TN gridbased on Navi4x#4209
ericwan-amd merged 5 commits intodevelopfrom
users/ericwan/navi4x_tuning_fp8_tn_gridbased

ericwan-amd commented Feb 3, 2026 •

edited

Loading

Uh oh!

math-ci-webhook Bot commented Feb 11, 2026

Uh oh!

math-ci-webhook Bot commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericwan-amd commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Related Tickets

Submission Checklist

Uh oh!

math-ci-webhook Bot commented Feb 11, 2026

perfci run on commit 1560727

Uh oh!

math-ci-webhook Bot commented Feb 14, 2026

perfci run on commit 9830931

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericwan-amd commented Feb 3, 2026 •

edited

Loading

perfci run on commit `1560727`

perfci run on commit `9830931`