Skip to content

[asm]Add wide_stores support for MXFP4 (4wave) GEMM#1286

Open
xintin wants to merge 5 commits into
mainfrom
xintin/post_loop_stores_optimization_asm
Open

[asm]Add wide_stores support for MXFP4 (4wave) GEMM#1286
xintin wants to merge 5 commits into
mainfrom
xintin/post_loop_stores_optimization_asm

Conversation

@xintin
Copy link
Copy Markdown
Contributor

@xintin xintin commented Apr 9, 2026

WaveASM backend support for coalesced buffer_store_dwordx4 via v_permlane16_swap_b32.

Shape (M, N, K) Baseline TFLOPS Coalesced TFLOPS Speedup
(315904, 384, 1792) 1335.990 1336.420 1.000x
(20480, 1152, 28928) 2157.540 2153.370 1.000x
(5888, 1920, 3328) 1400.030 1557.610 1.113x
(4736, 2432, 321280) 2134.450 2132.890 0.999x
(46336, 2688, 8448) 2166.370 2166.730 1.000x
(3584, 2944, 512) 506.260 512.640 1.013x
(512, 3072, 428288) 892.940 893.070 1.000x
(7424, 3200, 4608) 1710.380 1830.580 1.070x
(256, 4224, 102656) 651.640 654.040 1.004x
(4864, 4608, 2560) 1550.700 1633.050 1.053x
(4864, 4608, 5888) 1858.190 1950.210 1.050x
(4608, 4992, 7424) 1973.810 2074.320 1.051x
(2048, 5376, 63232) 2342.210 2413.170 1.030x
(2048, 5760, 6400) 1875.560 1914.100 1.021x
(256, 6784, 394496) 970.000 970.780 1.001x
(3584, 6912, 548608) 2369.900 2376.520 1.003x
(5120, 7040, 4864) 1863.820 1900.100 1.019x
(4864, 7680, 7936) 2186.690 2234.010 1.022x
(2816, 7808, 1792) 1212.600 1251.280 1.032x
(3072, 7808, 3328) 1602.910 1661.940 1.037x
(14848, 12672, 10496) 2539.220 2538.450 1.000x

@xintin xintin force-pushed the xintin/post_loop_stores_optimization_asm branch 2 times, most recently from cd1a2ef to cfea99c Compare April 9, 2026 12:36
@xintin xintin marked this pull request as ready for review April 9, 2026 13:10
@xintin xintin changed the title [wip][asm]Add wide_stores supprot for MXFP4 (4wave) GEMM [wip][asm]Add wide_stores support for MXFP4 (4wave) GEMM Apr 9, 2026
@xintin xintin force-pushed the xintin/post_loop_stores_optimization_asm branch 3 times, most recently from 74efac8 to 8cce94b Compare April 9, 2026 14:35
@xintin xintin changed the title [wip][asm]Add wide_stores support for MXFP4 (4wave) GEMM [asm]Add wide_stores support for MXFP4 (4wave) GEMM Apr 9, 2026
@xintin xintin requested a review from panditsa April 9, 2026 16:53
@xintin xintin force-pushed the xintin/wide_stores_llvm branch from 9f1c6cb to f01106c Compare April 9, 2026 20:22
Base automatically changed from xintin/wide_stores_llvm to main April 9, 2026 21:05
@xintin xintin changed the title [asm]Add wide_stores support for MXFP4 (4wave) GEMM [wip][asm]Add wide_stores support for MXFP4 (4wave) GEMM Apr 9, 2026
xintin added 3 commits April 9, 2026 21:27
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the xintin/post_loop_stores_optimization_asm branch from 8cce94b to 544c209 Compare April 9, 2026 21:28
@xintin xintin changed the title [wip][asm]Add wide_stores support for MXFP4 (4wave) GEMM [asm]Add wide_stores support for MXFP4 (4wave) GEMM Apr 9, 2026
xintin added 2 commits April 10, 2026 01:09
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the xintin/post_loop_stores_optimization_asm branch from 759055e to 97e0b05 Compare April 10, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant