Hi!
When we tested host_to-vice_cemcpy_sm and Device_to-host_cemcpy_sm separately on the H100 cluster, we obtained two completely different values
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 35.19 35.25 35.30 35.03 35.25 35.32 35.39 35.06
Running device_to_host_memcpy_sm.
memcpy SM CPU(row) <- GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 52.77 52.77 52.77 52.78 52.76 52.77 52.78 52.77
Actually, they should be close values.
What could be causing this?
Hi!
When we tested
host_to-vice_cemcpy_smandDevice_to-host_cemcpy_smseparately on the H100 cluster, we obtained two completely different valuesActually, they should be close values.
What could be causing this?