Pytorch Symmetric memory backend by saivishal1999 · Pull Request #6022 · NVIDIA/Fuser

saivishal1999 · 2026-03-02T12:03:39Z

No description provided.

greptile-apps · 2026-03-02T12:09:56Z

Greptile Summary

This PR adds PyTorch symmetric memory backend support as an alternative to nvfuser's native CUDA VMM implementation. The implementation allows users to choose between Native (default), PyTorchNccl, PyTorchNvshmem, or PyTorchCuda backends via NVFUSER_ENABLE=symmetric_memory_backend(...).

Key changes:

Added SymmetricMemoryBackend enum and getSymmetricMemoryBackend() to query the selected backend
Extended Communicator with getStore() and getWorldBackendIntrusivePtr() to provide PyTorch integration points
Implemented PyTorch backend path in SymmetricTensor::allocate() using c10d::symmetric_memory::empty_strided_p2p
Added handle caching mechanism to bridge PyTorch allocation and SymmetricTensor construction
Methods like remoteTensor(), multicastPtr() delegate to PyTorch when py_symm_handle_ is set
Contiguous view operations explicitly unsupported for PyTorch backend (with clear error messages)
Added validation tests and backend selection tests

Issues found:

fbuild.sh is a personal build script with hardcoded paths and settings that should not be committed
Large block of commented-out test code should be removed or enabled

Confidence Score: 3/5

Safe to merge after removing fbuild.sh and addressing commented test code
The core implementation is well-structured with proper thread safety (mutex for cache), conditional compilation guards, and clear error messages. However, fbuild.sh must be removed as it's a personal build script, and the commented-out test should be cleaned up. The PyTorch integration logic appears sound but reduces confidence slightly due to the accidental inclusion of development artifacts.
fbuild.sh must be removed; tests/cpp/test_multidevice_symmetric_tensor.cpp needs commented code cleaned up

Important Files Changed

Filename	Overview
fbuild.sh	Personal build script with hardcoded paths and settings - should not be committed
csrc/multidevice/communicator.cpp	Implements getStore() and getWorldBackendIntrusivePtr() for PyTorch symmetric memory
csrc/multidevice/ipc_utils.cpp	Implements getSymmetricMemoryBackend() to query backend from options
csrc/multidevice/symmetric_tensor.cpp	Implements PyTorch backend path with handle caching, allocation, and delegation logic
tests/cpp/test_multidevice_symmetric_tensor.cpp	Adds backend tests and validation tests; contains commented-out PyTorch backend test

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start[SymmetricTensor::allocate] --> CheckBackend{getSymmetricMemoryBackend}
    
    CheckBackend -->|Native| CheckVMM[Check VMM Support]
    CheckBackend -->|PyTorch*| PyTorchPath[PyTorch Backend Path]
    
    PyTorchPath --> InitBackend[ensurePyTorchSymmMemBackend]
    InitBackend --> SetBackend[set_backend NCCL/NVSHMEM/CUDA]
    SetBackend --> SetGroupInfo[set_group_info with Communicator]
    SetGroupInfo --> ComputeStrides[Compute row-major strides]
    ComputeStrides --> CheckNCCL{Backend == PyTorchNccl?}
    CheckNCCL -->|Yes| AllocNoGroup[empty_strided_p2p with nullopt group]
    CheckNCCL -->|No| AllocWithGroup[empty_strided_p2p with nvfuser_symm group]
    AllocNoGroup --> Rendezvous[c10d::symmetric_memory::rendezvous]
    AllocWithGroup --> Rendezvous
    Rendezvous --> CacheHandle[Cache handle by data_ptr]
    CacheHandle --> ReturnTensor[Return tensor]
    
    CheckVMM --> NativeAlloc[Native CUDA VMM allocation]
    NativeAlloc --> ReturnTensor
    
    ReturnTensor --> UserCode[User constructs SymmetricTensor]
    UserCode --> Constructor[SymmetricTensor constructor]
    Constructor --> LookupCache{Check handle cache}
    LookupCache -->|Found| MovePyHandle[Move py_symm_handle_ from cache]
    LookupCache -->|Not found| NativeSetup[Native IPC setup]
    MovePyHandle --> RemoteAccess[Remote tensor access via PyTorch]
    NativeSetup --> RemoteAccess

_{Last reviewed commit: 14816aa}

greptile-apps

_{10 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-02T12:10:00Z

fbuild.sh

+#!/bin/bash
+
+export CC=clang-20
+export CXX=clang++-20
+export LDFLAGS="-fuse-ld=mold"
+
+export NVFUSER_BUILD_ENABLE_PCH
+
+export UCC_HOME="/opt/hpcx/ucc"
+export UCC_DIR="/opt/hpcx/ucc/lib/cmake/ucc"
+export UCX_HOME="/opt/hpcx/ucx"
+export UCX_DIR="/opt/hpcx/ucx/lib/cmake/ucx"
+
+# export TORCH_CUDA_ARCH_LIST="9.0"
+
+export NVFUSER_BUILD_WITH_UCC=1
+export NVFUSER_BUILD_INSTALL_DIR=$BUILD_DIRECTORY/nvfuser
+export NVFUSER_BUILD_DIR=$BUILD_DIRECTORY
+
+# Enable debug mode, leave empty for non-debug compilation
+export NVFUSER_BUILD_BUILD_TYPE=Debug
+export RUN_CMAKE=""
+
+pip install -v -e ./python --no-build-isolation


This appears to be a personal build script with hardcoded paths (/opt/hpcx/), specific compiler versions (clang-20), and debug settings. It should not be committed to the repository.

Suggested change

#!/bin/bash

export CC=clang-20

export CXX=clang++-20

export LDFLAGS="-fuse-ld=mold"

export NVFUSER_BUILD_ENABLE_PCH

export UCC_HOME="/opt/hpcx/ucc"

export UCC_DIR="/opt/hpcx/ucc/lib/cmake/ucc"

export UCX_HOME="/opt/hpcx/ucx"

export UCX_DIR="/opt/hpcx/ucx/lib/cmake/ucx"

# export TORCH_CUDA_ARCH_LIST="9.0"

export NVFUSER_BUILD_WITH_UCC=1

export NVFUSER_BUILD_INSTALL_DIR=$BUILD_DIRECTORY/nvfuser

export NVFUSER_BUILD_DIR=$BUILD_DIRECTORY

# Enable debug mode, leave empty for non-debug compilation

export NVFUSER_BUILD_BUILD_TYPE=Debug

export RUN_CMAKE=""

pip install -v -e ./python --no-build-isolation

# Remove this file - it should not be part of the PR

greptile-apps · 2026-03-02T12:10:01Z

tests/cpp/test_multidevice_symmetric_tensor.cpp

+// TEST_F(SymmetricTensorTest, PyTorchBackend_RemoteAccessCorrectness) {
+//   if (communicator_->size() == 1) {
+//     GTEST_SKIP() << "Skipping test for single device";
+//   }
+//   SymmetricMemoryBackend backend = getSymmetricMemoryBackend();
+//   if (backend == SymmetricMemoryBackend::Native) {
+//     GTEST_SKIP()
+//         << "PyTorch backend not selected; set NVFUSER_ENABLE=symmetric_memory_backend(pytorch_nccl) to run";
+//   }
+
+//   const int64_t rank = communicator_->deviceId();
+//   const int64_t world_size = communicator_->size();
+
+//   at::Tensor local_tensor = SymmetricTensor::allocate(
+//       {256, 512}, at::ScalarType::Float, communicator_->device());
+//   SymmetricTensor sym_tensor(local_tensor);
+
+//   EXPECT_TRUE(local_tensor.is_cuda());
+//   EXPECT_EQ(local_tensor.numel(), 256 * 512);
+
+//   float local_value = static_cast<float>(rank + 200);
+//   local_tensor.fill_(local_value);
+
+//   sym_tensor.setupRemoteHandles();
+
+//   for (int64_t peer_rank = 0; peer_rank < world_size; ++peer_rank) {
+//     void* peer_ptr = sym_tensor.remoteTensor(peer_rank).data_ptr();
+//     EXPECT_NE(peer_ptr, nullptr);
+
+//     float peer_value;
+//     NVFUSER_CUDA_RT_SAFE_CALL(cudaMemcpy(
+//         &peer_value, peer_ptr, sizeof(float), cudaMemcpyDeviceToHost));
+
+//     float expected_value = static_cast<float>(peer_rank + 200);
+//     EXPECT_FLOAT_EQ(peer_value, expected_value)
+//         << "Rank " << rank << " reading from rank " << peer_rank
+//         << " (PyTorch backend)";
+//   }
+// }


Large block of commented-out test code. Either remove it or uncomment and enable it for testing the PyTorch backend path.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

saivishal1999 added 3 commits February 27, 2026 04:40

Initial implementation of symmetric memory backend for PyTorch

14fd212

Initital changes to add pytorch symmetric memory backend

5646c03

Initial pytorch symmetric memory backend changes

14816aa

greptile-apps bot reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch Symmetric memory backend#6022

Pytorch Symmetric memory backend#6022
saivishal1999 wants to merge 3 commits intoNVIDIA:mainfrom
saivishal1999:symmetric-memory-pytorch-backends

saivishal1999 commented Mar 2, 2026

Uh oh!

greptile-apps bot commented Mar 2, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Mar 2, 2026

Uh oh!

greptile-apps bot Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

saivishal1999 commented Mar 2, 2026

Uh oh!

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant