[PyTorch] torch.compile support for permutation functions by pggPL · Pull Request #2686 · NVIDIA/TransformerEngine

pggPL · 2026-02-17T19:46:30Z

Description

This PR adds torch.compile(fullgraph=True) support for MoE permutation operations (moe_permute, moe_unpermute, moe_sort_chunks_by_index) by converting all torch.autograd.Function implementations to PyTorch custom operators using torch.library.custom_op.

Note that this PR does not add torch.compile support for QuantizedTensor as an input.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL · 2026-02-19T15:45:07Z

/te-ci pytorch

greptile-apps · 2026-02-19T15:48:00Z

Greptile Summary

Converted MoE permutation operations from torch.autograd.Function to torch.library.custom_op to enable torch.compile(fullgraph=True) support. The refactor maintains backward compatibility while adding compile support for moe_permute, moe_unpermute, and moe_sort_chunks_by_index.

Key changes:

Replaced 6 autograd.Function classes with custom ops registered under te_moe:: namespace
Added fake implementations for shape inference during compilation tracing
Implemented proper autograd registration with setup_context and backward wrappers
Added _quantized_tensor_passthrough_ops to prevent unwrapping FP8 tensors in __torch_dispatch__
Configured torch._dynamo.config.reorderable_logging_functions to allow warnings without graph breaks
Comprehensive test coverage with use_torch_compile parameter (limited to single config for efficiency)

Note: FP8 QuantizedTensor inputs are explicitly not supported under torch.compile (runtime error raised), as stated in PR description.

Confidence Score: 4/5

Safe to merge with minor consideration for global state thread-safety in highly concurrent scenarios
The implementation correctly converts autograd functions to custom ops with proper fake implementations and autograd registration. Tests provide good coverage. One minor concern: global workspace state could theoretically cause issues in concurrent compilation scenarios, though this is likely acceptable for the current use case.
No files require special attention - the implementation is well-structured and tested

Important Files Changed

Filename	Overview
transformer_engine/pytorch/permutation.py	Converted torch.autograd.Function implementations to torch.library.custom_op for torch.compile support; added fake implementations and proper autograd registration; includes global state management that may have thread-safety implications
tests/pytorch/test_permutation.py	Added comprehensive torch.compile test coverage with `use_torch_compile` parameter; tests limited to single configuration to reduce test time; includes proper dynamo reset and functorch config
transformer_engine/pytorch/quantized_tensor.py	Added passthrough mechanism for custom ops to handle quantized tensors without unwrapping in torch_dispatch

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User calls moe_permute/moe_unpermute] --> B{torch.compile?}
    B -->|No| C[Direct custom op call]
    B -->|Yes| D[torch.ops.te_moe.* custom ops]
    
    D --> E[Forward Op]
    E --> F[register_fake for shape inference]
    E --> G[Actual implementation]
    
    G --> H{FP8 QuantizedTensor?}
    H -->|Yes| I[Passthrough in __torch_dispatch__]
    H -->|No| J[Normal tensor processing]
    
    I --> K[Handle FP8 internally]
    J --> K
    
    K --> L[setup_context saves state]
    L --> M[Backward Op via register_autograd]
    
    M --> N[Custom backward implementation]
    
    subgraph "Custom Ops"
        E
        F
        G
        L
        M
        N
    end
    
    subgraph "QuantizedTensor Handling"
        H
        I
        K
    end

_{Last reviewed commit: 981beb4}

greptile-apps

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

pggPL · 2026-02-19T15:58:15Z

/te-ci pytorch

greptile-apps

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-19T16:00:44Z

transformer_engine/pytorch/permutation.py

+_moe_permute_index_map_workspace = None
+_moe_permute_index_map_max_expanded_token_num = 0


global mutable state may cause issues with concurrent execution or re-compilation

The workspace variables are shared module-level state that gets mutated. In torch.compile with multiple threads or parallel compilation, this could lead to race conditions. Consider thread-local storage or passing these as function arguments if possible.

pggPL added 2 commits February 17, 2026 19:45

init

ce5ee86

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

work finished

8159d26

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL force-pushed the moe_torch_compile branch from 41e22ef to 8159d26 Compare February 18, 2026 17:31

pre-commit-ci bot and others added 4 commits February 18, 2026 17:32

[pre-commit.ci] auto fixes from pre-commit.com hooks

dcdb413

for more information, see https://pre-commit.ci

fix

696bd94

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

4fca812

for more information, see https://pre-commit.ci

lint fixes

385e79c

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL marked this pull request as ready for review February 19, 2026 15:45

greptile-apps bot reviewed Feb 19, 2026

View reviewed changes

pggPL and others added 2 commits February 19, 2026 15:57

fixes

d386e51

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

981beb4

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[PyTorch] torch.compile support for permutation functions#2686

[PyTorch] torch.compile support for permutation functions#2686
pggPL wants to merge 8 commits intoNVIDIA:mainfrom
pggPL:moe_torch_compile

pggPL commented Feb 17, 2026 •

edited

Loading

Uh oh!

pggPL commented Feb 19, 2026

Uh oh!

greptile-apps bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

pggPL commented Feb 19, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		_moe_permute_index_map_workspace = None
		_moe_permute_index_map_max_expanded_token_num = 0

Comments

Conversation

pggPL commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

pggPL commented Feb 19, 2026

Uh oh!

greptile-apps bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

pggPL commented Feb 19, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pggPL commented Feb 17, 2026 •

edited

Loading

greptile-apps bot commented Feb 19, 2026 •

edited

Loading