[PyTorch] Error out if constructing `LayerNormLinear` with row tensor parallelism by timmoon10 · Pull Request #2688 · NVIDIA/TransformerEngine

timmoon10 · 2026-02-17T23:55:51Z

Description

LayerNormLinear modules with row tensor-parallel have input tensors that are sharded along the inner dimension:

TransformerEngine/transformer_engine/pytorch/module/layernorm_linear.py

Lines 1199 to 1200 in 7e48fa1

    
           elif self.parallel_mode == "row": 
        
               self.in_features = divide(self.in_features, self.tp_size)

However, we currently don't support tensor-parallel LayerNorm or RMSNorm, which would involve a tensor-parallel all-reduce to compute statistics. If the user attempts to run LayerNormLinear with row tensor parallelism, then they experience an illegal memory access when the norm kernel accesses values in the unsharded norm weight tensor. We haven't experienced problems so far because row TP is usually used for the proj and fc2 layers, which are usually Linears.

This PR adds an error message to make the failure more obvious.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Error out if constructing LayerNormLinear with row tensor parallelism

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

greptile-apps · 2026-02-17T23:58:08Z

Greptile Summary

Added validation to prevent construction of LayerNormLinear with row tensor parallelism, which was causing illegal memory accesses due to unsupported tensor-parallel normalization operations.

Key changes:

Added NotImplementedError check in LayerNormLinear.__init__ when parallel_mode == "row"
Removed row-parallel test cases for LayerNormLinear from distributed comm-gemm overlap tests

Issues found:

Dead code remains in constructor after the validation check (lines 1203-1204, 1231-1237, 1388-1393)
The elif block at line 1203 and other row-parallel initialization logic will never execute
Consider cleaning up unreachable code paths for maintainability

Confidence Score: 3/5

Safe to merge but should clean up dead code paths for better maintainability
The core fix is correct and prevents the illegal memory access issue. Tests are appropriately updated. However, dead code remains after the validation check that should be removed to prevent future confusion
Pay attention to transformer_engine/pytorch/module/layernorm_linear.py - dead code should be cleaned up

Important Files Changed

Filename	Overview
transformer_engine/pytorch/module/layernorm_linear.py	Added validation to prevent row-parallel construction, but left dead code in constructor and initialization logic
tests/pytorch/distributed/test_comm_gemm_overlap.py	Correctly removed row-parallel test cases for `LayerNormLinear`

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LayerNormLinear.__init__] --> B{parallel_mode in GemmParallelModes?}
    B -->|No| C[AssertionError: parallel_mode not supported]
    B -->|Yes| D{parallel_mode == 'row'?}
    D -->|Yes| E[NotImplementedError: Normalization does not support TP distribution]
    D -->|No| F{parallel_mode == 'column'?}
    F -->|Yes| G[Divide out_features by tp_size]
    F -->|No| H[No TP distribution]
    G --> I[Continue initialization]
    H --> I
    
    style E fill:#f66,stroke:#f00,color:#fff
    style C fill:#f66,stroke:#f00,color:#fff
    style I fill:#6f6,stroke:#0f0

_{Last reviewed commit: 1ef2ae0}

greptile-apps · 2026-02-17T23:58:16Z

Additional Comments (1)

transformer_engine/pytorch/module/layernorm_linear.py
Unreachable code: this elif branch will never execute since row parallel mode now raises NotImplementedError on line 1197. Consider removing these lines or moving the error check after this conditional.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2026-02-20T22:49:07Z

/te-ci pytorch L1

greptile-apps

_{2 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-20T22:50:56Z

Additional Comments (3)

transformer_engine/pytorch/module/layernorm_linear.py
Dead code - the NotImplementedError raised on line 1197 prevents execution from reaching this row-parallel case

transformer_engine/pytorch/module/layernorm_linear.py
Dead code - row-parallel mode raises NotImplementedError at line 1197, so these will always be False

        # Row-parallel overlaps (disabled - not supported)
        self.ub_overlap_rs_fprop = False
        self.ub_overlap_ag_dgrad = False

transformer_engine/pytorch/module/layernorm_linear.py
Dead code - row-parallel mode is not supported (line 1197), so this will always be False

        # For RPL, bias has to be added after TP collectives
        # So it cannot be fused with the GEMM
        # Note: RPL is not supported for LayerNormLinear
        self.gemm_bias_unfused_add = False

Error out if constructing LayerNormLinear with row tensor parallelism

0f7674f

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added the bug Something isn't working label Feb 17, 2026

This comment was marked as outdated.

Sign in to view

cspades approved these changes Feb 18, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

ptrendx previously approved these changes Feb 18, 2026

View reviewed changes

timmoon10 added 2 commits February 20, 2026 22:46

Disable Userbuffers test for row-TP LayerNormLinear

0e225c1

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into tmoon/row-tp-layernorm-linear

1ef2ae0

timmoon10 dismissed ptrendx’s stale review via 1ef2ae0 February 20, 2026 22:47

greptile-apps bot reviewed Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[PyTorch] Error out if constructing `LayerNormLinear` with row tensor parallelism#2688

[PyTorch] Error out if constructing `LayerNormLinear` with row tensor parallelism#2688
timmoon10 wants to merge 3 commits intoNVIDIA:mainfrom
timmoon10:tmoon/row-tp-layernorm-linear

timmoon10 commented Feb 17, 2026

Uh oh!

greptile-apps bot commented Feb 17, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

greptile-apps bot commented Feb 17, 2026

Uh oh!

This comment was marked as outdated.

timmoon10 commented Feb 20, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	elif self.parallel_mode == "row":
	self.in_features = divide(self.in_features, self.tp_size)

Comments

Conversation

timmoon10 commented Feb 17, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

This comment was marked as outdated.

Uh oh!

greptile-apps bot commented Feb 17, 2026

Uh oh!

This comment was marked as outdated.

timmoon10 commented Feb 20, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Feb 17, 2026 •

edited

Loading