Skip to content

Conversation

@justinchuby
Copy link
Collaborator

@justinchuby justinchuby commented Dec 31, 2025

Fix aten__native_batch_norm_legit_functional where the running mean/var were returned without creating a new value, making the graph invalid.

Fixes pytorch/pytorch#171471

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Dec 31, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 70.09%. Comparing base (519ef5a) to head (a9f8ff0).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
onnxscript/function_libs/torch_lib/ops/core.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2753   +/-   ##
=======================================
  Coverage   70.09%   70.09%           
=======================================
  Files         228      228           
  Lines       27382    27382           
  Branches     2783     2783           
=======================================
  Hits        19194    19194           
  Misses       7229     7229           
  Partials      959      959           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@justinchuby justinchuby added the module: torchlib Related to the torch/aten function lib in development label Dec 31, 2025
running_mean_fp32 = op.Cast(running_mean, to=FLOAT.dtype)
invstd = op.Cast(invstd, to=FLOAT.dtype)
return norm, running_mean_fp32, invstd, running_mean, running_var
return norm, running_mean_fp32, invstd, op.Identity(running_mean), op.Identity(running_var)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what was happening? I assume running_mean/var were valid values already ... but presumably some other requirement was violated (like an input-value cannot be output-value)?

In other words, what is the requirement for torchlib functions from a dev's perspective? Are they required to never return an input-value as an output-value without wrapping in an Identity? Seems like something that the underlying infrastructure could take care of without burdening the torchlib developer.

Copy link
Collaborator Author

@justinchuby justinchuby Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the running_mean and running_var are treated as mutable buffers in the pytorch model. It is an initializer directly as a graph output.

This only happens with the training graph. So we did not really see the case in our testing.

From torchlib's perspective, yes an input should not be returned directly as output. It is probably true that we can detect that externally and wrap the output with an identity.

@justinchuby justinchuby merged commit a571309 into main Dec 31, 2025
33 checks passed
@justinchuby justinchuby deleted the justinchu/fix-native-batch-norm branch December 31, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: torchlib Related to the torch/aten function lib in development

Projects

Development

Successfully merging this pull request may close these issues.

[ONNX] Exporter fails some torchvision models

3 participants