Fix bus error or segfault from roi_align with large batchsize by zy1git · Pull Request #9441 · pytorch/vision

zy1git · 2026-03-13T09:51:18Z

Summary
Bug: roi_align in torchvision crashes with a bus error/segfault on CPU or returns silently wrong (all-zero) results on CUDA when the total number of output elements exceeds INT_MAX (~2.1 billion). This is caused by 32-bit int overflow in index arithmetic within the C++ and CUDA kernels.

Root Cause: The kernels use int for composite index calculations like n × channels × pooled_width × pooled_height and pointer offsets like (roi_batch_ind × channels + c) × height × width. When these products exceed 2,147,483,647, the int wraps to a negative value, causing out-of-bounds memory access.

Example: FasterRCNN with batch_size=172 generates ~172,000 ROIs. The output index reaches 171,999 × 256 × 7 × 7 = 2,157,555,456 > INT_MAX, which matches the reporter's observed threshold exactly.

Fix: Promoted int to int64_t for all index, offset, and stride variables in the relevant files.

Test Plan
New overflow regression test
pytest test/test_ops.py::TestRoIAlign::test_roi_align_large_index -v

Existing tests — verify no regressions
pytest test/test_ops.py::TestRoIAlign -v

Fixes #8206

pytorch-bot · 2026-03-13T09:51:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9441

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d9ab5ce with merge base 6f131f1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

NicolasHug · 2026-03-16T15:46:56Z

test/test_ops.py

+        output_bytes = n_rois * channels * pooled_h * pooled_w * 4  # float32
+        if output_bytes > 9 * 1024**3:
+            pytest.skip("Test requires ~9 GB of memory")
+


all these values are statically defined. This if block is either always True or always False.

NicolasHug · 2026-03-16T15:47:28Z

test/test_ops.py

+            x = torch.rand(num_imgs, channels, height, width, dtype=torch.float32, device=device)
+            rois = torch.zeros(n_rois, 5, dtype=torch.float32, device=device)
+        except RuntimeError:
+            pytest.skip("Not enough memory to allocate test tensors")


Please verify tests aren't not being skipped on the CI. If they pass, remove the try/except, if they don't, we'll have to consider other strategies to test this.

NicolasHug · 2026-03-16T15:49:45Z

torchvision/csrc/ops/cpu/roi_align_kernel.cpp

 template <typename T>
 void roi_align_backward_kernel_impl(
-    int nthreads,
+    int64_t nthreads,


Can you explain why nthreads needs to be int64_t? It should never need to be that large?

Zhitao Yu added 2 commits March 13, 2026 02:38

fix the issue 8206 and add the test

8c71ea8

fix the issue 8206 and add the test

40b2276

meta-cla bot added the cla signed label Mar 13, 2026

remove unnecessary comments

d9ab5ce

zy1git marked this pull request as draft March 13, 2026 10:03

NicolasHug reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bus error or segfault from roi_align with large batchsize#9441

Fix bus error or segfault from roi_align with large batchsize#9441
zy1git wants to merge 3 commits intopytorch:mainfrom
zy1git:issue-8206

zy1git commented Mar 13, 2026

Uh oh!

pytorch-bot bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

NicolasHug Mar 16, 2026

Uh oh!

NicolasHug Mar 16, 2026 •

edited

Loading

Uh oh!

NicolasHug Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zy1git commented Mar 13, 2026

Uh oh!

pytorch-bot bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9441

✅ No Failures

Uh oh!

NicolasHug Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

NicolasHug Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 13, 2026 •

edited

Loading

NicolasHug Mar 16, 2026 •

edited

Loading