Skip to content

Fix bus error or segfault from roi_align with large batchsize#9441

Draft
zy1git wants to merge 3 commits intopytorch:mainfrom
zy1git:issue-8206
Draft

Fix bus error or segfault from roi_align with large batchsize#9441
zy1git wants to merge 3 commits intopytorch:mainfrom
zy1git:issue-8206

Conversation

@zy1git
Copy link
Contributor

@zy1git zy1git commented Mar 13, 2026

Summary
Bug: roi_align in torchvision crashes with a bus error/segfault on CPU or returns silently wrong (all-zero) results on CUDA when the total number of output elements exceeds INT_MAX (~2.1 billion). This is caused by 32-bit int overflow in index arithmetic within the C++ and CUDA kernels.

Root Cause: The kernels use int for composite index calculations like n × channels × pooled_width × pooled_height and pointer offsets like (roi_batch_ind × channels + c) × height × width. When these products exceed 2,147,483,647, the int wraps to a negative value, causing out-of-bounds memory access.

Example: FasterRCNN with batch_size=172 generates ~172,000 ROIs. The output index reaches 171,999 × 256 × 7 × 7 = 2,157,555,456 > INT_MAX, which matches the reporter's observed threshold exactly.

Fix: Promoted int to int64_t for all index, offset, and stride variables in the relevant files.

Test Plan
New overflow regression test
pytest test/test_ops.py::TestRoIAlign::test_roi_align_large_index -v

Existing tests — verify no regressions
pytest test/test_ops.py::TestRoIAlign -v

Fixes #8206

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9441

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d9ab5ce with merge base 6f131f1 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the cla signed label Mar 13, 2026
@zy1git zy1git marked this pull request as draft March 13, 2026 10:03
output_bytes = n_rois * channels * pooled_h * pooled_w * 4 # float32
if output_bytes > 9 * 1024**3:
pytest.skip("Test requires ~9 GB of memory")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these values are statically defined. This if block is either always True or always False.

x = torch.rand(num_imgs, channels, height, width, dtype=torch.float32, device=device)
rois = torch.zeros(n_rois, 5, dtype=torch.float32, device=device)
except RuntimeError:
pytest.skip("Not enough memory to allocate test tensors")
Copy link
Member

@NicolasHug NicolasHug Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify tests aren't not being skipped on the CI. If they pass, remove the try/except, if they don't, we'll have to consider other strategies to test this.

template <typename T>
void roi_align_backward_kernel_impl(
int nthreads,
int64_t nthreads,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why nthreads needs to be int64_t? It should never need to be that large?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bus error or segfault from roi_align with large batchsize

2 participants