Skip to content

CVS-183516: Fix offset & length conversion in weight sharing logic as size_t#988

Merged
ankitm3k merged 1 commit intoovep-developfrom
ankit/weight_sharing_fix
Mar 24, 2026
Merged

CVS-183516: Fix offset & length conversion in weight sharing logic as size_t#988
ankitm3k merged 1 commit intoovep-developfrom
ankit/weight_sharing_fix

Conversation

@ankitm3k
Copy link

@ankitm3k ankitm3k commented Mar 24, 2026

CVS-183516: Fix offset & length conversion in weight sharing logic as size_t

Summary

Fixes a Windows-only crash (std::out_of_range) when creating inference sessions with
OVEP weight sharing enabled on models whose external data files exceed 4.29 GB.


Problem

During weight-sharing session creation, CreateModelWithStrippedQDQNodes iterates over
TensorProto external data entries and parses the offset and length fields — stored
as plain decimal strings per the ONNX spec — using std::stoul:

data_offset = std::stoul(pb_value);  // "offset"
size        = std::stoul(pb_value);  // "length"

On Windows, MSVC defines unsigned long as 32-bit (max 4,294,967,295 ≈ 4.29 GB).
For models like Phi-Silica PSU2, external tensor offsets into phi_36_lora_2_5_1.data_proxy
reach ~5.67 GB — well above this ceiling. std::stoul throws std::out_of_range before
any value is assigned, crashing session creation.

The target variables data_offset and size are already declared as size_t (64-bit on
x64), so the overflow happens entirely inside stoul itself.

On Linux, GCC/Clang define unsigned long as 64-bit so the bug is latent but not
triggered — however the code was still incorrect and non-portable.


Fix

Replace std::stoul with std::from_chars from <charconv> (C++17):

#include <charconv>

std::from_chars(pb_value.data(), pb_value.data() + pb_value.size(), data_offset);
std::from_chars(pb_value.data(), pb_value.data() + pb_value.size(), size);

std::from_chars is the correct tool here for several reasons:

Property std::stoul std::from_chars
Parse width unsigned long (32-bit on Windows) Deduced from target type (size_t = 64-bit)
Error handling Throws std::out_of_range Returns result struct; never throws
Locale Locale-aware (subtle portability risk) Always locale-independent decimal
ONNX wire format Mismatch risk Plain decimal — exact match
Performance Slow (locale table lookups) Fastest standard string→integer parse

Root Cause Chain

TensorProto::external_data[i].value()   →  std::string  (decimal, e.g. "6091128832")
std::stoul(value)                        →  unsigned long  (32-bit on MSVC)
                                         →  std::out_of_range thrown  (> 4,294,967,295)
Session creation aborts

Files Changed

File Change
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc Add #include <charconv>; replace 2× std::stoul with std::from_chars targeting size_t

Testing

Validated against Phi-Silica PSU2 with ep.share_ep_contexts=1 on Windows. Session
creation completes without std::out_of_range. No functional change on Linux.

@ankitm3k ankitm3k merged commit 9e79db8 into ovep-develop Mar 24, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants