Skip to content

Sync with Microsoft ONNX Runtime - 25032026#989

Open
ai-fw-intg wants to merge 18 commits intoovep-developfrom
sync_msft_25032026
Open

Sync with Microsoft ONNX Runtime - 25032026#989
ai-fw-intg wants to merge 18 commits intoovep-developfrom
sync_msft_25032026

Conversation

@ai-fw-intg
Copy link

Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.

ShirasawaSama and others added 18 commits March 23, 2026 08:46
### Description
<!-- Describe your changes. -->
This change adds support for the Deformable Convolution 2D operator
(DeformConv2D) to ONNX Runtime. The branch implements the operator
schema and registration, provides kernel implementations (CPU and
GPU/CUDA where available), implements shape inference, and adds unit and
integration tests to validate correctness and numerical parity with
reference implementations. The changes include performance-oriented
optimizations and necessary changes to build/test scripts.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Deformable convolutions are widely used in vision models that require
spatial sampling flexibility (e.g., Deformable ConvNets, some
detection/segmentation models). Native support in ONNX Runtime enables
these models to run efficiently without custom operators or external
runtimes, broadening the set of compatible models and improving
performance and portability.


### See also

- https://onnx.ai/onnx/operators/onnx__DeformConv.html
-
https://docs.pytorch.org/vision/main/generated/torchvision.ops.deform_conv2d.html
- https://arxiv.org/abs/1811.11168
- https://arxiv.org/abs/1703.06211
-
https://github.com/pytorch/vision/blob/0f6d91d9fe514e6de2f5519114cbeb389d498b2d/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu
-
https://github.com/open-mmlab/mmdetection/blob/master/mmdet/ops/dcn/src/deform_conv_cuda.cpp
-
https://github.com/pytorch/vision/blob/0f6d91d9fe514e6de2f5519114cbeb389d498b2d/torchvision/csrc/ops/cpu/deform_conv2d_kernel.cpp
-
https://github.com/open-mmlab/mmdetection/blob/master/mmdet/ops/dcn/src/deform_conv_cuda.cpp

- microsoft#22060
- microsoft#15572
- microsoft#20810
- microsoft#16903
- onnx/onnx#5451
- ZhengPeng7/BiRefNet#167
- pytorch/pytorch#68910
- pytorch/vision#2066
…de (microsoft#27724)

### Description
On Windows, `std::filesystem::path::string()` converts the internal
UTF-16 representation to a narrow string using the system's active ANSI
code page. When the path contains characters outside that code page
(Japanese, Chinese, Korean, etc.), this throws std::system_error with
'No mapping for the Unicode character exists in the target multi-byte
code page.'

This affected both the core session telemetry logging (causing
`InferenceSession::Initialize()` to fail) and execution provider code
(OpenVINO, TensorRT, TensorRT RTX, QNN, MIGraphX) where model paths are
converted for EPContext attributes and profiling.

### Motivation and Context
Fix: Replace `.filename().string()` with
`PathToUTF8String(.filename().native())` which uses
`WideCharToMultiByte(CP_UTF8, ...)` and handles all Unicode characters
correctly. This pattern is already used elsewhere in the codebase for
path-to-string conversions.

Note: Two remaining instances in Linux-only code (`cann_utils.cc`,
`device_discovery.cc`) are left as-is since `.string()` is safe on Linux
where paths are already narrow strings.

Fixes microsoft/WindowsAppSDK#6173

---------

Co-authored-by: Sagar Bhure <sagarbhure@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Support RoiAlign for opset versions 16 and 22

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…27774)

### Description

This PR consolidates PRs microsoft#27416 and microsoft#27708 to extend CUDA Pad kernel
support through opset 25, including wrap mode implementation.

### Motivation and Context

The CUDA execution provider previously only registered the Pad kernel up
to opset 18 and did not implement wrap mode. When an ONNX model exported
with opset 19+ was run on the CUDA executor, the Pad operation was
forced to fall back to CPU, resulting in significant performance
degradation. This PR aligns CUDA Pad registration with the ONNX Pad
schema evolution through opset 25 and provides a correct wrap mode
implementation.

Related issues: microsoft#26393
Related PRs: microsoft#27416, microsoft#27708

### Summary of Changes

#### Kernel registration and opset coverage

| File | Change |
|------|--------|
| `onnxruntime/core/providers/cuda/tensor/pad.cc` | Adds CUDA Pad kernel
registrations for opset ranges 18, 19-20, 21-22, 23, 24, and 25. |
| `onnxruntime/core/providers/cuda/cuda_execution_provider.cc` |
Registers the new Pad kernel versions in the CUDA EP registry under the
existing per-opset sections. |

#### CUDA Pad implementation

| File | Change |
|------|--------|
| `onnxruntime/core/providers/cuda/tensor/pad_impl.h` | Extends the Pad
kernel interface to pass effective sliced extents and per-axis input
offsets. |
| `onnxruntime/core/providers/cuda/tensor/pad_impl.cu` | Adds CUDA wrap
mode using a `WrapCoordinate` device helper with `if constexpr`
compile-time specialization. Removes dead wrap code from the
NCHW-specialized kernel path. |
| `onnxruntime/core/providers/cuda/tensor/pad.cc` | Computes effective
sliced input extents/offsets for wrap behavior with negative pads.
Bypasses the NCHW fast-path for wrap mode and routes through the generic
implementation. |

#### Documentation

| File | Change |
|------|--------|
| `docs/OperatorKernels.md` | Updates the CUDA Pad kernel opset coverage
to reflect the new version splits (25+, 24, 23, [21,22], [19,20], 18) up
to opset 25. |

#### Test coverage

| File | Change |
|------|--------|
| `onnxruntime/test/providers/cpu/tensor/pad_test.cc` | Adds CUDA-only
Pad coverage for `edge` across opsets 18-25 and `wrap` across opsets
19-25. Updates existing wrap test comment. |

### Checklist

- [x] Tests added/updated
- [x] No breaking changes

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: Shirasawa <764798966@qq.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
…ft#27800)

### Description

QNN python wheel pipeline for Linux didn't take into account the QNN
version that the user can override when running the pipeline and always
used whatever was the default in the pipeline yamls. Allow for
overriding QNN version in onnxruntime-qnn Linux python wheel pipelines.
…crosoft#27812)

The kernel computes neg_scaled_zp = -(scale * zp) in fp16 first
(intermediate rounding), then uses it in the fma. For scale*zp in range
[128, 256), the fp16 ULP is 0.125, so this intermediate rounding error
(~0.06) propagates to the result and exceeded the test tolerance.

The reference was computing everything in fp32 and converting to fp16
only at the end, avoiding this intermediate rounding. This caused
mismatches up to 0.15 (29 fp16 ULPs).

Fix: emulate the kernel's fp16 computation order in the reference:
1. neg_szp = MLAS_FP16(-(scale * zp)).ToFloat()  // fp16 round-trip
2. result = MLAS_FP16(neg_szp + value * scale)    // emulates fma
… quantization (microsoft#27769)

Extends the QDQ `DQMatMulToMatMulNBits` fusion to handle additional
quantization patterns beyond the existing blockwise DQ→MatMul case.

### New support
- **Gemm**: Fuses DQ→Gemm (with optional bias, including DQ bias) into
MatMulNBits, stripping Gemm-specific attributes (`alpha`, `beta`,
`transB`).
- **Per-tensor & per-channel quantization**: Expands scalar/1D scales
and zero-points into block-quantized format expected by MatMulNBits.
Block size is configurable via `session.qdq_matmulnbits_block_size`
(default: 32).

### Changes
- **Selectors** (qdq_selectors.cc): Replaced
`ValidateBlockwiseDQForMatMulNBits` with `ValidateDQForMatMulNBits`
supporting all three quantization modes. Added Gemm-specific validation.
- **Actions** (qdq_actions.cc): Added scale/zp expansion for
non-blockwise cases, Gemm attribute cleanup, and bias wiring to
MatMulNBits input 5.
- **Registration** (qdq_selector_action_transformer.cc): Registered
`Gemm` alongside `MatMul`; threaded `qdq_matmulnbits_block_size` from
session config.
- **Tests** (qdq_matmulnbits_transformer_test.cc): Added tests for
per-tensor, per-channel, Gemm (no bias, constant bias, DQ bias), block
size options, and negative cases.
…es/vite-default (microsoft#27463)

Bumps [rollup](https://github.com/rollup/rollup) from 4.35.0 to 4.59.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/rollup/rollup/releases">rollup's
releases</a>.</em></p>
<blockquote>
<h2>v4.59.0</h2>
<h2>4.59.0</h2>
<p><em>2026-02-22</em></p>
<h3>Features</h3>
<ul>
<li>Throw when the generated bundle contains paths that would leave the
output directory (<a
href="https://redirect.github.com/rollup/rollup/issues/6276">#6276</a>)</li>
</ul>
<h3>Pull Requests</h3>
<ul>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6275">#6275</a>:
Validate bundle stays within output dir (<a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
</ul>
<h2>v4.58.0</h2>
<h2>4.58.0</h2>
<p><em>2026-02-20</em></p>
<h3>Features</h3>
<ul>
<li>Also support <code>__NO_SIDE_EFFECTS__</code> annotation before
variable declarations declaring function expressions (<a
href="https://redirect.github.com/rollup/rollup/issues/6272">#6272</a>)</li>
</ul>
<h3>Pull Requests</h3>
<ul>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6256">#6256</a>:
docs: document PreRenderedChunk properties including isDynamicEntry and
isImplicitEntry (<a
href="https://github.com/njg7194"><code>@​njg7194</code></a>, <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6259">#6259</a>:
docs: Correct typo and improve sentence structure in docs for
<code>output.experimentalMinChunkSize</code> (<a
href="https://github.com/millerick"><code>@​millerick</code></a>, <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6260">#6260</a>:
fix(deps): update rust crate swc_compiler_base to v47 (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6261">#6261</a>:
fix(deps): lock file maintenance minor/patch updates (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6262">#6262</a>:
Avoid unnecessary cloning of the code string (<a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6263">#6263</a>:
fix(deps): update minor/patch updates (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6265">#6265</a>:
chore(deps): lock file maintenance (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6267">#6267</a>:
fix(deps): update minor/patch updates (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6268">#6268</a>:
chore(deps): update dependency eslint-plugin-unicorn to v63 (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6269">#6269</a>:
chore(deps): update dependency lru-cache to v11 (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6270">#6270</a>:
chore(deps): lock file maintenance (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6272">#6272</a>:
forward NO_SIDE_EFFECTS annotations to function expressions in variable
declarations (<a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
</ul>
<h2>v4.57.1</h2>
<h2>4.57.1</h2>
<p><em>2026-01-30</em></p>
<h3>Bug Fixes</h3>
<ul>
<li>Fix heap corruption issue in Windows (<a
href="https://redirect.github.com/rollup/rollup/issues/6251">#6251</a>)</li>
<li>Ensure exports of a dynamic import are fully included when called
from a try...catch (<a
href="https://redirect.github.com/rollup/rollup/issues/6254">#6254</a>)</li>
</ul>
<h3>Pull Requests</h3>
<ul>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6251">#6251</a>:
fix: Isolate and cache <code>process.report.getReport()</code> calls in
a child process for robust environment detection (<a
href="https://github.com/alan-agius4"><code>@​alan-agius4</code></a>, <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/rollup/rollup/blob/master/CHANGELOG.md">rollup's
changelog</a>.</em></p>
<blockquote>
<h2>4.59.0</h2>
<p><em>2026-02-22</em></p>
<h3>Features</h3>
<ul>
<li>Throw when the generated bundle contains paths that would leave the
output directory (<a
href="https://redirect.github.com/rollup/rollup/issues/6276">#6276</a>)</li>
</ul>
<h3>Pull Requests</h3>
<ul>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6275">#6275</a>:
Validate bundle stays within output dir (<a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
</ul>
<h2>4.58.0</h2>
<p><em>2026-02-20</em></p>
<h3>Features</h3>
<ul>
<li>Also support <code>__NO_SIDE_EFFECTS__</code> annotation before
variable declarations declaring function expressions (<a
href="https://redirect.github.com/rollup/rollup/issues/6272">#6272</a>)</li>
</ul>
<h3>Pull Requests</h3>
<ul>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6256">#6256</a>:
docs: document PreRenderedChunk properties including isDynamicEntry and
isImplicitEntry (<a
href="https://github.com/njg7194"><code>@​njg7194</code></a>, <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6259">#6259</a>:
docs: Correct typo and improve sentence structure in docs for
<code>output.experimentalMinChunkSize</code> (<a
href="https://github.com/millerick"><code>@​millerick</code></a>, <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6260">#6260</a>:
fix(deps): update rust crate swc_compiler_base to v47 (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6261">#6261</a>:
fix(deps): lock file maintenance minor/patch updates (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6262">#6262</a>:
Avoid unnecessary cloning of the code string (<a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6263">#6263</a>:
fix(deps): update minor/patch updates (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6265">#6265</a>:
chore(deps): lock file maintenance (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6267">#6267</a>:
fix(deps): update minor/patch updates (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6268">#6268</a>:
chore(deps): update dependency eslint-plugin-unicorn to v63 (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6269">#6269</a>:
chore(deps): update dependency lru-cache to v11 (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6270">#6270</a>:
chore(deps): lock file maintenance (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6272">#6272</a>:
forward NO_SIDE_EFFECTS annotations to function expressions in variable
declarations (<a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
</ul>
<h2>4.57.1</h2>
<p><em>2026-01-30</em></p>
<h3>Bug Fixes</h3>
<ul>
<li>Fix heap corruption issue in Windows (<a
href="https://redirect.github.com/rollup/rollup/issues/6251">#6251</a>)</li>
<li>Ensure exports of a dynamic import are fully included when called
from a try...catch (<a
href="https://redirect.github.com/rollup/rollup/issues/6254">#6254</a>)</li>
</ul>
<h3>Pull Requests</h3>
<ul>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6251">#6251</a>:
fix: Isolate and cache <code>process.report.getReport()</code> calls in
a child process for robust environment detection (<a
href="https://github.com/alan-agius4"><code>@​alan-agius4</code></a>, <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6252">#6252</a>:
chore(deps): update dependency lru-cache to v11 (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot])</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6253">#6253</a>:
chore(deps): lock file maintenance minor/patch updates (<a
href="https://github.com/renovate"><code>@​renovate</code></a>[bot], <a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
<li><a
href="https://redirect.github.com/rollup/rollup/pull/6254">#6254</a>:
Fully include dynamic imports in a try-catch (<a
href="https://github.com/lukastaegert"><code>@​lukastaegert</code></a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/rollup/rollup/commit/ae846957f109690a866cc3e4c073613c338d3476"><code>ae84695</code></a>
4.59.0</li>
<li><a
href="https://github.com/rollup/rollup/commit/b39616e9175b3d9fc3977c99153174c490805a93"><code>b39616e</code></a>
Update audit-resolve</li>
<li><a
href="https://github.com/rollup/rollup/commit/c60770d7aaf750e512c1b2774989ea4596e660b2"><code>c60770d</code></a>
Validate bundle stays within output dir (<a
href="https://redirect.github.com/rollup/rollup/issues/6275">#6275</a>)</li>
<li><a
href="https://github.com/rollup/rollup/commit/33f39c1f205ea2eadaf4b589e493453e2baa3662"><code>33f39c1</code></a>
4.58.0</li>
<li><a
href="https://github.com/rollup/rollup/commit/b61c40803b717854c1c28937e8098e5ad3c7b8ca"><code>b61c408</code></a>
forward NO_SIDE_EFFECTS annotations to function expressions in variable
decla...</li>
<li><a
href="https://github.com/rollup/rollup/commit/7f00689ec90e2cafb11c26eefbcac62343c936f6"><code>7f00689</code></a>
Extend agent instructions</li>
<li><a
href="https://github.com/rollup/rollup/commit/e7b2b85af0901244ecc141b9d792c6db6b527ea4"><code>e7b2b85</code></a>
chore(deps): lock file maintenance (<a
href="https://redirect.github.com/rollup/rollup/issues/6270">#6270</a>)</li>
<li><a
href="https://github.com/rollup/rollup/commit/2aa5da9baf82211b8207d268c8751630cb766970"><code>2aa5da9</code></a>
fix(deps): update minor/patch updates (<a
href="https://redirect.github.com/rollup/rollup/issues/6267">#6267</a>)</li>
<li><a
href="https://github.com/rollup/rollup/commit/4319837c5448d0c10d89e9ded118888deec2eeec"><code>4319837</code></a>
chore(deps): update dependency lru-cache to v11 (<a
href="https://redirect.github.com/rollup/rollup/issues/6269">#6269</a>)</li>
<li><a
href="https://github.com/rollup/rollup/commit/c3b6b4bdc4f2ed978fa233132a526957e6513233"><code>c3b6b4b</code></a>
chore(deps): update dependency eslint-plugin-unicorn to v63 (<a
href="https://redirect.github.com/rollup/rollup/issues/6268">#6268</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/rollup/rollup/compare/v4.35.0...v4.59.0">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by [GitHub Actions](<a
href="https://www.npmjs.com/~GitHub">https://www.npmjs.com/~GitHub</a>
Actions), a new releaser for rollup since your current version.</p>
</details>
<details>
<summary>Install script changes</summary>
<p>This version modifies <code>prepare</code> script that runs during
installation. Review the package contents before updating.</p>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=rollup&package-manager=npm_and_yarn&previous-version=4.35.0&new-version=4.59.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…7785)

Bumps [flatted](https://github.com/WebReflection/flatted) from 3.3.3 to
3.4.2.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/WebReflection/flatted/commit/3bf09091c3562e17a0647bc06710dd6097079cf7"><code>3bf0909</code></a>
3.4.2</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/885ddcc33cf9657caf38c57c7be45ae1c5272802"><code>885ddcc</code></a>
fix CWE-1321</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/0bdba705d130f00892b1b8fcc80cf4cdea0631e3"><code>0bdba70</code></a>
added flatted-view to the benchmark</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/2a02dce7c641dec31194c67663f9b0b12e62da20"><code>2a02dce</code></a>
3.4.1</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/fba4e8f2e113665da275b19cd0f695f3d98e9416"><code>fba4e8f</code></a>
Merge pull request <a
href="https://redirect.github.com/WebReflection/flatted/issues/89">#89</a>
from WebReflection/python-fix</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/5fe86485e6df7f7f34a07a2a85498bd3e17384e7"><code>5fe8648</code></a>
added &quot;when in Rome&quot; also a test for PHP</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/53517adbefe724fe472b2f9ebcdb01910d0ae3f0"><code>53517ad</code></a>
some minor improvement</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/b3e2a0c387bf446435fec45ad7f05299f012346f"><code>b3e2a0c</code></a>
Fixing recursion issue in Python too</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/c4b46dbcbf782326e54ea1b65d3ebb1dc7a23fad"><code>c4b46db</code></a>
Add SECURITY.md for security policy and reporting</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/f86d071e0f70de5a7d8200198824a3f07fc9c988"><code>f86d071</code></a>
Create dependabot.yml for version updates</li>
<li>Additional commits viewable in <a
href="https://github.com/WebReflection/flatted/compare/v3.3.3...v3.4.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=flatted&package-manager=npm_and_yarn&previous-version=3.3.3&new-version=3.4.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [flatted](https://github.com/WebReflection/flatted) from 3.3.3 to
3.4.2.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/WebReflection/flatted/commit/3bf09091c3562e17a0647bc06710dd6097079cf7"><code>3bf0909</code></a>
3.4.2</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/885ddcc33cf9657caf38c57c7be45ae1c5272802"><code>885ddcc</code></a>
fix CWE-1321</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/0bdba705d130f00892b1b8fcc80cf4cdea0631e3"><code>0bdba70</code></a>
added flatted-view to the benchmark</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/2a02dce7c641dec31194c67663f9b0b12e62da20"><code>2a02dce</code></a>
3.4.1</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/fba4e8f2e113665da275b19cd0f695f3d98e9416"><code>fba4e8f</code></a>
Merge pull request <a
href="https://redirect.github.com/WebReflection/flatted/issues/89">#89</a>
from WebReflection/python-fix</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/5fe86485e6df7f7f34a07a2a85498bd3e17384e7"><code>5fe8648</code></a>
added &quot;when in Rome&quot; also a test for PHP</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/53517adbefe724fe472b2f9ebcdb01910d0ae3f0"><code>53517ad</code></a>
some minor improvement</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/b3e2a0c387bf446435fec45ad7f05299f012346f"><code>b3e2a0c</code></a>
Fixing recursion issue in Python too</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/c4b46dbcbf782326e54ea1b65d3ebb1dc7a23fad"><code>c4b46db</code></a>
Add SECURITY.md for security policy and reporting</li>
<li><a
href="https://github.com/WebReflection/flatted/commit/f86d071e0f70de5a7d8200198824a3f07fc9c988"><code>f86d071</code></a>
Create dependabot.yml for version updates</li>
<li>Additional commits viewable in <a
href="https://github.com/WebReflection/flatted/compare/v3.3.3...v3.4.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=flatted&package-manager=npm_and_yarn&previous-version=3.3.3&new-version=3.4.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
Add fused Silu and Exact Gelu (Erf based) for AVX512f

Silu benchmarks:
<img width="225" height="442" alt="image"
src="https://github.com/user-attachments/assets/42ce53a2-10cb-496f-b3d9-23b9ebf26be3"
/>

GELU exact (Erf) benchmarks:
<img width="218" height="431" alt="image"
src="https://github.com/user-attachments/assets/c68b260a-c209-437e-819b-fb200212ee54"
/>

### Motivation and Context
Improve performance on AVX512F


Silu shows small regression at B=1 but I don't think the absolute
difference is much

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
### Description

This PR makes it possible to build WebGPU EP as an EP API based plugin
EP.

#### Requirements

The goal of this PR is to support both building WebGPU EP as a bundled
EP and an EP API based plugin EP. This approach allows:
- enabling WebGPU EP as a standalone plugin EP package for WCR usage
- graceful transition for WebGPU EP as an native EP for language
binding, from the bundled EP to an EP API based plugin EP
- keep the existing usage (static library) working (majorly for web)




#### Design & Implementation

Instead of **changing** WebGPU EP from a bundled EP to an EP API based
plugin EP in one shot, this PR **extend** WebGPU EP to support building
as plugin EP.

- add a new folder `include/onnxruntime/ep` with a bunches of header
files. Those files are not WebGPU specific. They are used for:
  - include common defines/functions/macros for plugin EP to use
- include a few "adapter" classes that takes C-API objects to simulate
ORT internal classes behaviors
- include a few "override" classes that simulate ORT internal classes,
but using implementations that only depend on C-API
  - include a special base class `onnxruntime::ep::Ep` to inherit from
  
These header files allow a compile time "switch" to the different set of
types to minimize changes to existing code. Specifically, `pch.h` is
required to be included as PCH to make sure the "override" to take place
correctly.

- add a new folder `onnxruntime/core/providers/webgpu/ep` for EP API
implementation, specifically:
  - `api.cc`: implements `CreateEpFactories` and `ReleaseEpFactory`
  - `ep.cc` `ep.h`: implement class `onnxruntime::webgpu::ep::Ep`
- `factory.cc` `factory.h`: implement class
`onnxruntime::webgpu::ep::Factory`

#### Dependencies and Prerequisites

(unmerged changes are included as a part of current PR)

- microsoft#26855
- microsoft#26803
- microsoft#26859
- microsoft#26879
- microsoft#26919
- microsoft#27569
- microsoft#27587

#### Missing Parts

- Allow setting Global/Default EP options

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
<!-- Describe your changes. -->

Fixes WebGPU Einsum op by replacing the manual
`uniforms.inputN_shape[idx]` string with `GetElementAt(...)` which
correctly handles uniform shape access for all tensor ranks.

I also added a bunch of tests for this...

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Closes microsoft#27762
…eation (microsoft#27634)

## Description

We had a weird behavior in Transformers.js V4. After calling
`InferenceSession.release()` on a WebGPU session, attempting to create a
new WebGPU session fails with:
```
WebGPU device lost (2): Device was destroyed.
```
In Transformers.js we encourage the use of the `create -> release ->
create` pattern, because we expect the application to run for some time
and might use multiple models. So it makes sense to unload models after
the job is done.

It seems like this was introduced in
[e03631e](microsoft@e03631ee528),
which added the `preserveDevice` option with a default value of `false`.
When the last session is released and `preserveDevice=false`, the C++
side destroys the WebGPU device, but the JavaScript reference in
`env.webgpu.device` is never cleared, leaving a stale reference to a
destroyed device.

## Changes

**Clear stale device reference when lost** (`backend-webgpu.ts`)

1. Made device property `configurable: true` to allow deletion
2. Added cleanup logic in `dispose()` to detect device loss via
`device.lost` promise
3. When device is lost (destroyed, driver crash, etc.), delete the stale
`env.webgpu.device` reference

This allows subsequent session creation to acquire a fresh device
instead of attempting to reuse a lost one.
…c transformer suite (microsoft#27821)

### Description
As title



### Motivation and Context
Tiny continuation to microsoft#27691

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.