Sync with Microsoft ONNX Runtime - 25032026#989
Open
ai-fw-intg wants to merge 18 commits intoovep-developfrom
Open
Sync with Microsoft ONNX Runtime - 25032026#989ai-fw-intg wants to merge 18 commits intoovep-developfrom
ai-fw-intg wants to merge 18 commits intoovep-developfrom
Conversation
### Description <!-- Describe your changes. --> This change adds support for the Deformable Convolution 2D operator (DeformConv2D) to ONNX Runtime. The branch implements the operator schema and registration, provides kernel implementations (CPU and GPU/CUDA where available), implements shape inference, and adds unit and integration tests to validate correctness and numerical parity with reference implementations. The changes include performance-oriented optimizations and necessary changes to build/test scripts. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Deformable convolutions are widely used in vision models that require spatial sampling flexibility (e.g., Deformable ConvNets, some detection/segmentation models). Native support in ONNX Runtime enables these models to run efficiently without custom operators or external runtimes, broadening the set of compatible models and improving performance and portability. ### See also - https://onnx.ai/onnx/operators/onnx__DeformConv.html - https://docs.pytorch.org/vision/main/generated/torchvision.ops.deform_conv2d.html - https://arxiv.org/abs/1811.11168 - https://arxiv.org/abs/1703.06211 - https://github.com/pytorch/vision/blob/0f6d91d9fe514e6de2f5519114cbeb389d498b2d/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu - https://github.com/open-mmlab/mmdetection/blob/master/mmdet/ops/dcn/src/deform_conv_cuda.cpp - https://github.com/pytorch/vision/blob/0f6d91d9fe514e6de2f5519114cbeb389d498b2d/torchvision/csrc/ops/cpu/deform_conv2d_kernel.cpp - https://github.com/open-mmlab/mmdetection/blob/master/mmdet/ops/dcn/src/deform_conv_cuda.cpp - microsoft#22060 - microsoft#15572 - microsoft#20810 - microsoft#16903 - onnx/onnx#5451 - ZhengPeng7/BiRefNet#167 - pytorch/pytorch#68910 - pytorch/vision#2066
…de (microsoft#27724) ### Description On Windows, `std::filesystem::path::string()` converts the internal UTF-16 representation to a narrow string using the system's active ANSI code page. When the path contains characters outside that code page (Japanese, Chinese, Korean, etc.), this throws std::system_error with 'No mapping for the Unicode character exists in the target multi-byte code page.' This affected both the core session telemetry logging (causing `InferenceSession::Initialize()` to fail) and execution provider code (OpenVINO, TensorRT, TensorRT RTX, QNN, MIGraphX) where model paths are converted for EPContext attributes and profiling. ### Motivation and Context Fix: Replace `.filename().string()` with `PathToUTF8String(.filename().native())` which uses `WideCharToMultiByte(CP_UTF8, ...)` and handles all Unicode characters correctly. This pattern is already used elsewhere in the codebase for path-to-string conversions. Note: Two remaining instances in Linux-only code (`cann_utils.cc`, `device_discovery.cc`) are left as-is since `.string()` is safe on Linux where paths are already narrow strings. Fixes microsoft/WindowsAppSDK#6173 --------- Co-authored-by: Sagar Bhure <sagarbhure@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Support RoiAlign for opset versions 16 and 22 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…27774) ### Description This PR consolidates PRs microsoft#27416 and microsoft#27708 to extend CUDA Pad kernel support through opset 25, including wrap mode implementation. ### Motivation and Context The CUDA execution provider previously only registered the Pad kernel up to opset 18 and did not implement wrap mode. When an ONNX model exported with opset 19+ was run on the CUDA executor, the Pad operation was forced to fall back to CPU, resulting in significant performance degradation. This PR aligns CUDA Pad registration with the ONNX Pad schema evolution through opset 25 and provides a correct wrap mode implementation. Related issues: microsoft#26393 Related PRs: microsoft#27416, microsoft#27708 ### Summary of Changes #### Kernel registration and opset coverage | File | Change | |------|--------| | `onnxruntime/core/providers/cuda/tensor/pad.cc` | Adds CUDA Pad kernel registrations for opset ranges 18, 19-20, 21-22, 23, 24, and 25. | | `onnxruntime/core/providers/cuda/cuda_execution_provider.cc` | Registers the new Pad kernel versions in the CUDA EP registry under the existing per-opset sections. | #### CUDA Pad implementation | File | Change | |------|--------| | `onnxruntime/core/providers/cuda/tensor/pad_impl.h` | Extends the Pad kernel interface to pass effective sliced extents and per-axis input offsets. | | `onnxruntime/core/providers/cuda/tensor/pad_impl.cu` | Adds CUDA wrap mode using a `WrapCoordinate` device helper with `if constexpr` compile-time specialization. Removes dead wrap code from the NCHW-specialized kernel path. | | `onnxruntime/core/providers/cuda/tensor/pad.cc` | Computes effective sliced input extents/offsets for wrap behavior with negative pads. Bypasses the NCHW fast-path for wrap mode and routes through the generic implementation. | #### Documentation | File | Change | |------|--------| | `docs/OperatorKernels.md` | Updates the CUDA Pad kernel opset coverage to reflect the new version splits (25+, 24, 23, [21,22], [19,20], 18) up to opset 25. | #### Test coverage | File | Change | |------|--------| | `onnxruntime/test/providers/cpu/tensor/pad_test.cc` | Adds CUDA-only Pad coverage for `edge` across opsets 18-25 and `wrap` across opsets 19-25. Updates existing wrap test comment. | ### Checklist - [x] Tests added/updated - [x] No breaking changes <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: Shirasawa <764798966@qq.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
…ft#27800) ### Description QNN python wheel pipeline for Linux didn't take into account the QNN version that the user can override when running the pipeline and always used whatever was the default in the pipeline yamls. Allow for overriding QNN version in onnxruntime-qnn Linux python wheel pipelines.
…crosoft#27812) The kernel computes neg_scaled_zp = -(scale * zp) in fp16 first (intermediate rounding), then uses it in the fma. For scale*zp in range [128, 256), the fp16 ULP is 0.125, so this intermediate rounding error (~0.06) propagates to the result and exceeded the test tolerance. The reference was computing everything in fp32 and converting to fp16 only at the end, avoiding this intermediate rounding. This caused mismatches up to 0.15 (29 fp16 ULPs). Fix: emulate the kernel's fp16 computation order in the reference: 1. neg_szp = MLAS_FP16(-(scale * zp)).ToFloat() // fp16 round-trip 2. result = MLAS_FP16(neg_szp + value * scale) // emulates fma
…sor default op… (microsoft#27773) fixes a few webnn test cases for webgpu ep. This https://wpt.live/webnn/conformance_tests/transpose.https.any.html?gpu should pass 100% now
… quantization (microsoft#27769) Extends the QDQ `DQMatMulToMatMulNBits` fusion to handle additional quantization patterns beyond the existing blockwise DQ→MatMul case. ### New support - **Gemm**: Fuses DQ→Gemm (with optional bias, including DQ bias) into MatMulNBits, stripping Gemm-specific attributes (`alpha`, `beta`, `transB`). - **Per-tensor & per-channel quantization**: Expands scalar/1D scales and zero-points into block-quantized format expected by MatMulNBits. Block size is configurable via `session.qdq_matmulnbits_block_size` (default: 32). ### Changes - **Selectors** (qdq_selectors.cc): Replaced `ValidateBlockwiseDQForMatMulNBits` with `ValidateDQForMatMulNBits` supporting all three quantization modes. Added Gemm-specific validation. - **Actions** (qdq_actions.cc): Added scale/zp expansion for non-blockwise cases, Gemm attribute cleanup, and bias wiring to MatMulNBits input 5. - **Registration** (qdq_selector_action_transformer.cc): Registered `Gemm` alongside `MatMul`; threaded `qdq_matmulnbits_block_size` from session config. - **Tests** (qdq_matmulnbits_transformer_test.cc): Added tests for per-tensor, per-channel, Gemm (no bias, constant bias, DQ bias), block size options, and negative cases.
…es/vite-default (microsoft#27463) Bumps [rollup](https://github.com/rollup/rollup) from 4.35.0 to 4.59.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/rollup/rollup/releases">rollup's releases</a>.</em></p> <blockquote> <h2>v4.59.0</h2> <h2>4.59.0</h2> <p><em>2026-02-22</em></p> <h3>Features</h3> <ul> <li>Throw when the generated bundle contains paths that would leave the output directory (<a href="https://redirect.github.com/rollup/rollup/issues/6276">#6276</a>)</li> </ul> <h3>Pull Requests</h3> <ul> <li><a href="https://redirect.github.com/rollup/rollup/pull/6275">#6275</a>: Validate bundle stays within output dir (<a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> </ul> <h2>v4.58.0</h2> <h2>4.58.0</h2> <p><em>2026-02-20</em></p> <h3>Features</h3> <ul> <li>Also support <code>__NO_SIDE_EFFECTS__</code> annotation before variable declarations declaring function expressions (<a href="https://redirect.github.com/rollup/rollup/issues/6272">#6272</a>)</li> </ul> <h3>Pull Requests</h3> <ul> <li><a href="https://redirect.github.com/rollup/rollup/pull/6256">#6256</a>: docs: document PreRenderedChunk properties including isDynamicEntry and isImplicitEntry (<a href="https://github.com/njg7194"><code>@njg7194</code></a>, <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6259">#6259</a>: docs: Correct typo and improve sentence structure in docs for <code>output.experimentalMinChunkSize</code> (<a href="https://github.com/millerick"><code>@millerick</code></a>, <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6260">#6260</a>: fix(deps): update rust crate swc_compiler_base to v47 (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6261">#6261</a>: fix(deps): lock file maintenance minor/patch updates (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6262">#6262</a>: Avoid unnecessary cloning of the code string (<a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6263">#6263</a>: fix(deps): update minor/patch updates (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6265">#6265</a>: chore(deps): lock file maintenance (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6267">#6267</a>: fix(deps): update minor/patch updates (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6268">#6268</a>: chore(deps): update dependency eslint-plugin-unicorn to v63 (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6269">#6269</a>: chore(deps): update dependency lru-cache to v11 (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6270">#6270</a>: chore(deps): lock file maintenance (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6272">#6272</a>: forward NO_SIDE_EFFECTS annotations to function expressions in variable declarations (<a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> </ul> <h2>v4.57.1</h2> <h2>4.57.1</h2> <p><em>2026-01-30</em></p> <h3>Bug Fixes</h3> <ul> <li>Fix heap corruption issue in Windows (<a href="https://redirect.github.com/rollup/rollup/issues/6251">#6251</a>)</li> <li>Ensure exports of a dynamic import are fully included when called from a try...catch (<a href="https://redirect.github.com/rollup/rollup/issues/6254">#6254</a>)</li> </ul> <h3>Pull Requests</h3> <ul> <li><a href="https://redirect.github.com/rollup/rollup/pull/6251">#6251</a>: fix: Isolate and cache <code>process.report.getReport()</code> calls in a child process for robust environment detection (<a href="https://github.com/alan-agius4"><code>@alan-agius4</code></a>, <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/rollup/rollup/blob/master/CHANGELOG.md">rollup's changelog</a>.</em></p> <blockquote> <h2>4.59.0</h2> <p><em>2026-02-22</em></p> <h3>Features</h3> <ul> <li>Throw when the generated bundle contains paths that would leave the output directory (<a href="https://redirect.github.com/rollup/rollup/issues/6276">#6276</a>)</li> </ul> <h3>Pull Requests</h3> <ul> <li><a href="https://redirect.github.com/rollup/rollup/pull/6275">#6275</a>: Validate bundle stays within output dir (<a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> </ul> <h2>4.58.0</h2> <p><em>2026-02-20</em></p> <h3>Features</h3> <ul> <li>Also support <code>__NO_SIDE_EFFECTS__</code> annotation before variable declarations declaring function expressions (<a href="https://redirect.github.com/rollup/rollup/issues/6272">#6272</a>)</li> </ul> <h3>Pull Requests</h3> <ul> <li><a href="https://redirect.github.com/rollup/rollup/pull/6256">#6256</a>: docs: document PreRenderedChunk properties including isDynamicEntry and isImplicitEntry (<a href="https://github.com/njg7194"><code>@njg7194</code></a>, <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6259">#6259</a>: docs: Correct typo and improve sentence structure in docs for <code>output.experimentalMinChunkSize</code> (<a href="https://github.com/millerick"><code>@millerick</code></a>, <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6260">#6260</a>: fix(deps): update rust crate swc_compiler_base to v47 (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6261">#6261</a>: fix(deps): lock file maintenance minor/patch updates (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6262">#6262</a>: Avoid unnecessary cloning of the code string (<a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6263">#6263</a>: fix(deps): update minor/patch updates (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6265">#6265</a>: chore(deps): lock file maintenance (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6267">#6267</a>: fix(deps): update minor/patch updates (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6268">#6268</a>: chore(deps): update dependency eslint-plugin-unicorn to v63 (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6269">#6269</a>: chore(deps): update dependency lru-cache to v11 (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6270">#6270</a>: chore(deps): lock file maintenance (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6272">#6272</a>: forward NO_SIDE_EFFECTS annotations to function expressions in variable declarations (<a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> </ul> <h2>4.57.1</h2> <p><em>2026-01-30</em></p> <h3>Bug Fixes</h3> <ul> <li>Fix heap corruption issue in Windows (<a href="https://redirect.github.com/rollup/rollup/issues/6251">#6251</a>)</li> <li>Ensure exports of a dynamic import are fully included when called from a try...catch (<a href="https://redirect.github.com/rollup/rollup/issues/6254">#6254</a>)</li> </ul> <h3>Pull Requests</h3> <ul> <li><a href="https://redirect.github.com/rollup/rollup/pull/6251">#6251</a>: fix: Isolate and cache <code>process.report.getReport()</code> calls in a child process for robust environment detection (<a href="https://github.com/alan-agius4"><code>@alan-agius4</code></a>, <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6252">#6252</a>: chore(deps): update dependency lru-cache to v11 (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot])</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6253">#6253</a>: chore(deps): lock file maintenance minor/patch updates (<a href="https://github.com/renovate"><code>@renovate</code></a>[bot], <a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> <li><a href="https://redirect.github.com/rollup/rollup/pull/6254">#6254</a>: Fully include dynamic imports in a try-catch (<a href="https://github.com/lukastaegert"><code>@lukastaegert</code></a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/rollup/rollup/commit/ae846957f109690a866cc3e4c073613c338d3476"><code>ae84695</code></a> 4.59.0</li> <li><a href="https://github.com/rollup/rollup/commit/b39616e9175b3d9fc3977c99153174c490805a93"><code>b39616e</code></a> Update audit-resolve</li> <li><a href="https://github.com/rollup/rollup/commit/c60770d7aaf750e512c1b2774989ea4596e660b2"><code>c60770d</code></a> Validate bundle stays within output dir (<a href="https://redirect.github.com/rollup/rollup/issues/6275">#6275</a>)</li> <li><a href="https://github.com/rollup/rollup/commit/33f39c1f205ea2eadaf4b589e493453e2baa3662"><code>33f39c1</code></a> 4.58.0</li> <li><a href="https://github.com/rollup/rollup/commit/b61c40803b717854c1c28937e8098e5ad3c7b8ca"><code>b61c408</code></a> forward NO_SIDE_EFFECTS annotations to function expressions in variable decla...</li> <li><a href="https://github.com/rollup/rollup/commit/7f00689ec90e2cafb11c26eefbcac62343c936f6"><code>7f00689</code></a> Extend agent instructions</li> <li><a href="https://github.com/rollup/rollup/commit/e7b2b85af0901244ecc141b9d792c6db6b527ea4"><code>e7b2b85</code></a> chore(deps): lock file maintenance (<a href="https://redirect.github.com/rollup/rollup/issues/6270">#6270</a>)</li> <li><a href="https://github.com/rollup/rollup/commit/2aa5da9baf82211b8207d268c8751630cb766970"><code>2aa5da9</code></a> fix(deps): update minor/patch updates (<a href="https://redirect.github.com/rollup/rollup/issues/6267">#6267</a>)</li> <li><a href="https://github.com/rollup/rollup/commit/4319837c5448d0c10d89e9ded118888deec2eeec"><code>4319837</code></a> chore(deps): update dependency lru-cache to v11 (<a href="https://redirect.github.com/rollup/rollup/issues/6269">#6269</a>)</li> <li><a href="https://github.com/rollup/rollup/commit/c3b6b4bdc4f2ed978fa233132a526957e6513233"><code>c3b6b4b</code></a> chore(deps): update dependency eslint-plugin-unicorn to v63 (<a href="https://redirect.github.com/rollup/rollup/issues/6268">#6268</a>)</li> <li>Additional commits viewable in <a href="https://github.com/rollup/rollup/compare/v4.35.0...v4.59.0">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by [GitHub Actions](<a href="https://www.npmjs.com/~GitHub">https://www.npmjs.com/~GitHub</a> Actions), a new releaser for rollup since your current version.</p> </details> <details> <summary>Install script changes</summary> <p>This version modifies <code>prepare</code> script that runs during installation. Review the package contents before updating.</p> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…7785) Bumps [flatted](https://github.com/WebReflection/flatted) from 3.3.3 to 3.4.2. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/WebReflection/flatted/commit/3bf09091c3562e17a0647bc06710dd6097079cf7"><code>3bf0909</code></a> 3.4.2</li> <li><a href="https://github.com/WebReflection/flatted/commit/885ddcc33cf9657caf38c57c7be45ae1c5272802"><code>885ddcc</code></a> fix CWE-1321</li> <li><a href="https://github.com/WebReflection/flatted/commit/0bdba705d130f00892b1b8fcc80cf4cdea0631e3"><code>0bdba70</code></a> added flatted-view to the benchmark</li> <li><a href="https://github.com/WebReflection/flatted/commit/2a02dce7c641dec31194c67663f9b0b12e62da20"><code>2a02dce</code></a> 3.4.1</li> <li><a href="https://github.com/WebReflection/flatted/commit/fba4e8f2e113665da275b19cd0f695f3d98e9416"><code>fba4e8f</code></a> Merge pull request <a href="https://redirect.github.com/WebReflection/flatted/issues/89">#89</a> from WebReflection/python-fix</li> <li><a href="https://github.com/WebReflection/flatted/commit/5fe86485e6df7f7f34a07a2a85498bd3e17384e7"><code>5fe8648</code></a> added "when in Rome" also a test for PHP</li> <li><a href="https://github.com/WebReflection/flatted/commit/53517adbefe724fe472b2f9ebcdb01910d0ae3f0"><code>53517ad</code></a> some minor improvement</li> <li><a href="https://github.com/WebReflection/flatted/commit/b3e2a0c387bf446435fec45ad7f05299f012346f"><code>b3e2a0c</code></a> Fixing recursion issue in Python too</li> <li><a href="https://github.com/WebReflection/flatted/commit/c4b46dbcbf782326e54ea1b65d3ebb1dc7a23fad"><code>c4b46db</code></a> Add SECURITY.md for security policy and reporting</li> <li><a href="https://github.com/WebReflection/flatted/commit/f86d071e0f70de5a7d8200198824a3f07fc9c988"><code>f86d071</code></a> Create dependabot.yml for version updates</li> <li>Additional commits viewable in <a href="https://github.com/WebReflection/flatted/compare/v3.3.3...v3.4.2">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [flatted](https://github.com/WebReflection/flatted) from 3.3.3 to 3.4.2. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/WebReflection/flatted/commit/3bf09091c3562e17a0647bc06710dd6097079cf7"><code>3bf0909</code></a> 3.4.2</li> <li><a href="https://github.com/WebReflection/flatted/commit/885ddcc33cf9657caf38c57c7be45ae1c5272802"><code>885ddcc</code></a> fix CWE-1321</li> <li><a href="https://github.com/WebReflection/flatted/commit/0bdba705d130f00892b1b8fcc80cf4cdea0631e3"><code>0bdba70</code></a> added flatted-view to the benchmark</li> <li><a href="https://github.com/WebReflection/flatted/commit/2a02dce7c641dec31194c67663f9b0b12e62da20"><code>2a02dce</code></a> 3.4.1</li> <li><a href="https://github.com/WebReflection/flatted/commit/fba4e8f2e113665da275b19cd0f695f3d98e9416"><code>fba4e8f</code></a> Merge pull request <a href="https://redirect.github.com/WebReflection/flatted/issues/89">#89</a> from WebReflection/python-fix</li> <li><a href="https://github.com/WebReflection/flatted/commit/5fe86485e6df7f7f34a07a2a85498bd3e17384e7"><code>5fe8648</code></a> added "when in Rome" also a test for PHP</li> <li><a href="https://github.com/WebReflection/flatted/commit/53517adbefe724fe472b2f9ebcdb01910d0ae3f0"><code>53517ad</code></a> some minor improvement</li> <li><a href="https://github.com/WebReflection/flatted/commit/b3e2a0c387bf446435fec45ad7f05299f012346f"><code>b3e2a0c</code></a> Fixing recursion issue in Python too</li> <li><a href="https://github.com/WebReflection/flatted/commit/c4b46dbcbf782326e54ea1b65d3ebb1dc7a23fad"><code>c4b46db</code></a> Add SECURITY.md for security policy and reporting</li> <li><a href="https://github.com/WebReflection/flatted/commit/f86d071e0f70de5a7d8200198824a3f07fc9c988"><code>f86d071</code></a> Create dependabot.yml for version updates</li> <li>Additional commits viewable in <a href="https://github.com/WebReflection/flatted/compare/v3.3.3...v3.4.2">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description Add fused Silu and Exact Gelu (Erf based) for AVX512f Silu benchmarks: <img width="225" height="442" alt="image" src="https://github.com/user-attachments/assets/42ce53a2-10cb-496f-b3d9-23b9ebf26be3" /> GELU exact (Erf) benchmarks: <img width="218" height="431" alt="image" src="https://github.com/user-attachments/assets/c68b260a-c209-437e-819b-fb200212ee54" /> ### Motivation and Context Improve performance on AVX512F Silu shows small regression at B=1 but I don't think the absolute difference is much --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
### Description This PR makes it possible to build WebGPU EP as an EP API based plugin EP. #### Requirements The goal of this PR is to support both building WebGPU EP as a bundled EP and an EP API based plugin EP. This approach allows: - enabling WebGPU EP as a standalone plugin EP package for WCR usage - graceful transition for WebGPU EP as an native EP for language binding, from the bundled EP to an EP API based plugin EP - keep the existing usage (static library) working (majorly for web) #### Design & Implementation Instead of **changing** WebGPU EP from a bundled EP to an EP API based plugin EP in one shot, this PR **extend** WebGPU EP to support building as plugin EP. - add a new folder `include/onnxruntime/ep` with a bunches of header files. Those files are not WebGPU specific. They are used for: - include common defines/functions/macros for plugin EP to use - include a few "adapter" classes that takes C-API objects to simulate ORT internal classes behaviors - include a few "override" classes that simulate ORT internal classes, but using implementations that only depend on C-API - include a special base class `onnxruntime::ep::Ep` to inherit from These header files allow a compile time "switch" to the different set of types to minimize changes to existing code. Specifically, `pch.h` is required to be included as PCH to make sure the "override" to take place correctly. - add a new folder `onnxruntime/core/providers/webgpu/ep` for EP API implementation, specifically: - `api.cc`: implements `CreateEpFactories` and `ReleaseEpFactory` - `ep.cc` `ep.h`: implement class `onnxruntime::webgpu::ep::Ep` - `factory.cc` `factory.h`: implement class `onnxruntime::webgpu::ep::Factory` #### Dependencies and Prerequisites (unmerged changes are included as a part of current PR) - microsoft#26855 - microsoft#26803 - microsoft#26859 - microsoft#26879 - microsoft#26919 - microsoft#27569 - microsoft#27587 #### Missing Parts - Allow setting Global/Default EP options --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description <!-- Describe your changes. --> Fixes WebGPU Einsum op by replacing the manual `uniforms.inputN_shape[idx]` string with `GetElementAt(...)` which correctly handles uniform shape access for all tensor ranks. I also added a bunch of tests for this... ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Closes microsoft#27762
…eation (microsoft#27634) ## Description We had a weird behavior in Transformers.js V4. After calling `InferenceSession.release()` on a WebGPU session, attempting to create a new WebGPU session fails with: ``` WebGPU device lost (2): Device was destroyed. ``` In Transformers.js we encourage the use of the `create -> release -> create` pattern, because we expect the application to run for some time and might use multiple models. So it makes sense to unload models after the job is done. It seems like this was introduced in [e03631e](microsoft@e03631ee528), which added the `preserveDevice` option with a default value of `false`. When the last session is released and `preserveDevice=false`, the C++ side destroys the WebGPU device, but the JavaScript reference in `env.webgpu.device` is never cleared, leaving a stale reference to a destroyed device. ## Changes **Clear stale device reference when lost** (`backend-webgpu.ts`) 1. Made device property `configurable: true` to allow deletion 2. Added cleanup logic in `dispose()` to detect device loss via `device.lost` promise 3. When device is lost (destroyed, driver crash, etc.), delete the stale `env.webgpu.device` reference This allows subsequent session creation to acquire a fresh device instead of attempting to reuse a lost one.
…c transformer suite (microsoft#27821) ### Description As title ### Motivation and Context Tiny continuation to microsoft#27691 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.