Skip to content

Use convert_to_tensor inside __init__ to avoid eager-only numpy() under XLA#2

Draft
CodersAcademy006 wants to merge 1822 commits intomasterfrom
clean/convert-constants-init
Draft

Use convert_to_tensor inside __init__ to avoid eager-only numpy() under XLA#2
CodersAcademy006 wants to merge 1822 commits intomasterfrom
clean/convert-constants-init

Conversation

@CodersAcademy006
Copy link
Copy Markdown
Owner

Replaces tf.constant(...) with tf.convert_to_tensor(...) inside init in example/test code so these classes work correctly with @tf.function(jit_compile=True). Fixes part of tensorflow#105151.

tensorflower-gardener and others added 30 commits November 20, 2025 03:12
PiperOrigin-RevId: 834674902
Imported from GitHub PR openxla/xla#34103

📝 Summary of Changes
"mandatory" compatible layouts have to be assigned to both operands and outputs simultaneously such that subsequent layout propagation does not alter one of them making the operation invalid.

🚀 Kind of Contribution
🐛 Bug Fix

🧪 Unit Tests:
yes

🧪 Execution Tests:
no
Copybara import of the project:

--
f0ff62e4bf031a3aebf4cdadb66139b3b1120307 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix layout assignment of bitcast-converts.

"mandatory" compatible layouts have to be assigned to both operands and
outputs simultaneously such that subsequent layout propagation does not
alter one of them making the operation invalid.

Merging this change closes tensorflow#34103

PiperOrigin-RevId: 834676734
…parameter``` is true.

Old GSPMD propagation needs them since they do not have the concept of open/closed sharding. In Shardy with sdy-round-trip, JAX creates the correct open/closed shardings for parameters and results. We do not need these vectors at all.

Before this change, we always canonicalize layout at after Shardy propagaiton, which may be redundant.

PiperOrigin-RevId: 834684933
PiperOrigin-RevId: 834691273
…tions

Imported from GitHub PR openxla/xla#32053

📝 Summary of Changes

Added CommandBuffer support for Convolution ops
graph capture of convolutions is enabled only for convolution custom call targets explicitly added to '--legacy_command_buffer_custom_call_targets' list: see command_buffer_scheduling_test.cc for an example.
🎯 Justification
This op wase missing for whatever reason: this results in graph fragmentation especially for large models. Hence one gets several (sometimes many) execution graphs instead of just one.

🚀 Kind of Contribution
✨ New Feature

🧪 Unit Tests:
Added new subtest to xla/service/gpu/transforms/command_buffer_scheduling_test.cc

This is a splitted PR originated from openxla/xla#30855

@xla-rotation could you have a look please ?
Copybara import of the project:

--
5f5f5bc8ba8212ceb6afde6f9729ba4a951e4051 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

adding coll permute and convolution to command buffers

added UTs  and convolution command

test fixes

added rebase fixes

capture only those convolution targets which are explictly

Revert "adding coll permute and convolution to command buffers"

This reverts commit 75847e67261b4589162411c9846ed9c0b9fc1ed5.

added conv to command buffers

fixing build and test

--
e8afa3296a4a8ad079cde2e84391c7e0006ddf52 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

fixing build

--
b529288221708e59a04bfaacdfa5b7a1c25b091e by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

rewritten ConvolutionCmd, adapted command_buffer_conv_pass

--
3ecd3d0516a7766d70d636b2110b4b310a9be7b2 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>:

some cosmetics

Merging this change closes tensorflow#32053

PiperOrigin-RevId: 834708288
…numbers from a structured MLIR attribute.

The `ParseDimensionNumbers` function now expects a single `dimension_numbers` attribute in the composite attributes, structured as an array of arrays representing contracting and batch dimensions. This change simplifies the attribute parsing. Additionally, checks are added to ensure that the parsed dimension numbers are supported and to handle cases where scale factors are not divisible by 32 for non-BF16 types, preventing rewriting in those scenarios. The test case is updated to reflect the new attribute format.

PiperOrigin-RevId: 834777398
…rror flakes.

They may be causing permission errors on Windows when Bazel tries to access header
files in Windows SDK/Clang while building @@bazel_tools targets.

PiperOrigin-RevId: 834792776
In LiteRT, input and output names shouldn't be empty. Populate default names
if tensors don't have names.

PiperOrigin-RevId: 834808503
PiperOrigin-RevId: 834837889
…ation model for collective-permute uses transfer size and communication pattern type as input key.

The interpolation is implemented as follows:
- `CollectivePermuteCostModelType` which classifies communication patterns (e.g. one-way, two-way-all-mutual) is added to `ExactInterpolatorKey` for collective-permute instructions.
- Interpolation for collective-permute uses only exact matching via `ExactInterpolator` based on transfer size, and does not use `FallbackInterpolator`. This is because cost of collective-permute is primarily dependent on bytes transferred and communication pattern, and not on number of devices in the same way as other collectives.

PiperOrigin-RevId: 834846468
Updates LLVM usage to match
[355e0f94af5a](llvm/llvm-project@355e0f94af5a)

PiperOrigin-RevId: 834865231
…rvice

New namespace is absl_testing::

PiperOrigin-RevId: 834872277
When the call splitter is called on a non-flat graph, we don't want to implicitly flatten it by creating new bodies at each callsite.

PiperOrigin-RevId: 834875376
PiperOrigin-RevId: 834896529
Fixes a problem where structured_tensor fails to import under Python 3.14 with the error:

```
  File ".../tensorflow/python/ops/structured/structured_tensor.py", line 54, in <module>
    class StructuredTensor(extension_type.BatchableExtensionType):
    ...<1135 lines>...
          return self._ragged_shape.rank
  File ".../tensorflow/python/framework/extension_type.py", line 90, in __init__
    _check_field_annotations(cls)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
  File ".../tensorflow/python/framework/extension_type.py", line 935, in _check_field_annotations
    raise ValueError(
    ...<2 lines>...
    )
ValueError: The field annotations for StructuredTensor are invalid. Field FieldName is missing a type annotation.
```
PiperOrigin-RevId: 834982237
PiperOrigin-RevId: 834982495
…host transfer extension.

PiperOrigin-RevId: 835038444
PiperOrigin-RevId: 835066344
PiperOrigin-RevId: 835085920
PiperOrigin-RevId: 835096840
PiperOrigin-RevId: 835111320
PiperOrigin-RevId: 835118587
Peter Gavin and others added 30 commits November 26, 2025 16:58
PiperOrigin-RevId: 837297785
…ABSL versions internally and in OSS.

PiperOrigin-RevId: 837314334
…already writes the chosen node into `ready_chosen` using `memcpy`, so we don't need the conditionals in the print statement. Also, we need to save the "unchosen" node's information for printing purposes.

PiperOrigin-RevId: 837321312
To avoid bringing GPU dependencies with default FFI target, split GPU-specific context decoding into backends/gpu:ffi

PiperOrigin-RevId: 837342185
This is expected to be a safe change. Calls to multimap::find() historically returned the first matching element, or end() if there is no match. However, this is not guaranteed, and recent changes in libc++ changed this to return an arbitrary element that matches. Using equal_range() is a safe replacement that will preserve the current behavior.

PiperOrigin-RevId: 837348099
PiperOrigin-RevId: 837373785
…s one was missed in previous consolidations.

PiperOrigin-RevId: 837391515
PiperOrigin-RevId: 837393069
…tx backend

Instead of relying on is_autotuning_compilation boolean it's now up to
backend runner to properly set --fail_ptx_compilation_on_register_spilling based on --xla_gpu_filter_kernels_spilling_registers_on_autotuning.

Note the change in AutotunerCompileUtil::Compile calls
GpuCodegenBackend::AdjustDebugOptionsForAutotuning. That seems to also
improve compile time of benchmarks.

Dropped gemm fusion autotuner tests about spilling as they
basically tested if the backend respects debug option flags. With this
change they would become tautological at best and don't exercise any behavior of
gemm_fusion_autotuner.

PiperOrigin-RevId: 837403708
Prior to this change we would hit an assert when constructing the tensor::CollapseShapeOp in the 0D->0D case.

PiperOrigin-RevId: 837419131
Can be enabled with XLA_FLAGS="--xla_backend_extra_options=xla_cpu_enable_tiled_emitter" (!warning! may not work as expected for now)

Reverts 3a4bd44

PiperOrigin-RevId: 837419373
PiperOrigin-RevId: 837425105
…uctions to tiled emitter.

PiperOrigin-RevId: 837432186
flag

Corresponding field is already removed from the proto

PiperOrigin-RevId: 837451384
To get the behavior in line with internal ASSERT_OK_AND_ASSIGN.
Unlike the non-TF_ counterpart, the macro expands to code that already
includes the trailing semicolon so it happens to work.

Attempts to replace it with an internal variant as part of b/444419873
make it not work anymore, until the missing semicolons are added.

PiperOrigin-RevId: 837461910
Some arguments to kernels may not be managed by the buffer assignments and
consequently have no buffer slices attached to them. Examples of these include
scalars and arguments whose memory is managed by the runtime thunks
[CollectiveKernelThunk]. This change introduces a new type for arguments to be
passed in with a shape but without an associated slice.

PiperOrigin-RevId: 837470085
This change replaces usages of tsl::errors::AlreadyExists with absl::AlreadyExistsError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

Changes:
- Replaced errors::AlreadyExists with absl::AlreadyExistsError.
- Used absl::StrCat to construct error messages where necessary.

Reverts 2fc3b48

PiperOrigin-RevId: 837473556
This change replaces usages of tsl::errors::ResourceExhausted with absl::ResourceExhaustedError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

Changes:
- Replaced errors::ResourceExhausted with absl::ResourceExhaustedError.
- Used absl::StrCat to construct error messages where necessary.
- Fixed missing dependencies in BUILD files using build_cleaner.
- Reordered includes in windows_file_system.cc.
PiperOrigin-RevId: 837490121
It's currently a cc_library target which breaks the layering_check which
I'm trying to enable. Therefore this change introduces a new Bazel function `mkl_dep` which returns a select which resolves to a single target.

This is based on the function `mkl_deps` which returns a select that resolves to a list of target. But this list has always one element or less, which is why it's easy to make this an alias.

PiperOrigin-RevId: 837493107
Imported from GitHub PR openxla/xla#34467

📝 Summary of Changes
AMD/ROCm Triton backend does not support warp specialization. `ThreadDims` are therefore calculated from module attributes and not retrieved from `nvvm.reqntid`.

🎯 Justification
As warp specialization is not currently supported by the AMD/ROCm Triton backend, this backend ignores the `nvvm.reqntid` attribute. Therefore, this attribute does not contain a correct value, as the currently Triton implementation assumes the number of threads per warp is always 32, which is not the case for some AMD targets (see https://github.com/triton-lang/triton/blob/49e174c6856aed1d36b85fb2b398ffaa32a80aa8/lib/Conversion/TritonGPUToLLVM/FuncOpToLLVM.cpp#L204C53-L204C68).
Consequently, the `ExtractThreadDims` has been adapted to calculate `ThreadDims` only based on attributes used and updated by the AMD triton backend.

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix

📊 Benchmark (for Performance Improvements)
Not relevant

🧪 Unit Tests:
Fixes failures of type:
```
xla/backends/gpu/codegen/triton/fusion_emitter_device_test.cc:4234: Failure
Value of: RunAndCompareNoHloPasses(kHloText, ErrorSpecForDotAlgorithm(algorithm))
 Actual: false (INTERNAL: Expected total threads as per reqntid attribute to be 32 but got 64 as per ttg.total-num-warps and tt.threads-per-warp attributes.)
Expected: true
```
for Triton Tests when targeting AMD GPUs.

🧪 Execution Tests:
Not relevant

Copybara import of the project:

--
24086e4e80223cdccd38c82af46f5bde96124b5a by Maxime France-Pillois <mfrancep@amd.com>:

[ROCm] Fix ExtractThreadDims for AMD targets

AMD/ROCm Triton backend does not support warp specialization.
ThreadDims are therefore calculated from the Module attributes and not retrieved from `nvvm.reqntid`.

Merging this change closes tensorflow#34467

PiperOrigin-RevId: 837500152
…o_tensor(...) to avoid .numpy() in graph/XLA mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.