Use convert_to_tensor inside __init__ to avoid eager-only numpy() under XLA#2
Draft
CodersAcademy006 wants to merge 1822 commits intomasterfrom
Draft
Use convert_to_tensor inside __init__ to avoid eager-only numpy() under XLA#2CodersAcademy006 wants to merge 1822 commits intomasterfrom
CodersAcademy006 wants to merge 1822 commits intomasterfrom
Conversation
PiperOrigin-RevId: 834674902
Imported from GitHub PR openxla/xla#34103 📝 Summary of Changes "mandatory" compatible layouts have to be assigned to both operands and outputs simultaneously such that subsequent layout propagation does not alter one of them making the operation invalid. 🚀 Kind of Contribution 🐛 Bug Fix 🧪 Unit Tests: yes 🧪 Execution Tests: no Copybara import of the project: -- f0ff62e4bf031a3aebf4cdadb66139b3b1120307 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Fix layout assignment of bitcast-converts. "mandatory" compatible layouts have to be assigned to both operands and outputs simultaneously such that subsequent layout propagation does not alter one of them making the operation invalid. Merging this change closes tensorflow#34103 PiperOrigin-RevId: 834676734
…intrinsics. PiperOrigin-RevId: 834684558
…parameter``` is true. Old GSPMD propagation needs them since they do not have the concept of open/closed sharding. In Shardy with sdy-round-trip, JAX creates the correct open/closed shardings for parameters and results. We do not need these vectors at all. Before this change, we always canonicalize layout at after Shardy propagaiton, which may be redundant. PiperOrigin-RevId: 834684933
PiperOrigin-RevId: 834691273
…tions Imported from GitHub PR openxla/xla#32053 📝 Summary of Changes Added CommandBuffer support for Convolution ops graph capture of convolutions is enabled only for convolution custom call targets explicitly added to '--legacy_command_buffer_custom_call_targets' list: see command_buffer_scheduling_test.cc for an example. 🎯 Justification This op wase missing for whatever reason: this results in graph fragmentation especially for large models. Hence one gets several (sometimes many) execution graphs instead of just one. 🚀 Kind of Contribution ✨ New Feature 🧪 Unit Tests: Added new subtest to xla/service/gpu/transforms/command_buffer_scheduling_test.cc This is a splitted PR originated from openxla/xla#30855 @xla-rotation could you have a look please ? Copybara import of the project: -- 5f5f5bc8ba8212ceb6afde6f9729ba4a951e4051 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: adding coll permute and convolution to command buffers added UTs and convolution command test fixes added rebase fixes capture only those convolution targets which are explictly Revert "adding coll permute and convolution to command buffers" This reverts commit 75847e67261b4589162411c9846ed9c0b9fc1ed5. added conv to command buffers fixing build and test -- e8afa3296a4a8ad079cde2e84391c7e0006ddf52 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: fixing build -- b529288221708e59a04bfaacdfa5b7a1c25b091e by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: rewritten ConvolutionCmd, adapted command_buffer_conv_pass -- 3ecd3d0516a7766d70d636b2110b4b310a9be7b2 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: some cosmetics Merging this change closes tensorflow#32053 PiperOrigin-RevId: 834708288
PiperOrigin-RevId: 834762307
PiperOrigin-RevId: 834776465
…numbers from a structured MLIR attribute. The `ParseDimensionNumbers` function now expects a single `dimension_numbers` attribute in the composite attributes, structured as an array of arrays representing contracting and batch dimensions. This change simplifies the attribute parsing. Additionally, checks are added to ensure that the parsed dimension numbers are supported and to handle cases where scale factors are not divisible by 32 for non-BF16 types, preventing rewriting in those scenarios. The test case is updated to reflect the new attribute format. PiperOrigin-RevId: 834777398
…rror flakes. They may be causing permission errors on Windows when Bazel tries to access header files in Windows SDK/Clang while building @@bazel_tools targets. PiperOrigin-RevId: 834792776
In LiteRT, input and output names shouldn't be empty. Populate default names if tensors don't have names. PiperOrigin-RevId: 834808503
PiperOrigin-RevId: 834837889
PiperOrigin-RevId: 834839651
…ation model for collective-permute uses transfer size and communication pattern type as input key. The interpolation is implemented as follows: - `CollectivePermuteCostModelType` which classifies communication patterns (e.g. one-way, two-way-all-mutual) is added to `ExactInterpolatorKey` for collective-permute instructions. - Interpolation for collective-permute uses only exact matching via `ExactInterpolator` based on transfer size, and does not use `FallbackInterpolator`. This is because cost of collective-permute is primarily dependent on bytes transferred and communication pattern, and not on number of devices in the same way as other collectives. PiperOrigin-RevId: 834846468
PiperOrigin-RevId: 834846503
Updates LLVM usage to match [355e0f94af5a](llvm/llvm-project@355e0f94af5a) PiperOrigin-RevId: 834865231
…rvice New namespace is absl_testing:: PiperOrigin-RevId: 834872277
When the call splitter is called on a non-flat graph, we don't want to implicitly flatten it by creating new bodies at each callsite. PiperOrigin-RevId: 834875376
PiperOrigin-RevId: 834896529
PiperOrigin-RevId: 834972320
Fixes a problem where structured_tensor fails to import under Python 3.14 with the error:
```
File ".../tensorflow/python/ops/structured/structured_tensor.py", line 54, in <module>
class StructuredTensor(extension_type.BatchableExtensionType):
...<1135 lines>...
return self._ragged_shape.rank
File ".../tensorflow/python/framework/extension_type.py", line 90, in __init__
_check_field_annotations(cls)
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
File ".../tensorflow/python/framework/extension_type.py", line 935, in _check_field_annotations
raise ValueError(
...<2 lines>...
)
ValueError: The field annotations for StructuredTensor are invalid. Field FieldName is missing a type annotation.
```
PiperOrigin-RevId: 834982237
PiperOrigin-RevId: 834982495
PiperOrigin-RevId: 834997967
…host transfer extension. PiperOrigin-RevId: 835038444
PiperOrigin-RevId: 835066344
PiperOrigin-RevId: 835085920
PiperOrigin-RevId: 835096840
PiperOrigin-RevId: 835111311
PiperOrigin-RevId: 835111320
PiperOrigin-RevId: 835118587
PiperOrigin-RevId: 837297785
PiperOrigin-RevId: 837301178
…ABSL versions internally and in OSS. PiperOrigin-RevId: 837314334
…already writes the chosen node into `ready_chosen` using `memcpy`, so we don't need the conditionals in the print statement. Also, we need to save the "unchosen" node's information for printing purposes. PiperOrigin-RevId: 837321312
To avoid bringing GPU dependencies with default FFI target, split GPU-specific context decoding into backends/gpu:ffi PiperOrigin-RevId: 837342185
This is expected to be a safe change. Calls to multimap::find() historically returned the first matching element, or end() if there is no match. However, this is not guaranteed, and recent changes in libc++ changed this to return an arbitrary element that matches. Using equal_range() is a safe replacement that will preserve the current behavior. PiperOrigin-RevId: 837348099
… id mapping PiperOrigin-RevId: 837366466
PiperOrigin-RevId: 837373785
PiperOrigin-RevId: 837386345
…s one was missed in previous consolidations. PiperOrigin-RevId: 837391515
PiperOrigin-RevId: 837393069
…tx backend Instead of relying on is_autotuning_compilation boolean it's now up to backend runner to properly set --fail_ptx_compilation_on_register_spilling based on --xla_gpu_filter_kernels_spilling_registers_on_autotuning. Note the change in AutotunerCompileUtil::Compile calls GpuCodegenBackend::AdjustDebugOptionsForAutotuning. That seems to also improve compile time of benchmarks. Dropped gemm fusion autotuner tests about spilling as they basically tested if the backend respects debug option flags. With this change they would become tautological at best and don't exercise any behavior of gemm_fusion_autotuner. PiperOrigin-RevId: 837403708
…nalysis. PiperOrigin-RevId: 837407555
Prior to this change we would hit an assert when constructing the tensor::CollapseShapeOp in the 0D->0D case. PiperOrigin-RevId: 837419131
Can be enabled with XLA_FLAGS="--xla_backend_extra_options=xla_cpu_enable_tiled_emitter" (!warning! may not work as expected for now) Reverts 3a4bd44 PiperOrigin-RevId: 837419373
…er base class. PiperOrigin-RevId: 837424830
PiperOrigin-RevId: 837425105
PiperOrigin-RevId: 837425159
…uctions to tiled emitter. PiperOrigin-RevId: 837432186
PiperOrigin-RevId: 837446324
flag Corresponding field is already removed from the proto PiperOrigin-RevId: 837451384
PiperOrigin-RevId: 837453532
To get the behavior in line with internal ASSERT_OK_AND_ASSIGN. Unlike the non-TF_ counterpart, the macro expands to code that already includes the trailing semicolon so it happens to work. Attempts to replace it with an internal variant as part of b/444419873 make it not work anymore, until the missing semicolons are added. PiperOrigin-RevId: 837461910
Some arguments to kernels may not be managed by the buffer assignments and consequently have no buffer slices attached to them. Examples of these include scalars and arguments whose memory is managed by the runtime thunks [CollectiveKernelThunk]. This change introduces a new type for arguments to be passed in with a shape but without an associated slice. PiperOrigin-RevId: 837470085
This change replaces usages of tsl::errors::AlreadyExists with absl::AlreadyExistsError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. Changes: - Replaced errors::AlreadyExists with absl::AlreadyExistsError. - Used absl::StrCat to construct error messages where necessary. Reverts 2fc3b48 PiperOrigin-RevId: 837473556
This change replaces usages of tsl::errors::ResourceExhausted with absl::ResourceExhaustedError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. Changes: - Replaced errors::ResourceExhausted with absl::ResourceExhaustedError. - Used absl::StrCat to construct error messages where necessary. - Fixed missing dependencies in BUILD files using build_cleaner. - Reordered includes in windows_file_system.cc. PiperOrigin-RevId: 837490121
It's currently a cc_library target which breaks the layering_check which I'm trying to enable. Therefore this change introduces a new Bazel function `mkl_dep` which returns a select which resolves to a single target. This is based on the function `mkl_deps` which returns a select that resolves to a list of target. But this list has always one element or less, which is why it's easy to make this an alias. PiperOrigin-RevId: 837493107
Imported from GitHub PR openxla/xla#34467 📝 Summary of Changes AMD/ROCm Triton backend does not support warp specialization. `ThreadDims` are therefore calculated from module attributes and not retrieved from `nvvm.reqntid`. 🎯 Justification As warp specialization is not currently supported by the AMD/ROCm Triton backend, this backend ignores the `nvvm.reqntid` attribute. Therefore, this attribute does not contain a correct value, as the currently Triton implementation assumes the number of threads per warp is always 32, which is not the case for some AMD targets (see https://github.com/triton-lang/triton/blob/49e174c6856aed1d36b85fb2b398ffaa32a80aa8/lib/Conversion/TritonGPUToLLVM/FuncOpToLLVM.cpp#L204C53-L204C68). Consequently, the `ExtractThreadDims` has been adapted to calculate `ThreadDims` only based on attributes used and updated by the AMD triton backend. 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: Fixes failures of type: ``` xla/backends/gpu/codegen/triton/fusion_emitter_device_test.cc:4234: Failure Value of: RunAndCompareNoHloPasses(kHloText, ErrorSpecForDotAlgorithm(algorithm)) Actual: false (INTERNAL: Expected total threads as per reqntid attribute to be 32 but got 64 as per ttg.total-num-warps and tt.threads-per-warp attributes.) Expected: true ``` for Triton Tests when targeting AMD GPUs. 🧪 Execution Tests: Not relevant Copybara import of the project: -- 24086e4e80223cdccd38c82af46f5bde96124b5a by Maxime France-Pillois <mfrancep@amd.com>: [ROCm] Fix ExtractThreadDims for AMD targets AMD/ROCm Triton backend does not support warp specialization. ThreadDims are therefore calculated from the Module attributes and not retrieved from `nvvm.reqntid`. Merging this change closes tensorflow#34467 PiperOrigin-RevId: 837500152
…o_tensor(...) to avoid .numpy() in graph/XLA mode
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces tf.constant(...) with tf.convert_to_tensor(...) inside init in example/test code so these classes work correctly with @tf.function(jit_compile=True). Fixes part of tensorflow#105151.