You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
180
181
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
181
182
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
183
+
dynamically_allocate_resources (bool): Dynamically allocate resources during engine execution.
182
184
**kwargs: Any,
183
185
Returns:
184
186
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
"""Compile an ExportedProgram module for NVIDIA GPUs using TensorRT
@@ -512,6 +516,7 @@ def compile(
512
516
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
513
517
offload_module_to_cpu (bool): Offload the module to CPU. This is useful when we need to minimize GPU memory usage.
514
518
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
519
+
dynamically_allocate_resources (bool): Dynamically allocate resources during engine execution.
515
520
**kwargs: Any,
516
521
Returns:
517
522
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
Copy file name to clipboardExpand all lines: py/torch_tensorrt/dynamo/_settings.py
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@
11
11
DLA_GLOBAL_DRAM_SIZE,
12
12
DLA_LOCAL_DRAM_SIZE,
13
13
DLA_SRAM_SIZE,
14
+
DYNAMICALLY_ALLOCATE_RESOURCES,
14
15
DRYRUN,
15
16
ENABLE_CROSS_COMPILE_FOR_WINDOWS,
16
17
ENABLE_EXPERIMENTAL_DECOMPOSITIONS,
@@ -97,6 +98,8 @@ class CompilationSettings:
97
98
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
98
99
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
99
100
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
101
+
offload_module_to_cpu (bool): Offload the model to CPU to reduce memory footprint during compilation
102
+
dynamically_allocate_resources (bool): Dynamically allocate resources for TensorRT engines
0 commit comments