You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
179
180
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
180
181
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
182
+
cpu_memory_budget (int): The maximum amount of CPU memory to use for the compilation. If the compilation requires more memory than this budget, the compilation will fail. If set to -1, the compilation will use all available CPU memory.
181
183
**kwargs: Any,
182
184
Returns:
183
185
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
# The global partitioner leaves non-TRT nodes as-is
@@ -868,6 +882,16 @@ def preserve_module_specs(
868
882
# Iterate over all components that can be accelerated
869
883
# Generate the corresponding TRT Module for those
870
884
885
+
# Here we delete the frozen parameters from the graph module. Note this does not affect the submodules. We are going to delete the frozen parameters from the submodules in the convert_module function.
0 commit comments