You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Or you can manually launch training by `paddle.distributed.launch`. `paddle.distributed.launch` is a built-in module in PaddlePaddle that spawns up multiple distributed training processes on each of the training nodes.
@@ -497,6 +497,7 @@ Advanced Training:
497
497
--use-dynamic-loss-scaling
498
498
Enable dynamic loss scaling in AMP training, only be applied when --amp is set. (default: False)
499
499
--use-pure-fp16 Enable pure FP16 training, only be applied when --amp is set. (default: False)
500
+
--fuse-resunit Enable CUDNNv8 ResUnit fusion, only be applied when --amp is set. (default: False)
500
501
--asp Enable automatic sparse training (ASP). (default: False)
501
502
--prune-model Prune model to 2:4 sparse pattern, only be applied when --asp is set. (default: False)
502
503
--mask-algo {mask_1d,mask_2d_greedy,mask_2d_best}
@@ -827,8 +828,8 @@ To achieve these same results, follow the steps in the [Quick Start Guide](#quic
827
828
828
829
|**GPUs**|**Throughput - TF32**|**Throughput - mixed precision**|**Throughput speedup (TF32 to mixed precision)**|**TF32 Scaling**|**Mixed Precision Scaling**|**Mixed Precision Training Time (90E)**|**TF32 Training Time (90E)**|
0 commit comments