update docs for 2.0.100 release (#1631)

jingxu10 · web-flow · commit 25b721274747 · 2023-05-12T18:44:30.000+09:00
diff --git a/docker/Dockerfile.compile b/docker/Dockerfile.compile
@@ -40,7 +40,7 @@ RUN curl -fsSL -v -o ~/miniconda.sh -O  https://repo.anaconda.com/miniconda/Mini
 FROM dev-base AS build
 COPY --from=conda /opt/conda /opt/conda
 RUN --mount=type=cache,target=/opt/ccache \
-    curl -fsSL -v -o compile_bundle.sh -O https://github.com/intel/intel-extension-for-pytorch/blob/v2.0.0+cpu/scripts/compile_bundle.sh && \
+    curl -fsSL -v -o compile_bundle.sh -O https://github.com/intel/intel-extension-for-pytorch/blob/v2.0.100+cpu/scripts/compile_bundle.sh && \
     bash compile_bundle.sh && \
     python -m pip install --no-cache-dir intel-extension-for-pytorch/dist/*.whl && \
     rm -rf intel-extension-for-pytorch llvm-project compile_bundle.sh
diff --git a/docker/Dockerfile.prebuilt b/docker/Dockerfile.prebuilt
@@ -27,10 +27,10 @@ RUN ${PYTHON} -m pip --no-cache-dir install --upgrade \
 # Some TF tools expect a "python" binary
 RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
 
-ARG IPEX_VERSION=2.0.0
-ARG PYTORCH_VERSION=2.0.0
-ARG TORCHAUDIO_VERSION=2.0.0
-ARG TORCHVISION_VERSION=0.15.0
+ARG IPEX_VERSION=2.0.100
+ARG PYTORCH_VERSION=2.0.1
+ARG TORCHAUDIO_VERSION=2.0.2
+ARG TORCHVISION_VERSION=0.15.2
 ARG TORCH_CPU_URL=https://download.pytorch.org/whl/cpu/torch_stable.html
 
 RUN \
diff --git a/docs/tutorials/blogs_publications.md b/docs/tutorials/blogs_publications.md
@@ -1,8 +1,9 @@
 Blogs & Publications
 ====================
 
-* [Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
-* [Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
+* [Intel® Deep Learning Boost (Intel® DL Boost) - Improve Inference Performance of Hugging Face BERT Base Model in Google Cloud Platform (GCP) Technology Guide, Apr 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-intel-dl-boost-improve-inference-performance-of-hugging-face-bert-base-model-in-google-cloud-platform-gcp-technology-guide)
+* [Get Started with Intel® Extension for PyTorch\* on GPU | Intel Software, Mar 2023](https://www.youtube.com/watch?v=Id-rE2Q7xZ0&t=1s)
+* [Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs, Mar 2023](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
 * [Accelerating PyTorch Transformers with Intel Sapphire Rapids, Part 1, Jan 2023](https://huggingface.co/blog/intel-sapphire-rapids)
 * [Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide, Jan 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
 * [Scaling inference on CPUs with TorchServe, PyTorch Conference, Dec 2022](https://www.youtube.com/watch?v=066_Jd6cwZg)
diff --git a/docs/tutorials/examples.md b/docs/tutorials/examples.md
@@ -312,4 +312,4 @@ $ ldd example-app
 
 ## Model Zoo
 
-Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
+Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0.100-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0.100-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
diff --git a/docs/tutorials/features/hypertune.md b/docs/tutorials/features/hypertune.md
@@ -95,15 +95,15 @@ This is the script as an optimization function.
 'target_val'                               # optional. Target value of the objective function. Default is -float('inf')
 ```
 
-Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
+Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
 
 ## Usage Examples
 
 **Tuning `ncores_per_instance` for minimum `latency`**
 
 Suppose we want to tune `ncores_per_instance` for a single instance to minimize latency for resnet50 on a machine with two Intel(R) Xeon(R) Platinum 8180M CPUs. Each socket has 28 physical cores and another 28 logical cores.
 
-Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
+Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
 ```
 python -m intel_extension_for_pytorch.cpu.hypertune --conf_file <hypertune_directory>/example/example.yaml <hypertune_directory>/example/resnet50.py
 ```
@@ -115,6 +115,6 @@ latency: 12.339081764221191
 ```
 15 `ncores_per_instance` gave the minimum latency.
 
-You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
+You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
 
 Hypertune can also optimize multi-objective function. Add as many objectives as you would like to your script.
diff --git a/docs/tutorials/installation.md b/docs/tutorials/installation.md
@@ -17,7 +17,7 @@ Make sure PyTorch is installed so that the extension will work properly. For eac
 
 |PyTorch Version|Extension Version|
 |--|--|
-|[v2.0.\*](https://github.com/pytorch/pytorch/tree/v2.0.0 "v2.0.0")|[v2.0.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu)|
+|[v2.0.\*](https://github.com/pytorch/pytorch/tree/v2.0.1 "v2.0.1")|[v2.0.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu)|
 |[v1.13.\*](https://github.com/pytorch/pytorch/tree/v1.13.0 "v1.13.0")|[v1.13.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu)|
 |[v1.12.\*](https://github.com/pytorch/pytorch/tree/v1.12.0 "v1.12.0")|[v1.12.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v1.12.300)|
 |[v1.11.\*](https://github.com/pytorch/pytorch/tree/v1.11.0 "v1.11.0")|[v1.11.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v1.11.200)|
@@ -48,6 +48,7 @@ Prebuilt wheel files availability matrix for Python versions
 
 | Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 | Python 3.11 |
 | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
+| 2.0.100 |  |  | ✔️ | ✔️ | ✔️ | ✔️ |
 | 2.0.0 |  |  | ✔️ | ✔️ | ✔️ | ✔️ |
 | 1.13.100 |  | ✔️ | ✔️ | ✔️ | ✔️ |  |
 | 1.13.0 |  | ✔️ | ✔️ | ✔️ | ✔️ |  |
@@ -88,7 +89,7 @@ python -m pip install <package_name>==<version_name> -f https://developer.intel.
 To ensure a smooth compilation, a script is provided in the Github repo. If you would like to compile the binaries from source, it is highly recommended to utilize this script.
 
 ```bash
-$ wget https://raw.githubusercontent.com/intel/intel-extension-for-pytorch/v2.0.0+cpu/scripts/compile_bundle.sh
+$ wget https://raw.githubusercontent.com/intel/intel-extension-for-pytorch/v2.0.100+cpu/scripts/compile_bundle.sh
 $ bash compile_bundle.sh
 ```
 
@@ -152,6 +153,7 @@ docker pull intel/intel-optimized-pytorch:latest
 
 |Version|Pre-cxx11 ABI|cxx11 ABI|
 |--|--|--|
+| 2.0.100 | [libintel-ext-pt-2.0.100+cpu.run](https://intel-extension-for-pytorch.s3.amazonaws.com/libipex/cpu/libintel-ext-pt-2.0.100%2Bcpu.run) | [libintel-ext-pt-cxx11-abi-2.0.100+cpu.run](https://intel-extension-for-pytorch.s3.amazonaws.com/libipex/cpu/libintel-ext-pt-cxx11-abi-2.0.100%2Bcpu.run) |
 | 2.0.0 | [libintel-ext-pt-2.0.0+cpu.run](https://intel-extension-for-pytorch.s3.amazonaws.com/libipex/cpu/libintel-ext-pt-2.0.0%2Bcpu.run) | [libintel-ext-pt-cxx11-abi-2.0.0+cpu.run](https://intel-extension-for-pytorch.s3.amazonaws.com/libipex/cpu/libintel-ext-pt-cxx11-abi-2.0.0%2Bcpu.run) |
 | 1.13.100 | [libintel-ext-pt-1.13.100+cpu.run](https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libipex/cpu/libintel-ext-pt-1.13.100%2Bcpu.run) | [libintel-ext-pt-cxx11-abi-1.13.100+cpu.run](https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libipex/cpu/libintel-ext-pt-cxx11-abi-1.13.100%2Bcpu.run) |
 | 1.13.0 | [libintel-ext-pt-1.13.0+cpu.run](https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libipex/cpu/libintel-ext-pt-1.13.0%2Bcpu.run) | [libintel-ext-pt-cxx11-abi-1.13.0+cpu.run](https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libipex/cpu/libintel-ext-pt-cxx11-abi-1.13.0%2Bcpu.run) |
diff --git a/docs/tutorials/performance_tuning/known_issues.md b/docs/tutorials/performance_tuning/known_issues.md
@@ -7,6 +7,13 @@ Known Issues
 
 - If you found the workload runs with Intel® Extension for PyTorch\* occupies a remarkably large amount of memory, you can try to reduce the occupied memory size by setting the `--weights_prepack` parameter of the `ipex.optimize()` function to `False`.
 
+- If running DDP with launch script, explicit configuration of the `nprocs_per_node` argument won't take effect. Please replace line 155 of the `intel_extension_for_pytorch/cpu/launch/launcher_distributed.py` file to the following code snippet.
+
+```
+        if args.nprocs_per_node == 0:
+            args.nprocs_per_node = len(set([c.node for c in self.cpuinfo.pool_all])) if len(nodes_list) == 0 else len(nodes_list)
+```
+
 - If inference is done with a custom function, `conv+bn` folding feature of the `ipex.optimize()` function doesn't work.
 
   ```
@@ -54,13 +61,13 @@ Known Issues
 - When working with an NLP model inference with dynamic input data length appling with TorchScript (either `torch.jit.trace` or `torch.jit.script`), performance with Intel® Extension for PyTorch\* is possible to be less than that without Intel® Extension for PyTorch\*. In this case, adding the workarounds below would help solve this issue.
   - Python interface
     ```python
-	torch._C._jit_set_texpr_fuser_enabled(False)
-	```
+    torch._C._jit_set_texpr_fuser_enabled(False)
+    ```
   - C++ interface
     ```c++
     #include <torch/csrc/jit/passes/tensorexpr_fuser.h>
-	torch::jit::setTensorExprFuserEnabled(false);
-	```
+    torch::jit::setTensorExprFuserEnabled(false);
+    ```
 
 ## INT8
 
diff --git a/docs/tutorials/performance_tuning/launch_script.md b/docs/tutorials/performance_tuning/launch_script.md
@@ -92,7 +92,7 @@ run_20210712212258_instance_0_cores_0-43.log
 
 ## Usage Examples
 
-Example script [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/examples/cpu/inference/resnet50_general_inference_script.py) will be used in this guide.
+Example script [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/examples/cpu/inference/resnet50_general_inference_script.py) will be used in this guide.
 
 - Single instance for inference
   - [I. Use all physical cores](#i-use-all-physical-cores)
diff --git a/docs/tutorials/releases.md b/docs/tutorials/releases.md
@@ -1,6 +1,23 @@
 Releases
 =============
 
+## 2.0.100
+
+### Highlights
+
+- Enhanced the functionality of Intel® Extension for PyTorch as a backend of `torch.compile`: [#1568](https://github.com/intel/intel-extension-for-pytorch/commit/881c6fe0e6f8ab84a564b02216ddb96a3589363e) [#1585](https://github.com/intel/intel-extension-for-pytorch/commit/f5ce6193496ae68a57d688a3b3bbff541755e4ce) [#1590](https://github.com/intel/intel-extension-for-pytorch/commit/d8723df73358ae495ae5f62b5cdc90ae08920d27)
+- Fixed the Stable Diffusion fine-tuning accuracy issue [#1587](https://github.com/intel/intel-extension-for-pytorch/commit/bc76ab133b7330852931db9cda8dca7c69a0b594) [#1594](https://github.com/intel/intel-extension-for-pytorch/commit/b2983b4d35fc0ea7f5bdaf37f6e269256f8c36c4)
+- Fixed the ISA check on old hypervisor based VM [#1513](https://github.com/intel/intel-extension-for-pytorch/commit/a34eab577c4efa1c336b1f91768075bb490c1f14)
+- Addressed the excessive memory usage in weight prepack [#1593](https://github.com/intel/intel-extension-for-pytorch/commit/ee7dc343790d1d63bab1caf71e57dd3f7affdce9)
+- Fixed the weight prepack of convolution when `padding_mode` is not `'zeros'` [#1580](https://github.com/intel/intel-extension-for-pytorch/commit/02449ccb3a6b475643116532a4cffbe1f974c1d9)
+- Optimized the INT8 LSTM performance [#1566](https://github.com/intel/intel-extension-for-pytorch/commit/fed42b17391fed477ae8adec83d920f8f8fb1a80)
+- Fixed TransNetV2 calibration failure [#1564](https://github.com/intel/intel-extension-for-pytorch/commit/046f7dfbaa212389ac58ae219597c16403e66bad)
+- Fixed BF16 RNN-T inference when `AVX512_CORE_VNNI` ISA is used [#1592](https://github.com/intel/intel-extension-for-pytorch/commit/023c104ab5953cf63b84efeb5176007d876015a2)
+- Fixed the ROIAlign operator [#1589](https://github.com/intel/intel-extension-for-pytorch/commit/6beb3d4661f09f55d031628ebe9fa6d63f04cab1)
+- Enabled execution on designated numa nodes with launch script [#1517](https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-cpu/commit/2ab3693d50d6edd4bfae766f75dc273396a79488)
+
+**Full Changelog**: https://github.com/intel/intel-extension-for-pytorch/compare/v2.0.0+cpu...v2.0.100+cpu
+
 ## 2.0.0
 
 We are pleased to announce the release of Intel® Extension for PyTorch\* 2.0.0-cpu which accompanies PyTorch 2.0. This release mainly brings in our latest optimization on NLP (BERT), support of PyTorch 2.0's hero API –- torch.compile as one of its backend, together with a set of bug fixing and small optimization.
diff --git a/scripts/compile_bundle.sh b/scripts/compile_bundle.sh
@@ -6,7 +6,7 @@ VER_LLVM="llvmorg-13.0.0"
 VER_PYTORCH=""
 VER_TORCHVISION=""
 VER_TORCHAUDIO=""
-VER_IPEX="v2.0.0+cpu"
+VER_IPEX="v2.0.100+cpu"
 
 # Check existance of required Linux commands
 for CMD in gcc g++ python git nproc; do
@@ -52,9 +52,6 @@ ABI=$(python -c "import torch; print(int(torch._C._GLIBCXX_USE_CXX11_ABI))")
 # Compile individual component
 #  LLVM
 cd ../llvm-project
-if [ ${UID} -eq 0 ]; then
-    git config --global --add safe.directory `pwd`
-fi
 if [ -d build ]; then
     rm -rf build
 fi
@@ -72,14 +69,8 @@ ln -s ${LLVM_ROOT}/bin/llvm-config ${LLVM_ROOT}/bin/llvm-config-13
 export PATH=${LLVM_ROOT}/bin:$PATH
 export LD_LIBRARY_PATH=${LLVM_ROOT}/lib:$LD_LIBRARY_PATH
 cd ..
-if [ ${UID} -eq 0 ]; then
-    git config --global --unset safe.directory
-fi
 #  Intel® Extension for PyTorch*
 cd ../intel-extension-for-pytorch
-if [ ${UID} -eq 0 ]; then
-    git config --global --add safe.directory `pwd`
-fi
 python -m pip install -r requirements.txt
 export USE_LLVM=${LLVM_ROOT}
 export LLVM_DIR=${USE_LLVM}/lib/cmake/llvm
@@ -90,9 +81,6 @@ unset DNNL_GRAPH_BUILD_COMPILER_BACKEND
 unset LLVM_DIR
 unset USE_LLVM
 python -m pip install --force-reinstall dist/*.whl
-if [ ${UID} -eq 0 ]; then
-    git config --global --unset safe.directory
-fi
 
 # Sanity Test
 cd ..

Original file line number	Diff line number	Diff line change
`@@ -312,4 +312,4 @@ $ ldd example-app`
`312`	`312`
`313`	`313`	`## Model Zoo`
`314`	`314`
`315`		`-Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.`
	`315`	`+Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0.100-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0.100-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.`