You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/blogs_publications.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,9 @@
1
1
Blogs & Publications
2
2
====================
3
3
4
-
*[Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
5
-
*[Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
4
+
*[Intel® Deep Learning Boost (Intel® DL Boost) - Improve Inference Performance of Hugging Face BERT Base Model in Google Cloud Platform (GCP) Technology Guide, Apr 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-intel-dl-boost-improve-inference-performance-of-hugging-face-bert-base-model-in-google-cloud-platform-gcp-technology-guide)
5
+
*[Get Started with Intel® Extension for PyTorch\* on GPU | Intel Software, Mar 2023](https://www.youtube.com/watch?v=Id-rE2Q7xZ0&t=1s)
6
+
*[Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs, Mar 2023](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
6
7
*[Accelerating PyTorch Transformers with Intel Sapphire Rapids, Part 1, Jan 2023](https://huggingface.co/blog/intel-sapphire-rapids)
7
8
*[Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide, Jan 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
8
9
*[Scaling inference on CPUs with TorchServe, PyTorch Conference, Dec 2022](https://www.youtube.com/watch?v=066_Jd6cwZg)
Copy file name to clipboardExpand all lines: docs/tutorials/examples.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -312,4 +312,4 @@ $ ldd example-app
312
312
313
313
## Model Zoo
314
314
315
-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
315
+
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0.100-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0.100-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
Copy file name to clipboardExpand all lines: docs/tutorials/features/hypertune.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,15 +95,15 @@ This is the script as an optimization function.
95
95
'target_val' # optional. Target value of the objective function. Default is -float('inf')
96
96
```
97
97
98
-
Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
98
+
Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
99
99
100
100
## Usage Examples
101
101
102
102
**Tuning `ncores_per_instance` for minimum `latency`**
103
103
104
104
Suppose we want to tune `ncores_per_instance` for a single instance to minimize latency for resnet50 on a machine with two Intel(R) Xeon(R) Platinum 8180M CPUs. Each socket has 28 physical cores and another 28 logical cores.
105
105
106
-
Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
106
+
Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
15 `ncores_per_instance` gave the minimum latency.
117
117
118
-
You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
118
+
You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
119
119
120
120
Hypertune can also optimize multi-objective function. Add as many objectives as you would like to your script.
To ensure a smooth compilation, a script is provided in the Github repo. If you would like to compile the binaries from source, it is highly recommended to utilize this script.
Copy file name to clipboardExpand all lines: docs/tutorials/performance_tuning/known_issues.md
+11-4Lines changed: 11 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,13 @@ Known Issues
7
7
8
8
- If you found the workload runs with Intel® Extension for PyTorch\* occupies a remarkably large amount of memory, you can try to reduce the occupied memory size by setting the `--weights_prepack` parameter of the `ipex.optimize()` function to `False`.
9
9
10
+
- If running DDP with launch script, explicit configuration of the `nprocs_per_node` argument won't take effect. Please replace line 155 of the `intel_extension_for_pytorch/cpu/launch/launcher_distributed.py` file to the following code snippet.
11
+
12
+
```
13
+
if args.nprocs_per_node == 0:
14
+
args.nprocs_per_node = len(set([c.node for c in self.cpuinfo.pool_all])) if len(nodes_list) == 0 else len(nodes_list)
15
+
```
16
+
10
17
- If inference is done with a custom function, `conv+bn` folding feature of the `ipex.optimize()` function doesn't work.
11
18
12
19
```
@@ -54,13 +61,13 @@ Known Issues
54
61
- When working with an NLP model inference with dynamic input data length appling with TorchScript (either `torch.jit.trace` or `torch.jit.script`), performance with Intel® Extension for PyTorch\* is possible to be less than that without Intel® Extension for PyTorch\*. In this case, adding the workarounds below would help solve this issue.
Example script [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/examples/cpu/inference/resnet50_general_inference_script.py) will be used in this guide.
95
+
Example script [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/examples/cpu/inference/resnet50_general_inference_script.py) will be used in this guide.
96
96
97
97
- Single instance for inference
98
98
-[I. Use all physical cores](#i-use-all-physical-cores)
Copy file name to clipboardExpand all lines: docs/tutorials/releases.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,23 @@
1
1
Releases
2
2
=============
3
3
4
+
## 2.0.100
5
+
6
+
### Highlights
7
+
8
+
- Enhanced the functionality of Intel® Extension for PyTorch as a backend of `torch.compile`: [#1568](https://github.com/intel/intel-extension-for-pytorch/commit/881c6fe0e6f8ab84a564b02216ddb96a3589363e)[#1585](https://github.com/intel/intel-extension-for-pytorch/commit/f5ce6193496ae68a57d688a3b3bbff541755e4ce)[#1590](https://github.com/intel/intel-extension-for-pytorch/commit/d8723df73358ae495ae5f62b5cdc90ae08920d27)
9
+
- Fixed the Stable Diffusion fine-tuning accuracy issue [#1587](https://github.com/intel/intel-extension-for-pytorch/commit/bc76ab133b7330852931db9cda8dca7c69a0b594)[#1594](https://github.com/intel/intel-extension-for-pytorch/commit/b2983b4d35fc0ea7f5bdaf37f6e269256f8c36c4)
10
+
- Fixed the ISA check on old hypervisor based VM [#1513](https://github.com/intel/intel-extension-for-pytorch/commit/a34eab577c4efa1c336b1f91768075bb490c1f14)
11
+
- Addressed the excessive memory usage in weight prepack [#1593](https://github.com/intel/intel-extension-for-pytorch/commit/ee7dc343790d1d63bab1caf71e57dd3f7affdce9)
12
+
- Fixed the weight prepack of convolution when `padding_mode` is not `'zeros'`[#1580](https://github.com/intel/intel-extension-for-pytorch/commit/02449ccb3a6b475643116532a4cffbe1f974c1d9)
13
+
- Optimized the INT8 LSTM performance [#1566](https://github.com/intel/intel-extension-for-pytorch/commit/fed42b17391fed477ae8adec83d920f8f8fb1a80)
- Fixed BF16 RNN-T inference when `AVX512_CORE_VNNI` ISA is used [#1592](https://github.com/intel/intel-extension-for-pytorch/commit/023c104ab5953cf63b84efeb5176007d876015a2)
16
+
- Fixed the ROIAlign operator [#1589](https://github.com/intel/intel-extension-for-pytorch/commit/6beb3d4661f09f55d031628ebe9fa6d63f04cab1)
17
+
- Enabled execution on designated numa nodes with launch script [#1517](https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-cpu/commit/2ab3693d50d6edd4bfae766f75dc273396a79488)
We are pleased to announce the release of Intel® Extension for PyTorch\* 2.0.0-cpu which accompanies PyTorch 2.0. This release mainly brings in our latest optimization on NLP (BERT), support of PyTorch 2.0's hero API –- torch.compile as one of its backend, together with a set of bug fixing and small optimization.
0 commit comments