You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/performance_tuning/known_issues.md
+9-15Lines changed: 9 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Troubleshooting
6
6
### General Usage
7
7
8
8
-**Problem**: FP64 data type is unsupported on current platform.
9
-
-**Cause**: FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html)platform.
9
+
-**Cause**: FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html)and [Intel® Arc™ A-Series Graphics](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html) platforms.
10
10
If you run any AI workload on that platform and receive this error message, it means a kernel requires FP64 instructions that are not supported and the execution is stopped.
11
11
-**Problem**: Runtime error `invalid device pointer` if `import horovod.torch as hvd` before `import intel_extension_for_pytorch`
12
12
-**Cause**: Intel® Optimization for Horovod\* uses utilities provided by Intel® Extension for PyTorch\*. The improper import order causes Intel® Extension for PyTorch\* to be unloaded before Intel®
@@ -25,9 +25,9 @@ Troubleshooting
25
25
- **Solution**: Pass `export GLIBCXX_USE_CXX11_ABI=1` and compile PyTorch\* with particular compiler which supports `_GLIBCXX_USE_CXX11_ABI=1`. We recommend using prebuilt wheels
26
26
in [download server](https:// developer.intel.com/ipex-whl-stable-xpu) to avoid this issue.
27
27
- **Problem**: Bad termination after AI model execution finishes when using Intel MPI.
28
-
- **Cause**: This is a random issue when the AI model (e.g. RN50 training) execution finishes in an Intel MPI environment. It is not user-friendly as the model execution ends ungracefully.
28
+
- **Cause**: This is a random issue when the AI model (e.g. RN50 training) execution finishes in an Intel MPI environment. It is not user-friendly as the model execution ends ungracefully. It has been fixed in PyTorch* 2.3 ([#116312](https://github.com/pytorch/pytorch/commit/f657b2b1f8f35aa6ee199c4690d38a2b460387ae)).
29
29
- **Solution**: Add `dist.destroy_process_group()` during the cleanup stage in the model script, as described
30
-
in [Getting Started with Distributed Data Parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html).
30
+
in [Getting Started with Distributed Data Parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html), before Intel® Extension for PyTorch* supports PyTorch* 2.3.
31
31
- **Problem**: `-997 runtime error` when running some AI models on Intel® Arc™ A-Series GPUs.
32
32
- **Cause**: Some of the `-997 runtime error` are actually out-of-memory errors. As Intel® Arc™ A-Series GPUs have less device memory than Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU
33
33
Max Series, running some AI models on them may trigger out-of-memory errors and cause them to report failure such as `-997 runtime error` most likely. This is expected. Memory usage optimization is a work in progress to allow Intel® Arc™ A-Series GPUs to support more AI models.
@@ -38,21 +38,13 @@ Troubleshooting
38
38
- **Problem**: Some workloads terminate with an error `CL_DEVICE_NOT_FOUND` after some time on WSL2.
39
39
- **Cause**: This issue is due to the [TDR feature](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys#tdrdelay) on Windows.
40
40
- **Solution**: Try increasing TDRDelay in your Windows Registry to a large value, such as 20 (it is 2 seconds, by default), and reboot.
41
-
- **Problem**: Runtime error `Unable to find TSan function` might be raised when running some CPU AI workloads in certain scenarios.
42
-
- **Cause**: This issue is probably caused by the compatibility issue of OMP tool libraries.
43
-
- **Solution**: Please try the workaround: disable OMP tool libraries by `export OMP_TOOL="disabled"`, to unblock your workload. We are working on the final solution and will release it as soon as possible.
44
-
- **Problem**: The profiled data on GPU operators using legacy profiler is not accurate sometimes.
45
-
- **Cause**: Compiler in 2024.0 oneAPI basekit optimizes barrier implementation which brings negative impact on legacy profiler.
46
-
- **Solution**: Use Kineto profiler instead. Or use legacy profiler with `export UR_L0_IN_ORDER_BARRIER_BY_SIGNAL=0` to workaround this issue.
47
41
- **Problem**: Random bad termination after AI model convergence test (>24 hours) finishes.
48
42
- **Cause**: This is a random issue when some AI model convergence test execution finishes. It is not user-friendly as the model execution ends ungracefully.
49
43
- **Solution**: Kill the process after the convergence test finished, or use checkpoints to divide the convergence test into several phases and execute separately.
50
-
- **Problem**: Random GPU hang issue when executing the first allreduce in LLM inference workloads on 1 Intel® Data Center GPU Max 1550 card.
51
-
- **Cause**: Race condition happens between oneDNN kernels and oneCCL Bindings for Pytorch\* allreduce primitive.
52
-
- **Solution**: Use `TORCH_LLM_ALLREDUCE=0` to workaround this issue.
53
-
- **Problem**: GPU hang issue when executing LLM inference workloads on multi Intel® Data Center GPU Max series cards over PCIe communication.
54
-
- **Cause**: oneCCL Bindings for Pytorch\* allreduce primitive does not support PCIe for cross-cards communication.
55
-
- **Solution**: Enable XeLink for cross-cards communication, or use `TORCH_LLM_ALLREDUCE=0`for the PCIe only environments.
44
+
- **Problem**: Random instability issues such as page fault or atomic access violation when executing LLM inference workloads on Intel® Data Center GPU Max series cards.
45
+
- **Cause**: This issue is reported on LTS driver [803.29](https://dgpu-docs.intel.com/releases/LTS_803.29_20240131.html). The root cause is under investigation.
46
+
- **Solution**: Use active rolling stable release driver [775.20](https://dgpu-docs.intel.com/releases/stable_775_20_20231219.html) or latest driver version to workaround.
47
+
56
48
### Library Dependencies
57
49
58
50
- **Problem**: Cannot find oneMKL library when building Intel® Extension for PyTorch\* without oneMKL.
@@ -104,6 +96,8 @@ Troubleshooting
104
96
The following unit test fails on Intel® Data Center GPU Flex Series 170 but the same testcase passes on Intel® Data Center GPU Max Series. The root cause of the failure is under investigation.
Copy file name to clipboardExpand all lines: docs/tutorials/releases.md
+25Lines changed: 25 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,31 @@
1
1
Releases
2
2
=============
3
3
4
+
## 2.1.20+xpu
5
+
6
+
Intel® Extension for PyTorch\* v2.1.20+xpu is a minor release which supports Intel® GPU platforms (Intel® Data Center GPU Flex Series, Intel® Data Center GPU Max Series and Intel® Arc™ A-Series Graphics) based on PyTorch\* 2.1.0.
7
+
8
+
### Highlights
9
+
10
+
- Intel® oneAPI Base Toolkit 2024.1 compatibility
11
+
- Intel® oneDNN v3.4 integration
12
+
- LLM inference scaling optimization based on Intel® oneCCL 2021.12 (Prototype)
13
+
- Bug fixing and other optimization
14
+
- Uplift XeTLA to v0.3.4.1 [#3696](https://github.com/intel/intel-extension-for-pytorch/commit/dc0f6d39739404d38226ccf444c421706f14f2de)
15
+
-[SDP] Fallback unsupported bias size to native impl [#3706](https://github.com/intel/intel-extension-for-pytorch/commit/d897ebd585da05a90295165584efc448e265a38d)
- Fix beam search accuracy issue in workgroup reduce [#3796](https://github.com/intel/intel-extension-for-pytorch/commit/f2f20a523ee85ed1f44c7fa6465b8e5e1e2edfea)
18
+
- Support int32 index tensor in index operator [#3808](https://github.com/intel/intel-extension-for-pytorch/commit/f7bb4873c0416a9f56d1f7ecfbcdbe7ad58b47cd)
19
+
- Add deepspeed in LLM dockerfile [#3829](https://github.com/intel/intel-extension-for-pytorch/commit/6266f89833f8010d6c683f9b45cfb2031575ad92)
- Fix windows build failure with Intel® oneMKL 2024.1 in torch_patches [#18](https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.1.20/torch_patches/0018-use-ONEMKL_LIBRARIES-for-mkl-libs-in-torch-to-not-ov.patch)
23
+
- Fix FFT core dump issue with Intel® oneMKL 2024.1 in torch_patches [#20](https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.1.20/torch_patches/0020-Hide-MKL-symbols-211-212.patch), [#21](https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.1.20/torch_patches/0021-Fix-Windows-Build-214-215.patch)
24
+
25
+
### Known Issues
26
+
27
+
Please refer to [Known Issues webpage](./performance_tuning/known_issues.md).
28
+
4
29
## 2.1.10+xpu
5
30
6
31
Intel® Extension for PyTorch\* v2.1.10+xpu is the new Intel® Extension for PyTorch\* release supports both CPU platforms and GPU platforms (Intel® Data Center GPU Flex Series, Intel® Data Center GPU Max Series and Intel® Arc™ A-Series Graphics) based on PyTorch\* 2.1.0. It extends PyTorch\* 2.1.0 with up-to-date features and optimizations on `xpu` for an extra performance boost on Intel hardware. Optimizations take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, through PyTorch*`xpu` device, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs with PyTorch*.
0 commit comments