Skip to content

Commit 716d786

Browse files
authored
Update Release Note and Known Issues (#3976)
1 parent 54b8171 commit 716d786

File tree

2 files changed

+34
-15
lines changed

2 files changed

+34
-15
lines changed

docs/tutorials/performance_tuning/known_issues.md

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Troubleshooting
66
### General Usage
77

88
- **Problem**: FP64 data type is unsupported on current platform.
9-
- **Cause**: FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) platform.
9+
- **Cause**: FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) and [Intel® Arc™ A-Series Graphics](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html) platforms.
1010
If you run any AI workload on that platform and receive this error message, it means a kernel requires FP64 instructions that are not supported and the execution is stopped.
1111
- **Problem**: Runtime error `invalid device pointer` if `import horovod.torch as hvd` before `import intel_extension_for_pytorch`
1212
- **Cause**: Intel® Optimization for Horovod\* uses utilities provided by Intel® Extension for PyTorch\*. The improper import order causes Intel® Extension for PyTorch\* to be unloaded before Intel®
@@ -25,9 +25,9 @@ Troubleshooting
2525
- **Solution**: Pass `export GLIBCXX_USE_CXX11_ABI=1` and compile PyTorch\* with particular compiler which supports `_GLIBCXX_USE_CXX11_ABI=1`. We recommend using prebuilt wheels
2626
in [download server](https:// developer.intel.com/ipex-whl-stable-xpu) to avoid this issue.
2727
- **Problem**: Bad termination after AI model execution finishes when using Intel MPI.
28-
- **Cause**: This is a random issue when the AI model (e.g. RN50 training) execution finishes in an Intel MPI environment. It is not user-friendly as the model execution ends ungracefully.
28+
- **Cause**: This is a random issue when the AI model (e.g. RN50 training) execution finishes in an Intel MPI environment. It is not user-friendly as the model execution ends ungracefully. It has been fixed in PyTorch* 2.3 ([#116312](https://github.com/pytorch/pytorch/commit/f657b2b1f8f35aa6ee199c4690d38a2b460387ae)).
2929
- **Solution**: Add `dist.destroy_process_group()` during the cleanup stage in the model script, as described
30-
in [Getting Started with Distributed Data Parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html).
30+
in [Getting Started with Distributed Data Parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html), before Intel® Extension for PyTorch* supports PyTorch* 2.3.
3131
- **Problem**: `-997 runtime error` when running some AI models on Intel® Arc™ A-Series GPUs.
3232
- **Cause**: Some of the `-997 runtime error` are actually out-of-memory errors. As Intel® Arc™ A-Series GPUs have less device memory than Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU
3333
Max Series, running some AI models on them may trigger out-of-memory errors and cause them to report failure such as `-997 runtime error` most likely. This is expected. Memory usage optimization is a work in progress to allow Intel® Arc™ A-Series GPUs to support more AI models.
@@ -38,21 +38,13 @@ Troubleshooting
3838
- **Problem**: Some workloads terminate with an error `CL_DEVICE_NOT_FOUND` after some time on WSL2.
3939
- **Cause**: This issue is due to the [TDR feature](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys#tdrdelay) on Windows.
4040
- **Solution**: Try increasing TDRDelay in your Windows Registry to a large value, such as 20 (it is 2 seconds, by default), and reboot.
41-
- **Problem**: Runtime error `Unable to find TSan function` might be raised when running some CPU AI workloads in certain scenarios.
42-
- **Cause**: This issue is probably caused by the compatibility issue of OMP tool libraries.
43-
- **Solution**: Please try the workaround: disable OMP tool libraries by `export OMP_TOOL="disabled"`, to unblock your workload. We are working on the final solution and will release it as soon as possible.
44-
- **Problem**: The profiled data on GPU operators using legacy profiler is not accurate sometimes.
45-
- **Cause**: Compiler in 2024.0 oneAPI basekit optimizes barrier implementation which brings negative impact on legacy profiler.
46-
- **Solution**: Use Kineto profiler instead. Or use legacy profiler with `export UR_L0_IN_ORDER_BARRIER_BY_SIGNAL=0` to workaround this issue.
4741
- **Problem**: Random bad termination after AI model convergence test (>24 hours) finishes.
4842
- **Cause**: This is a random issue when some AI model convergence test execution finishes. It is not user-friendly as the model execution ends ungracefully.
4943
- **Solution**: Kill the process after the convergence test finished, or use checkpoints to divide the convergence test into several phases and execute separately.
50-
- **Problem**: Random GPU hang issue when executing the first allreduce in LLM inference workloads on 1 Intel® Data Center GPU Max 1550 card.
51-
- **Cause**: Race condition happens between oneDNN kernels and oneCCL Bindings for Pytorch\* allreduce primitive.
52-
- **Solution**: Use `TORCH_LLM_ALLREDUCE=0` to workaround this issue.
53-
- **Problem**: GPU hang issue when executing LLM inference workloads on multi Intel® Data Center GPU Max series cards over PCIe communication.
54-
- **Cause**: oneCCL Bindings for Pytorch\* allreduce primitive does not support PCIe for cross-cards communication.
55-
- **Solution**: Enable XeLink for cross-cards communication, or use `TORCH_LLM_ALLREDUCE=0` for the PCIe only environments.
44+
- **Problem**: Random instability issues such as page fault or atomic access violation when executing LLM inference workloads on Intel® Data Center GPU Max series cards.
45+
- **Cause**: This issue is reported on LTS driver [803.29](https://dgpu-docs.intel.com/releases/LTS_803.29_20240131.html). The root cause is under investigation.
46+
- **Solution**: Use active rolling stable release driver [775.20](https://dgpu-docs.intel.com/releases/stable_775_20_20231219.html) or latest driver version to workaround.
47+
5648
### Library Dependencies
5749
5850
- **Problem**: Cannot find oneMKL library when building Intel® Extension for PyTorch\* without oneMKL.
@@ -104,6 +96,8 @@ Troubleshooting
10496
The following unit test fails on Intel® Data Center GPU Flex Series 170 but the same test case passes on Intel® Data Center GPU Max Series. The root cause of the failure is under investigation.
10597
- `test_weight_norm.py::TestNNMethod::test_weight_norm_differnt_type`
10698

99+
- Unit test failures on Intel® Arc™ A770
100+
107101
The following unit tests fail in Windows environment on Intel® Arc™ A770 Graphic card. The root cause of the failures is under investigation.
108102
- `test_foreach.py::TestTorchMethod::test_foreach_cos`
109103
- `test_foreach.py::TestTorchMethod::test_foreach_sin`

docs/tutorials/releases.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,31 @@
11
Releases
22
=============
33

4+
## 2.1.20+xpu
5+
6+
Intel® Extension for PyTorch\* v2.1.20+xpu is a minor release which supports Intel® GPU platforms (Intel® Data Center GPU Flex Series, Intel® Data Center GPU Max Series and Intel® Arc™ A-Series Graphics) based on PyTorch\* 2.1.0.
7+
8+
### Highlights
9+
10+
- Intel® oneAPI Base Toolkit 2024.1 compatibility
11+
- Intel® oneDNN v3.4 integration
12+
- LLM inference scaling optimization based on Intel® oneCCL 2021.12 (Prototype)
13+
- Bug fixing and other optimization
14+
- Uplift XeTLA to v0.3.4.1 [#3696](https://github.com/intel/intel-extension-for-pytorch/commit/dc0f6d39739404d38226ccf444c421706f14f2de)
15+
- [SDP] Fallback unsupported bias size to native impl [#3706](https://github.com/intel/intel-extension-for-pytorch/commit/d897ebd585da05a90295165584efc448e265a38d)
16+
- Error handling enhancement [#3788](https://github.com/intel/intel-extension-for-pytorch/commit/bd034e7a37822f84706f0068ec85d989fb766529), [#3841](https://github.com/intel/intel-extension-for-pytorch/commit/7d4f297ecb4c076586a22908ecadf4689cb2d5ef)
17+
- Fix beam search accuracy issue in workgroup reduce [#3796](https://github.com/intel/intel-extension-for-pytorch/commit/f2f20a523ee85ed1f44c7fa6465b8e5e1e2edfea)
18+
- Support int32 index tensor in index operator [#3808](https://github.com/intel/intel-extension-for-pytorch/commit/f7bb4873c0416a9f56d1f7ecfbcdbe7ad58b47cd)
19+
- Add deepspeed in LLM dockerfile [#3829](https://github.com/intel/intel-extension-for-pytorch/commit/6266f89833f8010d6c683f9b45cfb2031575ad92)
20+
- Fix batch norm accuracy issue [#3882](https://github.com/intel/intel-extension-for-pytorch/commit/a1e2271717ff61dc3ea7d8d471c2356b3e469b93)
21+
- Prebuilt wheel dockerfile update [#3887](https://github.com/intel/intel-extension-for-pytorch/commit/8d5d71522910c1f622dac6a52cb0025e469774b2#diff-022fb5910f470cc5c44ab38cb20586d014f37c06ac8f3378e146ed35ee202a46), [#3970](https://github.com/intel/intel-extension-for-pytorch/commit/54b8171940cd694ba91c928c99acc440c9993881)
22+
- Fix windows build failure with Intel® oneMKL 2024.1 in torch_patches [#18](https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.1.20/torch_patches/0018-use-ONEMKL_LIBRARIES-for-mkl-libs-in-torch-to-not-ov.patch)
23+
- Fix FFT core dump issue with Intel® oneMKL 2024.1 in torch_patches [#20](https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.1.20/torch_patches/0020-Hide-MKL-symbols-211-212.patch), [#21](https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.1.20/torch_patches/0021-Fix-Windows-Build-214-215.patch)
24+
25+
### Known Issues
26+
27+
Please refer to [Known Issues webpage](./performance_tuning/known_issues.md).
28+
429
## 2.1.10+xpu
530

631
Intel® Extension for PyTorch\* v2.1.10+xpu is the new Intel® Extension for PyTorch\* release supports both CPU platforms and GPU platforms (Intel® Data Center GPU Flex Series, Intel® Data Center GPU Max Series and Intel® Arc™ A-Series Graphics) based on PyTorch\* 2.1.0. It extends PyTorch\* 2.1.0 with up-to-date features and optimizations on `xpu` for an extra performance boost on Intel hardware. Optimizations take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, through PyTorch* `xpu` device, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs with PyTorch*.

0 commit comments

Comments
 (0)