intel
diff --git a/‎README.md‎
Lines changed: 12 additions & 5 deletions b/‎README.md‎
Lines changed: 12 additions & 5 deletions
diff --git a/‎docs/images/block_diagram_of_xeon.png‎
146 KB b/‎docs/images/block_diagram_of_xeon.png‎
146 KB
diff --git a/‎docs/images/llm/llm_iakv_2.png‎
31.1 KB b/‎docs/images/llm/llm_iakv_2.png‎
31.1 KB
diff --git a/‎docs/images/llm/llm_kvcache.png‎
32.1 KB b/‎docs/images/llm/llm_kvcache.png‎
32.1 KB
diff --git a/‎docs/images/typical_two_socket_configuration.png‎
30.7 KB b/‎docs/images/typical_two_socket_configuration.png‎
30.7 KB
diff --git a/‎docs/index.rst‎
Lines changed: 2 additions & 1 deletion b/‎docs/index.rst‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/tutorials/api_doc.rst‎
Lines changed: 5 additions & 0 deletions b/‎docs/tutorials/api_doc.rst‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/tutorials/contribution.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/tutorials/contribution.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/tutorials/examples.md‎
Lines changed: 24 additions & 14 deletions b/‎docs/tutorials/examples.md‎
Lines changed: 24 additions & 14 deletions
diff --git a/‎docs/tutorials/features.rst‎
Lines changed: 55 additions & 3 deletions b/‎docs/tutorials/features.rst‎
Lines changed: 55 additions & 3 deletions
@@ -14,6 +14,8 @@ Intel® Extension for PyTorch\* provides optimizations for both eager mode and g
 
 The extension can be loaded as a Python module for Python programs or linked as a C++ library for C++ programs. In Python scripts users can enable it dynamically by importing `intel_extension_for_pytorch`.
 
+In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLMs are introduced in the Intel® Extension for PyTorch\*.
+
 * Check [CPU tutorial](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/) for detailed information of Intel® Extension for PyTorch\* for Intel® CPUs. Source code is available at the [main branch](https://github.com/intel/intel-extension-for-pytorch/tree/main).
 * Check [GPU tutorial](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/) for detailed information of Intel® Extension for PyTorch\* for Intel® GPUs. Source code is available at the [xpu-main branch](https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main).
 
@@ -24,29 +26,34 @@ The extension can be loaded as a Python module for Python programs or linked as
 You can use either of the following 2 commands to install Intel® Extension for PyTorch\* CPU version.
 
 ```bash
-python -m pip install intel_extension_for_pytorch
-python -m pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-stable-cpu
+python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+python -m pip install intel-extension-for-pytorch --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
+# for PRC user, you can check with the following link
+python -m pip install intel-extension-for-pytorch --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/cn/
 ```
 
 **Note:** Intel® Extension for PyTorch\* has PyTorch version requirement. Please check more detailed information via the URL below.
 
 More installation methods can be found at [CPU Installation Guide](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
 
-Compilation instruction of the latest CPU code base `main` branch can be found at [Installation Guide](https://github.com/intel/intel-extension-for-pytorch/blob/main/docs/tutorials/installation.md#install-via-compiling-from-source).
+Compilation instruction of the latest CPU code base `main` branch can be found in the session Package `source` at [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
 
 ### GPU version
 
 You can install Intel® Extension for PyTorch\* for GPU via command below.
 
 ```bash
-python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 intel_extension_for_pytorch==2.1.10+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
+python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+# for PRC user, you can check with the following link
+python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
+
 ```
 
 **Note:** The patched PyTorch 2.1.0 is required to work with Intel® Extension for PyTorch\* on Intel® graphics card for now.
 
 More installation methods can be found at [GPU Installation Guide](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html).
 
-Compilation instruction of the latest GPU code base `xpu-main` branch can be found at [Installation Guide For Linux/WSL2](https://github.com/intel/intel-extension-for-pytorch/blob/xpu-main/docs/tutorials/installations/linux.rst#install-via-compiling-from-source) and [Installation Guide For Windows](https://github.com/intel/intel-extension-for-pytorch/blob/xpu-main/docs/tutorials/installations/windows.rst#install-via-compiling-from-source).
+Compilation instruction of the latest GPU code base `xpu-main` branch can be found in the session Package `source` at [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
 
 ## Getting Started
 
 
@@ -10,7 +10,7 @@ Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel®
 Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* ``xpu`` device.
 
 In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain 
-LLM models are introduced in the Intel® Extension for PyTorch*. For more information on LLM optimizations, refer to the `Large Language Models (LLM) <llm.html>`_ section.
+Large Language Models (LLMs) are introduced in the Intel® Extension for PyTorch*. For more information on LLM optimizations, refer to the `Large Language Models (LLMs) <./tutorials/llm.html>`_ section.
 
 The extension can be loaded as a Python module for Python programs or linked as a C++ library for C++ programs. In Python scripts, users can enable it dynamically by importing ``intel_extension_for_pytorch``.
 
@@ -58,6 +58,7 @@ The team tracks bugs and enhancement requests using `GitHub issues <https://gith
 
    tutorials/introduction
    tutorials/features
+   Large Language Models (LLM)<tutorials/llm>
    tutorials/technical_details
    tutorials/releases
    tutorials/performance_tuning/known_issues
 
@@ -43,6 +43,11 @@ Miscellaneous
 .. autofunction:: quantization._gptq
 .. autofunction:: fp8_autocast
 
+.. currentmodule:: intel_extension_for_pytorch.xpu.fp8.fp8
+.. autofunction:: fp8_autocast
+.. currentmodule:: intel_extension_for_pytorch.quantization
+.. autofunction:: _gptq
+
 Random Number Generator
 =======================
 
 
@@ -16,7 +16,7 @@ Once you implement and test your feature or bug-fix, submit a Pull Request to ht
 
 ## Developing Intel® Extension for PyTorch\* on XPU
 
-A full set of instructions on installing Intel® Extension for PyTorch\* from source is in the [Installation document](installation.md#install-via-source-compilation).
+A full set of instructions on installing Intel® Extension for PyTorch\* from source is in the [Installation document](../../../index.html#installation?platform=gpu&version=v2.1.10%2Bxpu).
 
 To develop on your machine, here are some tips:
 
 
@@ -187,6 +187,15 @@ The example code below works for all data types.
 
 ### Basic Usage
 
+**Download and Install cppsdk**
+
+Ensure you have download and install cppsdk in the [installation page](https://intel.github.io/intel-extension-for-pytorch/index.html#installation) before compiling the cpp code.
+
+1. Go to [installation page](https://intel.github.io/intel-extension-for-pytorch/index.html#installation)
+2. Select the desired Platform & Version & OS
+3. In the package part, select cppsdk
+4. Follow the instructions in the cppsdk installation page to download and install cppsdk into libtorch.
+
 **example-app.cpp**
 
 [//]: # (marker_cppsdk_sample_app)
@@ -206,21 +215,22 @@ $ cd build
 $ CC=icx CXX=icpx cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> ..
 $ make
 ```
+The <LIBPYTORCH_PATH> is the absolute path of libtorch we install at the first step.
 
 If *Found IPEX* is shown as dynamic library paths, the extension was linked into the binary. This can be verified with the Linux command *ldd*.
 
 ```bash
 $ CC=icx CXX=icpx cmake -DCMAKE_PREFIX_PATH=/workspace/libtorch ..
--- The C compiler identification is IntelLLVM 2023.2.0
--- The CXX compiler identification is IntelLLVM 2023.2.0
+-- The C compiler identification is IntelLLVM 2024.0.0
+-- The CXX compiler identification is IntelLLVM 2024.0.0
 -- Detecting C compiler ABI info
 -- Detecting C compiler ABI info - done
--- Check for working C compiler: /workspace/intel/oneapi/compiler/2023.2.0/linux/bin/icx - skipped
+-- Check for working C compiler: /workspace/intel/oneapi/compiler/2024.0.0/linux/bin/icx - skipped
 -- Detecting C compile features
 -- Detecting C compile features - done
 -- Detecting CXX compiler ABI info
 -- Detecting CXX compiler ABI info - done
--- Check for working CXX compiler: /workspace/intel/oneapi/compiler/2023.2.0/linux/bin/icpx - skipped
+-- Check for working CXX compiler: /workspace/intel/oneapi/compiler/2024.0.0/linux/bin/icpx - skipped
 -- Detecting CXX compile features
 -- Detecting CXX compile features - done
 -- Looking for pthread.h
@@ -242,16 +252,16 @@ $ ldd example-app
         libintel-ext-pt-cpu.so => /workspace/libtorch/lib/libintel-ext-pt-cpu.so (0x00007fd5a1a1b000)
         libintel-ext-pt-gpu.so => /workspace/libtorch/lib/libintel-ext-pt-gpu.so (0x00007fd5862b0000)
         ...
-        libmkl_intel_lp64.so.2 => /workspace/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_intel_lp64.so.2 (0x00007fd584ab0000)
-        libmkl_core.so.2 => /workspace/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_core.so.2 (0x00007fd5806cc000)
-        libmkl_gnu_thread.so.2 => /workspace/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_gnu_thread.so.2 (0x00007fd57eb1d000)
-        libmkl_sycl.so.3 => /workspace/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_sycl.so.3 (0x00007fd55512c000)
-        libOpenCL.so.1 => /workspace/intel/oneapi/compiler/2023.2.0/linux/lib/libOpenCL.so.1 (0x00007fd55511d000)
-        libsvml.so => /workspace/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin/libsvml.so (0x00007fd553b11000)
-        libirng.so => /workspace/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin/libirng.so (0x00007fd553600000)
-        libimf.so => /workspace/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin/libimf.so (0x00007fd55321b000)
-        libintlc.so.5 => /workspace/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007fd553a9c000)
-        libsycl.so.6 => /workspace/intel/oneapi/compiler/2023.2.0/linux/lib/libsycl.so.6 (0x00007fd552f36000)
+        libmkl_intel_lp64.so.2 => /workspace/intel/oneapi/mkl/2024.0.0/lib/intel64/libmkl_intel_lp64.so.2 (0x00007fd584ab0000)
+        libmkl_core.so.2 => /workspace/intel/oneapi/mkl/2024.0.0/lib/intel64/libmkl_core.so.2 (0x00007fd5806cc000)
+        libmkl_gnu_thread.so.2 => /workspace/intel/oneapi/mkl/2024.0.0/lib/intel64/libmkl_gnu_thread.so.2 (0x00007fd57eb1d000)
+        libmkl_sycl.so.3 => /workspace/intel/oneapi/mkl/2024.0.0/lib/intel64/libmkl_sycl.so.3 (0x00007fd55512c000)
+        libOpenCL.so.1 => /workspace/intel/oneapi/compiler/2024.0.0/linux/lib/libOpenCL.so.1 (0x00007fd55511d000)
+        libsvml.so => /workspace/intel/oneapi/compiler/2024.0.0/linux/compiler/lib/intel64_lin/libsvml.so (0x00007fd553b11000)
+        libirng.so => /workspace/intel/oneapi/compiler/2024.0.0/linux/compiler/lib/intel64_lin/libirng.so (0x00007fd553600000)
+        libimf.so => /workspace/intel/oneapi/compiler/2024.0.0/linux/compiler/lib/intel64_lin/libimf.so (0x00007fd55321b000)
+        libintlc.so.5 => /workspace/intel/oneapi/compiler/2024.0.0/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007fd553a9c000)
+        libsycl.so.6 => /workspace/intel/oneapi/compiler/2024.0.0/linux/lib/libsycl.so.6 (0x00007fd552f36000)
         ...
 ```
 
 
@@ -50,16 +50,17 @@ Intel® Extension for PyTorch* provides built-in INT8 quantization recipes to de
 
 Check more detailed information for `INT8 Quantization [CPU] <features/int8_overview.md>`_ and `INT8 recipe tuning API guide (Experimental, *NEW feature in 1.13.0* on CPU) <features/int8_recipe_tuning_api.md>`_ on CPU side.
 
-On Intel® GPUs, quantization usages follow PyTorch default quantization APIs. Check sample codes at `Examples <./examples.html#int8>`_ page.
+Check more detailed information for `INT8 Quantization [XPU] <features/int8_overview_xpu.md>`_. 
 
-Intel® Extension for PyTorch* also provides INT4 and FP8 Quantization.  Check more detailed information for `FP8 Quantization <./features/float8.md>`_ and `INT4 Quantization <./features/int4.md>`_ 
+On Intel® GPUs, Intel® Extension for PyTorch* also provides INT4 and FP8 Quantization.  Check more detailed information for `FP8 Quantization <./features/float8.md>`_ and `INT4 Quantization <./features/int4.md>`_ 
 
 .. toctree::
    :hidden:
    :maxdepth: 1
 
    features/int8_overview
    features/int8_recipe_tuning_api
+   features/int8_overview_xpu
    features/int4
    features/float8
 
@@ -108,20 +109,45 @@ Check the `API Documentation`_ for the details of API functions. `DPC++ Extensio
 
    features/DPC++_Extension
 
-
 Advanced Configuration
 ----------------------
 
 The default settings for Intel® Extension for PyTorch* are sufficient for most use cases. However, if you need to customize Intel® Extension for PyTorch*, advanced configuration is available at build time and runtime.
 
 For more detailed information, check `Advanced Configuration <features/advanced_configuration.md>`_.
 
+A driver environment variable `ZE_FLAT_DEVICE_HIERARCHY` is currently used to select the device hierarchy model with which the underlying hardware is exposed. By default, each GPU tile is used as a device. Check the `Level Zero Specification Documentation <https://spec.oneapi.io/level-zero/latest/core/PROG.html#environment-variables>`_ for more details.
+
 .. toctree::
    :hidden:
    :maxdepth: 1
 
    features/advanced_configuration
 
+Fully Sharded Data Parallel (FSDP)
+----------------------------------
+
+`Fully Sharded Data Parallel (FSDP)` is a PyTorch\* module that provides industry-grade solution for large model training. FSDP is a type of data parallel training, unlike DDP, where each process/worker maintains a replica of the model, FSDP shards model parameters, optimizer states and gradients across DDP ranks to reduce the GPU memory footprint used in training. This makes the training of some large-scale models feasible.
+
+For more detailed information, check `FSDP <features/FSDP.md>`_.
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+
+   features/FSDP
+
+Inductor
+--------
+Intel® Extension for PyTorch\* now empowers users to seamlessly harness graph compilation capabilities for optimal PyTorch model performance on Intel GPU via the flagship `torch.compile <https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile>`_ API through the default "inductor" backend (`TorchInductor <https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747/1>`_ ). 
+
+For more detailed information, check `Inductor <features/torch_compile_gpu.md>`_.
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+
+   features/torch_compile_gpu
 
 Legacy Profiler Tool (Experimental)
 -----------------------------------
@@ -149,6 +175,32 @@ For more detailed information, check `Simple Trace Tool <features/simple_trace.m
 
    features/simple_trace
 
+Kineto Supported Profiler Tool (Experimental)
+---------------------------------------------
+
+The Kineto supported profiler tool is an extension of PyTorch\* profiler for profiling operators' executing time cost on GPU devices. With this tool, you can get information in many fields of the run models or code scripts. Build Intel® Extension for PyTorch\* with Kineto support as default and enable this tool using the `with` statement before the code segment.
+
+For more detailed information, check `Profiler Kineto <features/profiler_kineto.md>`_.
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+
+   features/profiler_kineto
+
+
+Compute Engine (Experimental feature for debug)
+-----------------------------------------------
+
+Compute engine is a experimental feature which provides the capacity to choose specific backend for operators with multiple implementations.
+
+For more detailed information, check `Compute Engine <features/compute_engine.md>`_.
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+
+   features/compute_engine
 
 CPU-Specific
 ************