Skip to content

Commit e91e80b

Browse files
authored
Remove cpu doc (#3923)
1 parent 833e441 commit e91e80b

24 files changed

+89
-2430
lines changed
59.9 KB
Loading
48.7 KB
Loading
58.4 KB
Loading
44.7 KB
Loading

docs/index.rst

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Intel® Extension for PyTorch* has been released as an open–source project at
2727
You can find more information about the product at:
2828

2929
- `Features <https://intel.github.io/intel-extension-for-pytorch/gpu/latest/tutorials/features>`_
30-
- `Performance <https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance>`_
30+
- `Performance <./tutorials/performance.html>`_
3131

3232
Architecture
3333
------------
@@ -59,6 +59,7 @@ The team tracks bugs and enhancement requests using `GitHub issues <https://gith
5959
tutorials/introduction
6060
tutorials/features
6161
Large Language Models (LLM)<tutorials/llm>
62+
tutorials/performance
6263
tutorials/technical_details
6364
tutorials/releases
6465
tutorials/performance_tuning/known_issues
@@ -82,15 +83,6 @@ The team tracks bugs and enhancement requests using `GitHub issues <https://gith
8283

8384
tutorials/api_doc
8485

85-
.. toctree::
86-
:maxdepth: 3
87-
:caption: PERFORMANCE TUNING
88-
:hidden:
89-
90-
tutorials/performance_tuning/tuning_guide
91-
tutorials/performance_tuning/launch_script
92-
tutorials/performance_tuning/torchserve
93-
9486
.. toctree::
9587
:maxdepth: 3
9688
:caption: CONTRIBUTING GUIDE

docs/tutorials/api_doc.rst

Lines changed: 0 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -105,39 +105,3 @@ C++ API
105105

106106
.. doxygenfunction:: xpu::get_queue_from_stream
107107

108-
109-
CPU-Specific
110-
************
111-
112-
Miscellaneous
113-
=============
114-
115-
.. currentmodule:: intel_extension_for_pytorch
116-
.. autofunction:: enable_onednn_fusion
117-
118-
Quantization (Prototype)
119-
============
120-
121-
.. automodule:: intel_extension_for_pytorch.quantization
122-
.. autofunction:: prepare
123-
.. autofunction:: convert
124-
125-
Introduction is avaiable at `feature page <./features/int8_recipe_tuning_api.md>`_.
126-
127-
.. autofunction:: autotune
128-
129-
CPU Runtime
130-
===========
131-
132-
.. automodule:: intel_extension_for_pytorch.cpu.runtime
133-
.. autofunction:: is_runtime_ext_enabled
134-
.. autoclass:: CPUPool
135-
.. autoclass:: pin
136-
.. autoclass:: MultiStreamModuleHint
137-
.. autoclass:: MultiStreamModule
138-
.. autoclass:: Task
139-
.. autofunction:: get_core_list_of_node_id
140-
141-
.. .. automodule:: intel_extension_for_pytorch.quantization
142-
.. :members:
143-

docs/tutorials/features.rst

Lines changed: 5 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -29,26 +29,22 @@ Intel® Extension for PyTorch* automatically converts a model to channels last m
2929
Auto Mixed Precision (AMP)
3030
--------------------------
3131

32-
Benefiting from less memory usage and computation, low precision data types typically speed up both training and inference workloads. Furthermore, accelerated by Intel® native hardware instructions, including Intel® Deep Learning Boost (Intel® DL Boost) on the 3rd Generation Xeon® Scalable Processors (aka Cooper Lake), as well as the Intel® Advanced Matrix Extensions (Intel® AMX) instruction set on the 4th next generation of Intel® Xeon® Scalable Processors (aka Sapphire Rapids), low precision data type, bfloat 16 and float16, provide further boosted performance. We recommend to use AMP for accelerating convolutional and matmul based neural networks.
32+
Benefiting from less memory usage and computation, low precision data types typically speed up both training and inference workloads.
33+
On GPU side, support of BFloat16 and Float16 are both available in Intel® Extension for PyTorch\*. BFloat16 is the default low precision floating data type when AMP is enabled.
3334

34-
The support of Auto Mixed Precision (AMP) with `BFloat16 on CPU <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-deep-learning-boost-new-instruction-bfloat16.html>`_ and BFloat16 optimization of operators has been enabled in Intel® Extension for PyTorch\*, and partially upstreamed to PyTorch master branch. These optimizations will be landed in PyTorch master through PRs that are being submitted and reviewed. On GPU side, support of BFloat16 and Float16 are both available in Intel® Extension for PyTorch\*. BFloat16 is the default low precision floating data type when AMP is enabled.
35-
36-
Detailed information of AMP for GPU and CPU are available at `Auto Mixed Precision (AMP) on GPU <features/amp_gpu.md>`_ and `Auto Mixed Precision (AMP) on CPU <features/amp_cpu.md>`_ respectively.
35+
Detailed information of AMP for GPU are available at `Auto Mixed Precision (AMP) on GPU <features/amp_gpu.md>`_.
3736

3837
.. toctree::
3938
:hidden:
4039
:maxdepth: 1
4140

42-
features/amp_cpu
4341
features/amp_gpu
4442

4543

4644
Quantization
4745
------------
4846

49-
Intel® Extension for PyTorch* provides built-in INT8 quantization recipes to deliver good statistical accuracy for most popular DL workloads including CNN, NLP and recommendation models on CPU side. On top of that, if users would like to tune for a higher accuracy than what the default recipe provides, a recipe tuning API powered by Intel® Neural Compressor is provided for users to try.
50-
51-
Check more detailed information for `INT8 Quantization [CPU] <features/int8_overview.md>`_ and `INT8 recipe tuning API guide (Prototype, *NEW feature in 1.13.0* on CPU) <features/int8_recipe_tuning_api.md>`_ on CPU side.
47+
Intel® Extension for PyTorch* currently supports imperative mode and TorchScript mode for post-training static quantization on GPU. This section illustrates the quantization workflow on Intel GPUs.
5248

5349
Check more detailed information for `INT8 Quantization [XPU] <features/int8_overview_xpu.md>`_.
5450

@@ -58,8 +54,6 @@ On Intel® GPUs, Intel® Extension for PyTorch* also provides INT4 and FP8 Quant
5854
:hidden:
5955
:maxdepth: 1
6056

61-
features/int8_overview
62-
features/int8_recipe_tuning_api
6357
features/int8_overview_xpu
6458
features/int4
6559
features/float8
@@ -68,7 +62,7 @@ On Intel® GPUs, Intel® Extension for PyTorch* also provides INT4 and FP8 Quant
6862
Distributed Training
6963
--------------------
7064

71-
To meet demands of large scale model training over multiple devices, distributed training on Intel® GPUs and CPUs are supported. Two alternative methodologies are available. Users can choose either to use PyTorch native distributed training module, `Distributed Data Parallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`_, with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support via `Intel® oneCCL Bindings for PyTorch (formerly known as torch_ccl) <https://github.com/intel/torch-ccl>`_ or use Horovod with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support (Prototype).
65+
To meet demands of large scale model training over multiple devices, distributed training on Intel® GPUs is supported. Two alternative methodologies are available. Users can choose either to use PyTorch native distributed training module, `Distributed Data Parallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`_, with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support via `Intel® oneCCL Bindings for PyTorch (formerly known as torch_ccl) <https://github.com/intel/torch-ccl>`_ or use Horovod with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support (Prototype).
7266

7367
For more detailed information, check `DDP <features/DDP.md>`_ and `Horovod (Prototype) <features/horovod.md>`_.
7468

@@ -202,81 +196,4 @@ For more detailed information, check `Compute Engine <features/compute_engine.md
202196

203197
features/compute_engine
204198

205-
CPU-Specific
206-
************
207-
208-
Operator Optimization
209-
---------------------
210-
211-
Intel® Extension for PyTorch* also optimizes operators and implements several customized operators for performance boosts. A few ATen operators are replaced by their optimized counterparts in Intel® Extension for PyTorch* via the ATen registration mechanism. Some customized operators are implemented for several popular topologies. For instance, ROIAlign and NMS are defined in Mask R-CNN. To improve performance of these topologies, Intel® Extension for PyTorch* also optimized these customized operators.
212-
213-
.. currentmodule:: intel_extension_for_pytorch.nn
214-
.. autoclass:: FrozenBatchNorm2d
215-
216-
.. currentmodule:: intel_extension_for_pytorch.nn.functional
217-
.. autofunction:: interaction
218-
219-
**Auto kernel selection** is a feature that enables users to tune for better performance with GEMM operations. It is provided as parameter –auto_kernel_selection, with boolean value, of the ipex.optimize() function. By default, the GEMM kernel is computed with oneMKL primitives. However, under certain circumstances oneDNN primitives run faster. Users are able to set –auto_kernel_selection to True to run GEMM kernels with oneDNN primitives.” -> "We aim to provide good default performance by leveraging the best of math libraries and enabled weights_prepack, and it has been verified with broad set of models. If you would like to try other alternatives, you can use auto_kernel_selection toggle in ipex.optimize to switch, and you can disable weights_preack in ipex.optimize if you are concerning the memory footprint more than performance gain. However in majority cases, keeping default is what we recommend.
220-
221-
222-
Runtime Extension
223-
-----------------
224-
225-
Intel® Extension for PyTorch* Runtime Extension provides PyTorch frontend APIs for users to get finer-grained control of the thread runtime and provides:
226-
227-
- Multi-stream inference via the Python frontend module MultiStreamModule.
228-
- Spawn asynchronous tasks from both Python and C++ frontend.
229-
- Program core bindings for OpenMP threads from both Python and C++ frontend.
230-
231-
.. note:: Intel® Extension for PyTorch* Runtime extension is still in the prototype stage. The API is subject to change. More detailed descriptions are available in the `API Documentation <api_doc.html>`_.
232-
233-
For more detailed information, check `Runtime Extension <features/runtime_extension.md>`_.
234-
235-
.. toctree::
236-
:hidden:
237-
:maxdepth: 1
238-
239-
features/runtime_extension
240-
241-
242-
Codeless Optimization (Prototype, *NEW feature in 1.13.\**)
243-
--------------------------------------------------------------
244-
245-
This feature enables users to get performance benefits from Intel® Extension for PyTorch* without changing Python scripts. It hopefully eases the usage and has been verified working well with broad scope of models, though in few cases there could be small overhead comparing to applying optimizations with Intel® Extension for PyTorch* APIs.
246-
247-
For more detailed information, check `Codeless Optimization <features/codeless_optimization.md>`_.
248-
249-
.. toctree::
250-
:hidden:
251-
:maxdepth: 1
252-
253-
features/codeless_optimization.md
254-
255-
256-
Graph Capture (Prototype, *NEW feature in 1.13.0\**)
257-
-------------------------------------------------------
258-
259-
Since graph mode is key for deployment performance, this feature automatically captures graphs based on set of technologies that PyTorch supports, such as TorchScript and TorchDynamo. Users won't need to learn and try different PyTorch APIs to capture graphs, instead, they can turn on a new boolean flag `--graph_mode` (default off) in `ipex.optimize` to get the best of graph optimization.
260-
261-
For more detailed information, check `Graph Capture <features/graph_capture.md>`_.
262-
263-
.. toctree::
264-
:hidden:
265-
:maxdepth: 1
266-
267-
features/graph_capture
268-
269-
270-
HyperTune (Prototype, *NEW feature in 1.13.0\**)
271-
---------------------------------------------------
272-
273-
HyperTune is an prototype feature to perform hyperparameter/execution configuration searching. The searching is used in various areas such as optimization of hyperparameters of deep learning models. The searching is extremely useful in real situations when the number of hyperparameters, including configuration of script execution, and their search spaces are huge that manually tuning these hyperparameters/configuration is impractical and time consuming. Hypertune automates this process of execution configuration searching for the `launcher <performance_tuning/launch_script.md>`_ and Intel® Extension for PyTorch*.
274-
275-
For more detailed information, check `HyperTune <features/hypertune.md>`_.
276-
277-
.. toctree::
278-
:hidden:
279-
:maxdepth: 1
280-
281-
features/hypertune
282199

docs/tutorials/features/codeless_optimization.md

Lines changed: 0 additions & 107 deletions
This file was deleted.

docs/tutorials/features/graph_capture.md

Lines changed: 0 additions & 12 deletions
This file was deleted.

0 commit comments

Comments
 (0)