Skip to content

Commit 211813b

Browse files
authored
update llm readme doc (#2585)
* update llm overview part per Guobing * remove redundant period * correct modelzoo version; add note for Falcon dist. inference availability with ds v0.13.1 * updates for r2.2 * misc updates * add qconfig download links; correct optimized model scope * update model optimized scope desc. and baichuan modelID * update model support status for bloom1b7,baichuan13b,opt30b * update optimized model tables * update specific argument changes for individual models in INT8 WOQ deepspeed inf. * update model scope table in mainpage README.md; update expression and table in llm/README.md * trial for new model scope table * update neox dist. support scope and recipe * add qconfig download link for 2 more models * revert model table in llm.rst. will update by another pr * update llm.rst
1 parent 31f5ced commit 211813b

File tree

9 files changed

+334
-163
lines changed

9 files changed

+334
-163
lines changed

README.md

Lines changed: 28 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -21,36 +21,41 @@ In the current technological landscape, Generative AI (GenAI) workloads and mode
2121

2222
| MODEL FAMILY | MODEL NAME (Huggingface hub) | FP32 | BF16 | Static quantization INT8 | Weight only quantization INT8 | Weight only quantization INT4 |
2323
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
24-
|LLAMA| meta-llama/Llama-2-7b-hf ||||| ☑️ |
25-
|LLAMA| meta-llama/Llama-2-13b-hf ||||| ☑️ |
26-
|LLAMA| meta-llama/Llama-2-70b-hf ||||| ☑️ |
27-
|GPT-J| EleutherAI/gpt-j-6b ||||||
28-
|GPT-NEOX| EleutherAI/gpt-neox-20b ||| ☑️ || ☑️ |
29-
|DOLLY| databricks/dolly-v2-12b ||| ☑️ | ☑️ | ☑️ |
30-
|FALCON| tiiuae/falcon-40b ||||||
31-
|OPT| facebook/opt-30b |||| | ☑️ |
32-
|OPT| facebook/opt-1.3b ||||| ☑️ |
33-
|Bloom| bigscience/bloom-1b7 || ☑️ || | ☑️ |
34-
|CodeGen| Salesforce/codegen-2B-multi ||| ☑️ |||
35-
|Baichuan| baichuan-inc/Baichuan2-7B-Chat ||||| |
36-
|Baichuan| baichuan-inc/Baichuan2-13B-Chat ||| || |
37-
|Baichuan| baichuan-inc/Baichuan-13B-Chat || ☑️ || | |
38-
|ChatGLM| THUDM/chatglm3-6b ||| ☑️ || |
39-
|ChatGLM| THUDM/chatglm2-6b || ☑️ | ☑️ | ☑️ | |
40-
|GPTBigCode| bigcode/starcoder ||| ☑️ || ☑️ |
41-
|T5| google/flan-t5-xl ||| ☑️ || |
42-
|Mistral| mistralai/Mistral-7B-v0.1 ||| ☑️ || ☑️ |
43-
|MPT| mosaicml/mpt-7b ||| ☑️ |||
44-
45-
*Note*: All above models have undergone thorough optimization and verification processes for both performance and accuracy. In the context of the optimized model list table above, the symbol ✅ signifies that the model can achieve an accuracy drop of less than 1% when using a specific data type compared to FP32, whereas the accuracy drop may exceed 1% for ☑️ marked ones. We are working in progress to better support the models in the table with various data types. In addition, more models will be optimized, which will expand the table.
24+
|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
25+
|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
26+
|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
27+
|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
28+
|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟨 | 🟨 | 🟩 | 🟨 |
29+
|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟨 | 🟨 | 🟩 | 🟨 |
30+
|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
31+
|OPT| facebook/opt-30b | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
32+
|OPT| facebook/opt-1.3b | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
33+
|Bloom| bigscience/bloom-1b7 | 🟩 | 🟨 | 🟩 | 🟩 | 🟨 |
34+
|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | 🟨 | 🟩 | 🟩 |
35+
|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | |
36+
|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | |
37+
|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟨 | 🟩 | 🟩 | |
38+
|ChatGLM| THUDM/chatglm3-6b | 🟩 | 🟩 | 🟨 | 🟩 | |
39+
|ChatGLM| THUDM/chatglm2-6b | 🟩 | 🟩 | 🟨 | 🟩 | |
40+
|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | 🟨 | 🟩 | 🟨 |
41+
|T5| google/flan-t5-xl | 🟩 | 🟩 | 🟨 | 🟩 | |
42+
|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | 🟨 | 🟩 | 🟨 |
43+
|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | 🟨 | 🟩 | 🟩 |
44+
45+
- 🟩 signifies that the model can perform well and with good accuracy (<1% difference as compared with FP32).
46+
47+
- 🟨 signifies that the model can perform well while accuracy may not been in a perfect state (>1% difference as compared with FP32).
48+
49+
*Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16).
50+
We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.
4651

4752
## Support
4853

4954
The team tracks bugs and enhancement requests using [GitHub issues](https://github.com/intel/intel-extension-for-pytorch/issues/). Before submitting a suggestion or bug report, search the existing GitHub issues to see if your issue has already been reported.
5055

5156
## Intel® AI Reference Models
5257

53-
Use cases that had already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.2.0-models) (former Model Zoo). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r2.2.0-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running the scripts in the Reference Models.
58+
Use cases that had already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.2-models) (former Model Zoo). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r2.2-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running the scripts in the Reference Models.
5459

5560
## License
5661

docker/Dockerfile.prebuilt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ RUN ${PYTHON} -m pip --no-cache-dir install --upgrade \
2727
# Some TF tools expect a "python" binary
2828
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
2929

30-
ARG IPEX_VERSION=2.1.100
31-
ARG PYTORCH_VERSION=2.1.1
32-
ARG TORCHAUDIO_VERSION=2.1.1
33-
ARG TORCHVISION_VERSION=0.16.1
30+
ARG IPEX_VERSION=2.2.0
31+
ARG PYTORCH_VERSION=2.2.0
32+
ARG TORCHAUDIO_VERSION=2.2.0
33+
ARG TORCHVISION_VERSION=0.17.0
3434
ARG TORCH_CPU_URL=https://download.pytorch.org/whl/cpu/torch_stable.html
3535

3636
RUN \

docs/design_doc/cpu/isa_dyndisp.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ PyTorch & IPEX CPU ISA support statement:
1414
| IPEX-1.12 |||||||||
1515
| IPEX-1.13 |||||||||
1616
| IPEX-2.1 |||||||||
17+
| IPEX-2.2 |||||||||
1718

1819
\* Current IPEX DEFAULT level implemented as same as AVX2 level.
1920

docs/tutorials/features/sq_recipe_tuning_api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Smooth Quant Recipe Tuning API
2-
====================================
1+
Smooth Quant Recipe Tuning API (Experimental)
2+
=============================================
33
Smooth Quantization is a popular method to improve the accuracy of int8 quantization. The [autotune API](../api_doc.html#ipex.quantization.autotune) allows automatic global alpha tuning, and automatic layer-by-layer alpha tuning provided by Intel® Neural Compressor for the best INT8 accuracy.
44

55
SmoothQuant will introduce alpha to calculate the ratio of input and weight updates to reduce quantization error. SmoothQuant arguments are as below:

docs/tutorials/getting_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Quick Start
22

3-
The following instructions assume you have installed the Intel® Extension for PyTorch\*. For installation instructions, refer to [Installation](../../../index.html#installation?platform=cpu&version=v2.1.0%2Bcpu).
3+
The following instructions assume you have installed the Intel® Extension for PyTorch\*. For installation instructions, refer to [Installation](../../../index.html#installation?platform=cpu&version=v2.2.0%2Bcpu).
44

55
To start using the Intel® Extension for PyTorch\* in your code, you need to make the following changes:
66

docs/tutorials/installation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Installation
22
============
33

4-
Select your preferences and follow the installation instructions provided on the [Installation page](../../../index.html#installation?platform=cpu&version=v2.1.0%2Bcpu).
4+
Select your preferences and follow the installation instructions provided on the [Installation page](../../../index.html#installation?platform=cpu&version=v2.2.0%2Bcpu).
55

66
After successful installation, refer to the [Quick Start](getting_started.md) and [Examples](examples.md) sections to start using the extension in your code.
77

8-
**NOTE:** For detailed instructions on installing and setting up the environment for Large Language Models (LLM), as well as example scripts, refer to the [LLM best practices](https://github.com/intel/intel-extension-for-pytorch/tree/v2.1.0%2Bcpu/examples/cpu/inference/python/llm).
8+
**NOTE:** For detailed instructions on installing and setting up the environment for Large Language Models (LLM), as well as example scripts, refer to the [LLM best practices](https://github.com/intel/intel-extension-for-pytorch/tree/v2.2.0%2Bcpu/examples/cpu/inference/python/llm).

0 commit comments

Comments
 (0)