You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* bug fix in llm env activate scripts
* remove llm training and revert back example training
* update env activate script for bitsandbytes example
* remove llama7b/13b in run_accuracy scripts
* add README for bitsandbytes
* specific the client gpu validated
---------
Co-authored-by: Zheng, Zhaoqiong <zhaoqiong.zheng@intel.com>
Copy file name to clipboardExpand all lines: examples/gpu/llm/README.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
Here you can find benchmarking scripts for large language models (LLM) text generation. These scripts:
4
4
5
-
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other models such as ChatGLMv3-6B, Baichuan2-13B and Phi3-mini.
5
+
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other models such as Baichuan2-13B and Phi3-mini.
6
6
- Include both single instance and distributed (DeepSpeed) use cases for FP16 optimization.
7
7
- Cover model generation inference with low precision cases for different models with best performance and accuracy (fp16 AMP and weight only quantization)
Here you can find the quantized model lora-finetuning scripts for Llama3.
4
+
5
+
6
+
7
+
## Supported Platforms
8
+
9
+
\* Intel® Data Center GPU Max Series (1550/1100) : support Llama3.1-8B.<br />
10
+
\* Intel® Core™ Ultra Processors with Intel® Arc™ B Series Graphics : support Llama3.2-3B.<br />
11
+
12
+
## Run Models
13
+
14
+
**Note**: During the execution, you may need to log in your Hugging Face account to access model files. Refer to [HuggingFace Login](https://huggingface.co/docs/huggingface_hub/quick-start#login)
15
+
16
+
```
17
+
huggingface-cli login --token <your_token_here>
18
+
```
19
+
20
+
### Environment Set Up
21
+
Set up environment by following [LLM Environment Set Up](../README.md).
22
+
23
+
24
+
### Run Qlora finetuning with quantized model using Bash Script
25
+
26
+
The related code and run script are prepared in the folder. Run all with the one-click bash script `run_qlora_pvc.sh` or `run_qlora_client.sh`:
27
+
28
+
29
+
If you are running on a Data Center Max Series GPU:
30
+
31
+
```
32
+
bash run_qlora_pvc.sh
33
+
```
34
+
35
+
If you are running on a Intel Client GPU:
36
+
37
+
```
38
+
bash run_qlora_client.sh
39
+
```
40
+
41
+
42
+
### Run inference with quantized model
43
+
44
+
```
45
+
# set quant_type and max_new_tokens according to your needs
Copy file name to clipboardExpand all lines: examples/gpu/llm/fine-tuning/Phi3/README.md
-78Lines changed: 0 additions & 78 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,29 +43,6 @@ python phi3_ft.py \
43
43
44
44
#### Fine-tuning on single card
45
45
46
-
Example: Phi-3 Mini 4k full fine-tuning on single card. The default dataset `financial_phrasebank` is loaded in `phi3_ft.py`.
47
-
48
-
```bash
49
-
export TORCH_LLM_ALLREDUCE=1
50
-
51
-
export model="microsoft/Phi-3-mini-4k-instruct"
52
-
53
-
python phi3_ft.py \
54
-
--model_name_or_path ${model} \
55
-
--use_flashattn False \
56
-
--custom_mp True \
57
-
--max_seq_length 128 \
58
-
--output_dir="output" \
59
-
--evaluation_strategy="epoch" \
60
-
--learning_rate=1e-3 \
61
-
--auto_find_batch_size=True \
62
-
--num_train_epochs=1 \
63
-
--save_steps=500 \
64
-
--logging_steps=1 \
65
-
--save_total_limit=8
66
-
```
67
-
68
-
69
46
Example: Phi-3 Mini 4k LoRA fine-tuning on single card. The default dataset `financial_phrasebank` is loaded in `phi3_ft.py`.
70
47
71
48
```bash
@@ -95,61 +72,6 @@ python phi3_ft.py \
95
72
The default `fsdp_config.yml` is set with 1 machine with 4 cards 8 tiles, If you are using different setting, please change the `num_processes: 8` accordingly. For example, to use 8 cards 16 tiles, the line in `fsdp_config.yml` should be changed to `num_processes: 16`.
Copy file name to clipboardExpand all lines: examples/gpu/llm/fine-tuning/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,7 +54,7 @@ Here we mainly focus on the memory-constrained fine-tuning on single GPU, and pr
54
54
55
55
### Profile the finetuning
56
56
57
-
For profiling the process of finetuning, Apply the `patches/transformers.patch` to transformers v4.41.2 and set the following VARIABLE before finetuning.
57
+
For profiling the process of finetuning, Apply the `patches/transformers.patch` to transformers v4.44.2 and set the following VARIABLE before finetuning.
0 commit comments