From 95fb7086d031909c159663dba403193c6fbebd4d Mon Sep 17 00:00:00 2001
From: shifan <609471158@qq.com>
Date: Sun, 23 Nov 2025 17:41:02 +0800
Subject: [PATCH 1/8] add Qwen2.5-VL README

Signed-off-by: shifan <609471158@qq.com>
---
 docs/source/tutorials/Qwen2.5-VL-32B.md | 146 ++++++++++++++++++++++++
 1 file changed, 146 insertions(+)
 create mode 100644 docs/source/tutorials/Qwen2.5-VL-32B.md

diff --git a/docs/source/tutorials/Qwen2.5-VL-32B.md b/docs/source/tutorials/Qwen2.5-VL-32B.md
new file mode 100644
index 00000000000..3f28a817bfd
--- /dev/null
+++ b/docs/source/tutorials/Qwen2.5-VL-32B.md
@@ -0,0 +1,146 @@
+# Multi-NPU (Qwen2.5-VL-32B-Instruct)
+
+## Introduction
+
+Key Enhancements:
+- Understand things visually: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
+
+- Being agentic: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use.
+
+- Understanding long videos and capturing events: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments.
+
+- Capable of visual localization in different formats: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes.
+
+- Generating structured outputs: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc.
+
+This document will demonstrate the main validation steps of the model, including supported features, feature configuration, environment preparation, single-node deployment, as well as accuracy and performance evaluation.
+
+## Supported Features
+
+Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
+
+Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
+
+## Environment Preparation
+
+### Model Weight
+
+- A sample Qwen2.5-VL quantization script can be found in the modelslim code repository. [Qwen2.5-VL Quantization Script Example](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)
+
+- `Qwen2.5-VL-32B-Instruct-w8a8`(Quantized version): require 1 Atlas 800 A2 (64G × 8) node. 
+
+It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
+
+### Verify Multi-node Communication(Optional)
+
+If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
+
+
+## Deployment
+### Run docker container
+```shell
+export IMAGE=quay.io/ascend/vllm-ascend:0.11.0rc1
+docker run --rm \
+--shm-size=1g \
+--net=host \
+--name vllm-ascend-qwen25_VL \
+--device /dev/davinci0 \
+--device /dev/davinci1 \
+--device /dev/davinci_manager \
+--device /dev/devmm_svm \
+--device /dev/hisi_hdc \
+-v /usr/local/dcmi:/usr/local/dcmi \
+-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+-v /etc/ascend_install.info:/etc/ascend_install.info \
+-v /root/.cache:/root/.cache \
+-v /data:/data \
+-it $IMAGE bash
+```
+
+### Single-node Deployment
+
+Run the following script to execute online inference. Recommend two NPU cards for deploying the Qwen2.5-VL-32B-Instruct-w8a8 model.
+
+```shell
+#!/bin/sh
+# apt install libjemalloc2 or yum install jemalloc
+export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
+export HCCL_OP_EXPANSION_MODE="AIV"
+export VLLM_USE_V1=1
+
+vllm serve /data/Qwen2.5-VL-32B-Instruct-w8a8 \
+    --host 0.0.0.0 \
+    --port 8888 \
+    --served-model-name qwen25_vl \
+    --quantization ascend \
+    --async-scheduling \
+    --tensor-parallel-size 2 \
+    --max_model_len 15000 \
+    --max-num-batched-tokens 30000 \
+    --max-num-seqs 30 \
+    --no-enable-prefix-caching \
+    --trust-remote-code \
+    --additional-config '{"enable_weight_nz_layout":true}'
+
+```
+
+
+### Prefill-Decode Disaggregation
+
+Not supported yet.
+
+## Functional Verification
+
+Once your server is started, you can query the model with input prompts:
+
+```shell
+curl http://localhost:8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+    "model": "qwen25_vl",
+    "messages": [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": [
+        {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},
+        {"type": "text", "text": "What is the text in the illustrate?"}
+    ]}
+    ]
+    }'
+```
+
+## Accuracy Evaluation
+
+Here are two accuracy evaluation methods.
+
+### Using AISBench
+
+1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
+
+
+## Performance
+
+### Using AISBench
+
+Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
+
+### Using vLLM Benchmark
+
+Run performance evaluation of `Qwen2.5-VL-32B-Instruct-w8a8` as an example.
+
+Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
+
+There are three `vllm bench` subcommand:
+- `latency`: Benchmark the latency of a single batch of requests.
+- `serve`: Benchmark the online serving throughput.
+- `throughput`: Benchmark offline inference throughput.
+
+Take the `serve` as an example. Run the code as follows.
+
+```shell
+export VLLM_USE_MODELSCOPE=true
+vllm bench serve --model /data/Qwen2.5-VL-32B-Instruct-w8a8  --dataset-name random --random-input 200 --num-prompt 200 --request-rate 1 --save-result --result-dir ./
+```
+
+After about several minutes, you can get the performance evaluation result.

From 06107f0fe3606df32509bda4b0b198c908fa74d9 Mon Sep 17 00:00:00 2001
From: shifan <609471158@qq.com>
Date: Tue, 25 Nov 2025 21:30:00 +0800
Subject: [PATCH 2/8] update Qwen2.5-VL README

Signed-off-by: shifan <609471158@qq.com>
---
 docs/source/tutorials/Qwen2.5-VL-32B.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/tutorials/Qwen2.5-VL-32B.md b/docs/source/tutorials/Qwen2.5-VL-32B.md
index 3f28a817bfd..0e6b50305a4 100644
--- a/docs/source/tutorials/Qwen2.5-VL-32B.md
+++ b/docs/source/tutorials/Qwen2.5-VL-32B.md
@@ -77,7 +77,7 @@ vllm serve /data/Qwen2.5-VL-32B-Instruct-w8a8 \
     --quantization ascend \
     --async-scheduling \
     --tensor-parallel-size 2 \
-    --max_model_len 15000 \
+    --max-model-len 15000 \
     --max-num-batched-tokens 30000 \
     --max-num-seqs 30 \
     --no-enable-prefix-caching \

From ff0c8eeaaff3fc99597c6713745929e71302b832 Mon Sep 17 00:00:00 2001
From: shifan <609471158@qq.com>
Date: Sat, 29 Nov 2025 14:57:48 +0800
Subject: [PATCH 3/8] add multi_npu_qwen2.5_vl tutorials

Signed-off-by: shifan <609471158@qq.com>
---
 docs/source/tutorials/index.md                      |  1 +
 .../{Qwen2.5-VL-32B.md => multi_npu_qwen2.5_vl.md}  | 13 ++++---------
 2 files changed, 5 insertions(+), 9 deletions(-)
 rename docs/source/tutorials/{Qwen2.5-VL-32B.md => multi_npu_qwen2.5_vl.md} (96%)

diff --git a/docs/source/tutorials/index.md b/docs/source/tutorials/index.md
index 321ec22d9cc..450e7fbb803 100644
--- a/docs/source/tutorials/index.md
+++ b/docs/source/tutorials/index.md
@@ -15,6 +15,7 @@ multi_npu
 multi_npu_moge
 multi_npu_qwen3_moe
 multi_npu_quantization
+multi_npu_qwen2.5_vl
 single_node_300i
 DeepSeek-V3.2-Exp.md
 multi_node
diff --git a/docs/source/tutorials/Qwen2.5-VL-32B.md b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
similarity index 96%
rename from docs/source/tutorials/Qwen2.5-VL-32B.md
rename to docs/source/tutorials/multi_npu_qwen2.5_vl.md
index 0e6b50305a4..89f07b92ab4 100644
--- a/docs/source/tutorials/Qwen2.5-VL-32B.md
+++ b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
@@ -1,4 +1,4 @@
-# Multi-NPU (Qwen2.5-VL-32B-Instruct)
+# Multi-NPU (Qwen2.5-VL-32B-Instruct-W8A8)
 
 ## Introduction
 
@@ -27,7 +27,7 @@ Refer to [feature guide](../user_guide/feature_guide/index.md) to get the featur
 
 - A sample Qwen2.5-VL quantization script can be found in the modelslim code repository. [Qwen2.5-VL Quantization Script Example](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)
 
-- `Qwen2.5-VL-32B-Instruct-w8a8`(Quantized version): require 1 Atlas 800 A2 (64G × 8) node. 
+- `Qwen2.5-VL-32B-Instruct-w8a8`(Quantized version): require 1 Atlas 800 A2 (64G × 8) node.
 
 It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
 
@@ -77,8 +77,8 @@ vllm serve /data/Qwen2.5-VL-32B-Instruct-w8a8 \
     --quantization ascend \
     --async-scheduling \
     --tensor-parallel-size 2 \
-    --max-model-len 15000 \
-    --max-num-batched-tokens 30000 \
+    --max-model-len 30000 \
+    --max-num-batched-tokens 50000 \
     --max-num-seqs 30 \
     --no-enable-prefix-caching \
     --trust-remote-code \
@@ -86,11 +86,6 @@ vllm serve /data/Qwen2.5-VL-32B-Instruct-w8a8 \
 
 ```
 
-
-### Prefill-Decode Disaggregation
-
-Not supported yet.
-
 ## Functional Verification
 
 Once your server is started, you can query the model with input prompts:

From b408c6d081eae17acc9d7e63444240f1e45f3aea Mon Sep 17 00:00:00 2001
From: shifan <609471158@qq.com>
Date: Sat, 29 Nov 2025 15:16:10 +0800
Subject: [PATCH 4/8] update multi_npu_qwen2.5_vl tutorials

Signed-off-by: shifan <609471158@qq.com>
---
 docs/source/tutorials/multi_npu_qwen2.5_vl.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/tutorials/multi_npu_qwen2.5_vl.md b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
index 89f07b92ab4..8c6b913058d 100644
--- a/docs/source/tutorials/multi_npu_qwen2.5_vl.md
+++ b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
@@ -7,7 +7,7 @@ Key Enhancements:
 
 - Being agentic: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use.
 
-- Understanding long videos and capturing events: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments.
+- Understanding long videos and capturing events: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of capturing event by pinpointing the relevant video segments.
 
 - Capable of visual localization in different formats: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes.
 

From 747be253fe9f65ca959803051424afd661dee2a3 Mon Sep 17 00:00:00 2001
From: shifan <609471158@qq.com>
Date: Sat, 29 Nov 2025 15:53:26 +0800
Subject: [PATCH 5/8] update multi_npu_qwen2.5_vl tutorials

Signed-off-by: shifan <609471158@qq.com>
---
 docs/source/tutorials/multi_npu_qwen2.5_vl.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/docs/source/tutorials/multi_npu_qwen2.5_vl.md b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
index 8c6b913058d..6a3a177f502 100644
--- a/docs/source/tutorials/multi_npu_qwen2.5_vl.md
+++ b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
@@ -35,9 +35,9 @@ It is recommended to download the model weight to the shared directory of multip
 
 If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
 
-
 ## Deployment
 ### Run docker container
+
 ```shell
 export IMAGE=quay.io/ascend/vllm-ascend:0.11.0rc1
 docker run --rm \
@@ -113,7 +113,6 @@ Here are two accuracy evaluation methods.
 
 1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
 
-
 ## Performance
 
 ### Using AISBench

From 028f6c029a177245332cbbbc12815f29a5406b86 Mon Sep 17 00:00:00 2001
From: shifan <609471158@qq.com>
Date: Sat, 29 Nov 2025 16:28:49 +0800
Subject: [PATCH 6/8] modify multi_npu_qwen2.5_vl tutorials

Signed-off-by: shifan <609471158@qq.com>
---
 docs/source/tutorials/multi_npu_qwen2.5_vl.md | 33 ++++++++++++++++---
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/docs/source/tutorials/multi_npu_qwen2.5_vl.md b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
index 6a3a177f502..8b193f35b19 100644
--- a/docs/source/tutorials/multi_npu_qwen2.5_vl.md
+++ b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
@@ -15,6 +15,10 @@ Key Enhancements:
 
 This document will demonstrate the main validation steps of the model, including supported features, feature configuration, environment preparation, single-node deployment, as well as accuracy and performance evaluation.
 
+## **Attention**
+
+This example requires version **v0.11.0rc1**.Earlier versions may lack certain features.
+
 ## Supported Features
 
 Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
@@ -36,14 +40,21 @@ It is recommended to download the model weight to the shared directory of multip
 If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
 
 ## Deployment
+
+The specific example scenario is as follows:
+- The machine environment is an Atlas 800 A2 (64G*8)
+- The LLM is Qwen2.5-VL-32B-Instruct-W8A8
+
 ### Run docker container
 
-```shell
-export IMAGE=quay.io/ascend/vllm-ascend:0.11.0rc1
+```{code-block} bash
+   :substitutions:
+# Update the vllm-ascend image
+export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
 docker run --rm \
 --shm-size=1g \
 --net=host \
---name vllm-ascend-qwen25_VL \
+--name vllm-ascend \
 --device /dev/davinci0 \
 --device /dev/davinci1 \
 --device /dev/davinci_manager \
@@ -65,9 +76,21 @@ Run the following script to execute online inference. Recommend two NPU cards fo
 
 ```shell
 #!/bin/sh
-# apt install libjemalloc2 or yum install jemalloc
-export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
+# if os is Ubuntu
+apt install libjemalloc2 
+# if os is openEuler
+yum install jemalloc
+# Add the LD_PRELOAD environment variable
+if [ -f /usr/lib/aarch64-linux-gnu/libjemalloc.so.2 ]; then
+    # On Ubuntu, first install with `apt install libjemalloc2`
+    export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
+elif [ -f /usr/lib64/libjemalloc.so.2 ]; then
+    # On openEuler, first install with `yum install jemalloc`
+    export LD_PRELOAD=/usr/lib64/libjemalloc.so.2:$LD_PRELOAD
+fi
+# Enable the AIVector core to directly schedule ROCE communication
 export HCCL_OP_EXPANSION_MODE="AIV"
+# Set vLLM to Engine V1
 export VLLM_USE_V1=1
 
 vllm serve /data/Qwen2.5-VL-32B-Instruct-w8a8 \

From 858b362c7d84e5391c812b6daa66d4285be32f05 Mon Sep 17 00:00:00 2001
From: shifan <609471158@qq.com>
Date: Sat, 29 Nov 2025 16:40:20 +0800
Subject: [PATCH 7/8] modify multi_npu_qwen2.5_vl tutorials

Signed-off-by: shifan <609471158@qq.com>
---
 docs/source/tutorials/multi_npu_qwen2.5_vl.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/source/tutorials/multi_npu_qwen2.5_vl.md b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
index 8b193f35b19..b3ebc688121 100644
--- a/docs/source/tutorials/multi_npu_qwen2.5_vl.md
+++ b/docs/source/tutorials/multi_npu_qwen2.5_vl.md
@@ -29,6 +29,8 @@ Refer to [feature guide](../user_guide/feature_guide/index.md) to get the featur
 
 ### Model Weight
 
+- `Qwen2.5-VL-32B-Instruct`(BF16 version): [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)
+
 - A sample Qwen2.5-VL quantization script can be found in the modelslim code repository. [Qwen2.5-VL Quantization Script Example](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)
 
 - `Qwen2.5-VL-32B-Instruct-w8a8`(Quantized version): require 1 Atlas 800 A2 (64G × 8) node.

From 2d8e42e6f1bd93996d743fb8c7be35d83f404777 Mon Sep 17 00:00:00 2001
From: shifan <609471158@qq.com>
Date: Tue, 2 Dec 2025 09:55:19 +0800
Subject: [PATCH 8/8] rename to Qwen2.5-VL

Signed-off-by: shifan <609471158@qq.com>
---
 ...{multi_npu_qwen2.5_vl.md => Qwen2.5-VL.md} | 31 ++++++++++++-------
 docs/source/tutorials/index.md                |  2 +-
 2 files changed, 20 insertions(+), 13 deletions(-)
 rename docs/source/tutorials/{multi_npu_qwen2.5_vl.md => Qwen2.5-VL.md} (73%)

diff --git a/docs/source/tutorials/multi_npu_qwen2.5_vl.md b/docs/source/tutorials/Qwen2.5-VL.md
similarity index 73%
rename from docs/source/tutorials/multi_npu_qwen2.5_vl.md
rename to docs/source/tutorials/Qwen2.5-VL.md
index b3ebc688121..7e981c531f6 100644
--- a/docs/source/tutorials/multi_npu_qwen2.5_vl.md
+++ b/docs/source/tutorials/Qwen2.5-VL.md
@@ -1,17 +1,18 @@
-# Multi-NPU (Qwen2.5-VL-32B-Instruct-W8A8)
+# Qwen2.5-VL
 
 ## Introduction
 
-Key Enhancements:
-- Understand things visually: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
+The key features include:
 
-- Being agentic: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use.
+- **Understand things visually**: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
 
-- Understanding long videos and capturing events: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of capturing event by pinpointing the relevant video segments.
+- **Being agentic**: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use.
 
-- Capable of visual localization in different formats: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes.
+- **Understanding long videos and capturing events**: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of capturing event by pinpointing the relevant video segments.
 
-- Generating structured outputs: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc.
+- **Capable of visual localization in different formats**: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes.
+
+- **Generating structured outputs**: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc.
 
 This document will demonstrate the main validation steps of the model, including supported features, feature configuration, environment preparation, single-node deployment, as well as accuracy and performance evaluation.
 
@@ -29,11 +30,17 @@ Refer to [feature guide](../user_guide/feature_guide/index.md) to get the featur
 
 ### Model Weight
 
-- `Qwen2.5-VL-32B-Instruct`(BF16 version): [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)
+- `Qwen2.5-VL-3B-Instruct`(BF16 version): require 1 Atlas 800I A2 (64G × 8) node. [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)
 
-- A sample Qwen2.5-VL quantization script can be found in the modelslim code repository. [Qwen2.5-VL Quantization Script Example](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)
+- `Qwen2.5-VL-7B-Instruct`(BF16 version): require 1 Atlas 800I A2 (64G × 8) node. [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)
+
+- `Qwen2.5-VL-32B-Instruct`(BF16 version): require 1 Atlas 800I A2 (64G × 8) node. [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)
 
-- `Qwen2.5-VL-32B-Instruct-w8a8`(Quantized version): require 1 Atlas 800 A2 (64G × 8) node.
+- `Qwen2.5-VL-72B-Instruct`(BF16 version): require 1 Atlas 800I A2 (64G × 8) node. [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct)
+
+- `Qwen2.5-VL-32B-Instruct-w8a8`(Quantized version): require 1 Atlas 800I A2 (64G × 8) node.
+
+- A sample Qwen2.5-VL quantization script can be found in the modelslim code repository. [Qwen2.5-VL Quantization Script Example](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)
 
 It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
 
@@ -44,7 +51,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
 ## Deployment
 
 The specific example scenario is as follows:
-- The machine environment is an Atlas 800 A2 (64G*8)
+- The machine environment is an Atlas 800I A2 (64G × 8)
 - The LLM is Qwen2.5-VL-32B-Instruct-W8A8
 
 ### Run docker container
@@ -132,7 +139,7 @@ curl http://localhost:8000/v1/chat/completions \
 
 ## Accuracy Evaluation
 
-Here are two accuracy evaluation methods.
+Here is one accuracy evaluation methods.
 
 ### Using AISBench
 
diff --git a/docs/source/tutorials/index.md b/docs/source/tutorials/index.md
index 450e7fbb803..71cf136063e 100644
--- a/docs/source/tutorials/index.md
+++ b/docs/source/tutorials/index.md
@@ -15,7 +15,7 @@ multi_npu
 multi_npu_moge
 multi_npu_qwen3_moe
 multi_npu_quantization
-multi_npu_qwen2.5_vl
+Qwen2.5-VL
 single_node_300i
 DeepSeek-V3.2-Exp.md
 multi_node