Skip to content

Conversation

@MrFan-yes
Copy link

@MrFan-yes MrFan-yes commented Nov 23, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: shifan <609471158@qq.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new README file with instructions for running the Qwen2.5-VL-32B model. The documentation is comprehensive, but I've found a couple of critical issues in the example commands that would prevent users from running them successfully. Specifically, there's a port mismatch between the server and client commands, and an inconsistent command-line argument. My review includes suggestions to fix these issues.

Once your server is started, you can query the model with input prompts:

```shell
curl http://localhost:8000/v1/chat/completions \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The vllm serve command is configured to listen on port 8888, but this curl command is attempting to connect to port 8000. This will cause a connection failure. The port in this command should be updated to match the server's port.

Suggested change
curl http://localhost:8000/v1/chat/completions \
curl http://localhost:8888/v1/chat/completions \

--quantization ascend \
--async-scheduling \
--tensor-parallel-size 2 \
--max_model_len 15000 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The command-line argument --max_model_len uses an underscore. For consistency with other arguments (e.g., --tensor-parallel-size) and standard vLLM CLI usage, it should be --max-model-len with a hyphen. This will prevent potential errors and confusion for users.

Suggested change
--max_model_len 15000 \
--max-model-len 15000 \

Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
@MrFan-yes MrFan-yes changed the title add Qwen2.5-VL README add multi_npu_qwen2.5_vl tutorials Nov 29, 2025
Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
multi_npu_moge
multi_npu_qwen3_moe
multi_npu_quantization
multi_npu_qwen2.5_vl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename to “Qwen2.5-VL”, such as "single node", "multi nodes" deployment should add in this readme.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

@@ -0,0 +1,165 @@
# Multi-NPU (Qwen2.5-VL-32B-Instruct-W8A8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the title to "Qwen2.5-VL"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

Signed-off-by: shifan <609471158@qq.com>
@MrFan-yes MrFan-yes changed the title add multi_npu_qwen2.5_vl tutorials add Qwen2.5-VL tutorials Dec 2, 2025
@MrFan-yes MrFan-yes requested a review from 1092626063 December 2, 2025 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants