-
Notifications
You must be signed in to change notification settings - Fork 621
add Qwen2.5-VL tutorials #4364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add Qwen2.5-VL tutorials #4364
Conversation
Signed-off-by: shifan <609471158@qq.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new README file with instructions for running the Qwen2.5-VL-32B model. The documentation is comprehensive, but I've found a couple of critical issues in the example commands that would prevent users from running them successfully. Specifically, there's a port mismatch between the server and client commands, and an inconsistent command-line argument. My review includes suggestions to fix these issues.
| Once your server is started, you can query the model with input prompts: | ||
|
|
||
| ```shell | ||
| curl http://localhost:8000/v1/chat/completions \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vllm serve command is configured to listen on port 8888, but this curl command is attempting to connect to port 8000. This will cause a connection failure. The port in this command should be updated to match the server's port.
| curl http://localhost:8000/v1/chat/completions \ | |
| curl http://localhost:8888/v1/chat/completions \ |
| --quantization ascend \ | ||
| --async-scheduling \ | ||
| --tensor-parallel-size 2 \ | ||
| --max_model_len 15000 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command-line argument --max_model_len uses an underscore. For consistency with other arguments (e.g., --tensor-parallel-size) and standard vLLM CLI usage, it should be --max-model-len with a hyphen. This will prevent potential errors and confusion for users.
| --max_model_len 15000 \ | |
| --max-model-len 15000 \ |
Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
Signed-off-by: shifan <609471158@qq.com>
docs/source/tutorials/index.md
Outdated
| multi_npu_moge | ||
| multi_npu_qwen3_moe | ||
| multi_npu_quantization | ||
| multi_npu_qwen2.5_vl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename to “Qwen2.5-VL”, such as "single node", "multi nodes" deployment should add in this readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
| @@ -0,0 +1,165 @@ | |||
| # Multi-NPU (Qwen2.5-VL-32B-Instruct-W8A8) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the title to "Qwen2.5-VL"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
Signed-off-by: shifan <609471158@qq.com>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?