[Draft][Quantization][Feature] Add AWQ quantization in vllm-ascend. #4316

menogrey · 2025-11-20T13:11:33Z

What this PR does / why we need it?

Add AWQ quantization in vllm-ascend. Most of the code refer to sglang implement: sgl-project/sglang#10158 , and new quantization adaptation refer to compressed tensor: #4036 .

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

gemini-code-assist

Code Review

This pull request introduces support for AWQ quantization in vllm-ascend. The changes are well-structured, adding the necessary configurations and Ascend-specific implementations for AWQ, including linear and MoE layers. A key improvement is the added robustness in AscendRMSNorm to handle different quantization configurations without crashing. My review has identified a couple of redundant function calls within the new npu_fused_experts function that should be removed to improve performance.

gemini-code-assist · 2025-11-20T13:13:19Z

vllm_ascend/quantization/awq/awq.py

+    # gmm1: gate_up_proj
+    hidden_states, pertoken_scale = torch_npu.npu_dynamic_quant(hidden_states)
+    if not use_wna16:
+        hidden_states, pertoken_scale = torch_npu.npu_dynamic_quant(hidden_states)


The call to torch_npu.npu_dynamic_quant() on this line is redundant, as it's already been called unconditionally on line 79. This duplicate call is unnecessary and negatively impacts performance. It should be removed. A similar issue is present on line 108.

gemini-code-assist · 2025-11-20T13:13:19Z

vllm_ascend/quantization/awq/awq.py

+    hidden_states = torch_npu.npu_swiglu(hidden_states)
+    hidden_states, pertoken_scale = torch_npu.npu_dynamic_quant(hidden_states)
+    if not use_wna16:
+        hidden_states, pertoken_scale = torch_npu.npu_dynamic_quant(hidden_states)


Similar to the issue on line 81, this call to torch_npu.npu_dynamic_quant() is redundant because it was already called on line 106. Please remove this unnecessary duplicate call to avoid performance degradation.

github-actions · 2025-11-20T13:17:05Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-11-21T07:06:51Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao · 2025-11-26T07:58:21Z

@paulyu12 this pr implement AWQ quantization, and now it is under testing, just at you to take a look

menogrey · 2025-11-26T08:01:08Z

Validation on this issue #4378

github-actions · 2025-11-28T06:12:15Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-11-28T06:14:02Z

Please rebase to main now.

Signed-off-by: menogrey <1299267905@qq.com>

DeepSeek-V3.1-AWQ. Signed-off-by: menogrey <1299267905@qq.com>

Signed-off-by: menogrey <1299267905@qq.com>

menogrey marked this pull request as draft November 20, 2025 13:12

gemini-code-assist bot reviewed Nov 20, 2025

View reviewed changes

github-actions bot added module:ops module:core module:quantization labels Nov 20, 2025

github-actions bot added the merge-conflicts label Nov 21, 2025

menogrey force-pushed the add_awq_quant2 branch from d1be882 to 9fe852c Compare November 25, 2025 09:01

menogrey mentioned this pull request Nov 26, 2025

[Feature]: Add AWQ quantization support for vllm-ascend #4378

Open

menogrey force-pushed the add_awq_quant2 branch from 4e14b12 to 4faf918 Compare November 27, 2025 07:25

github-actions bot added module:tests and removed merge-conflicts labels Nov 27, 2025

menogrey force-pushed the add_awq_quant2 branch from 44fdc96 to 5953663 Compare November 27, 2025 13:18

github-actions bot added the merge-conflicts label Nov 28, 2025

menogrey force-pushed the add_awq_quant2 branch from 5953663 to c04232e Compare November 28, 2025 07:48

menogrey added 8 commits November 29, 2025 01:59

[Quantization][Feature] Add AWQ quantization in vllm-ascend.

171c055

Signed-off-by: menogrey <1299267905@qq.com>

[Quantization] Fix AWQ FusedMoE layer skipping, for some models like

71d8716

DeepSeek-V3.1-AWQ. Signed-off-by: menogrey <1299267905@qq.com>

[Quantization] Fix AWQ supported dtypes, add BF16.

61cd924

Signed-off-by: menogrey <1299267905@qq.com>

[Quantization] Remove unused import.

67e2a09

Signed-off-by: menogrey <1299267905@qq.com>

Fix UT test and add AWQ model CI.

832441e

Signed-off-by: menogrey <1299267905@qq.com>

Fix lint check issue.

74db337

Signed-off-by: menogrey <1299267905@qq.com>

Add AWQ quantization UT testcase.

7e7d8c9

Signed-off-by: menogrey <1299267905@qq.com>

Optimize quant_config testcase.

ce953d2

Signed-off-by: menogrey <1299267905@qq.com>

menogrey force-pushed the add_awq_quant2 branch from 7251f58 to ce953d2 Compare November 29, 2025 03:17

github-actions bot removed the merge-conflicts label Nov 29, 2025

menogrey added 2 commits November 29, 2025 04:00

Fix AWQ moe exception after upgrade.

49ff60c

Signed-off-by: menogrey <1299267905@qq.com>

Fix lint check issue.

fa8b76b

Signed-off-by: menogrey <1299267905@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft][Quantization][Feature] Add AWQ quantization in vllm-ascend. #4316

[Draft][Quantization][Feature] Add AWQ quantization in vllm-ascend. #4316

menogrey commented Nov 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Uh oh!

gemini-code-assist bot Nov 20, 2025

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

MengqingCao commented Nov 26, 2025

Uh oh!

menogrey commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

wangxiyuan commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Draft][Quantization][Feature] Add AWQ quantization in vllm-ascend. #4316

Are you sure you want to change the base?

[Draft][Quantization][Feature] Add AWQ quantization in vllm-ascend. #4316

Conversation

menogrey commented Nov 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

MengqingCao commented Nov 26, 2025

Uh oh!

menogrey commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

wangxiyuan commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

menogrey commented Nov 20, 2025 •

edited by github-actions bot

Loading