[refactor]support gatingtopk operator generalization #4356

1092626063 · 2025-11-22T07:54:20Z

What this PR does / why we need it?

This pr is cherry-pick from : #2958 and #4340

Past：
npu_moe_gating_top_k can only support 'group_count=256' pattern

Now：
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of torch_npu.npu_moe_gating_top_k_softmax are included in torch_npu.npu_moe_gating_top_k

CANN: depends on 8.3.RC1

Performance：

GLM4.5-w8a8, TPS improve 6%
Qwen3, the same as before

Does this PR introduce any user-facing change?

How was this patch tested?

github-actions · 2025-11-22T07:54:28Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for a fused gatingtopk MoE operator. The changes introduce a check function, _check_npu_moe_gating_top_k, to determine if the fused operator can be used, and refactors the expert selection logic to use this check. The tests have also been updated to reflect these changes.

My review has identified a critical bug in _check_npu_moe_gating_top_k (and its duplicated version in the test file) that could lead to a ZeroDivisionError. The order of checks is incorrect, using num_expert_group as a divisor before validating it. Please see the detailed comments for the fix.

gemini-code-assist · 2025-11-22T07:56:31Z

tests/e2e/singlecard/ops/test_fused_moe.py

+    if top_k < 1 or \
+        top_k > (hidden_states.shape[-1] / (num_expert_group * topk_group)):
+        return False
+    if topk_group < 1 or topk_group > num_expert_group:
+        return False
+    if topk_group * hidden_states.shape[-1] / num_expert_group < top_k:
+        return False
+    if not (num_expert_group > 0 and hidden_states.shape[-1] % num_expert_group
+            == 0 and hidden_states.shape[-1] // num_expert_group > 2):
+        return False


This function has a potential ZeroDivisionError. The variable num_expert_group is used as a divisor on lines 100 and 104, but it is only checked to be greater than 0 on line 106. If num_expert_group is 0, this will cause a crash. To fix this, the check for num_expert_group > 0 should be performed before any division by it. It would also be better to import and use the _check_npu_moe_gating_top_k function from vllm_ascend.ops.moe.experts_selector to avoid code duplication and ensure consistency.

Suggested change

if top_k < 1 or \

top_k > (hidden_states.shape[-1] / (num_expert_group * topk_group)):

return False

if topk_group < 1 or topk_group > num_expert_group:

return False

if topk_group * hidden_states.shape[-1] / num_expert_group < top_k:

return False

if not (num_expert_group > 0 and hidden_states.shape[-1] % num_expert_group

== 0 and hidden_states.shape[-1] // num_expert_group > 2):

return False

if not (num_expert_group > 0 and hidden_states.shape[-1] % num_expert_group

== 0 and hidden_states.shape[-1] // num_expert_group > 2):

return False

if top_k < 1 or \

top_k > (hidden_states.shape[-1] / (num_expert_group * topk_group)):

return False

if topk_group < 1 or topk_group > num_expert_group:

return False

if topk_group * hidden_states.shape[-1] / num_expert_group < top_k:

return False

gemini-code-assist · 2025-11-22T07:56:31Z

vllm_ascend/ops/moe/experts_selector.py

+    if top_k < 1 or \
+        top_k > (hidden_states.shape[-1] / (num_expert_group * topk_group)):
+        return False
+    if topk_group < 1 or topk_group > num_expert_group:
+        return False
+    if topk_group * hidden_states.shape[-1] / num_expert_group < top_k:
+        return False
+    if not (num_expert_group > 0 and hidden_states.shape[-1] % num_expert_group
+            == 0 and hidden_states.shape[-1] // num_expert_group > 2):
+        return False


There is a potential ZeroDivisionError in this function. The variable num_expert_group is used as a divisor on lines 115 and 119 before it is checked to be greater than 0 on line 121. If num_expert_group is 0, the code will crash. The check for num_expert_group > 0 should be moved before it is used in any division operations to prevent this.

Suggested change

if top_k < 1 or \

top_k > (hidden_states.shape[-1] / (num_expert_group * topk_group)):

return False

if topk_group < 1 or topk_group > num_expert_group:

return False

if topk_group * hidden_states.shape[-1] / num_expert_group < top_k:

return False

if not (num_expert_group > 0 and hidden_states.shape[-1] % num_expert_group

== 0 and hidden_states.shape[-1] // num_expert_group > 2):

return False

if not (num_expert_group > 0 and hidden_states.shape[-1] % num_expert_group

== 0 and hidden_states.shape[-1] // num_expert_group > 2):

return False

if top_k < 1 or \

top_k > (hidden_states.shape[-1] / (num_expert_group * topk_group)):

return False

if topk_group < 1 or topk_group > num_expert_group:

return False

if topk_group * hidden_states.shape[-1] / num_expert_group < top_k:

return False

…lm-project#4050) ### What this PR does / why we need it? pick from : vllm-project#2958 Past： npu_moe_gating_top_k can only support 'group_count=256' pattern Now： 1、npu_moe_gating_top_k support all size of group_count 2、the functionality of `torch_npu.npu_moe_gating_top_k_softmax` are included in `torch_npu.npu_moe_gating_top_k` CANN: depends on 8.3.RC1 Performance： 1. GLM4.5-w8a8, TPS improve 6% 2. Qwen3, the same as before Signed-off-by: 1092626063 <1092626063@qq.com>

Signed-off-by: 1092626063 <1092626063@qq.com>

github-actions bot added module:tests module:ops module:core labels Nov 22, 2025

gemini-code-assist bot reviewed Nov 22, 2025

View reviewed changes

1092626063 changed the title ~~support gatingtopk~~ [refactor]support gatingtopk operator generalization Nov 22, 2025

1092626063 force-pushed the gatingtopk0110 branch 4 times, most recently from dc446ff to d517c57 Compare November 24, 2025 08:52

wangxiyuan approved these changes Nov 24, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Nov 24, 2025

1092626063 force-pushed the gatingtopk0110 branch 3 times, most recently from b3ad464 to 4e6de1e Compare December 3, 2025 01:55

1092626063 added 2 commits December 4, 2025 14:24

ops gatingtopk fix nightly ci error

02a1a9e

Signed-off-by: 1092626063 <1092626063@qq.com>

1092626063 force-pushed the gatingtopk0110 branch from 4e6de1e to 02a1a9e Compare December 4, 2025 06:24

wangxiyuan merged commit c4a11a7 into vllm-project:v0.11.0-dev Dec 4, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[refactor]support gatingtopk operator generalization #4356

[refactor]support gatingtopk operator generalization #4356

1092626063 commented Nov 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 22, 2025

Uh oh!

gemini-code-assist bot Nov 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[refactor]support gatingtopk operator generalization #4356

[refactor]support gatingtopk operator generalization #4356

Conversation

1092626063 commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1092626063 commented Nov 22, 2025 •

edited

Loading