[Fix] Remove unnecessary NPU synchronization in MTP proposer #4325

yiz-liu · 2025-11-21T03:39:56Z

What this PR does / why we need it?

Remove unnecessary NPU synchronization in MTP proposer to improve performances.

Removing this synchronization point improves pipeline efficiency by allowing for better overlap between CPU and NPU operations. A more proper one is already implemented in #4233

Does this PR introduce any user-facing change?

None.

How was this patch tested?

None.

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

…formances. Removing this synchronization point improves pipeline efficiency by allowing for better overlap between CPU and NPU operations.. A more proper one is already implemented in vllm-project#4233. Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

yiz-liu · 2025-11-21T03:40:54Z

This is a follow up of former discussion.

gemini-code-assist

Code Review

This pull request proposes removing a torch.npu.synchronize() call within the speculative decoding loop in MtpProposer. This change aims to enhance performance by allowing greater overlap between CPU-based metadata preparation and NPU computation. Based on my analysis, the explicit synchronization is redundant because data dependencies are managed by the NPU stream, and a more suitable synchronization point is already in place for graph-based execution. Therefore, this modification is a sound performance optimization, and I have not identified any potential correctness issues.

github-actions · 2025-11-21T04:12:25Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copilot

Pull Request Overview

This PR removes an unnecessary NPU synchronization call from the MTP proposer to improve pipeline efficiency by enabling better overlap between CPU and NPU operations. The PR description indicates that a more appropriate synchronization mechanism is already implemented in PR #4233.

Removed torch.npu.synchronize() call from the metadata preparation loop in the _propose method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MengqingCao

LGTM

yiz-liu · 2025-11-24T06:07:00Z

There was an irrlevant lint error.

…oject#4325) ### What this PR does / why we need it? Remove unnecessary NPU synchronization in MTP proposer to improve performances. Removing this synchronization point improves pipeline efficiency by allowing for better overlap between CPU and NPU operations. A more proper one is already implemented in vllm-project#4233 ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: nsdie <yeyifan@huawei.com>

…oject#4325) ### What this PR does / why we need it? Remove unnecessary NPU synchronization in MTP proposer to improve performances. Removing this synchronization point improves pipeline efficiency by allowing for better overlap between CPU and NPU operations. A more proper one is already implemented in vllm-project#4233 ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

Copilot AI review requested due to automatic review settings November 21, 2025 03:39

gemini-code-assist bot reviewed Nov 21, 2025

View reviewed changes

Copilot started reviewing on behalf of yiz-liu November 21, 2025 04:16 View session

Copilot finished reviewing on behalf of yiz-liu November 21, 2025 04:18

Copilot AI reviewed Nov 21, 2025

View reviewed changes

yiz-liu added ready read for review ready-for-test start test by label for PR labels Nov 21, 2025

Merge branch 'vllm-project:main' into fix-mtp

1ca7d50

MengqingCao approved these changes Nov 24, 2025

View reviewed changes

yiz-liu merged commit 9799934 into vllm-project:main Nov 24, 2025
24 of 25 checks passed

yiz-liu deleted the fix-mtp branch November 24, 2025 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] Remove unnecessary NPU synchronization in MTP proposer #4325

[Fix] Remove unnecessary NPU synchronization in MTP proposer #4325

yiz-liu commented Nov 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

yiz-liu commented Nov 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

MengqingCao left a comment

Uh oh!

yiz-liu commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Fix] Remove unnecessary NPU synchronization in MTP proposer #4325

[Fix] Remove unnecessary NPU synchronization in MTP proposer #4325

Conversation

yiz-liu commented Nov 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

yiz-liu commented Nov 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

MengqingCao left a comment

Choose a reason for hiding this comment

Uh oh!

yiz-liu commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yiz-liu commented Nov 21, 2025 •

edited by github-actions bot

Loading