Skip to content

Conversation

@yiz-liu
Copy link
Collaborator

@yiz-liu yiz-liu commented Nov 21, 2025

What this PR does / why we need it?

Remove unnecessary NPU synchronization in MTP proposer to improve performances.

Removing this synchronization point improves pipeline efficiency by allowing for better overlap between CPU and NPU operations. A more proper one is already implemented in #4233

Does this PR introduce any user-facing change?

None.

How was this patch tested?

None.

…formances.

Removing this synchronization point improves pipeline efficiency by allowing for better overlap between CPU and NPU operations.. A more proper one is already implemented in vllm-project#4233.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Copilot AI review requested due to automatic review settings November 21, 2025 03:39
@yiz-liu
Copy link
Collaborator Author

yiz-liu commented Nov 21, 2025

This is a follow up of former discussion.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request proposes removing a torch.npu.synchronize() call within the speculative decoding loop in MtpProposer. This change aims to enhance performance by allowing greater overlap between CPU-based metadata preparation and NPU computation. Based on my analysis, the explicit synchronization is redundant because data dependencies are managed by the NPU stream, and a more suitable synchronization point is already in place for graph-based execution. Therefore, this modification is a sound performance optimization, and I have not identified any potential correctness issues.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copilot finished reviewing on behalf of yiz-liu November 21, 2025 04:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes an unnecessary NPU synchronization call from the MTP proposer to improve pipeline efficiency by enabling better overlap between CPU and NPU operations. The PR description indicates that a more appropriate synchronization mechanism is already implemented in PR #4233.

  • Removed torch.npu.synchronize() call from the metadata preparation loop in the _propose method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yiz-liu yiz-liu added ready read for review ready-for-test start test by label for PR labels Nov 21, 2025
Copy link
Collaborator

@MengqingCao MengqingCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiz-liu
Copy link
Collaborator Author

yiz-liu commented Nov 24, 2025

There was an irrlevant lint error.

@yiz-liu yiz-liu merged commit 9799934 into vllm-project:main Nov 24, 2025
24 of 25 checks passed
@yiz-liu yiz-liu deleted the fix-mtp branch November 24, 2025 06:07
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…oject#4325)

### What this PR does / why we need it?
Remove unnecessary NPU synchronization in MTP proposer to improve
performances.

Removing this synchronization point improves pipeline efficiency by
allowing for better overlap between CPU and NPU operations. A more
proper one is already implemented in vllm-project#4233

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…oject#4325)

### What this PR does / why we need it?
Remove unnecessary NPU synchronization in MTP proposer to improve
performances.

Removing this synchronization point improves pipeline efficiency by
allowing for better overlap between CPU and NPU operations. A more
proper one is already implemented in vllm-project#4233

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…oject#4325)

### What this PR does / why we need it?
Remove unnecessary NPU synchronization in MTP proposer to improve
performances.

Removing this synchronization point improves pipeline efficiency by
allowing for better overlap between CPU and NPU operations. A more
proper one is already implemented in vllm-project#4233

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…oject#4325)

### What this PR does / why we need it?
Remove unnecessary NPU synchronization in MTP proposer to improve
performances.

Removing this synchronization point improves pipeline efficiency by
allowing for better overlap between CPU and NPU operations. A more
proper one is already implemented in vllm-project#4233

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…oject#4325)

### What this PR does / why we need it?
Remove unnecessary NPU synchronization in MTP proposer to improve
performances.

Removing this synchronization point improves pipeline efficiency by
allowing for better overlap between CPU and NPU operations. A more
proper one is already implemented in vllm-project#4233

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Nov 29, 2025
…oject#4325)

### What this PR does / why we need it?
Remove unnecessary NPU synchronization in MTP proposer to improve
performances.

Removing this synchronization point improves pipeline efficiency by
allowing for better overlap between CPU and NPU operations. A more
proper one is already implemented in vllm-project#4233

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.


- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants