Skip to content

Conversation

@LucasWilkinson
Copy link
Collaborator

@LucasWilkinson LucasWilkinson commented Nov 19, 2025

Fix vllm serve deepseek-ai/DeepSeek-R1 --speculative-config '{"method": "mtp", "num_speculative_tokens": 2}' -tp 8 falsely asserting; should only assert when sequence parallelism is enabled. (Introduced in #28315)

Main

vllm serve meta-llama/Meta-Llama-3-8B-Instruct -tp 4 --port 3333 --speculative-config '{"method": "eagle", "model": "yuhuili/EAGLE-LLaMA3-Instruct-8B", "num_speculative_tokens": 2}'
...
RuntimeError: Worker failed with error 'Can't determine cudagraph shapes that are both a multiple of 3 (num_speculative_tokens + 1) required by spec-decode and 4 (tensor_parallel_size) required by sequence parallelism please adjust num_speculative_tokens or disable sequence parallelism'

PR

Boots fine

@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where an assertion related to sequence parallelism was incorrectly triggered even when sequence parallelism was disabled. The change correctly adds a check for self.pass_config.enable_sequence_parallelism before applying constraints on cudagraph capture sizes related to tensor_parallel_size. This ensures that the logic is only applied when sequence parallelism is active, resolving the false assertion. The fix is correct and well-targeted.

@LucasWilkinson LucasWilkinson changed the title [BugFix] Fix false assertion with MTP=2 and TP=8 [BugFix] Fix false assertion with spec-decode and TP>2 Nov 19, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson changed the title [BugFix] Fix false assertion with spec-decode and TP>2 [BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 Nov 19, 2025
@vllm-bot vllm-bot merged commit 8f4f77a into vllm-project:main Nov 19, 2025
10 of 17 checks passed
@MatthewBonanni MatthewBonanni deleted the lwilkinson/mtp-fix branch November 19, 2025 21:44
khluu pushed a commit that referenced this pull request Nov 19, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
(cherry picked from commit 8f4f77a)
Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025
LuminolT pushed a commit to LuminolT/vllm that referenced this pull request Nov 21, 2025
…-project#29036)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: LuminolT <lumischen01@gmail.com>
bigPYJ1151 pushed a commit that referenced this pull request Nov 25, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants