-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 #29036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug where an assertion related to sequence parallelism was incorrectly triggered even when sequence parallelism was disabled. The change correctly adds a check for self.pass_config.enable_sequence_parallelism before applying constraints on cudagraph capture sizes related to tensor_parallel_size. This ensures that the logic is only applied when sequence parallelism is active, resolving the false assertion. The fix is correct and well-targeted.
100b3fd to
bf20165
Compare
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> (cherry picked from commit 8f4f77a)
…-project#29036) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…-project#29036) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>
…-project#29036) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…-project#29036) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Fix
vllm serve deepseek-ai/DeepSeek-R1 --speculative-config '{"method": "mtp", "num_speculative_tokens": 2}' -tp 8falsely asserting; should only assert when sequence parallelism is enabled. (Introduced in #28315)Main
PR
Boots fine