-
Notifications
You must be signed in to change notification settings - Fork 629
[Bugfix] use module-level import for patched function in Qwen3Next #4354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] use module-level import for patched function in Qwen3Next #4354
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a bug related to function patching in specific execution environments. By changing the import of chunk_gated_delta_rule from a direct function import to a module-level import, the code ensures that the function call is resolved at runtime. This allows for monkey-patching to work as intended, preventing the model from holding a reference to an old, unpatched function. The change is minimal, targeted, and effectively solves the described problem. The implementation is sound and follows Python best practices for creating patchable code. The changes look good and I have no further comments.
0db2bd3 to
caeb8cf
Compare
|
@wangxiyuan hi, please help me have a review |
58e92af to
d0f9b3b
Compare
Signed-off-by: zjchenn <zjchenn@gmail.com>
456e709 to
112061f
Compare
MengqingCao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thx for this fix!
…llm-project#4354) ### What this PR does / why we need it? **Problem**: The Qwen3Next model implementation currently imports chunk_gated_delta_rule directly using `from ... import ...` In frameworks like `verl`, the model file is often imported before `vllm-ascend` initializes and applies its patches. This causes the model to permanently hold a reference to the original (unpatched) vLLM kernel, resulting in execution errors on Ascend devices even if the patch runs later. **Solution**: Changed the import style to `from vllm...ops import chunk` and call `chunk.chunk_gated_delta_rule().` This ensures that the function lookup happens at runtime (dynamic dispatch), allowing the model to correctly pick up the patched function regardless of import order. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b Signed-off-by: zjchenn <zjchenn@gmail.com>
…llm-project#4354) ### What this PR does / why we need it? **Problem**: The Qwen3Next model implementation currently imports chunk_gated_delta_rule directly using `from ... import ...` In frameworks like `verl`, the model file is often imported before `vllm-ascend` initializes and applies its patches. This causes the model to permanently hold a reference to the original (unpatched) vLLM kernel, resulting in execution errors on Ascend devices even if the patch runs later. **Solution**: Changed the import style to `from vllm...ops import chunk` and call `chunk.chunk_gated_delta_rule().` This ensures that the function lookup happens at runtime (dynamic dispatch), allowing the model to correctly pick up the patched function regardless of import order. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b Signed-off-by: zjchenn <zjchenn@gmail.com>
What this PR does / why we need it?
Problem: The Qwen3Next model implementation currently imports chunk_gated_delta_rule directly using
from ... import ...In frameworks like
verl, the model file is often imported beforevllm-ascendinitializes and applies its patches. This causes the model to permanently hold a reference to the original (unpatched) vLLM kernel, resulting in execution errors on Ascend devices even if the patch runs later.Solution: Changed the import style to
from vllm...ops import chunkand callchunk.chunk_gated_delta_rule().This ensures that the function lookup happens at runtime (dynamic dispatch), allowing the model to correctly pick up the patched function regardless of import order.
Does this PR introduce any user-facing change?
No. This is an internal bug fix to resolve import reference issues.
How was this patch tested?