Skip to content

Conversation

@zjchenn
Copy link
Contributor

@zjchenn zjchenn commented Nov 22, 2025

What this PR does / why we need it?

Problem: The Qwen3Next model implementation currently imports chunk_gated_delta_rule directly using from ... import ...

In frameworks like verl, the model file is often imported before vllm-ascend initializes and applies its patches. This causes the model to permanently hold a reference to the original (unpatched) vLLM kernel, resulting in execution errors on Ascend devices even if the patch runs later.

Solution: Changed the import style to from vllm...ops import chunk and call chunk.chunk_gated_delta_rule().

This ensures that the function lookup happens at runtime (dynamic dispatch), allowing the model to correctly pick up the patched function regardless of import order.

Does this PR introduce any user-facing change?

No. This is an internal bug fix to resolve import reference issues.

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a bug related to function patching in specific execution environments. By changing the import of chunk_gated_delta_rule from a direct function import to a module-level import, the code ensures that the function call is resolved at runtime. This allows for monkey-patching to work as intended, preventing the model from holding a reference to an old, unpatched function. The change is minimal, targeted, and effectively solves the described problem. The implementation is sound and follows Python best practices for creating patchable code. The changes look good and I have no further comments.

@zjchenn zjchenn force-pushed the fix/module-level-import-for-qwen3next branch from 0db2bd3 to caeb8cf Compare November 22, 2025 02:27
@zjchenn
Copy link
Contributor Author

zjchenn commented Nov 22, 2025

@wangxiyuan hi, please help me have a review

@zjchenn zjchenn changed the title [bugfix] use module-level import for 'chunk_gated_delta_rule' in Qwen3Next [Bugfix] use module-level import for 'chunk_gated_delta_rule' in Qwen3Next Nov 22, 2025
@zjchenn zjchenn force-pushed the fix/module-level-import-for-qwen3next branch 2 times, most recently from 58e92af to d0f9b3b Compare November 22, 2025 07:15
@zjchenn
Copy link
Contributor Author

zjchenn commented Nov 24, 2025

/cc @MengqingCao @yiz-liu @weijinqian0

@zjchenn zjchenn changed the title [Bugfix] use module-level import for 'chunk_gated_delta_rule' in Qwen3Next [Bugfix] use module-level import for patched function in Qwen3Next Nov 24, 2025
@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Nov 25, 2025
Signed-off-by: zjchenn <zjchenn@gmail.com>
@zjchenn zjchenn force-pushed the fix/module-level-import-for-qwen3next branch from 456e709 to 112061f Compare November 25, 2025 07:56
Copy link
Collaborator

@MengqingCao MengqingCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx for this fix!

@MengqingCao MengqingCao merged commit 463910e into vllm-project:main Nov 25, 2025
22 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Nov 29, 2025
…llm-project#4354)

### What this PR does / why we need it?

**Problem**: The Qwen3Next model implementation currently imports
chunk_gated_delta_rule directly using `from ... import ...`

In frameworks like `verl`, the model file is often imported before
`vllm-ascend` initializes and applies its patches. This causes the model
to permanently hold a reference to the original (unpatched) vLLM kernel,
resulting in execution errors on Ascend devices even if the patch runs
later.

**Solution**: Changed the import style to `from vllm...ops import chunk`
and call `chunk.chunk_gated_delta_rule().`

This ensures that the function lookup happens at runtime (dynamic
dispatch), allowing the model to correctly pick up the patched function
regardless of import order.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: zjchenn <zjchenn@gmail.com>
Meihan-chen pushed a commit to Meihan-chen/vllm-ascend that referenced this pull request Dec 5, 2025
…llm-project#4354)

### What this PR does / why we need it?

**Problem**: The Qwen3Next model implementation currently imports
chunk_gated_delta_rule directly using `from ... import ...`

In frameworks like `verl`, the model file is often imported before
`vllm-ascend` initializes and applies its patches. This causes the model
to permanently hold a reference to the original (unpatched) vLLM kernel,
resulting in execution errors on Ascend devices even if the patch runs
later.

**Solution**: Changed the import style to `from vllm...ops import chunk`
and call `chunk.chunk_gated_delta_rule().`

This ensures that the function lookup happens at runtime (dynamic
dispatch), allowing the model to correctly pick up the patched function
regardless of import order.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b

Signed-off-by: zjchenn <zjchenn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants