Skip to content

Conversation

@ZJY0516
Copy link
Contributor

@ZJY0516 ZJY0516 commented Nov 8, 2025

Purpose

Fix fused_gdn_gating accuracy issue

Test Plan

vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --enable-expert-parallel -tp 4

lm_eval --model local-chat-completions --model_args model=Qwen/Qwen3-Next-80B-A3B-Instruct,base_url=http://localhost:8000/v1/chat/completions,num_concurrent=280 --tasks gsm8k --apply_chat_template --num_fewshot 5

Test Result

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.7839 ± 0.0113
strict-match 5 exact_match 0.6566 ± 0.0131

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@ZJY0516 ZJY0516 requested a review from sighingnow as a code owner November 8, 2025 08:37
@mergify mergify bot added the qwen Related to Qwen models label Nov 8, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an accuracy issue in the fused_gdn_gating_kernel. The changes are twofold: first, the manual sigmoid implementation is replaced with tl.sigmoid, which should provide better numerical stability. Second, the data type for storing the beta_output is corrected to match the input tensor b's data type. This change ensures that the kernel's behavior correctly mimics b.sigmoid(), resolving a precision mismatch that was likely the cause of the accuracy problem. The fix appears correct and well-targeted. I have no further suggestions.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@vadiklyutiy
Copy link
Collaborator

Could you add accuracy test before this PR

@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Nov 8, 2025

Could you add accuracy test before this PR

Before this PR, sometimes gsm8k score was zero

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Nov 8, 2025

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@vadiklyutiy
Copy link
Collaborator

Could you add accuracy test before this PR

Before this PR, sometimes gsm8k score was zero

what sometimes exactly mean?

  • use the same cmd and result was randomly 0 or around .80
  • or different cmd give different results
  • ?

@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Nov 8, 2025

Could you add accuracy test before this PR

Before this PR, sometimes gsm8k score was zero

what sometimes exactly mean?

  • use the same cmd and result was randomly 0 or around .80
  • or different cmd give different results
  • ?

same cmd, randomly 0 or around .80

@ZJY0516 ZJY0516 requested a review from vadiklyutiy November 8, 2025 10:48
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 8, 2025
@mgoin mgoin merged commit c4768dc into vllm-project:main Nov 9, 2025
53 checks passed
@ZJY0516 ZJY0516 deleted the fix-fused_gdn_gating branch November 13, 2025 09:38
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants