-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[None][feat] AutoDeploy: Use the router gemm op for nemotron MOE #9500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
|
/bot run |
📝 WalkthroughWalkthroughRouter logits computation in NemotronHTopkRouter now conditionally branches on weight dtype: using standard linear operation for float32 weights, or a specialized Triton GEMM operator for other dtypes with transposed weights and float32 output forcing. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10–15 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (2)**/*.py📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
**/*.{cpp,h,cu,py}📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
🧠 Learnings (6)📓 Common learnings📚 Learning: 2025-09-19T21:28:13.751ZApplied to files:
📚 Learning: 2025-10-20T17:07:18.745ZApplied to files:
📚 Learning: 2025-10-20T17:09:21.560ZApplied to files:
📚 Learning: 2025-08-19T12:45:11.997ZApplied to files:
📚 Learning: 2025-08-08T22:03:40.707ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
PR_Github #25888 [ run ] triggered by Bot. Commit: |
|
PR_Github #25888 [ run ] completed with state |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.