-
Notifications
You must be signed in to change notification settings - Fork 590
update autotuner input tensor random range #2116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Summary of ChangesHello @jiahanc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the FlashInfer autotuner by modifying the range of its randomly generated input tensors. By expanding the range from Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughDefault tensor initializers in the autotuner were changed to use uniform-like random values scaled to [-5, 5] and cast to the target dtype, replacing previous normal/previous uniform behaviours during tensor construction and input preparation. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the random range for input tensors in the autotuner to be [-5, 5). While the change in _prepare_input_tensors correctly implements this, the update in DynamicTensorSpec is inconsistent. It uses torch.randn which results in a different distribution, and the order of operations could lead to precision loss. My review includes a suggestion to make the initializers consistent and robust.
flashinfer/autotuner.py
Outdated
| lambda shapes, dtype, device: ( | ||
| torch.randn(shapes, device=device).to(dtype) * 10 - 5 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has a couple of issues:
-
Inconsistent random distribution: This uses
torch.randn, which generates values from a normal distribution. With the* 10 - 5transformation, this will produce values from a normal distribution with a mean of -5 and a standard deviation of 10. This is inconsistent with the change in_prepare_input_tensorswhich usestorch.randto generate values in a uniform[-5, 5)range, and also doesn't seem to match the PR description's goal of a[-5, 5)range. For consistency,torch.randshould be used here as well. -
Potential precision loss: The type conversion
.to(dtype)is applied before the multiplication and subtraction. This can cause a loss of precision, especially with lower-precision dtypes likefloat16. The arithmetic operations should be performed infloat32(the default fortorch.rand/torch.randn) before casting to the targetdtype.
The suggested change below addresses both points for consistency and correctness.
lambda shapes, dtype, device: (torch.rand(shapes, device=device) * 10 - 5).to(dtype)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
flashinfer/autotuner.py (1)
764-766: Consider refactoring lambda to a named function.The implementation correctly achieves the [-5, 5) range using uniform distribution. However, the static analysis tool suggests defining this as a proper function instead of a lambda for better readability.
As per coding guidelines:
- default_initializer = lambda shapes, dtype, device: ( - torch.rand(shapes, device=device) * 10 - 5 - ).to(dtype) + def default_initializer(shapes, dtype, device): + return (torch.rand(shapes, device=device) * 10 - 5).to(dtype)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer/autotuner.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
flashinfer/autotuner.py
764-766: Do not assign a lambda expression, use a def
Rewrite default_initializer as a def
(E731)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
|
/bot run |
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
flashinfer/autotuner.py (1)
764-766: LGTM! Consider extracting lambda to a named function.The implementation correctly generates uniform random tensors in [-5, 5) and is consistent with the initialization at lines 64-66.
As noted by static analysis, consider extracting the lambda to a named function for better readability:
+ def _default_tensor_initializer(shapes, dtype, device): + return (torch.rand(shapes, device=device) * 10 - 5).to(dtype) + def _prepare_input_tensors( self, profile: OptimizationProfile, inputs: List[torch.Tensor] ) -> List[torch.Tensor]: - default_initializer = lambda shapes, dtype, device: ( - torch.rand(shapes, device=device) * 10 - 5 - ).to(dtype) + default_initializer = self._default_tensor_initializerAs per static analysis.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer/autotuner.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
flashinfer/autotuner.py
764-766: Do not assign a lambda expression, use a def
Rewrite default_initializer as a def
(E731)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (1)
flashinfer/autotuner.py (1)
64-66: LGTM! Tensor initialization correctly implements uniform [-5, 5) range.The implementation properly addresses previous review concerns by:
- Using
torch.randfor uniform distribution (nottorch.randn)- Performing arithmetic operations before dtype conversion to preserve precision
- Maintaining consistency with the default_initializer at lines 764-766
The transformation correctly maps [0, 1) → [0, 10) → [-5, 5).
|
[CANCELING] Pipeline #38830366: canceled |
|
/bot run |
| # Set default tensor_initializers if not provided | ||
| if self.tensor_initializers is None: | ||
| self.tensor_initializers = [ | ||
| lambda shapes, dtype, device: torch.randn(shapes, device=device).to( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randn is gaussian distribution, which is different from your desciption:
input tensor random range from [0,1) to [-5,5) for larger range
where [0, 1) is a uniform distribution.
I have no idea about the real data distribution tbh and changing it to [-5, 5) seems fine. Just a heads up to make sure it's not a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for pointing out. Is there a reason why randn here but rand in the other place?
Reason to change to [-5,5) is @rosenrodt did some work on MXFP4 tuning experiment and found this range can get better autotuner config than [0,1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I speculated [-5, 5) is than [0, 1) because the latter could truncate to 0s, thus affecting the power profile during autotune and less representative of the power profile of the actual workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think if [-5, 5) is better let's use it, data distribution affects kernel execution time.
| def _prepare_input_tensors( | ||
| self, profile: OptimizationProfile, inputs: List[torch.Tensor] | ||
| ) -> List[torch.Tensor]: | ||
| default_initializer = lambda shapes, dtype, device: torch.rand( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is indeed uniform distribution [0, 1)
|
[FAILED] Pipeline #38838385: 13/18 passed |
📌 Description
Update autotuner input tensor random range from [0,1) to [-5,5) for larger range and closer to real tensor
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.
✏️ Tip: You can customize this high-level summary in your review settings.