Skip to content

Conversation

@jiahanc
Copy link
Collaborator

@jiahanc jiahanc commented Nov 19, 2025

📌 Description

Update autotuner input tensor random range from [0,1) to [-5,5) for larger range and closer to real tensor

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Bug Fixes
    • Improved tensor initialization used during autotuning: values are now drawn from a symmetric range around zero ([-5, 5]) with a more uniform-like distribution, yielding more consistent and stable parameter tuning results.

✏️ Tip: You can customize this high-level summary in your review settings.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jiahanc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the FlashInfer autotuner by modifying the range of its randomly generated input tensors. By expanding the range from [0,1) to [-5,5), the autotuner will now be tested with a wider and more realistic distribution of values, which should lead to more robust and accurate optimization profiles.

Highlights

  • Autotuner Input Tensor Range Update: The range for randomly generated input tensors used by the autotuner has been updated from [0,1) to [-5,5). This change applies to both the default tensor_initializers in __post_init__ and the default_initializer in _prepare_input_tensors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Default tensor initializers in the autotuner were changed to use uniform-like random values scaled to [-5, 5] and cast to the target dtype, replacing previous normal/previous uniform behaviours during tensor construction and input preparation.

Changes

Cohort / File(s) Change Summary
Autotuner initialization logic
flashinfer/autotuner.py
Replaced previous tensor initialization with default initializers that generate random values via (rand * 10 - 5) (range [-5, 5]) and then cast to the specified dtype; applied in DynamicTensorSpec.post_init and _prepare_input_tensors.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Single file with consistent initializer changes.
  • Check dtype casts and any assumptions about value distributions in downstream code.

Poem

🐰 I nibble code as daylight bends,
I scale and shift with playful trends,
From normals gone to ranges wide,
Tiny hops in tensors glide,
A rabbit's tweak — the inputs smile. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: updating the random range for autotuner input tensors from [0,1) to [-5,5).
Description check ✅ Passed The description includes the required Description section explaining the change rationale, follows the template structure, and confirms pre-commit checks were completed.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the random range for input tensors in the autotuner to be [-5, 5). While the change in _prepare_input_tensors correctly implements this, the update in DynamicTensorSpec is inconsistent. It uses torch.randn which results in a different distribution, and the order of operations could lead to precision loss. My review includes a suggestion to make the initializers consistent and robust.

Comment on lines 64 to 66
lambda shapes, dtype, device: (
torch.randn(shapes, device=device).to(dtype) * 10 - 5
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change has a couple of issues:

  1. Inconsistent random distribution: This uses torch.randn, which generates values from a normal distribution. With the * 10 - 5 transformation, this will produce values from a normal distribution with a mean of -5 and a standard deviation of 10. This is inconsistent with the change in _prepare_input_tensors which uses torch.rand to generate values in a uniform [-5, 5) range, and also doesn't seem to match the PR description's goal of a [-5, 5) range. For consistency, torch.rand should be used here as well.

  2. Potential precision loss: The type conversion .to(dtype) is applied before the multiplication and subtraction. This can cause a loss of precision, especially with lower-precision dtypes like float16. The arithmetic operations should be performed in float32 (the default for torch.rand/torch.randn) before casting to the target dtype.

The suggested change below addresses both points for consistency and correctness.

                lambda shapes, dtype, device: (torch.rand(shapes, device=device) * 10 - 5).to(dtype)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
flashinfer/autotuner.py (1)

764-766: Consider refactoring lambda to a named function.

The implementation correctly achieves the [-5, 5) range using uniform distribution. However, the static analysis tool suggests defining this as a proper function instead of a lambda for better readability.

As per coding guidelines:

-        default_initializer = lambda shapes, dtype, device: (
-            torch.rand(shapes, device=device) * 10 - 5
-        ).to(dtype)
+        def default_initializer(shapes, dtype, device):
+            return (torch.rand(shapes, device=device) * 10 - 5).to(dtype)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a23405 and cad6e85.

📒 Files selected for processing (1)
  • flashinfer/autotuner.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
flashinfer/autotuner.py

764-766: Do not assign a lambda expression, use a def

Rewrite default_initializer as a def

(E731)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs

@jiahanc
Copy link
Collaborator Author

jiahanc commented Nov 19, 2025

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !153 has been created, and the CI pipeline #38825852 is currently running. I'll report back once the pipeline job completes.

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
@jiahanc
Copy link
Collaborator Author

jiahanc commented Nov 20, 2025

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !153 has been updated with latest changes, and the CI pipeline #38830366 is currently running. I'll report back once the pipeline job completes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
flashinfer/autotuner.py (1)

764-766: LGTM! Consider extracting lambda to a named function.

The implementation correctly generates uniform random tensors in [-5, 5) and is consistent with the initialization at lines 64-66.

As noted by static analysis, consider extracting the lambda to a named function for better readability:

+    def _default_tensor_initializer(shapes, dtype, device):
+        return (torch.rand(shapes, device=device) * 10 - 5).to(dtype)
+
     def _prepare_input_tensors(
         self, profile: OptimizationProfile, inputs: List[torch.Tensor]
     ) -> List[torch.Tensor]:
-        default_initializer = lambda shapes, dtype, device: (
-            torch.rand(shapes, device=device) * 10 - 5
-        ).to(dtype)
+        default_initializer = self._default_tensor_initializer

As per static analysis.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cad6e85 and 3e9bc87.

📒 Files selected for processing (1)
  • flashinfer/autotuner.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
flashinfer/autotuner.py

764-766: Do not assign a lambda expression, use a def

Rewrite default_initializer as a def

(E731)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs
🔇 Additional comments (1)
flashinfer/autotuner.py (1)

64-66: LGTM! Tensor initialization correctly implements uniform [-5, 5) range.

The implementation properly addresses previous review concerns by:

  • Using torch.rand for uniform distribution (not torch.randn)
  • Performing arithmetic operations before dtype conversion to preserve precision
  • Maintaining consistency with the default_initializer at lines 764-766

The transformation correctly maps [0, 1) → [0, 10) → [-5, 5).

@flashinfer-bot
Copy link
Collaborator

[CANCELING] Pipeline #38830366: canceled

@jiahanc
Copy link
Collaborator Author

jiahanc commented Nov 20, 2025

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !153 has been created, and the CI pipeline #38838385 is currently running. I'll report back once the pipeline job completes.

# Set default tensor_initializers if not provided
if self.tensor_initializers is None:
self.tensor_initializers = [
lambda shapes, dtype, device: torch.randn(shapes, device=device).to(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

randn is gaussian distribution, which is different from your desciption:

input tensor random range from [0,1) to [-5,5) for larger range

where [0, 1) is a uniform distribution.

I have no idea about the real data distribution tbh and changing it to [-5, 5) seems fine. Just a heads up to make sure it's not a typo.

Copy link
Collaborator Author

@jiahanc jiahanc Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing out. Is there a reason why randn here but rand in the other place?
Reason to change to [-5,5) is @rosenrodt did some work on MXFP4 tuning experiment and found this range can get better autotuner config than [0,1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I speculated [-5, 5) is than [0, 1) because the latter could truncate to 0s, thus affecting the power profile during autotune and less representative of the power profile of the actual workload.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think if [-5, 5) is better let's use it, data distribution affects kernel execution time.

def _prepare_input_tensors(
self, profile: OptimizationProfile, inputs: List[torch.Tensor]
) -> List[torch.Tensor]:
default_initializer = lambda shapes, dtype, device: torch.rand(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is indeed uniform distribution [0, 1)

@flashinfer-bot
Copy link
Collaborator

[FAILED] Pipeline #38838385: 13/18 passed

@yzh119 yzh119 merged commit 84df81e into flashinfer-ai:main Nov 22, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants