Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions flashinfer/autotuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ def __post_init__(self):
# Set default tensor_initializers if not provided
if self.tensor_initializers is None:
self.tensor_initializers = [
lambda shapes, dtype, device: torch.randn(shapes, device=device).to(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

randn is gaussian distribution, which is different from your desciption:

input tensor random range from [0,1) to [-5,5) for larger range

where [0, 1) is a uniform distribution.

I have no idea about the real data distribution tbh and changing it to [-5, 5) seems fine. Just a heads up to make sure it's not a typo.

Copy link
Collaborator Author

@jiahanc jiahanc Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing out. Is there a reason why randn here but rand in the other place?
Reason to change to [-5,5) is @rosenrodt did some work on MXFP4 tuning experiment and found this range can get better autotuner config than [0,1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I speculated [-5, 5) is than [0, 1) because the latter could truncate to 0s, thus affecting the power profile during autotune and less representative of the power profile of the actual workload.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think if [-5, 5) is better let's use it, data distribution affects kernel execution time.

dtype
lambda shapes, dtype, device: (
torch.randn(shapes, device=device).to(dtype) * 10 - 5
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change has a couple of issues:

  1. Inconsistent random distribution: This uses torch.randn, which generates values from a normal distribution. With the * 10 - 5 transformation, this will produce values from a normal distribution with a mean of -5 and a standard deviation of 10. This is inconsistent with the change in _prepare_input_tensors which uses torch.rand to generate values in a uniform [-5, 5) range, and also doesn't seem to match the PR description's goal of a [-5, 5) range. For consistency, torch.rand should be used here as well.

  2. Potential precision loss: The type conversion .to(dtype) is applied before the multiplication and subtraction. This can cause a loss of precision, especially with lower-precision dtypes like float16. The arithmetic operations should be performed in float32 (the default for torch.rand/torch.randn) before casting to the target dtype.

The suggested change below addresses both points for consistency and correctness.

                lambda shapes, dtype, device: (torch.rand(shapes, device=device) * 10 - 5).to(dtype)

for _ in range(len(self.input_idx))
]
Expand Down Expand Up @@ -761,8 +761,8 @@ def _create_tensor_like(
def _prepare_input_tensors(
self, profile: OptimizationProfile, inputs: List[torch.Tensor]
) -> List[torch.Tensor]:
default_initializer = lambda shapes, dtype, device: torch.rand(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is indeed uniform distribution [0, 1)

shapes, device=device
default_initializer = lambda shapes, dtype, device: (
torch.rand(shapes, device=device) * 10 - 5
).to(dtype)
tensors = []
for i, p in enumerate(profile.shapes):
Expand Down