Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions tensorrt_llm/_torch/auto_deploy/models/patches/nemotron_h.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,14 +93,15 @@ def _nemotron_h_topk_router_forward(self, hidden_states):
Forward pass for NemotronHTopkRouter using the optimized noaux_tc_op kernel.

This replaces the original forward method which used pure PyTorch operations
with a fused CUDA kernel that performs:
1. Sigmoid activation of logits
2. Group-based expert selection
3. Top-k selection within selected groups
4. Normalized weight computation
with optimized CUDA kernels:
"""
hidden_states = hidden_states.view(-1, self.config.hidden_size)
router_logits = F.linear(hidden_states.type(torch.float32), self.weight.type(torch.float32))
if self.weight.dtype == torch.float32:
router_logits = F.linear(hidden_states.type(torch.float32), self.weight)
else:
router_logits = torch.ops.trtllm.dsv3_router_gemm_op(
hidden_states, self.weight.t(), bias=None, out_dtype=torch.float32
)
Comment on lines +99 to +104
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we do this through the pattern matcher + replacement approach as opposed to directly modifying the patch?


# Use the fused noaux_tc_op kernel which applies sigmoid internally
# and performs group-based top-k selection with normalization
Expand Down