Skip to content

Conversation

@evkotov
Copy link
Contributor

@evkotov evkotov commented Nov 25, 2025

Details:

Subgraph from silero_vad.onnx
image

Problem

ONNX models exported from PyTorch frequently contain Unsqueeze operations before LSTM nodes. These operations add extra dimensions to tensors, resulting in rank-4 or rank-5 inputs to LSTM nodes. However, the ONNX LSTM specification strictly requires rank-3 inputs with shape [seq_length, batch_size, input_size]. Why this happens:
PyTorch models use various tensor shapes during training
During ONNX export, shape mismatches are "fixed" by inserting Unsqueeze nodes
These Unsqueeze operations add dimensions with size 1 to match expected shapes
The resulting LSTM inputs have rank > 3, violating ONNX LSTM specification

Real-world impact:

Models like silero_vad.onnx contain 4 LSTM nodes, all with Unsqueeze operations before them
Without this fix LSTM models fail to convert to OpenVINO IR

Solution
This fix adds automatic rank reduction in the ONNX Frontend LSTM converter (src/frontends/onnx/frontend/src/op/lstm.cpp). The implementation uses a two-strategy approach:

  1. Squeeze Strategy (optimal path):
    Used when all extra leading dimensions equal 1
    Example: [1, 1, seq, batch, input] → [seq, batch, input]
    Zero-cost operation that only changes metadata, no data movement
    Applies to most real-world models (including silero_vad.onnx)
  2. Reshape Strategy (fallback path):
    Used when extra dimensions are > 1 or have dynamic shapes
    Example: [2, 3, seq, batch, input] → [6, batch, input] (flattens leading dimensions)
    Handles edge cases and dynamic shapes
    Uses dynamic shape calculation at runtime

Implementation details:

New function reduce_tensor_rank() analyzes input tensor rank and shape
Automatically selects optimal strategy based on dimension values
Applied to all LSTM inputs: X (data), initial_h (hidden state), initial_c (cell state)
Transparent to users - no model modifications required
Code structure:

// Analyze input shape
if (input_rank <= target_rank) {
    return input;  // No reduction needed
}

// Check if all extra dimensions equal 1
if (all_extra_dims_are_one) {
    // Use Squeeze - optimal path
    return Squeeze(input, axes);
} else {
    // Use Reshape - fallback path
    return Reshape(input, new_shape);
}

Performance:
Squeeze path has zero runtime overhead (metadata-only operation)
Reshape path adds minimal overhead only for edge cases
No impact on models that already have rank-3 inputs

Tickets:

  • 162986

Problem: Models like silero_vad contain LSTM layers with high-rank input tensors
(rank > 3), but OpenVINO's LSTM expects exactly rank 3 [batch, sequence, features].
This causes conversion failures for models with shapes like [1, 1, ?, ?, ?].

Solution: Add reduce_tensor_rank() helper function that processes LSTM inputs
(X, initial_h, initial_c) before axis reordering. The function:
- Squeezes leading dimensions equal to 1 when possible
- Uses Reshape to collapse leading dimensions if they aren't all 1
- Reduces rank to target rank 3 before reordering axes

Test: Added onnx_model_lstm_high_rank_input test with rank-5 input [1,1,3,2,4]
that gets reduced to [3,2,4]. Reference outputs generated using ONNX Runtime
with equivalent rank-3 input.
This test does not reproduce silero_vad.onnx structure.
Keep only tests that match actual silero_vad patterns:
- lstm_rank5_squeeze: multi-axis squeeze (with GPU skip)
- lstm_rank4_with_unsqueeze: exact silero_vad structure
@evkotov evkotov requested a review from mvafin November 27, 2025 12:52
};

// Helper function to reduce tensor rank to target_rank by squeezing or reshaping
std::shared_ptr<ov::Node> reduce_tensor_rank(const ov::Output<ov::Node>& input, int64_t target_rank) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<ov::Node> reduce_tensor_rank(const ov::Output<ov::Node>& input, int64_t target_rank) {
ov::Output<ov::Node> reduce_tensor_rank(const ov::Output<ov::Node>& input, int64_t target_rank) {

Avoid working with nodes as this might be a node with many outouts and you will use first output that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed

Comment on lines 48 to 50
if (!input_shape.rank().is_static()) {
return input.get_node_shared_ptr();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!input_shape.rank().is_static()) {
return input.get_node_shared_ptr();
}
if (input_shape.rank().is_dynamic()) {
return input;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed

const auto input_rank = input_shape.rank().get_length();

if (input_rank <= target_rank) {
return input.get_node_shared_ptr();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return input.get_node_shared_ptr();
return input;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed

Comment on lines 72 to 81
auto shape_of_input = std::make_shared<v3::ShapeOf>(input);
auto start_idx = v0::Constant::create(ov::element::i64, Shape{1}, {input_rank - target_rank});
auto stop_idx = v0::Constant::create(ov::element::i64, Shape{1}, {input_rank});
auto step = v0::Constant::create(ov::element::i64, Shape{1}, {1});

// Get last target_rank dimensions: shape[-target_rank:]
auto last_dims = std::make_shared<v8::Slice>(shape_of_input, start_idx, stop_idx, step);

// Reshape to extract last target_rank dimensions
return std::make_shared<v1::Reshape>(input, last_dims, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can extra input dimensions be not 1? In such case the reshape will fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigation shows how ONNX Runtime handles this case: ONNX Runtime strictly validates rank=3: https://github.com/microsoft/onnxruntime/blob/423a03f1fc80d3cbed4f973574ee96f31521a3d3/onnxruntime/core/providers/cpu/rnn/lstm_base.cc#L191-L192

if (X_shape.NumDimensions() != 3)
    return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
        "Input X must have 3 dimensions only. Actual:", X_shape);

ONNX specification also requires rank=3: https://github.com/onnx/onnx/blob/main/onnx/defs/rnn/defs.cc#L26-L28

if (first_input_shape.dim_size() != 3) {
    fail_shape_inference("First input tensor must have rank 3");
}

ONNX Runtime does not support inputs with rank > 3 for LSTM. It simply fails with an error. Our approach in OpenVINO is an extension that handles models where Unsqueeze operations precede LSTM nodes.
I updated solution:
I removed the Reshape fallback since it's incorrect for dimensions != 1
Now we always use Squeeze for all leading dimensions
Per ONNX spec, LSTM requires rank=3, so extra dimensions MUST be == 1 (from Unsqueeze operations)
If somehow extra dimensions are != 1 at runtime, Squeeze will fail with a clear error message, which is the correct behavior since such input violates ONNX LSTM specification

@mvafin
Copy link
Contributor

mvafin commented Nov 27, 2025

LSTM spec (https://onnx.ai/onnx/operators/onnx__LSTM.html) describes X, init C and H as 3D tensors. Is this a broader behavior supported by onnxruntime?

Address code review feedback:
- Change return type from std::shared_ptr<ov::Node> to ov::Output<ov::Node>
- Use is_dynamic() instead of !is_static() for clarity
- Return input directly instead of input.get_node_shared_ptr()
- Remove Reshape fallback, always use Squeeze for leading dimensions
- Remove unused includes (reshape.hpp, slice.hpp)

Per ONNX spec, LSTM requires rank-3 inputs. Extra dimensions must be == 1.
ONNX Runtime strictly validates this and fails for rank > 3.
Our Squeeze-based approach is an extension that handles models where
Unsqueeze operations precede LSTM nodes (common in PyTorch exports).
If extra dimensions are != 1 at runtime, Squeeze will fail with clear error.
@evkotov evkotov requested a review from mvafin December 3, 2025 11:42
@evkotov
Copy link
Contributor Author

evkotov commented Dec 3, 2025

LSTM spec (https://onnx.ai/onnx/operators/onnx__LSTM.html) describes X, init C and H as 3D tensors. Is this a broader behavior supported by onnxruntime?

No, ONNX Runtime does not support broader rank behavior for LSTM inputs. It strictly validates rank=3: ONNX Runtime validation: https://github.com/microsoft/onnxruntime/blob/423a03f1fc80d3cbed4f973574ee96f31521a3d3/onnxruntime/core/providers/cpu/rnn/lstm_base.cc#L191-L192

if (X_shape.NumDimensions() != 3)
    return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
        "Input X must have 3 dimensions only. Actual:", X_shape);

Similarly for initial_h and initial_c: https://github.com/microsoft/onnxruntime/blob/423a03f1fc80d3cbed4f973574ee96f31521a3d3/onnxruntime/core/providers/cpu/rnn/lstm_base.cc#L221-L230

if (initial_h_shape.NumDimensions() != 3 || ...)
    return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Input initial_h must have shape {...}");

ONNX shape inference also validates rank=3: https://github.com/onnx/onnx/blob/main/onnx/defs/rnn/defs.cc#L26-L28

if (first_input_shape.dim_size() != 3) {
    fail_shape_inference("First input tensor must have rank 3");
}

So models like silero_vad.onnx work in ONNX Runtime because Unsqueeze nodes are typically removed during graph optimization when their input is a constant, or the model is exported with correct shapes. Our fix in OpenVINO handles the case where Unsqueeze cannot be optimized away (e.g., when input is dynamic).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: ONNX FE OpenVINO ONNX FrontEnd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants