FastSpeech2/HiFi-GAN TFLite output shape always [1, 1, 80] instead of [1, N, 80] on Android

## Issue Description

I'm experiencing a persistent output shape issue with FastSpeech2/HiFi-GAN TFLite models during Android inference. Despite following all documented steps and trying numerous troubleshooting approaches, the output shape is always `[1, 1, 80]` instead of the expected `[1, N, 80]`, where N should be the variable length dimension.

## What I've Tried

I have exhaustively attempted the following:

- ✅ Proper tensor handling and memory allocation
- ✅ Verified official pre-trained models
- ✅ Correct preprocessing steps as documented
- ✅ Dynamic resizing configurations
- ✅ Multiple tokenization approaches
- ✅ Different input text lengths
- ✅ Various model loading methods
- ✅ Checked tensor specifications and signatures
- ✅ Validated input shapes are correct
- ✅ Tested with different Android devices and API levels

## The Problem

No matter what configuration or approach I use, the output is truncated to `[1, 1, 80]`. This appears to be a fundamental issue with how the TFLite models are handling the variable-length output dimension during inference on Android.

## Questions & Request for Help

1. **Are there hidden model export flags or configurations required** for proper variable-length output support in TFLite that aren't documented?

2. **Are there specific binding or interpreter settings** needed for Android TFLite runtime to handle dynamic output shapes correctly?

3. **Are there known TFLite runtime issues** on Android that impact output shape handling for FastSpeech2/HiFi-GAN models?

4. **Are there Android wrapper or quantization issues** that could be causing this output shape limitation?

5. **Is there a specific version of TensorFlow Lite** or runtime configuration that's required for proper variable-length output?

Any guidance on resolving this truncation and output shape bug would be greatly appreciated. I'm happy to provide additional details about my setup, code, or testing results if needed.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FastSpeech2/HiFi-GAN TFLite output shape always [1, 1, 80] instead of [1, N, 80] on Android #811

Issue Description

What I've Tried

The Problem

Questions & Request for Help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FastSpeech2/HiFi-GAN TFLite output shape always [1, 1, 80] instead of [1, N, 80] on Android #811

Description

Issue Description

What I've Tried

The Problem

Questions & Request for Help

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions