-
Notifications
You must be signed in to change notification settings - Fork 810
Description
Issue Description
I'm experiencing a persistent output shape issue with FastSpeech2/HiFi-GAN TFLite models during Android inference. Despite following all documented steps and trying numerous troubleshooting approaches, the output shape is always [1, 1, 80] instead of the expected [1, N, 80], where N should be the variable length dimension.
What I've Tried
I have exhaustively attempted the following:
- ✅ Proper tensor handling and memory allocation
- ✅ Verified official pre-trained models
- ✅ Correct preprocessing steps as documented
- ✅ Dynamic resizing configurations
- ✅ Multiple tokenization approaches
- ✅ Different input text lengths
- ✅ Various model loading methods
- ✅ Checked tensor specifications and signatures
- ✅ Validated input shapes are correct
- ✅ Tested with different Android devices and API levels
The Problem
No matter what configuration or approach I use, the output is truncated to [1, 1, 80]. This appears to be a fundamental issue with how the TFLite models are handling the variable-length output dimension during inference on Android.
Questions & Request for Help
-
Are there hidden model export flags or configurations required for proper variable-length output support in TFLite that aren't documented?
-
Are there specific binding or interpreter settings needed for Android TFLite runtime to handle dynamic output shapes correctly?
-
Are there known TFLite runtime issues on Android that impact output shape handling for FastSpeech2/HiFi-GAN models?
-
Are there Android wrapper or quantization issues that could be causing this output shape limitation?
-
Is there a specific version of TensorFlow Lite or runtime configuration that's required for proper variable-length output?
Any guidance on resolving this truncation and output shape bug would be greatly appreciated. I'm happy to provide additional details about my setup, code, or testing results if needed.
Thank you!