I am trying to deploy the model on Qualcomm board QCS6490 CPU, I have quantized the model to Q8_0 and Q5_0 , While performing inference of variable length audio's like 9 sec, 10 sec and 12 sec audio the inference time of Q5_0 is more than the Q8_0 model for each audio.
| Audio_Duration_ms |
Q8_0 Inference_Time_ms |
Q5_0_Inference_Time_ms |
| 6180 |
7404.38 |
9039.27 |
| 6240 |
7443.3 |
9275.39 |
| 6800 |
7387.46 |
8923.46 |
| 7200 |
7923.13 |
9377.34 |
| 8340 |
7477.39 |
8953.79 |
| 8760 |
7790.3 |
9420.6 |
| 10580 |
7691.98 |
9557.05 |
| 10680 |
7791.84 |
9290.73 |
| 17700 |
8143.61 |
9556.05 |
| 19980 |
8083.97 |
9774.57 |