Skip to content

Q5_0 quantized model inference time is more than Q8_0 #3491

@senthileng386

Description

@senthileng386

I am trying to deploy the model on Qualcomm board QCS6490 CPU, I have quantized the model to Q8_0 and Q5_0 , While performing inference of variable length audio's like 9 sec, 10 sec and 12 sec audio the inference time of Q5_0 is more than the Q8_0 model for each audio.

Audio_Duration_ms Q8_0 Inference_Time_ms Q5_0_Inference_Time_ms
6180 7404.38 9039.27
6240 7443.3 9275.39
6800 7387.46 8923.46
7200 7923.13 9377.34
8340 7477.39 8953.79
8760 7790.3 9420.6
10580 7691.98 9557.05
10680 7791.84 9290.73
17700 8143.61 9556.05
19980 8083.97 9774.57

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions