@@ -20,11 +20,25 @@ module to generate the spectrogram tensor.
2020
2121## Build
2222
23- Currently we have CUDA build support only. CPU and Metal backend builds are WIP.
23+ Currently we have CUDA and Metal build support. CPU is WIP.
24+
25+ For CUDA:
26+ ```
27+ BUILD_BACKEND="EXECUTORCH_BUILD_CUDA"
28+ ```
29+
30+ For Metal:
31+ ```
32+ BUILD_BACKEND="EXECUTORCH_BUILD_METAL"
33+ ```
2434
2535``` bash
2636# Install ExecuTorch libraries:
27- cmake --preset llm -DEXECUTORCH_BUILD_CUDA=ON -DCMAKE_INSTALL_PREFIX=cmake-out -DCMAKE_BUILD_TYPE=Release . -Bcmake-out
37+ cmake --preset llm \
38+ -D${BUILD_BACKEND} =ON \
39+ -DCMAKE_INSTALL_PREFIX=cmake-out \
40+ -DCMAKE_BUILD_TYPE=Release \
41+ -Bcmake-out -S.
2842cmake --build cmake-out -j$( nproc) --target install --config Release
2943
3044# Build the runner:
@@ -44,6 +58,8 @@ tokenizer target (`tokenizers::tokenizers`).
4458
4559Use [ Optimum-ExecuTorch] ( https://github.com/huggingface/optimum-executorch ) to export a Whisper model from Hugging Face:
4660
61+ #### CUDA backend:
62+
4763``` bash
4864optimum-cli export executorch \
4965 --model openai/whisper-small \
@@ -58,6 +74,23 @@ This command generates:
5874- ` model.pte ` — Compiled Whisper model
5975- ` aoti_cuda_blob.ptd ` — Weight data file for CUDA backend
6076
77+ #### Metal backend:
78+
79+ ``` bash
80+ optimum-cli export executorch \
81+ --model openai/whisper-small \
82+ --task automatic-speech-recognition \
83+ --recipe metal \
84+ --dtype bfloat16 \
85+ --output_dir ./
86+ ```
87+
88+ This command generates:
89+ - ` model.pte ` — Compiled Whisper model
90+ - ` aoti_metal_blob.ptd ` — Weight data file for Metal backend
91+
92+ ### Preprocessor
93+
6194Export a preprocessor to convert raw audio to mel-spectrograms:
6295
6396``` bash
@@ -71,7 +104,7 @@ python -m executorch.extension.audio.mel_spectrogram \
71104
72105### Quantization
73106
74- Export quantized models to reduce size and improve performance:
107+ Export quantized models to reduce size and improve performance (Not enabled for Metal yet) :
75108
76109``` bash
77110# 4-bit tile packed quantization for encoder
@@ -120,6 +153,8 @@ python -c "from datasets import load_dataset; import soundfile as sf; sample = l
120153
121154After building the runner (see [ Build] ( #build ) section), execute it with the exported model and audio:
122155
156+ #### CUDA backend:
157+
123158``` bash
124159# Set library path for CUDA dependencies
125160export LD_LIBRARY_PATH=/opt/conda/lib:$LD_LIBRARY_PATH
@@ -133,3 +168,16 @@ cmake-out/examples/models/whisper/whisper_runner \
133168 --processor_path whisper_preprocessor.pte \
134169 --temperature 0
135170```
171+
172+ #### Metal backend:
173+
174+ ``` bash
175+ # Run the Whisper runner
176+ cmake-out/examples/models/whisper/whisper_runner \
177+ --model_path model.pte \
178+ --data_path aoti_metal_blob.ptd \
179+ --tokenizer_path ./ \
180+ --audio_path output.wav \
181+ --processor_path whisper_preprocessor.pte \
182+ --temperature 0
183+ ```
0 commit comments