Replies: 11 comments 1 reply
-
|
+1 to this, would be nice to get comparable performance to TensorRT without having to export models to ONNX etc. first! |
Beta Was this translation helpful? Give feedback.
-
|
+1 |
Beta Was this translation helpful? Give feedback.
-
|
@mindbeast @bionictoucan @hietalajulius Hi, thanks for the comment. Yes, that makes sense in general. Right now, for ExecuTorch, we are integrating Vulkan into ExecuTorch. The reason is that it is a suitable solution for mobile GPUs. Enabling mobile use-cases is our primary goal at the moment. We will revisit Cuda, but perhaps, in the second half in the year. Curious, what are your current product needs? |
Beta Was this translation helpful? Give feedback.
-
|
Apologies for opening a similar feature request in #5263.
@mergennachin We want to deploy LLMs in cars, but Python-based inference frameworks like vLLM and SGLang are not suitable for edge devices.
Nearly five months have passed, is there any progress on this? |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for following up @DzAvril.
I guess this is using a platform similar to Jetson?
No update yet on CUDA backend for ET at the moment. We can get back to you here once we plan something. |
Beta Was this translation helpful? Give feedback.
-
@digantdesai Yes, Jetson Orin for now, and possibly Thor in the future. Looking forward to your update. |
Beta Was this translation helpful? Give feedback.
-
|
For mobile cuda backend, does |
Beta Was this translation helpful? Give feedback.
-
|
@DuinoDu My expectation is that compatibility is poor with torch_tensorrt. I expect a more compliant backend in executorch would help a lot of developers. |
Beta Was this translation helpful? Give feedback.
-
|
Any update on this? What is the best alternative so far to run CUDA on jetson? Directly Torch? Onnx? |
Beta Was this translation helpful? Give feedback.
-
|
@Gasoonjia may like to update this discussion? |
Beta Was this translation helpful? Give feedback.
-
|
We have a WIP cuda backend, backend by AOTInductor: https://docs.pytorch.org/docs/stable/torch.compiler_aot_inductor.html. We have enabled some popular models (whisper, voxtral, gemma3 etc), please checkout this README.md to give whisper a try! You can also find voxtral instructions here: https://github.com/pytorch/executorch/tree/main/examples/models/voxtral#readme, gemma3 instructions here: https://github.com/pytorch/executorch/blob/main/examples/models/gemma3/README.md |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Does it make sense for executorch to have a mobile cuda backend? There are many edge devices in the Jetson lineup from nvidia that have a cuda gpu, but can benefit from not wanting to link an enormous libtorch dependence.
Beta Was this translation helpful? Give feedback.
All reactions