-
Notifications
You must be signed in to change notification settings - Fork 13.7k
hexagon: various Op fixes #17135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hexagon: various Op fixes #17135
Conversation
|
Great work! I've tested the fix on my device and it works well. One small thing: Looking ahead, have you considered maintaining a similar structure to store these values on the CPU? That could be optimal since the weight dimensions remain fixed throughout all stages. |
chraac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows. Somehow other backends seems to handle this without obvious explicit checks. In the hexagon case we need to check explicitly and skip them.
Co-authored-by: chraac <chraac@gmail.com>
918e859 to
313f261
Compare
Yeah, I did lots of profiling runs and the overall perf is the same as before but all And yes, let's think about caching those. I was thinking host at first as we discussed in the other PR. But maybe it makes sense to allocate a little bit of vtcm and cache them there. |
|
@lhez please take a look |
lhez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Introduce fastdiv and fix
test-backend-opsfailures for ADD/SUB/MULThanks @chraac!
Subsumes #17042
Fixed inference with Qwen3-VL models that generate graphs with
ne[1] == 0