-
Notifications
You must be signed in to change notification settings - Fork 555
Add image-text-to-image and image-text-to-video tasks
#1866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
AI agent disclosure: I added the |
|
cc @merveenoyan |
gary149
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good - maybe @merveenoyan you'll want to do some edits to md files
merveenoyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for seeing this gap and working on it!
| ], | ||
| models: [ | ||
| { | ||
| description: "A powerful model for image-text-to-video generation.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to add more here and the Space too!
|
cc @julien-c as well on the (orthogonal) topic of generic any-to-any task + modality selection |
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
…/huggingface.js into new-image-text-tasks
|
Thanks @merveenoyan , modified the examples and opened this PR here https://huggingface.co/datasets/huggingfacejs/tasks/discussions/12 |
hanouticelina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the inference part, all good! thanks
merveenoyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
The goal of this new tasks is to support models that take in both image and text input and output either image or video.
The goal of this PR is making the tasks as analogous to
image-to-imageandimage-to-videoas possible, with the only difference that the image input should now be optional, as an empty image and a valid prompt should still work for a model like FLUX.2 (supports both text-to-image and image-to-image tasks) or LTX Video (both text-to-video and image-to-video)Once this is in, I'll also have a widget PR in Moon to support this task in the model cards / widgets etc. and a follow up PR adding this to the inference providers, so that we can then PR repos to change the task for compatible models