-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Introduce InferenceString wrapper object #137711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
To support multimodal embedding, where inputs may be a mix of text and images, we need some way of tracking whether a given input is text or an image. The InferenceString object wraps the input String and associates it with a DataType enum value which indicates the type of data represented by the String. - Introduce InferenceString object to allow image inputs to be passed through inference code - Refactor EmbeddingsInput, EmbeddingRequestChunker and ChunkInferenceInput classes to handle InferenceString instead of String - Unwrap InferenceString prior to passing it into the existing Request classes used for embeddings to preserve existing behaviour - Update existing tests to handle InferenceString - Add additional test coverage for new behaviour
|
Pinging @elastic/ml-core (Team:ML) |
| return DataType.TEXT.equals(dataType); | ||
| } | ||
|
|
||
| public static List<String> toStringList(List<InferenceString> inferenceStrings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we filter out DataType.IMAGE_BASE64 items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should just filter out non-text inputs, because if any manage to make it into one of the two places we call this method, then there's a problem somewhere. Maybe an assert like in EmbeddingsInput.getTextInputs() just for safety? The two classes where this method is called (in ElasticsearchInternalService and SageMakerService) don't use EmbeddingsInput, which is why there's a slightly different flow for them.
| for (int chunkIndex = 0; chunkIndex < chunks.size(); chunkIndex++) { | ||
| // If the number of chunks is larger than the maximum allowed value, | ||
| // scale the indices to [0, MAX) with similar number of original | ||
| // scale the indices to [0, MAX] with similar number of original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, the change here is because MAX is inclusive right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my mistake, I thought this was just a typo rather than indicating inclusive/exclusive. I learned something new today!
To support multimodal embedding, where inputs may be a mix of text and images, we need some way of tracking whether a given input is text or an image. The InferenceString object wraps the input String and associates it with a DataType enum value which indicates the type of data represented by the String.