[ML] Introduce InferenceString wrapper object #137711

DonalEvans · 2025-11-07T01:08:48Z

To support multimodal embedding, where inputs may be a mix of text and images, we need some way of tracking whether a given input is text or an image. The InferenceString object wraps the input String and associates it with a DataType enum value which indicates the type of data represented by the String.

Introduce InferenceString object to allow image inputs to be passed through inference code
Refactor EmbeddingsInput, EmbeddingRequestChunker and ChunkInferenceInput classes to handle InferenceString instead of String
Unwrap InferenceString prior to passing it into the existing Request classes used for embeddings to preserve existing behaviour
Update existing tests to handle InferenceString
Add additional test coverage for new behaviour

To support multimodal embedding, where inputs may be a mix of text and images, we need some way of tracking whether a given input is text or an image. The InferenceString object wraps the input String and associates it with a DataType enum value which indicates the type of data represented by the String. - Introduce InferenceString object to allow image inputs to be passed through inference code - Refactor EmbeddingsInput, EmbeddingRequestChunker and ChunkInferenceInput classes to handle InferenceString instead of String - Unwrap InferenceString prior to passing it into the existing Request classes used for embeddings to preserve existing behaviour - Update existing tests to handle InferenceString - Add additional test coverage for new behaviour

elasticsearchmachine · 2025-11-07T01:09:14Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner · 2025-11-07T19:46:36Z

server/src/main/java/org/elasticsearch/inference/InferenceString.java

+        return DataType.TEXT.equals(dataType);
+    }
+
+    public static List<String> toStringList(List<InferenceString> inferenceStrings) {


Should we filter out DataType.IMAGE_BASE64 items?

I don't think we should just filter out non-text inputs, because if any manage to make it into one of the two places we call this method, then there's a problem somewhere. Maybe an assert like in EmbeddingsInput.getTextInputs() just for safety? The two classes where this method is called (in ElasticsearchInternalService and SageMakerService) don't use EmbeddingsInput, which is why there's a slightly different flow for them.

jonathan-buttner · 2025-11-07T19:50:34Z

...e/src/main/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunker.java

            for (int chunkIndex = 0; chunkIndex < chunks.size(); chunkIndex++) {
                // If the number of chunks is larger than the maximum allowed value,
-                // scale the indices to [0, MAX) with similar number of original
+                // scale the indices to [0, MAX] with similar number of original


Just to confirm, the change here is because MAX is inclusive right?

Oh, my mistake, I thought this was just a typo rather than indicating inclusive/exclusive. I learned something new today!

DonalEvans added >refactoring :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Nov 7, 2025

jonathan-buttner reviewed Nov 7, 2025

View reviewed changes

jonathan-buttner approved these changes Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Introduce InferenceString wrapper object #137711

[ML] Introduce InferenceString wrapper object #137711

DonalEvans commented Nov 7, 2025

Uh oh!

elasticsearchmachine commented Nov 7, 2025

Uh oh!

jonathan-buttner Nov 7, 2025

Uh oh!

DonalEvans Nov 7, 2025

Uh oh!

jonathan-buttner Nov 7, 2025

Uh oh!

DonalEvans Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ML] Introduce InferenceString wrapper object #137711

Are you sure you want to change the base?

[ML] Introduce InferenceString wrapper object #137711

Conversation

DonalEvans commented Nov 7, 2025

Uh oh!

elasticsearchmachine commented Nov 7, 2025

Uh oh!

jonathan-buttner Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

DonalEvans Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

DonalEvans Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants