From 845bf71877580c7fe8220f219be3407baa8c4ea6 Mon Sep 17 00:00:00 2001 From: kosabogi Date: Fri, 7 Nov 2025 12:18:40 +0100 Subject: [PATCH 1/3] Documents separator groups for recursive chunking strategy --- .../elastic-inference/inference-api.md | 66 +++++++++++++++++-- 1 file changed, 60 insertions(+), 6 deletions(-) diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md index f8ce5450b9..2366313e13 100644 --- a/explore-analyze/elastic-inference/inference-api.md +++ b/explore-analyze/elastic-inference/inference-api.md @@ -161,11 +161,64 @@ PUT _inference/sparse_embedding/word_chunks stack: ga 9.1 ``` -The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting. +The `recursive` strategy splits the input text based on a configurable list of separator patterns, such as paragraph boundaries or Markdown structural elements like headings and horizontal rules. The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to [sentence-level splitting](#sentence). -##### Markdown separator group +You can configure the `recursive` strategy using either: +- [Predefined separator groups](#separator-groups): [`plaintext`](#plaintext) or [`markdown`](#markdown) +- [Custom separators](#custom-separators): Define your own regular expression patterns -The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk. +##### Predefined separator groups [separator-groups] + +Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) for simple line-structured text without markup, and [`markdown`](#markdown) for Markdown-formatted content. + +###### `plaintext` + +The `plaintext` separator group splits text at paragraph boundaries, first attempting to split on double newlines (paragraph breaks), then falling back to single newlines when chunks are still too large. + +:::{dropdown} Regular expression patterns for the `plaintext` separator group + +1. `(? Date: Wed, 12 Nov 2025 12:19:59 +0100 Subject: [PATCH 2/3] Update explore-analyze/elastic-inference/inference-api.md Co-authored-by: Benjamin Ironside Goldstein <91905639+benironside@users.noreply.github.com> --- explore-analyze/elastic-inference/inference-api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md index 2366313e13..44f941bfc6 100644 --- a/explore-analyze/elastic-inference/inference-api.md +++ b/explore-analyze/elastic-inference/inference-api.md @@ -169,7 +169,7 @@ You can configure the `recursive` strategy using either: ##### Predefined separator groups [separator-groups] -Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) for simple line-structured text without markup, and [`markdown`](#markdown) for Markdown-formatted content. +Predefined separator groups provide optimized patterns for common text formats: [`plaintext`](#plaintext) works for simple line-structured text without markup, and [`markdown`](#markdown) works for Markdown-formatted content. ###### `plaintext` From 17dddf0b6071242dbd3c62d18877619c138b7b9b Mon Sep 17 00:00:00 2001 From: kosabogi <105062005+kosabogi@users.noreply.github.com> Date: Wed, 12 Nov 2025 12:20:44 +0100 Subject: [PATCH 3/3] Update explore-analyze/elastic-inference/inference-api.md Co-authored-by: Benjamin Ironside Goldstein <91905639+benironside@users.noreply.github.com> --- explore-analyze/elastic-inference/inference-api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md index 44f941bfc6..f696055207 100644 --- a/explore-analyze/elastic-inference/inference-api.md +++ b/explore-analyze/elastic-inference/inference-api.md @@ -164,7 +164,7 @@ stack: ga 9.1 The `recursive` strategy splits the input text based on a configurable list of separator patterns, such as paragraph boundaries or Markdown structural elements like headings and horizontal rules. The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to [sentence-level splitting](#sentence). You can configure the `recursive` strategy using either: -- [Predefined separator groups](#separator-groups): [`plaintext`](#plaintext) or [`markdown`](#markdown) +- [Predefined separator groups](#separator-groups): [`Plaintext`](#plaintext) or [`markdown`](#markdown) - [Custom separators](#custom-separators): Define your own regular expression patterns ##### Predefined separator groups [separator-groups]