From bb1544f11903bb88ba7899993be1a9f03eb3a810 Mon Sep 17 00:00:00 2001 From: Chad Tindel Date: Fri, 7 Nov 2025 17:42:56 +0000 Subject: [PATCH 1/6] Add semantic_text index_options examples for BBQ quantization MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses #3804 by adding comprehensive examples showing how to use index_options with semantic_text fields for dense vector quantization strategies. Changes: - Added new section "Optimizing vector storage with index_options" to semantic-search-semantic-text.md - Included 5 complete examples: bbq_hnsw, bbq_flat, bbq_disk (DiskBBQ), int8_hnsw, and custom HNSW tuning - Added cross-references from dense-vector.md and knn.md to semantic_text examples - All examples tested and verified on Elasticsearch 9.2 The examples demonstrate memory optimization strategies including: - bbq_hnsw: Up to 32x memory reduction (default for 384+ dimensions) - bbq_flat: BBQ without HNSW for simpler use cases - bbq_disk: Disk-based storage with minimal memory requirements (ES 9.2+) - int8_hnsw: 8-bit quantization for 4x memory reduction - Custom HNSW parameters: m and ef_construction tuning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../semantic-search-semantic-text.md | 149 +++++++++++++++++- solutions/search/vector/dense-vector.md | 4 + solutions/search/vector/knn.md | 6 +- 3 files changed, 156 insertions(+), 3 deletions(-) diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md index c7da8c2cbc..d303c50558 100644 --- a/solutions/search/semantic-search/semantic-search-semantic-text.md +++ b/solutions/search/semantic-search/semantic-search-semantic-text.md @@ -29,6 +29,10 @@ The mapping of the destination index - the index that contains the embeddings th You can run {{infer}} either using the [Elastic {{infer-cap}} Service](/explore-analyze/elastic-inference/eis.md) or on your own ML-nodes. The following examples show you both scenarios. +::::{tip} +For production deployments with dense vector embeddings, consider optimizing storage and performance using [`index_options`](#semantic-text-index-options). This allows you to configure quantization strategies like BBQ (Better Binary Quantization) that can reduce memory usage by up to 32x. +:::: + :::::::{tab-set} ::::::{tab-item} Using EIS on Serverless @@ -107,10 +111,151 @@ PUT semantic-embeddings ::::::: -To try the ELSER model on the Elastic Inference Service, explicitly set the `inference_id` to `.elser-2-elastic`. For instructions, refer to [Using `semantic_text` with ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#using-elser-on-eis). +To try the ELSER model on the Elastic Inference Service, explicitly set the `inference_id` to `.elser-2-elastic`. For instructions, refer to [Using `semantic_text` with ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#using-elser-on-eis). + +### Optimizing vector storage with `index_options` [semantic-text-index-options] + +When using `semantic_text` with dense vector embeddings (such as E5 or other text embedding models), you can optimize storage and search performance by configuring `index_options` on the underlying `dense_vector` field. This is particularly useful for large-scale deployments. + +The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify quantization strategies like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). + +::::{tip} +For most production use cases with dense vector embeddings, using BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. Choose from: +- `bbq_hnsw` - Best for most use cases (default for 384+ dimensions) +- `bbq_flat` - Simpler option for smaller datasets +- `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+) +:::: + +Here's an example using `semantic_text` with a text embedding inference endpoint and BBQ quantization: + +```console +PUT semantic-embeddings-optimized +{ + "mappings": { + "properties": { + "content": { + "type": "semantic_text", + "inference_id": "my-e5-model", <1> + "index_options": { + "dense_vector": { + "type": "bbq_hnsw" <2> + } + } + } + } + } +} +``` + +1. Reference to a text embedding inference endpoint (e.g., E5, OpenAI, or Cohere embeddings). You must create this endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put). +2. Use Better Binary Quantization with HNSW indexing for optimal memory efficiency. This setting applies to the underlying `dense_vector` field that stores the embeddings. + +You can also use `bbq_flat` for simpler datasets or when you don't need the HNSW graph: + +```console +PUT semantic-embeddings-flat +{ + "mappings": { + "properties": { + "content": { + "type": "semantic_text", + "inference_id": "my-e5-model", + "index_options": { + "dense_vector": { + "type": "bbq_flat" <1> + } + } + } + } + } +} +``` + +1. Use BBQ without HNSW for simpler use cases with fewer vectors. This requires less compute resources during indexing. + +For very large datasets where memory is constrained, use `bbq_disk` (DiskBBQ) to store vectors on disk: + +```console +PUT semantic-embeddings-disk +{ + "mappings": { + "properties": { + "content": { + "type": "semantic_text", + "inference_id": "my-e5-model", + "index_options": { + "dense_vector": { + "type": "bbq_disk" <1> + } + } + } + } + } +} +``` + +```{applies_to} +stack: ga 9.2 +serverless: unavailable +``` + +1. Use DiskBBQ for disk-based vector storage with minimal memory requirements. Available in Elasticsearch 9.2+. This option stores compressed vectors on disk, reducing RAM usage to as little as 100 MB while maintaining query latencies around 15ms. + +Other quantization options include `int8_hnsw` and `int4_hnsw`: + +```console +PUT semantic-embeddings-int8 +{ + "mappings": { + "properties": { + "content": { + "type": "semantic_text", + "inference_id": "my-e5-model", + "index_options": { + "dense_vector": { + "type": "int8_hnsw" <1> + } + } + } + } + } +} +``` + +1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. + +For HNSW-specific tuning parameters like `m` and `ef_construction`, you can include them in the `index_options`: + +```console +PUT semantic-embeddings-custom +{ + "mappings": { + "properties": { + "content": { + "type": "semantic_text", + "inference_id": "my-e5-model", + "index_options": { + "dense_vector": { + "type": "bbq_hnsw", + "m": 32, <1> + "ef_construction": 200 <2> + } + } + } + } + } +} +``` + +1. Number of bidirectional links per node in the HNSW graph. Higher values improve recall but increase memory usage. Default is 16. +2. Number of candidates considered during graph construction. Higher values improve index quality but slow down indexing. Default is 100. + +::::{note} +The `index_options` parameter is only applicable when using inference endpoints that produce dense vector embeddings (like E5, OpenAI embeddings, Cohere embeddings, etc.). It does not apply to sparse vector models like ELSER, which use a different internal representation. +:::: ::::{note} -If you’re using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you’ll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data. +If you're using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you'll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data. :::: diff --git a/solutions/search/vector/dense-vector.md b/solutions/search/vector/dense-vector.md index 75614442f9..25c89c0ebf 100644 --- a/solutions/search/vector/dense-vector.md +++ b/solutions/search/vector/dense-vector.md @@ -46,3 +46,7 @@ For more information about how the profile affects virtual compute unit (VCU) al Better Binary Quantization (BBQ) is an advanced vector quantization technique for `dense_vector` fields. It compresses embeddings into compact binary form, enabling faster similarity search and reducing memory usage. This improves both search relevance and cost efficiency, especially when used with HNSW (Hierarchical Navigable Small World). Learn more about how BBQ works, supported algorithms, and configuration examples in the [Better Binary Quantization (BBQ) documentation](https://www.elastic.co/docs/reference/elasticsearch/index-settings/bbq). + +::::{tip} +When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md), you can configure BBQ and other quantization options through the `model_settings.index_options` parameter. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples of using `bbq_hnsw`, `int8_hnsw`, and other quantization strategies with semantic text fields. +:::: diff --git a/solutions/search/vector/knn.md b/solutions/search/vector/knn.md index 6ed002ab1b..efd8986cdb 100644 --- a/solutions/search/vector/knn.md +++ b/solutions/search/vector/knn.md @@ -134,7 +134,11 @@ Support for approximate kNN search was added in version 8.0. Before 8.0, `dense_ For approximate kNN, {{es}} stores dense vector values per segment as an [HNSW graph](https://arxiv.org/abs/1603.09320). Building HNSW graphs is compute-intensive, so indexing vectors can take time; you may need to increase client request timeouts for index and bulk operations. The [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) covers indexing performance, sizing, and configuration trade-offs that affect search performance. -In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters: +In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters. + +::::{tip} +When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md) with dense vector embeddings, you can also configure `index_options` through the `model_settings` parameter. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples. +:::: ```console PUT image-index From 754604ec6562bc91ad5a243c45f0ee468ad7b655 Mon Sep 17 00:00:00 2001 From: Chad Tindel Date: Mon, 10 Nov 2025 16:50:34 +0000 Subject: [PATCH 2/6] Improve technical accuracy and completeness of index_options documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses feedback from issue #3804 by clarifying parameter references and expanding quantization strategy documentation. Changes: - Add explicit int4_hnsw documentation with 8x memory reduction guidance - Fix parameter reference: "model_settings.index_options" → "index_options" - Clarify that index_options is configured directly on the semantic_text field - Improve consistency across cross-references in dense-vector.md and knn.md These refinements ensure users have accurate information about configuring vector quantization strategies for semantic search. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../search/semantic-search/semantic-search-semantic-text.md | 4 ++-- solutions/search/vector/dense-vector.md | 2 +- solutions/search/vector/knn.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md index d303c50558..94f3304d1b 100644 --- a/solutions/search/semantic-search/semantic-search-semantic-text.md +++ b/solutions/search/semantic-search/semantic-search-semantic-text.md @@ -201,7 +201,7 @@ serverless: unavailable 1. Use DiskBBQ for disk-based vector storage with minimal memory requirements. Available in Elasticsearch 9.2+. This option stores compressed vectors on disk, reducing RAM usage to as little as 100 MB while maintaining query latencies around 15ms. -Other quantization options include `int8_hnsw` and `int4_hnsw`: +Other quantization options include `int8_hnsw` (8-bit integer quantization) and `int4_hnsw` (4-bit integer quantization): ```console PUT semantic-embeddings-int8 @@ -222,7 +222,7 @@ PUT semantic-embeddings-int8 } ``` -1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. +1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. For 4-bit quantization, use `"type": "int4_hnsw"` instead, which provides 8x memory reduction. For HNSW-specific tuning parameters like `m` and `ef_construction`, you can include them in the `index_options`: diff --git a/solutions/search/vector/dense-vector.md b/solutions/search/vector/dense-vector.md index 25c89c0ebf..f11a9cb74c 100644 --- a/solutions/search/vector/dense-vector.md +++ b/solutions/search/vector/dense-vector.md @@ -48,5 +48,5 @@ Better Binary Quantization (BBQ) is an advanced vector quantization technique fo Learn more about how BBQ works, supported algorithms, and configuration examples in the [Better Binary Quantization (BBQ) documentation](https://www.elastic.co/docs/reference/elasticsearch/index-settings/bbq). ::::{tip} -When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md), you can configure BBQ and other quantization options through the `model_settings.index_options` parameter. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples of using `bbq_hnsw`, `int8_hnsw`, and other quantization strategies with semantic text fields. +When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md), you can configure BBQ and other quantization options through the `index_options` parameter. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples of using `bbq_hnsw`, `int8_hnsw`, and other quantization strategies with semantic text fields. :::: diff --git a/solutions/search/vector/knn.md b/solutions/search/vector/knn.md index efd8986cdb..19cdee7f0d 100644 --- a/solutions/search/vector/knn.md +++ b/solutions/search/vector/knn.md @@ -137,7 +137,7 @@ For approximate kNN, {{es}} stores dense vector values per segment as an [HNSW g In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters. ::::{tip} -When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md) with dense vector embeddings, you can also configure `index_options` through the `model_settings` parameter. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples. +When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md) with dense vector embeddings, you can also configure `index_options` directly on the field. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples. :::: ```console From dd7b107bbb8f2b10a5edc60e46554a158f42e75c Mon Sep 17 00:00:00 2001 From: Chad Tindel Date: Mon, 10 Nov 2025 11:54:32 -0500 Subject: [PATCH 3/6] Update solutions/search/semantic-search/semantic-search-semantic-text.md Co-authored-by: Kathleen DeRusso --- .../search/semantic-search/semantic-search-semantic-text.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md index 94f3304d1b..e41b9fcb65 100644 --- a/solutions/search/semantic-search/semantic-search-semantic-text.md +++ b/solutions/search/semantic-search/semantic-search-semantic-text.md @@ -120,7 +120,7 @@ When using `semantic_text` with dense vector embeddings (such as E5 or other tex The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify quantization strategies like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). ::::{tip} -For most production use cases with dense vector embeddings, using BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. Choose from: +For most production use cases using `semantic_text` with dense vector embeddings, using BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. Choose from: - `bbq_hnsw` - Best for most use cases (default for 384+ dimensions) - `bbq_flat` - Simpler option for smaller datasets - `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+) From 7bb6a2f8b31fc23ad6330340cdca2424869ccc95 Mon Sep 17 00:00:00 2001 From: Chad Tindel Date: Mon, 10 Nov 2025 17:02:21 +0000 Subject: [PATCH 4/6] Address all PR review feedback from @kderusso MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implements comprehensive improvements based on code review: 1. Added BBQ default behavior notes (384+ dims default to BBQ HNSW) 2. Enhanced quantization explanation with blog link and clearer description 3. Qualified BBQ recommendations for text embeddings specifically 4. Added BBQ 64 dimensions minimum requirement 5. Updated all examples to use built-in E5 endpoint (.multilingual-e5-small-elasticsearch) 6. Clarified E5/ELSER automatic availability 7. Improved bbq_flat description (maximum accuracy at expense of speed) 8. Improved bbq_disk description (simpler use cases, fewer vectors) 9. Added reference to full list of quantization options (int4_flat, etc.) 10. Added default behavior note to dense-vector.md All changes ensure users have accurate, complete information about BBQ quantization strategies and their appropriate use cases. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../semantic-search-semantic-text.md | 24 +++++++++---------- solutions/search/vector/dense-vector.md | 2 ++ 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md index e41b9fcb65..4dd39951a0 100644 --- a/solutions/search/semantic-search/semantic-search-semantic-text.md +++ b/solutions/search/semantic-search/semantic-search-semantic-text.md @@ -30,7 +30,7 @@ The mapping of the destination index - the index that contains the embeddings th You can run {{infer}} either using the [Elastic {{infer-cap}} Service](/explore-analyze/elastic-inference/eis.md) or on your own ML-nodes. The following examples show you both scenarios. ::::{tip} -For production deployments with dense vector embeddings, consider optimizing storage and performance using [`index_options`](#semantic-text-index-options). This allows you to configure quantization strategies like BBQ (Better Binary Quantization) that can reduce memory usage by up to 32x. +For production deployments with dense vector embeddings, consider optimizing storage and performance using [`index_options`](#semantic-text-index-options). This allows you to configure quantization strategies like BBQ (Better Binary Quantization) that can reduce memory usage by up to 32x. Note that new indices with 384 or more dimensions will default to BBQ HNSW automatically. :::: :::::::{tab-set} @@ -117,10 +117,10 @@ To try the ELSER model on the Elastic Inference Service, explicitly set the `inf When using `semantic_text` with dense vector embeddings (such as E5 or other text embedding models), you can optimize storage and search performance by configuring `index_options` on the underlying `dense_vector` field. This is particularly useful for large-scale deployments. -The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify quantization strategies like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). +The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify [quantization strategies](https://www.elastic.co/blog/vector-search-elasticsearch-rationale) like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. Quantization compresses high-dimensional vectors into more efficient representations, enabling faster searches and lower memory consumption. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). ::::{tip} -For most production use cases using `semantic_text` with dense vector embeddings, using BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. Choose from: +For most production use cases using `semantic_text` with dense vector embeddings from text models (like E5, OpenAI, or Cohere), BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. BBQ requires a minimum of 64 dimensions and works best with text embeddings (it may not perform well with other types like image embeddings). Choose from: - `bbq_hnsw` - Best for most use cases (default for 384+ dimensions) - `bbq_flat` - Simpler option for smaller datasets - `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+) @@ -135,7 +135,7 @@ PUT semantic-embeddings-optimized "properties": { "content": { "type": "semantic_text", - "inference_id": "my-e5-model", <1> + "inference_id": ".multilingual-e5-small-elasticsearch", <1> "index_options": { "dense_vector": { "type": "bbq_hnsw" <2> @@ -147,10 +147,10 @@ PUT semantic-embeddings-optimized } ``` -1. Reference to a text embedding inference endpoint (e.g., E5, OpenAI, or Cohere embeddings). You must create this endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put). +1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint that is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put). 2. Use Better Binary Quantization with HNSW indexing for optimal memory efficiency. This setting applies to the underlying `dense_vector` field that stores the embeddings. -You can also use `bbq_flat` for simpler datasets or when you don't need the HNSW graph: +You can also use `bbq_flat` for simpler datasets where you need maximum accuracy at the expense of speed: ```console PUT semantic-embeddings-flat @@ -159,7 +159,7 @@ PUT semantic-embeddings-flat "properties": { "content": { "type": "semantic_text", - "inference_id": "my-e5-model", + "inference_id": ".multilingual-e5-small-elasticsearch", "index_options": { "dense_vector": { "type": "bbq_flat" <1> @@ -182,7 +182,7 @@ PUT semantic-embeddings-disk "properties": { "content": { "type": "semantic_text", - "inference_id": "my-e5-model", + "inference_id": ".multilingual-e5-small-elasticsearch", "index_options": { "dense_vector": { "type": "bbq_disk" <1> @@ -199,7 +199,7 @@ stack: ga 9.2 serverless: unavailable ``` -1. Use DiskBBQ for disk-based vector storage with minimal memory requirements. Available in Elasticsearch 9.2+. This option stores compressed vectors on disk, reducing RAM usage to as little as 100 MB while maintaining query latencies around 15ms. +1. Use disk-optimized BBQ for simpler use cases with fewer vectors. This requires less compute resources during indexing. Available in Elasticsearch 9.2+, this option stores compressed vectors on disk, reducing RAM usage to as little as 100 MB while maintaining query latencies around 15ms. Other quantization options include `int8_hnsw` (8-bit integer quantization) and `int4_hnsw` (4-bit integer quantization): @@ -210,7 +210,7 @@ PUT semantic-embeddings-int8 "properties": { "content": { "type": "semantic_text", - "inference_id": "my-e5-model", + "inference_id": ".multilingual-e5-small-elasticsearch", "index_options": { "dense_vector": { "type": "int8_hnsw" <1> @@ -222,7 +222,7 @@ PUT semantic-embeddings-int8 } ``` -1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. For 4-bit quantization, use `"type": "int4_hnsw"` instead, which provides 8x memory reduction. +1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. For 4-bit quantization, use `"type": "int4_hnsw"` instead, which provides 8x memory reduction. For the full list of other available quantization options (including `int4_flat` and others), refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). For HNSW-specific tuning parameters like `m` and `ef_construction`, you can include them in the `index_options`: @@ -233,7 +233,7 @@ PUT semantic-embeddings-custom "properties": { "content": { "type": "semantic_text", - "inference_id": "my-e5-model", + "inference_id": ".multilingual-e5-small-elasticsearch", "index_options": { "dense_vector": { "type": "bbq_hnsw", diff --git a/solutions/search/vector/dense-vector.md b/solutions/search/vector/dense-vector.md index f11a9cb74c..c97c73798c 100644 --- a/solutions/search/vector/dense-vector.md +++ b/solutions/search/vector/dense-vector.md @@ -45,6 +45,8 @@ For more information about how the profile affects virtual compute unit (VCU) al Better Binary Quantization (BBQ) is an advanced vector quantization technique for `dense_vector` fields. It compresses embeddings into compact binary form, enabling faster similarity search and reducing memory usage. This improves both search relevance and cost efficiency, especially when used with HNSW (Hierarchical Navigable Small World). +New indices with 384 or more dimensions will default to BBQ HNSW automatically for optimal performance and memory efficiency. + Learn more about how BBQ works, supported algorithms, and configuration examples in the [Better Binary Quantization (BBQ) documentation](https://www.elastic.co/docs/reference/elasticsearch/index-settings/bbq). ::::{tip} From 8bb10f3a788a7032773c16f492bd5a69917486fc Mon Sep 17 00:00:00 2001 From: Chad Tindel Date: Mon, 10 Nov 2025 17:07:38 +0000 Subject: [PATCH 5/6] Added claude directories for .gitignore --- .gitignore | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 011dd3465d..2576bc1e2a 100644 --- a/.gitignore +++ b/.gitignore @@ -12,4 +12,7 @@ AGENTS.md .github/instructions/**.instructions.md CLAUDE.md GEMINI.md -.cursor \ No newline at end of file +.cursor +.claude +.claude-flow +.hive-mind From 1a1a5621aa36bf3483f2356cdc4082b65bded737 Mon Sep 17 00:00:00 2001 From: Chad Tindel Date: Mon, 10 Nov 2025 12:29:49 -0500 Subject: [PATCH 6/6] Update solutions/search/semantic-search/semantic-search-semantic-text.md Co-authored-by: Kathleen DeRusso --- .../search/semantic-search/semantic-search-semantic-text.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md index 4dd39951a0..7a9925267e 100644 --- a/solutions/search/semantic-search/semantic-search-semantic-text.md +++ b/solutions/search/semantic-search/semantic-search-semantic-text.md @@ -171,7 +171,7 @@ PUT semantic-embeddings-flat } ``` -1. Use BBQ without HNSW for simpler use cases with fewer vectors. This requires less compute resources during indexing. +1. Use disk-optimized BBQ for simpler use cases with fewer vectors. This requires less compute resources during indexing. For very large datasets where memory is constrained, use `bbq_disk` (DiskBBQ) to store vectors on disk: