-
Notifications
You must be signed in to change notification settings - Fork 175
Add semantic_text index_options examples for BBQ quantization #3854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ctindel
wants to merge
7
commits into
elastic:main
Choose a base branch
from
ctindel:add-semantic-text-index-options-examples
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+162
−4
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
bb1544f
Add semantic_text index_options examples for BBQ quantization
ctindel 754604e
Improve technical accuracy and completeness of index_options document…
ctindel dd7b107
Update solutions/search/semantic-search/semantic-search-semantic-text.md
ctindel 9c9edf5
Merge branch 'elastic:main' into add-semantic-text-index-options-exam…
ctindel 7bb6a2f
Address all PR review feedback from @kderusso
ctindel 8bb10f3
Added claude directories for .gitignore
ctindel 1a1a562
Update solutions/search/semantic-search/semantic-search-semantic-text.md
ctindel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -29,6 +29,10 @@ The mapping of the destination index - the index that contains the embeddings th | |||||
|
|
||||||
| You can run {{infer}} either using the [Elastic {{infer-cap}} Service](/explore-analyze/elastic-inference/eis.md) or on your own ML-nodes. The following examples show you both scenarios. | ||||||
|
|
||||||
| ::::{tip} | ||||||
| For production deployments with dense vector embeddings, consider optimizing storage and performance using [`index_options`](#semantic-text-index-options). This allows you to configure quantization strategies like BBQ (Better Binary Quantization) that can reduce memory usage by up to 32x. Note that new indices with 384 or more dimensions will default to BBQ HNSW automatically. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Semantic text aggressively defaults across the board, using the minimum of 64 dimensions.
Suggested change
|
||||||
| :::: | ||||||
|
|
||||||
| :::::::{tab-set} | ||||||
|
|
||||||
| ::::::{tab-item} Using EIS on Serverless | ||||||
|
|
@@ -107,10 +111,151 @@ PUT semantic-embeddings | |||||
|
|
||||||
| ::::::: | ||||||
|
|
||||||
| To try the ELSER model on the Elastic Inference Service, explicitly set the `inference_id` to `.elser-2-elastic`. For instructions, refer to [Using `semantic_text` with ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#using-elser-on-eis). | ||||||
| To try the ELSER model on the Elastic Inference Service, explicitly set the `inference_id` to `.elser-2-elastic`. For instructions, refer to [Using `semantic_text` with ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#using-elser-on-eis). | ||||||
|
|
||||||
| ### Optimizing vector storage with `index_options` [semantic-text-index-options] | ||||||
|
|
||||||
| When using `semantic_text` with dense vector embeddings (such as E5 or other text embedding models), you can optimize storage and search performance by configuring `index_options` on the underlying `dense_vector` field. This is particularly useful for large-scale deployments. | ||||||
|
|
||||||
| The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify [quantization strategies](https://www.elastic.co/blog/vector-search-elasticsearch-rationale) like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. Quantization compresses high-dimensional vectors into more efficient representations, enabling faster searches and lower memory consumption. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). | ||||||
|
|
||||||
| ::::{tip} | ||||||
| For most production use cases using `semantic_text` with dense vector embeddings from text models (like E5, OpenAI, or Cohere), BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. BBQ requires a minimum of 64 dimensions and works best with text embeddings (it may not perform well with other types like image embeddings). Choose from: | ||||||
| - `bbq_hnsw` - Best for most use cases (default for 384+ dimensions) | ||||||
| - `bbq_flat` - Simpler option for smaller datasets | ||||||
| - `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+) | ||||||
| :::: | ||||||
|
|
||||||
| Here's an example using `semantic_text` with a text embedding inference endpoint and BBQ quantization: | ||||||
|
|
||||||
| ```console | ||||||
| PUT semantic-embeddings-optimized | ||||||
| { | ||||||
| "mappings": { | ||||||
| "properties": { | ||||||
| "content": { | ||||||
| "type": "semantic_text", | ||||||
| "inference_id": ".multilingual-e5-small-elasticsearch", <1> | ||||||
| "index_options": { | ||||||
| "dense_vector": { | ||||||
| "type": "bbq_hnsw" <2> | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| 1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint that is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put). | ||||||
| 2. Use Better Binary Quantization with HNSW indexing for optimal memory efficiency. This setting applies to the underlying `dense_vector` field that stores the embeddings. | ||||||
|
|
||||||
| You can also use `bbq_flat` for simpler datasets where you need maximum accuracy at the expense of speed: | ||||||
|
|
||||||
| ```console | ||||||
| PUT semantic-embeddings-flat | ||||||
| { | ||||||
| "mappings": { | ||||||
| "properties": { | ||||||
| "content": { | ||||||
| "type": "semantic_text", | ||||||
| "inference_id": ".multilingual-e5-small-elasticsearch", | ||||||
| "index_options": { | ||||||
| "dense_vector": { | ||||||
| "type": "bbq_flat" <1> | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| 1. Use disk-optimized BBQ for simpler use cases with fewer vectors. This requires less compute resources during indexing. | ||||||
|
|
||||||
| For very large datasets where memory is constrained, use `bbq_disk` (DiskBBQ) to store vectors on disk: | ||||||
|
|
||||||
| ```console | ||||||
| PUT semantic-embeddings-disk | ||||||
| { | ||||||
| "mappings": { | ||||||
| "properties": { | ||||||
| "content": { | ||||||
| "type": "semantic_text", | ||||||
| "inference_id": ".multilingual-e5-small-elasticsearch", | ||||||
| "index_options": { | ||||||
| "dense_vector": { | ||||||
| "type": "bbq_disk" <1> | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| ```{applies_to} | ||||||
| stack: ga 9.2 | ||||||
| serverless: unavailable | ||||||
| ``` | ||||||
|
|
||||||
| 1. Use disk-optimized BBQ for simpler use cases with fewer vectors. This requires less compute resources during indexing. Available in Elasticsearch 9.2+, this option stores compressed vectors on disk, reducing RAM usage to as little as 100 MB while maintaining query latencies around 15ms. | ||||||
|
|
||||||
| Other quantization options include `int8_hnsw` (8-bit integer quantization) and `int4_hnsw` (4-bit integer quantization): | ||||||
|
|
||||||
| ```console | ||||||
| PUT semantic-embeddings-int8 | ||||||
| { | ||||||
| "mappings": { | ||||||
| "properties": { | ||||||
| "content": { | ||||||
| "type": "semantic_text", | ||||||
| "inference_id": ".multilingual-e5-small-elasticsearch", | ||||||
| "index_options": { | ||||||
| "dense_vector": { | ||||||
| "type": "int8_hnsw" <1> | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| 1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. For 4-bit quantization, use `"type": "int4_hnsw"` instead, which provides 8x memory reduction. For the full list of other available quantization options (including `int4_flat` and others), refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). | ||||||
|
|
||||||
| For HNSW-specific tuning parameters like `m` and `ef_construction`, you can include them in the `index_options`: | ||||||
|
|
||||||
| ```console | ||||||
| PUT semantic-embeddings-custom | ||||||
| { | ||||||
| "mappings": { | ||||||
| "properties": { | ||||||
| "content": { | ||||||
| "type": "semantic_text", | ||||||
| "inference_id": ".multilingual-e5-small-elasticsearch", | ||||||
| "index_options": { | ||||||
| "dense_vector": { | ||||||
| "type": "bbq_hnsw", | ||||||
| "m": 32, <1> | ||||||
| "ef_construction": 200 <2> | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| 1. Number of bidirectional links per node in the HNSW graph. Higher values improve recall but increase memory usage. Default is 16. | ||||||
| 2. Number of candidates considered during graph construction. Higher values improve index quality but slow down indexing. Default is 100. | ||||||
|
|
||||||
| ::::{note} | ||||||
| The `index_options` parameter is only applicable when using inference endpoints that produce dense vector embeddings (like E5, OpenAI embeddings, Cohere embeddings, etc.). It does not apply to sparse vector models like ELSER, which use a different internal representation. | ||||||
| :::: | ||||||
|
|
||||||
| ::::{note} | ||||||
| If you’re using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you’ll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data. | ||||||
| If you're using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you'll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data. | ||||||
|
|
||||||
| :::: | ||||||
|
|
||||||
|
|
||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little cautious of adding
.gitignoreentries in a docs update.