-
Notifications
You must be signed in to change notification settings - Fork 25.6k
POC-2 - Auto prefiltering for semantic text queries #137739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
POC-2 - Auto prefiltering for semantic text queries #137739
Conversation
This POC implements automatic prefiltering for `semantic_text` queries. We achieve this by adding an `AutoPrefilteringScope` object to the `SearchExecutionContext`. When we convert a query to a lucene query, queries may push prefilters to the `AutoPrefilteringScope`. At that stage, queries have already been rewritten. Semantic queries using `text_embedding` inference endpoints are rewritten to knn vector queries that are auto-prefiltering enabled. Then, when an auto-prefiltering enabled knn vector query is converted to its lucene equivalent, we fetch the prefilters from the `SearchExecutionContext` and we apply them to the knn vector query - which supports prefiltering already.
a338f0f to
6d68f0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice approach, simple and effective 🙌
I'm not entirely sure how this will integrate with ES|QL, but I see it playing out in one of two ways:
- If ES|QL builds a boolean query out of all its clauses, this would integrate automatically
- If ES|QL builds a query with a different structure, we may need to add a
filters()method toSemanticQueryBuilderandMatchQueryBuilder. This would be a method for internal use only to programmatically set pre-filters when necessary. We could then pass the pre-filters directly to the generated kNN query inSemanticTextFieldMapper#semanticQuery.
|
Interesting approach - instead of updating the ES|QL will be able to use this directly I think - bool queries are created for combining together ANDs. I think I'll be able to remove the prefiltering rules that are applied to KNN function there, as I believe this approach should take care of them. That's pretty neat and less complexity on ES|QL 💯 On the nit side - I wonder if we need a queue for storing the information - IIUC, we don't Very excited about this - it's simple and powerful! |
We do pop. I made |
benwtrent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction!
Some comments
| private List<QueryBuilder> getAllApplicableFilters(SearchExecutionContext context) { | ||
| List<QueryBuilder> applicableFilters = new ArrayList<>(filterQueries); | ||
| if (isAutoPrefiltering) { | ||
| applicableFilters.addAll(context.autoPrefilteringScope().getPrefilters()); | ||
| } | ||
| return applicableFilters; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not mix and match user provided filters and auto-prefilters. It might lead to surprising behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, as a user, I would expect the filters I provide particularly are the only ones applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which, I think we can safely assert that filters are empty here, right? Since the only thing that applies isAutoPrefiltering is semantic text :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. As things are today, it is impossible to have auto and explicit prefilters at the same time. I'll safe-guard against mixing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but unfortunately we are programmatically adding prefilters in
Line 161 in fdeeded
| copy.addFilterQuery(new TermsQueryBuilder(IndexFieldMapper.NAME, indices)); |
Not sure how to enforce this. But we would have to mess up to apply auto prefiltering to a user knn query, as there is no way to enable auto prefiltering for a user knn query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but unfortunately we are programmatically adding prefilters in
🤔
This is messy, we need to ensure that users cannot do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, users cannot end up getting auto-prefilters on their bespoke knn queries. They cannot enable auto prefiltering, it is only programmatically enabled by semantic text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They cannot enable auto prefiltering, it is only programmatically enabled by semantic text.
My worry is that somebody in the future will try to enable it.
Can we add a test or something to signal this shouldn't be allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I can add a test where we have explicit knn query under a bool and check it doesn't get any extra prefilters. With some commentary too. If a PR popped that changes that test people would at least have to read the warning, etc. Sounds good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! Thats great! Just making sure we have an adequate fence
| } | ||
|
|
||
| private List<QueryBuilder> collectPrefilters() { | ||
| return Stream.of(mustClauses, mustNotClauses.stream().map(c -> QueryBuilders.boolQuery().mustNot(c)).toList(), filterClauses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mustNot thing is tricky, since we already need to iterate all of it again when pushing down stream, why not have this collectPrefilters have a QueryBuilder argument, then we bypass the self concern, and then this can simply return QueryBuilder which is a single BooleanQuery we pass down (as it will be a boolean query anyways ultimately once its finally used right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no self concern for must_not as we do not apply prefilters to those. I'll add assertions in addBooleanClauses to communicate that contract and keep things simple.
This POC implements automatic prefiltering for
semantic_textqueries.We achieve this by adding an
AutoPrefilteringScopeobject to theSearchExecutionContext. When we convert a query to a lucene query, queries may push prefilters to theAutoPrefilteringScope. At that stage, queries have already been rewritten. Semantic queries usingtext_embeddinginference endpoints are rewritten to knn vector queries that are auto-prefiltering enabled. Then, when an auto-prefiltering enabled knn vector query is converted to its lucene equivalent, we fetch the prefilters from theSearchExecutionContextand we apply them to the knn vector query - which supports prefiltering already.