-
Notifications
You must be signed in to change notification settings - Fork 30
INTPYTHON-736 Convert simple $expr queries to $match queries #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The remaining test failure |
| @@ -0,0 +1,129 @@ | |||
| """Expression To Match Converters""" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow a file with no imports is a rare gem.
django_mongodb_backend/query_conversion/expression_converters.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/query_conversion/expression_converters.py
Outdated
Show resolved
Hide resolved
| return False | ||
| if isinstance(value, list | tuple | set): | ||
| return all(cls.is_simple_value(v) for v in value) | ||
| # TODO: Expand functionality to support `$getField` conversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO for this PR or do we need a new tracking ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a new tracking ticket, but we can table that until we determine if our new $expr generation refactor will handle this case.
|
|
||
|
|
||
| class _EqExpressionConverter(_BaseExpressionConverter): | ||
| """Convert $eq operation to a $match compatible query.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this classes it would be helpful to include small examples of what the transformation before+after looks like. Is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add those.
| def test_non_dict_expression(self): | ||
| expr = ["$status", "active"] | ||
| self.assertNotOptimizable(expr) | ||
|
|
||
| def test_empty_dict_expression(self): | ||
| expr = {} | ||
| self.assertNotOptimizable(expr) | ||
|
|
||
| def test_non_convertible(self): | ||
| expr = {"$gt": ["$price", 100]} | ||
| self.assertNotOptimizable(expr) | ||
|
|
||
| def _test_conversion_various_types(self, conversion_test): | ||
| for _type, val in self.CONVERTIBLE_TYPES.items(): | ||
| with self.subTest(_type=_type, val=val): | ||
| conversion_test(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests shouldn't appear in a test case since they'll be run redundantly on each subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean tests shouldn't appear in a test case parent class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least for Django tests "TestCase" means a class that's inherited from (SimpleTestCase, TestCase, LiveServerTestCase, etc.).
| from django_mongodb_backend.query_conversion.expression_converters import convert_expression | ||
|
|
||
|
|
||
| class TestBaseExpressionConversionCase(SimpleTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class TestBaseExpressionConversionCase(SimpleTestCase): | |
| class ExpressionConversionTestCase(SimpleTestCase): |
|
I'd feel slightly better about this PR if the code was in the existing |
Is there a strong reason to move it into |
| "$expr": { | ||
| "$and": [ | ||
| {"$eq": ["$status", "active"]}, | ||
| {"$gt": ["$price", 100]}, # Not optimizable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? "$gt" is valid in a find() query so it should be valid in a $match as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a good point of discussion: we can add additional operators ($gt, $gte, $lt, $lte are all simple) but each is additional code + tests we need to add. If we expect this to be a short-term solution we replace, is it worth expanding the scope to include those operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for those four operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment answers that question, at least in part. Leaving aside the code layout, I'd be inclined to treat this PR like an improvement I have no specific expectations about and would try to fix as much as I could in it. This is the "framework approach" we talked about in office hours when folks said that a framework for these optimizations was not the goal, but I think you could treat it as such if the trade-off between disposable features and your time and energy seems reasonable at the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work to add them was minimal, but I'd push back against adding any others that will require more complex code changes. Covering the basic operators we do here should provide a significant speedup for most users until we can rewrite our query generation entirely.
Until we hear some feedback from @WaVEV or others about the proposed refactor, I would take @Jibola 's PoC at face value. We know it implements significant performance improvements and we know he stated's that the As such, I would take the functions in that class and move them to Also even if the refactor timeline is short, the need to resolve the performance issue is shorter and I think we should proceed as best we can with what we know now and exclude things we don't know yet from that decision making process. To me, that means implementing the PoC the same way we'd implement any other feature, without the expectation that we're going to be removing it in the next release. That said, I don't feel that strongly about it. I'm probably going to remain skeptical about the refactor's ability to improve performance in a similar way, but I look forward to seeing it happen! |
|
I'm with Noah for now. It's better to keep this code separate than to mix it up with other things in query_utils. I might lightly restructure it when I take a deeper look later. |
| return None | ||
|
|
||
|
|
||
| class _GtExpressionConverter(_BaseExpressionConverter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These classes are all nearly identical. Could we refactor them with a common base class like this:
class _GtExpressionConverter(BinaryExpression):
operator = "$gt"
class _GtExpressionConverter(BinaryExpression):
operator = "$gte"
class _LtExpressionConverter(BinaryExpression):
operator = "$lt"
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added $eq as well with a tiny branch.
tests/queries_/test_mql.py
Outdated
|
|
||
| class FKLookupConditionPushdownTests(TestCase): | ||
| def test_filter_on_local_and_related_fields(self): | ||
| self.maxDiff = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxDiff is a class attribute and I believe mutating it in one test modifiest for all subsequent tests in the class.
Either we could set it on each class level, add it to MongoTestCaseMixin, or omit it and only add it when debugging tests locally.
| """Convert $eq operation to a $match compatible query. | ||
| For example:: | ||
| "$expr": { | ||
| {"$eq": ["$status", "active"]} | ||
| } | ||
| is converted to:: | ||
| {"status": "active"} | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the example could be on the base class so we don't need to repeat it on every class with only the operator replaced.
| Returns: | ||
| List of optimized match conditions | ||
| """ | ||
| expr_query = deepcopy(expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was part of Jib's initial commit, but I'm not sure why copying would be needed. I spot checked some tests and didn't see any failures. Could try a CI run without it?
| if isinstance(value, (list, tuple, set)): # noqa: UP038 | ||
| return all(cls.is_simple_value(v) for v in value) | ||
| # TODO: Expand functionality to support `$getField` conversion | ||
| return not isinstance(value, dict) or value is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think this could be rewritten as:
| return not isinstance(value, dict) or value is None | |
| return not isinstance(value, dict) |
not is instance(value, dict) implies value isn't None. and if value is a dict isn't None 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check for None at the start of the method since isinstance() is comparatively expensive.
| if isinstance(combined_conditions, list): | ||
| optimized_conditions = [] | ||
| for condition in combined_conditions: | ||
| if isinstance(condition, dict) and len(condition) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if some conditions pass the if and others don't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isn't possible 🤔 . The way that the queries are made any literal like True or False are handled by a dict. Is there any condition/operator that needs more than one key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emanuel, I'm not sure exactly what you meant here. Feel free to open a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| "$gt": _GtExpressionConverter, | ||
| "$gte": _GteExpressionConverter, | ||
| "$lt": _LtExpressionConverter, | ||
| "$lte": _LteExpressionConverter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could covers not with $nor.
0c27833 to
b99aba3
Compare
Co-authored-by: Noah Stapp <noah.stapp@mongodb.com>
b99aba3 to
94eefad
Compare
Summary
This PR introduces an abstraction to convert some of our simple $expr queries to $match queries without $expr in them. If the $expr query cannot be converted to a $match query, it will leave it as a $expr. This behavior is nested. Currently this is limited to
$eqand$inoperations for now, but in the future can be extended to other$exprconversions.This only affects the querying done in
self.match_mqlas the querying for lookups, groupings, or aggregation pipelines are out of scope for this pr.Changes in this PR
QueryOptimizerabstraction (which can be changed; it doesn't store state so it does not need to remain an object)ExpressionConverterswhich house the logic for converting queries from$exprusage to their$matchequivalents.$eq,$in,$and,$orin drilling down of expression conversion.Test Plan
python manage.py shellSee an example query below.Screenshots (grabbed text)
Callouts
QueryOptimizershould be removed and left as its functions if it is not going to retain state. Initialization is unnecessary.$getFieldconverter is made.self.match_mqlresolves to a$expr(which is true) future work should include keeping any query that doesn't useself.match_mql.