Skip to content

Conversation

@Jibola
Copy link
Contributor

@Jibola Jibola commented Aug 21, 2025

Summary

This PR introduces an abstraction to convert some of our simple $expr queries to $match queries without $expr in them. If the $expr query cannot be converted to a $match query, it will leave it as a $expr. This behavior is nested. Currently this is limited to $eq and $in operations for now, but in the future can be extended to other $expr conversions.

This only affects the querying done in self.match_mql as the querying for lookups, groupings, or aggregation pipelines are out of scope for this pr.

Changes in this PR

  • Introduce the QueryOptimizer abstraction (which can be changed; it doesn't store state so it does not need to remain an object)
  • Introduced ExpressionConverters which house the logic for converting queries from $expr usage to their $match equivalents.
  • Added several tests to confirm the behavior of the $eq, $in, $and, $or in drilling down of expression conversion.

Test Plan

  • Manual testing through python manage.py shell See an example query below.
  • Added test cases. Some old tests will fail and need to be fixed

Screenshots (grabbed text)

# Before
>>> Author.objects.filter(author_city__in=["London"])
{
    "$match": {
        "$expr": {
            "$and": [
                "$in": ["$author_city", "London"]
            ]
        }
    }
}

# After 
>>> Author.objects.filter(author_city__in=["London"])
{
    "$match": {
        "$and": {
            "$in": ["$author_city", "London"]
        }
    }
}

Callouts

  • This is still a proof of concept. Like said above, QueryOptimizer should be removed and left as its functions if it is not going to retain state. Initialization is unnecessary.
  • This will not work on Embedded Models until a$getField converter is made.
  • This PR only assumes self.match_mql resolves to a $expr (which is true) future work should include keeping any query that doesn't use self.match_mql.

@Jibola Jibola changed the title PoC Expression Conversion Abstraction [INTPYTHON-TODO]: PoC Expression Conversion Abstraction Aug 25, 2025
@NoahStapp NoahStapp changed the title [INTPYTHON-TODO]: PoC Expression Conversion Abstraction INTPYTHON-736 - Convert simple $expr queries to $match queries Aug 27, 2025
@NoahStapp
Copy link
Contributor

NoahStapp commented Aug 29, 2025

The remaining test failure test_changelist_view_list_editable_changed_objects_uses_filter is fixed by mongodb-forks/django#17.

@NoahStapp NoahStapp marked this pull request as ready for review September 4, 2025 15:47
@@ -0,0 +1,129 @@
"""Expression To Match Converters"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow a file with no imports is a rare gem.

return False
if isinstance(value, list | tuple | set):
return all(cls.is_simple_value(v) for v in value)
# TODO: Expand functionality to support `$getField` conversion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO for this PR or do we need a new tracking ticket?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a new tracking ticket, but we can table that until we determine if our new $expr generation refactor will handle this case.



class _EqExpressionConverter(_BaseExpressionConverter):
"""Convert $eq operation to a $match compatible query."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this classes it would be helpful to include small examples of what the transformation before+after looks like. Is that possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add those.

Comment on lines 31 to 34
def test_non_dict_expression(self):
expr = ["$status", "active"]
self.assertNotOptimizable(expr)

def test_empty_dict_expression(self):
expr = {}
self.assertNotOptimizable(expr)

def test_non_convertible(self):
expr = {"$gt": ["$price", 100]}
self.assertNotOptimizable(expr)

def _test_conversion_various_types(self, conversion_test):
for _type, val in self.CONVERTIBLE_TYPES.items():
with self.subTest(_type=_type, val=val):
conversion_test(val)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests shouldn't appear in a test case since they'll be run redundantly on each subclass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean tests shouldn't appear in a test case parent class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least for Django tests "TestCase" means a class that's inherited from (SimpleTestCase, TestCase, LiveServerTestCase, etc.).

from django_mongodb_backend.query_conversion.expression_converters import convert_expression


class TestBaseExpressionConversionCase(SimpleTestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class TestBaseExpressionConversionCase(SimpleTestCase):
class ExpressionConversionTestCase(SimpleTestCase):

@aclark4life
Copy link
Collaborator

I'd feel slightly better about this PR if the code was in the existing query_utils or if we anticipate future growth, make the query module a package.

@NoahStapp
Copy link
Contributor

I'd feel slightly better about this PR if the code was in the existing query_utils or if we anticipate future growth, make the query module a package.

Is there a strong reason to move it into query_utils? Adding a new folder we can cleanly remove once the generation refactor is in looks cleaner to me.

"$expr": {
"$and": [
{"$eq": ["$status", "active"]},
{"$gt": ["$price", 100]}, # Not optimizable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? "$gt" is valid in a find() query so it should be valid in a $match as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good point of discussion: we can add additional operators ($gt, $gte, $lt, $lte are all simple) but each is additional code + tests we need to add. If we expect this to be a short-term solution we replace, is it worth expanding the scope to include those operators?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for those four operators.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment answers that question, at least in part. Leaving aside the code layout, I'd be inclined to treat this PR like an improvement I have no specific expectations about and would try to fix as much as I could in it. This is the "framework approach" we talked about in office hours when folks said that a framework for these optimizations was not the goal, but I think you could treat it as such if the trade-off between disposable features and your time and energy seems reasonable at the start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The work to add them was minimal, but I'd push back against adding any others that will require more complex code changes. Covering the basic operators we do here should provide a significant speedup for most users until we can rewrite our query generation entirely.

@aclark4life
Copy link
Collaborator

I'd feel slightly better about this PR if the code was in the existing query_utils or if we anticipate future growth, make the query module a package.

Is there a strong reason to move it into query_utils? Adding a new folder we can cleanly remove once the generation refactor is in looks cleaner to me.

Until we hear some feedback from @WaVEV or others about the proposed refactor, I would take @Jibola 's PoC at face value. We know it implements significant performance improvements and we know he stated's that the QueryOptimizer class was an abstraction for convenience in developing the PoC.

As such, I would take the functions in that class and move them to query_utils (assuming that's not a heavy lift) and proceed as if we don't know when these converters will be removed, if ever.

Also even if the refactor timeline is short, the need to resolve the performance issue is shorter and I think we should proceed as best we can with what we know now and exclude things we don't know yet from that decision making process. To me, that means implementing the PoC the same way we'd implement any other feature, without the expectation that we're going to be removing it in the next release.

That said, I don't feel that strongly about it. I'm probably going to remain skeptical about the refactor's ability to improve performance in a similar way, but I look forward to seeing it happen!

@timgraham
Copy link
Collaborator

I'm with Noah for now. It's better to keep this code separate than to mix it up with other things in query_utils. I might lightly restructure it when I take a deeper look later.

return None


class _GtExpressionConverter(_BaseExpressionConverter):
Copy link
Member

@ShaneHarvey ShaneHarvey Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes are all nearly identical. Could we refactor them with a common base class like this:

class _GtExpressionConverter(BinaryExpression):
    operator = "$gt"
class _GtExpressionConverter(BinaryExpression):
    operator = "$gte"
class _LtExpressionConverter(BinaryExpression):
    operator = "$lt"
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added $eq as well with a tiny branch.


class FKLookupConditionPushdownTests(TestCase):
def test_filter_on_local_and_related_fields(self):
self.maxDiff = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxDiff is a class attribute and I believe mutating it in one test modifiest for all subsequent tests in the class.

Either we could set it on each class level, add it to MongoTestCaseMixin, or omit it and only add it when debugging tests locally.

Comment on lines 54 to 62
"""Convert $eq operation to a $match compatible query.
For example::
"$expr": {
{"$eq": ["$status", "active"]}
}
is converted to::
{"status": "active"}
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example could be on the base class so we don't need to repeat it on every class with only the operator replaced.

Returns:
List of optimized match conditions
"""
expr_query = deepcopy(expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of Jib's initial commit, but I'm not sure why copying would be needed. I spot checked some tests and didn't see any failures. Could try a CI run without it?

if isinstance(value, (list, tuple, set)): # noqa: UP038
return all(cls.is_simple_value(v) for v in value)
# TODO: Expand functionality to support `$getField` conversion
return not isinstance(value, dict) or value is None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think this could be rewritten as:

Suggested change
return not isinstance(value, dict) or value is None
return not isinstance(value, dict)

not is instance(value, dict) implies value isn't None. and if value is a dict isn't None 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check for None at the start of the method since isinstance() is comparatively expensive.

@timgraham timgraham changed the title INTPYTHON-736 - Convert simple $expr queries to $match queries INTPYTHON-736 Convert simple $expr queries to $match queries Sep 7, 2025
if isinstance(combined_conditions, list):
optimized_conditions = []
for condition in combined_conditions:
if isinstance(condition, dict) and len(condition) == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if some conditions pass the if and others don't

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isn't possible 🤔 . The way that the queries are made any literal like True or False are handled by a dict. Is there any condition/operator that needs more than one key?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emanuel, I'm not sure exactly what you meant here. Feel free to open a follow up PR.

Copy link
Collaborator

@WaVEV WaVEV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

"$gt": _GtExpressionConverter,
"$gte": _GteExpressionConverter,
"$lt": _LtExpressionConverter,
"$lte": _LteExpressionConverter,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could covers not with $nor.

@timgraham timgraham force-pushed the expr-conversion-poc branch 2 times, most recently from 0c27833 to b99aba3 Compare September 8, 2025 11:41
Co-authored-by: Noah Stapp <noah.stapp@mongodb.com>
@timgraham timgraham merged commit 94eefad into main Sep 8, 2025
19 checks passed
@timgraham timgraham deleted the expr-conversion-poc branch September 8, 2025 12:21
timgraham added a commit to mongodb-forks/django that referenced this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants