- Notifications
You must be signed in to change notification settings - Fork 30
INTPYTHON-751 Make query generation omit $expr unless required #396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
529e0ff to a78f26b Compare 8b0c247 to 141f1cf Compare d11378a to 2c48d11 Compare 5827580 to 5b9fa93 Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left my first round of comments. I'm going to review the tests as a second phase of the PR.
| def as_mql_expr(self, compiler, connection): | ||
| lhs_mql = process_lhs(self, compiler, connection, as_path=False) | ||
| value = process_rhs(self, compiler, connection, as_path=False) | ||
| return {"$gte": [lhs_mql, value]} | ||
| | ||
| def as_mql_path(self, compiler, connection): | ||
| lhs_mql = process_lhs(self, compiler, connection, as_path=True) | ||
| value = process_rhs(self, compiler, connection, as_path=True) | ||
| return {lhs_mql: {"$gte": value}} | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a $gte query in a search.text lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To convert a score function into a filter I decided to express the following proposition: score_func(...) > 0.
| | ||
| | ||
| def valid_path_key_name(key_name): | ||
| return bool(re.fullmatch(r"[A-Za-z0-9_]+", key_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.mongodb.com/docs/manual/core/dot-dollar-considerations/
Values like hashtags are also valid for path names and don't require $expr. To my knowledge so long as it's not (.) or ($) it's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will adjust the expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there is some emoji or some non ascii character? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then how about we change the boolean to be something like this?
return not bool(re.search(r"[\$\.]")
| def as_mql_expr(self, compiler, connection): | ||
| columns, parent_field = self._get_target_path() | ||
| mql = parent_field.as_mql(compiler, connection) | ||
| for key in columns: | ||
| mql = {"$getField": {"input": mql, "field": key}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially out of scope:
https://github.com/mongodb/django-mongodb-backend/pull/392/files#diff-0a6ce30a131a00fa88086c4c4d0d6e6232845fd11ef2bc67891fdf92e10c3743R18-R45
Is it possible to still remove $getField in as_mql_expr or is it expected that routing to as_mql_expr for embedded model queries is because of needing a getField call?
django_mongodb_backend/lookups.py Outdated
| @property | ||
| def can_use_path(self): | ||
| simple_column = getattr(self.lhs, "is_simple_column", False) | ||
| constant_value = is_constant_value(self.rhs) | ||
| return simple_column and constant_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
| self.assertAggregateQuery( | ||
| query, | ||
| "model_fields__nullableintegerarraymodel", | ||
| [{"$match": {"field": {"$in": ([1], [2])}}}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does $in now expect a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it wasn’t a new convention or expectation. There was already a test that checks the RHS $in as a tuple, so I just followed that convention. It doesn’t affect the query behavior.
| "$match": { | ||
| "$expr": { | ||
| "$eq": [ | ||
| {"$getField": {"input": "$data", "field": "integer_"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of a value that could get rid of the getField.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could get rid of those getField if we use $data.integer_ but I thought it was out of scope for this refactor. This behavior is the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's out of scope!
| [ | ||
| { | ||
| "$match": { | ||
| "$expr": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious to the callback chain on this one since the null-check could actually be converted.
May best be an improvement added later
966cf73 to fdf7fda Compare aea2786 to 175563b Compare fe2cba2 to 6a5665c Compare e278c38 to 8d6dd8e Compare | | ||
| | ||
| def valid_path_key_name(key_name): | ||
| return bool(re.fullmatch(r"[A-Za-z0-9_]+", key_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then how about we change the boolean to be something like this?
return not bool(re.search(r"[\$\.]")
4e0fc8c to 781d610 Compare db6c081 to ab70090 Compare 054c045 to ab70090 Compare ab70090 to 24576a4 Compare 24576a4 to d0c8b6c Compare
Design Doc
In this PR a unified approach for generating MQL from Django expressions was implemented. The core idea is to centralize the control flow in a
base_expressionmethod, which decides whether the expression can be translated into a directfield: valuematch (index-friendly) or must fall back to$expr. This keeps the logic for wrapping and dispatching in one place, while each lookup/function only defines its own expression-building logic.This approach also allows mixing direct
field: valuematches with$exprclauses within the same$match. As a result, multiple$exprentries may coexist alongside index-optimized conditions, depending on the shape of the query.Most lookups now follow this pattern by simply implementing
as_mql_expr(and optionallyas_mql_pathwhen a match-based translation is possible). Only a few special cases likeCol,Funcoperators (except theKeyTransform) , and many more, override the base behavior directly. This structure also leaves room for future optimizations (e.g. constant folding) without having to change the overall flow.Additionally, since MongoDB 6 does not allow nesting
$exprinside another$expr, the flow inbase_expressionensures that such cases are flattened. In practice, expressions are generated without redundant wrapping, so the final MQL never contains$exprwithin$expr.