- Notifications
You must be signed in to change notification settings - Fork 25.5k
ESQL: LOOKUP JOIN with expressions #134098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: LOOKUP JOIN with expressions #134098
Conversation
d311b5e
to d253a55
Compare 00fc5a4
to 0aa5e85
Compare Hi @julian-elastic, I've created a changelog YAML for you. |
81d1b9a
to 77e8426
Compare Hi @julian-elastic, I've created a changelog YAML for you. |
495490a
to 81dac7f
Compare Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @julian-elastic, I've created a changelog YAML for you. |
843e1f6
to b8f9ba6
Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done a first round, will do another one on Monday.
I think this is quite close already.
I'm a little paranoid of our behavior in case that the left and right hand side of the join are not the same type. But we can add tests for that in a follow-up. We have LookupJoinTypesIT
which is quite exhaustive but only covers 1 field matches without expressions.
.../main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownJoinPastProject.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/MatchConfig.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/JoinConfig.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java Outdated Show resolved Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java Outdated Show resolved Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java Outdated Show resolved Hide resolved
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
a1fb7cc
to 64d105c
Compare bce6400
to daa69e7
Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we're at the point where we should make some kind of abstraction inside the Expression
tree of things that can be in a QueryList
. I think it's fine to do it as a follow up to this, but I think we've arrived there. Or we'll arrive there when we do the next thing.
One thing that'd be nice if this abstraction could do is run the equivalent of this code:
if (TranslationAware.Translatable.YES.equals(comparison.translatable(lucenePushdownPredicates))) { return comparison.asQuery(lucenePushdownPredicates, TranslatorHandler.TRANSLATOR_HANDLER)
one time and return a closure that accepts Page, position
or something and returns a Lucene Query
. That's pretty close to what ExpressionEvaluator
does and it'd be nice to mimic it. So we can do "up front" stuff and "per row" stuff. But that can wait until after this PR.
Looks good to me. I left a few questions and statements about not knowing what's up in places, but that's the beauty of having two reviewers.
...heap-attack/src/javaRestTest/java/org/elasticsearch/xpack/esql/heap_attack/HeapAttackIT.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/LookupJoinExec.java Outdated Show resolved Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/HashJoinExec.java Show resolved Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join-expression.csv-spec Show resolved Hide resolved
I completely agree we are doing too many things per row now. I will work on this as a separate performance optimization after this is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only minor comments and test suggestions - please proceed at your own discretion and But before merging, please slap
test-release
on this and check that we don't introduce any release test failures :) Also, let's add a tiny test that confirms this that expression joins are not yet enabled on release builds.
I like the use of parameterization in the test changes!
I think there is only a tiny bit of follow-up work:
- Cleaning up the UsingJoinType
- Ensuring we don't rely on
FieldAttribute#name
on accident (see below) - Make sure the validation for expression joins is done in the usual place (see below)
import org.elasticsearch.xpack.esql.expression.function.UnresolvedFunction; | ||
import org.elasticsearch.xpack.esql.expression.function.UnsupportedAttribute; | ||
import org.elasticsearch.xpack.esql.expression.function.aggregate.Absent; | ||
import org.elasticsearch.xpack.esql.expression.function.aggregate.AbsentOverTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for follow-up:
We normally don't throw in the analyzer. We either leave things unresolved so the verifier picks it up and constructs an error message, or we add a failure in the verifier - with the advantage that we correctly start the error message with the location of the problem in the query.
In case of joins, we got away with leaving the join field unresolved - or even stuffing a custom error message in the join config (Only LEFT join is supported with USING
), which is a hack; sorry about that.
I think the most consistent way to verify that all join expressions are supported is having Join
implement PostAnalysisPlanVerificationAware
(watching out not to spill the verification to InlineJoin
as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will consider this for a future PR
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.FromAggregateMetricDouble; | ||
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToAggregateMetricDouble; | ||
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToDateNanos; | ||
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ToDenseVector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't reach here because this is caught during parsing, right?
In any case, this is fine, but may also be a verification that we perform in a PostAnalysisPlanVerificationAware
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will consider moving it there in the next PR
...plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/BinaryComparisonQueryList.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java Outdated Show resolved Hide resolved
...ava/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFiltersTests.java Show resolved Hide resolved
...ava/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineFiltersTests.java Show resolved Hide resolved
.../java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownJoinPastProjectTests.java Show resolved Hide resolved
.../java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownJoinPastProjectTests.java Show resolved Hide resolved
public Query doGetQuery(int position, int firstValueIndex, int valueCount) { | ||
Object value = blockValueReader.apply(firstValueIndex); | ||
// create a new comparison with the value from the block as a literal | ||
EsqlBinaryComparison comparison = binaryComparison.getFunctionType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the join on expression passed in for lookup join contains the correct attributes, corresponding with what comes from the right branch.
Agree!
That means right now they should be exactly the same as what is in the lookup index.
Yes! But what I'd like to be resilient to: What if we update the field attributes in the EsRelation
that represents the lookup index, and change the attribute names but leave the contained EsField
instance alone - so that a call to #fieldName
will return the same as before? Of course, propagating this change into the joinOnCondition
as well.
IMO the join should still execute just fine. If it doesn't, this would mean that we probably rely on the #name
output of the field attributes to build our Lucene queries. But we shouldn't do that, see the javadoc here. There is currently no invariant that guarantees that FieldAttribute#name
returns the same as FieldAttribute#fieldName
, but it's unfortunately easy to think that.
How about we add a small follow-up PR that messes with the lookup field names and shows that the resulting query lists are still valid?
6bb394d
to adb9290
Compare Hi @julian-elastic, I've created a changelog YAML for you. |
I reproduced the failing release tests locally on latest from main without any of my changes, so those failures are unrelated. |
Add support for LOOKUP JOIN with expressions. FROM index1 | LOOKUP JOIN lookup_index on left_field1 > right_field1 AND left_field2 <= right_field2 AND left_field3 == right_field1 --handle name conflict with rename FROM index1 | RENAME id as id_left | LOOKUP JOIN lookup_index on id_left >=id AND left_field2 <= right_field2 Add support for 1 or more join predicate expressions in the ON clause of the LOOKUP JOIN. If there is more than one predicate expression, the predicate expression must be connected with the AND operator. Each predicate is in the form field1 BINARY_OPERATOR field2. One of them should be from the left and the other from the right. BINARY_OPERATOR is one of ==, >,<, >=, <=, !=. The same field can be used multiple times in the join on condition. If the fields in the join on conditions have the same name, they should be renamed so that the fields in the join on condition have unique names. This commit may include content that was generated or assisted by Cursor, Gemini CLI and/or Github Copilot.
Add support for LOOKUP JOIN with expressions.
Add support for 1 or more join predicate expressions in the ON clause of the LOOKUP JOIN. If there is more than one predicate expression, the predicate expression must be connected with the AND operator. Each predicate is in the form field1 BINARY_OPERATOR field2. One of them should be from the left and the other from the right. BINARY_OPERATOR is one of ==, >,<, >=, <=, !=. The same field can be used multiple times in the join on condition. If the fields in the join on conditions have the same name, they should be renamed so that the fields in the join on condition have unique names.