- Notifications
You must be signed in to change notification settings - Fork 25.7k
Esql/lookup join grammar #116515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Esql/lookup join grammar #116515
Conversation
rename existing lookup command to look_up (to avoid clashes)
| Pinging @elastic/es-analytical-engine (Team:Analytics) |
bpintea left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor questions.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/JoinTypes.java Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/JoinTypes.java Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/MapperUtils.java Show resolved Hide resolved
Remove USING syntax - use ON <field> for now Fix incorrect serialization
fae8354 to 7f14d61 Compare
alex-spies left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a first pass with some questions and thoughts.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup.csv-spec-ignored Outdated Show resolved Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec Show resolved Hide resolved
...c/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/UnaryScalarFunction.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java Show resolved Hide resolved
| } | ||
| for (Attribute a : right) { | ||
| // override the existing entry in place | ||
| nameToAttribute.compute(a.name(), (name, existing) -> a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess we should make a nullable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO plus entry in the meta ticket - making the attributes nullable changes their equality which has side effects in the plan.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/JoinTypes.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/CsvTests.java Outdated Show resolved Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/plan/logical/JoinSerializationTests.java Outdated Show resolved Hide resolved
c66ccce to 9b51cef Compare Move writable declarations outside the core classes to avoid errors (such as subClass.getNamedWritable()) and centralize them in a top package class for better management.
alex-spies left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, all done now. I think we're not too far away, but I think there's a couple things that needs to be done before merging - and a bunch of things that can be done after.
Here's what I think we should do:
- I believe the refactoring of the named writable entries should become its own PR. Currently, our mixed cluster test suite is muted, which means that making such a large scale refactor to our serialization is quite risky.
- It looks to me like our current planning for lookup joins expects the match fields from the left and right hand side's outputs to have the same name ids. I believe this is wrong and will lead to inconsistent planning; see my comment on the changes to
Analyzer.java. The fields in the lookup index are different from fields that we join on. Actually, even if we join the lookup index with itself, we should probably still assign different name ids to the respective field attrs. to avoid bugs. There's a comment inLookupJoinExecthat claims we need qualifiers to make this work, but I don't believe so - name ids should disambiguate the attributes just fine. LookupJoinandLookupJoinExechave some inconsistencies and hacks which I think we best address to avoid them from lingering and producing secondary issues. E.g. the fact thatLookupJoinExecdoesn't have any right hand side attributes in itsreferences(), the fact thatLookupJoin.info()is not quite right etc. See the individual comments.- Test coverage (can be addressed in a follow up, but then we need to put it in the meta issue as well):
- The test for lookup joins on the data nodes doesn't look right to me. But we can consider lookup joins on data nodes a follow-up.
- In general, the test coverage in terms of csv tests for lookup joins even on the coordinator is too small. We can address this in a follow-up, though, but at least should add it to the meta issue.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java Show resolved Hide resolved
| // Do not just add the JoinConfig as a whole - this would prevent correctly registering the | ||
| // expressions and references. | ||
| return NodeInfo.create(this, LookupJoin::new, left(), right(), config(), output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing a TODO? The JoinConfig is currently still added as a whole, which means that methods like QueryPlan.forEachExpression will not work as expected.
Maybe this can just be fixed by copying the approach from Join? There, I solved this by having a constructor that takes the components of the JoinConfig instead of the whole thing, which is also the longest constructor to satisfy the info() contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was no longer relevant and was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still have to pass the individual components of the join config to the node info instead of the join config object as a whole.
If we want to, say, rewrite the plan and need to replace an attribute in the join config's left hand side, just calling forEachExpression on the join will be wrong because it ignores the join config.
We had issues because of this with Join before; let's fix it so we don't run into trouble later.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java Outdated Show resolved Hide resolved
| if (right instanceof FragmentExec fragment | ||
| && fragment.fragment() instanceof EsRelation relation | ||
| && relation.indexMode() == IndexMode.LOOKUP) { | ||
| return new LookupJoinExec( | ||
| join.source(), | ||
| left, | ||
| right, | ||
| config.matchFields(), | ||
| config.leftFields(), | ||
| config.rightFields(), | ||
| join.output() | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I understand that our plan for the future is to have more general joins where we might push e.g. a filter into the right hand side and then won't perform a LookupJoinExec but a more general join here. But as that's currently not in place, I believe this approach is complicating the planning for no gain, because we could also map LookupJoin directly to LookupJoinExec without first transforming it into a general Join. I believe this increases the risk of mistakes due to premature abstraction and loses information during logical optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree right now the abstraction is simplistic but the point is to decouple the command from the execution.
LookupJoin is syntactic sugar for special left join with special output - it gets folded into a regular join since I'd like to avoid serializing it (and dealing with versioning) and to have the compute engine pick up the pattern not the class.
That means any syntax that gets converted into a left join could be executed against a lookup join. Enrich for example could be attached onto this one but the dependency between the logical and physical happens through the pattern NOT the Lookup class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair - but we aren't at the point yet where an Enrich, or anything else for that matter, is rewritten into a pattern that is translated into a LookupJoinExec. Currently, we only have the Lookup logical plan node, which doesn't need the additional round trip via surrogate + pattern matching.
I really prefer deferring such generalizations until they have actual use cases - because getting the generalization wrong early on leads to pain :/ and it makes the code more complex than it currently needs.
But that's supposed to be a nit anyway :)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/LookupJoinExec.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/LookupJoinExec.java Show resolved Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec Show resolved Hide resolved
| Hi @costin, I've created a changelog YAML for you. |
| Thanks @alex-spies for all the comments, see my nested replies. This PR doesn't aim to fully address the lookup join, rather add the basic layers that can later be distributed and worked on as a follow-up from #116208 in which you already commented (thanks for that!). |
| Pinging @elastic/kibana-esql (ES|QL-ui) |
alex-spies left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya - approving this to unblock the PR. I think it's fine to address most things in follow ups, as you suggested @costin.
Before merging, I think it'd be great to
- merge #117029 first so that the serialization refactoring can be easily backported,
- carefully consider if this should be backported as well (not doing so can seriously hamper our ability to auto-backport as this PR touches many places, even when factoring out the serialization changes), and
- fix
LookupJoin.info()as commented before.
0c30cb9 to ac9427e Compare First PR for adding LOOKUP JOIN in ESQL. Introduces grammar and wires the main building blocks to execute a query ; follow-ups are required (see elastic#116208 for more details). Co-authored-by: Nik Everett <nik9000@users.noreply.github.com> (cherry picked from commit bc785f5)
First PR for adding LOOKUP JOIN in ESQL. Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see elastic#116208 for more details). Co-authored-by: Nik Everett <nik9000@users.noreply.github.com>
First PR for adding LOOKUP JOIN in ESQL. Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see elastic#116208 for more details). Co-authored-by: Nik Everett <nik9000@users.noreply.github.com>
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see #116208 for more details).