- Notifications
You must be signed in to change notification settings - Fork 25.5k
ESQL: Avoid unintended attribute removal #127563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Avoid unintended attribute removal #127563
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
buildkite test this |
I'm really confused. I can get a result of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kanoshiou I really appreciate you provided this PR and you tried to understand where the fix needs to go.
Unfortunately, this is a trivial fix that doesn't quite catch the true reason for the initial failure.
The essence of the failure is the fact that lookup join
can override the type
user-provided field with its own type
field. The type
from eval
is of type integer
while the type
that comes from the lookup join
is of type keyword
. Because the list of field names didn't contain type
the Analyzer couldn't learn about that type
that exists in message_types_lookup
and the only thing it knew about its type is that it comes from eval
.
lookup join
(like enrich
and maybe inlinestats
) are special in the sense that they add some columns to the result, so some of the rules we have in EsqlSession.fieldNames
need to be adjusted to account for these special characteristics. Because lookup join
can add a field of the same name as type
coming from an eval
before the said lookup join command, we have a special check in fieldNames
that forbids the removal of those Alias
es.
Your proposed solution does this:
- places the input references of grok in a new variable, instead of the
referenceBuilder
that was used before - the special check above for lookup join is not applied to
referenceBuilder
forgrok
because the input reference is not there anymore - you add to
referenceBuilder
the content of the grok-special variable
This is a bypass of the lookup join
special removal check.
Instead, let the grok
references in referenceBuilder
where they are and don't let the special lookup join
check remove it. The issue is with the lookup join
check, not with grok
, so adjust that one. I have the feeling that check is applicable in other cases, only that we didn't catch those yet.
@astefan Thank you for your review and for taking the time to provide such a detailed explanation of the issue's root cause. Your insights have been incredibly helpful, and I truly appreciate your guidance on my code, it has given me a much deeper understanding. I’ve learned a lot from your feedback! I have reverted the bypass and implemented a modification that I believe addresses the issue. If there are any areas that need improvement or further adjustments, please let me know, I’d be happy to refine it further. |
@kanoshiou I have a simpler suggestion for the change in EsqlSession. Instead of the original code in
we can do:
It will slightly change some of the existent tests in Please, try it out and let me know your thoughts. |
...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java Show resolved Hide resolved
...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java Show resolved Hide resolved
buildkite test this |
Thank you for your review and suggestions, @astefan! I believe the statement |
Indeed, I forgot about this part. UnresolvedRelation needs to also be accounted for in
Yes
Yes, it's a fine balance. Being too aggressive in minimizing the list of fields could lead to some of the issues we've had recently. Having more fields in that list, but still avoiding to ask for all the fields ( |
…moval # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java Outdated Show resolved Hide resolved
buildkite test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…moval # Conflicts: # x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java
@kanoshiou please, look into the conflicts. Most likely, they are coming from the freshly merged PR. |
@astefan conflicts resolved. |
buildkite test this |
Thank you @astefan! |
--------- Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
* ESQL: Avoid unintended attribute removal (#127563) --------- Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com> * Checkstyle * Checkstyle again * Slightly change the test because 8.19 has fewer indices in the index pattern used (9.x also has host_inventory index). --------- Co-authored-by: kanoshiou <73424326+kanoshiou@users.noreply.github.com>
Because we aim to minimize the number of attributes required from
field_caps
at pre-analysis time, a removal check helps eliminate fields defined by users (such aseval
). However, commands likelookup join
andenrich
can introduce fields with the same names as user-defined fields. To accommodate this, we ensure that aliases are not removed from the attributes list when a subsequentJOIN
occurs.For the query:
The plan
grok type "%{WORD:b}
passes the removal check and initiates alias removal down the tree. However, the removal process should halt atlookup join
due to the reason mentioned above. The issue arises because the original code allows progression toeval type = 1
, which inadvertently removes thetype
field. As a result, the fieldtype
ends up being of typeinteger
defined byeval type = 1
, rather than preserving thekeyword
type fieldtype
obtained frommessage_types_lookup
.Closes #127468