Skip to content

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Apr 10, 2025

If you do:

| WHERE text_field == "cat" 

we can't push to the text field because it's search index is for individual words. But most text fields have a .keyword sub field and we can query it's index. EXCEPT! It's normal for these fields to have ignore_above in their mapping. In that case we don't push to the field. Very sad.

With this change we can push down ==, but only when the right hand side is shorter than the ignore_above.

This has pretty much infinite speed gain. An example using a million documents:

Before: "took" : 391, After: "took" : 4, 

But this is going from totally un-indexed linear scans to totally indexed. You can make the "Before" number as high as you want by loading more data.

If you do: ``` | WHERE text_field == "cat" ``` we can't push to the text field because it's search index is for individual words. But most text fields have a `.keyword` sub field and we *can* query it's index. EXCEPT! It's normal for these fields to have `ignore_above` in their mapping. In that case we don't push to the field. Very sad. With this change we can push down `==`, but only when the right hand side is shorter than the `ignore_above`. This has pretty much infinite speed gain. An example using a million documents: ``` Before: "took" : 391, After: "took" : 4, ``` But this is going from totally un-indexed linear scans to totally indexed. You can make the "Before" number as high as you want by loading more data.
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@nik9000
Copy link
Member Author

nik9000 commented Apr 14, 2025

I don't believe this works for != but I'll open a followup that should handle that.

@nik9000 nik9000 marked this pull request as ready for review April 14, 2025 17:41
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

if (query instanceof SingleValueQuery) {
// Already wrapped
return query;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super proud of this. I kind of thing we should remove this and have folks wrap the query they build themselves. But not in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists so Equals can have a different behavior - it checks the value count of the synthetic source delegate.....

Wait. What if we remove one? Oh no.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I've added a fix for this. I'll push a javadoc explaining it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 600257a.

@nik9000
Copy link
Member Author

nik9000 commented Apr 14, 2025

I think this is ready for a real review!

@nik9000
Copy link
Member Author

nik9000 commented Apr 18, 2025

@luigidellaquila could you have a look at this one too?

@nik9000 nik9000 requested a review from luigidellaquila April 18, 2025 18:03
@nik9000
Copy link
Member Author

nik9000 commented Apr 21, 2025

I owe @luigidellaquila a test for pushing to lowercase. It's almost done. Just running a local test run one last time.

I've added this test.

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

* </p>
* <p>
* You may be asking "how would the first {@code text_field.raw:foo} query work if the
* value we're searching for is very long? In that case we never use this query at all.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we never use this query at all

Nit: I wonder if this should be a bigger warning at the top of the javdoc. I could imagine somebody (of us) trying to use this for something, and adding a bug because of it 👀 (Not perfect anyway)

// end::rlikeEscapingTripleQuotes-result[]
;

mvStringEquals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test for an literal over ignored_above chars + MV?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Literals don't have ignore_above.

@timestamp:date ,message:text
2023-10-23T13:55:01.543Z,[Connected to 10.1.0.1, Banana]
2023-10-23T13:55:01.544Z,Connected to 10.1.0.1
2023-10-23T13:55:01.545Z,[Connected to 10.1.0.1, More than one hundred characters long so it isn't indexed by the sub keyword field with ignore_above:100]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding also a single-value over ignore_above? So we have all the cases here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I should do that.


@Override
public TransportVersion getMinimalSupportedVersion() {
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not serialized because it's always translated in the local node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! I'll leave a comment.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It looks to me that LucenePushdownPredicates.DEFAULT is used always - how about using that instance directly in code instead of passing it around through TranslatorAware interface?

@nik9000
Copy link
Member Author

nik9000 commented Apr 21, 2025

The serverless failure looks real. Digging into that.

@nik9000
Copy link
Member Author

nik9000 commented Apr 21, 2025 via email

@nik9000
Copy link
Member Author

nik9000 commented Apr 21, 2025 via email

Copy link
Member Author

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the fix for the serverless issue: #127146

@nik9000 nik9000 enabled auto-merge (squash) April 22, 2025 18:24
@nik9000 nik9000 merged commit b527e4b into elastic:main Apr 22, 2025
16 of 17 checks passed
@nik9000
Copy link
Member Author

nik9000 commented Apr 22, 2025

While looking to extend this to != I've discovered a bug where this PR as it stands changes the behavior of != so I'll back it out. I'll re-add this when I have a solution to both. Back out PR incoming.

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 22, 2025
The PR elastic#126641 has a bug with `!=`.
nik9000 added a commit that referenced this pull request Apr 23, 2025
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 19, 2025
If you do: ``` | WHERE text_field == "cat" ``` we can't push to the text field because it's search index is for individual words. But most text fields have a `.keyword` sub field and we *can* query it's index. EXCEPT! It's normal for these fields to have `ignore_above` in their mapping. In that case we don't push to the field. Very sad. With this change we can push down `==`, but only when the right hand side is shorter than the `ignore_above`. This has pretty much infinite speed gain. An example using a million documents: ``` Before: "took" : 391, After: "took" : 4, ``` But this is going from totally un-indexed linear scans to totally indexed. You can make the "Before" number as high as you want by loading more data.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 19, 2025
@nik9000
Copy link
Member Author

nik9000 commented May 19, 2025

Backporting with #128156

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0

6 participants