- Notifications
You must be signed in to change notification settings - Fork 25.5k
ESQL - KNN function uses prefilters when pushed down to Lucene #131004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… to be generated with prefilters
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/querydsl/query/KnnQuery.java
🔍 Preview links for changed docs |
# TODO We need kNN prefiltering here so we get more candidates that pass the filter | ||
from colors metadata _score | ||
| where (knn(rgb_vector, [0,255,255], 140) or knn(rgb_vector, [128, 0, 255], 140)) and primary == true | ||
| where (knn(rgb_vector, [0,255,255], 140) or knn(rgb_vector, [128, 0, 255], 10)) and primary == true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using prefilter means that we no longer need to retrieve all results, but can use k as inteneded as conjunction is pushed down as a prefilter now
@Override | ||
public Query asQuery(LucenePushdownPredicates pushdownPredicates, TranslatorHandler handler) { | ||
return queryBuilder != null ? new TranslationAwareExpressionQuery(source(), queryBuilder) : translate(handler); | ||
return queryBuilder != null ? new TranslationAwareExpressionQuery(source(), queryBuilder) : translate(pushdownPredicates, handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added pushdownPredicates to translate method, as we need to evaluate the prefilters with them in order to decide
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java Show resolved Hide resolved
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
...gin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/KnnFunctionIT.java Show resolved Hide resolved
this.field = field; | ||
this.k = k; | ||
this.options = options; | ||
this.filterExpressions = filterExpressions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add both filterExpressions
and queryBuilder
to the nodeInfo
method:
my understanding is that if we don't add them, we risk losing both filterExpressions
and queryBuilder
during transforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch - added in b347bba
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/querydsl/query/KnnQuery.java Outdated Show resolved Hide resolved
…sql-knn-prefilter
...rg/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownConjunctionsToKnnPrefilters.java Outdated Show resolved Hide resolved
...rg/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownConjunctionsToKnnPrefilters.java Show resolved Hide resolved
| ||
String query = """ | ||
from test | ||
| where knn(dense_vector, [0, 1, 2], 10) and integer > 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we modify one of these tests such that we have 2 WHERE conditions, e.g. here:
| where knn(dense_vector, [0, 1, 2], 10) | integer > 10
it would help validate that combining the filters is done before we push down the prefilters to the KNN functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done in a99e75f
*/ | ||
var query = """ | ||
from test | ||
| where ((knn(dense_vector, [0, 1, 2], 10) or integer > 10) and ((keyword == "test") or knn(dense_vector, [4, 5, 6], 10))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get some tests where we use NOT.
for example:
((knn(dense_vector, [0, 1, 2], 10) or integer > 10) and NOT ((keyword == "test") or knn(dense_vector, [4, 5, 6], 10)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added some in 4c6bb4d
…sql-knn-prefilter # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/knn-function.csv-spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @carlosdelest !
} | ||
| ||
protected abstract Query translate(TranslatorHandler handler); | ||
protected abstract Query translate(LucenePushdownPredicates pushdownPredicates, TranslatorHandler handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easier to have two signatures for this, for a smaller file change count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more confusing. LucenePushdownPredicates
are part of the translation process even if they were not used until now.
* Support knn function | ||
*/ | ||
KNN_FUNCTION_V2(Build.current().isSnapshot()), | ||
KNN_FUNCTION_V3(Build.current().isSnapshot()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nitpick - I wonder if we could create an alias that this could reference (so when we go to V4 we only need to change one variable when testing for test compatibility)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example of what it would look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like KNN_FUNCTION_CURRENT = EsqlCapabilities.KNN_FUNCTION_V3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sue if that works with CSV tests 🤔 . I'd say let's keep this as it's an established pattern in ES|QL.
Maybe I'll give it a try for the next one and see what it looks like from a changes perspective 👍
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/ExpressionWritables.java
…sql-knn-prefilter
…king * upstream/main: (91 commits) Mute org.elasticsearch.packaging.test.DockerTests test130JavaHasCorrectOwnership elastic#131369 Add exception logging when interrupted (elastic#131153) Mute org.elasticsearch.packaging.test.DockerTests test140CgroupOsStatsAreAvailable elastic#131372 Mute org.elasticsearch.packaging.test.DockerTests test070BindMountCustomPathConfAndJvmOptions elastic#131366 Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=ml/delete_expired_data/Test delete expired data with body parameters} elastic#131364 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testSimilarityBetweenConstantVectorAndField {functionName=v_cosine similarityFunction=COSINE} elastic#131363 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testDifferentDimensions {functionName=v_cosine similarityFunction=COSINE} elastic#131362 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testSimilarityBetweenConstantVectors {functionName=v_cosine similarityFunction=COSINE} elastic#131361 Check SCORE_FUNCTION capability in VerifierTests (elastic#131352) Replace deprecated routingTable table call in tests (elastic#131005) [DOCS] Remove misused applies_to tag (elastic#131349) Adj ivf postings list building (elastic#130843) [Transform] Read metadata from Project State (elastic#131205) Add note on o11y to architecture guide (elastic#131291) Upgrade AWS Java SDK to 2.31.78 (elastic#131050) Support Fields API in conditional ingest processors (elastic#121914) ESQL - KNN function uses prefilters when pushed down to Lucene (elastic#131004) Add docs for ES|QL query logs (elastic#131287) Simplify `expectedFinalRegisterValue` computation (elastic#131274) Mute org.elasticsearch.test.rest.yaml.RcsCcsCommonYamlTestSuiteIT test {p0=search/110_field_collapsing/field collapsing, inner_hits and maxConcurrentGroupRequests} elastic#131348 ...
KNN function returned less results than expected when using conjunctions:
Before this change, knn retrieved the top k results, but then those would be post-filtered by the conjunction. That potentially retrieves in less than k results retrieved (see filters in knn).
This PR pushes down conjunctions as prefilters to KNN queries:
PushDownConjunctionsToKnnPrefilters
logical optimizer rule examines filters, and pushes conjunctions to knn functions as prefilters. It works by identifying knn functions on one side of a conjunction, and pushing the other side of the conjunction(s) that are parent to the knn function to it as prefilters.QueryBuilder
.This PR still doesn't handle the case when KNN cannot be pushed down and has prefilters - that will come as a follow up PR.
knn functions won't have knn queries as prefilters to avoid circular dependencies.
Whit this change, a query with:
WHERE ((knn(A) AND B) OR C) AND (knn(D) OR (E AND F))
will result in: