Skip to content

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Jul 10, 2025

KNN function returned less results than expected when using conjunctions:

FROM test METADATA _score WHERE knn(vector, [0, 1, 2], 10) AND primary == true SORT _score DESC

Before this change, knn retrieved the top k results, but then those would be post-filtered by the conjunction. That potentially retrieves in less than k results retrieved (see filters in knn).

This PR pushes down conjunctions as prefilters to KNN queries:

  • A new PushDownConjunctionsToKnnPrefilters logical optimizer rule examines filters, and pushes conjunctions to knn functions as prefilters. It works by identifying knn functions on one side of a conjunction, and pushing the other side of the conjunction(s) that are parent to the knn function to it as prefilters.
  • KNN function uses the prefilters when generating the QueryBuilder.
  • KNN checks that prefilters can be pushed down in order to decide whether it can be pushed down to Lucene.

This PR still doesn't handle the case when KNN cannot be pushed down and has prefilters - that will come as a follow up PR.

knn functions won't have knn queries as prefilters to avoid circular dependencies.

Whit this change, a query with:

WHERE ((knn(A) AND B) OR C) AND (knn(D) OR (E AND F))

will result in:

  • knn(A) will have B, E AND F as prefilters (other side of the conjunctions it's part of, minus knn(D))
  • knn(D) will have B OR C as prefilters (other side of the conjunctions it's part of, minus knn(A))
@carlosdelest carlosdelest added :Search Relevance/Vectors Vector search :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0 labels Jul 10, 2025
Copy link
Contributor

github-actions bot commented Jul 10, 2025

# TODO We need kNN prefiltering here so we get more candidates that pass the filter
from colors metadata _score
| where (knn(rgb_vector, [0,255,255], 140) or knn(rgb_vector, [128, 0, 255], 140)) and primary == true
| where (knn(rgb_vector, [0,255,255], 140) or knn(rgb_vector, [128, 0, 255], 10)) and primary == true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using prefilter means that we no longer need to retrieve all results, but can use k as inteneded as conjunction is pushed down as a prefilter now

@Override
public Query asQuery(LucenePushdownPredicates pushdownPredicates, TranslatorHandler handler) {
return queryBuilder != null ? new TranslationAwareExpressionQuery(source(), queryBuilder) : translate(handler);
return queryBuilder != null ? new TranslationAwareExpressionQuery(source(), queryBuilder) : translate(pushdownPredicates, handler);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added pushdownPredicates to translate method, as we need to evaluate the prefilters with them in order to decide

@carlosdelest carlosdelest changed the title ESQL - KNN function uses prefilters ESQL - KNN function uses prefilters when pushed down to Lucene Jul 10, 2025
@carlosdelest carlosdelest marked this pull request as ready for review July 10, 2025 14:38
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

this.field = field;
this.k = k;
this.options = options;
this.filterExpressions = filterExpressions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add both filterExpressions and queryBuilder to the nodeInfo method:

https://github.com/elastic/elasticsearch/blob/0b06d9f24dbac94f89bf696daad98041b93e7b1b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java#L292-294

my understanding is that if we don't add them, we risk losing both filterExpressions and queryBuilder during transforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch - added in b347bba


String query = """
from test
| where knn(dense_vector, [0, 1, 2], 10) and integer > 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we modify one of these tests such that we have 2 WHERE conditions, e.g. here:

| where knn(dense_vector, [0, 1, 2], 10) | integer > 10 

it would help validate that combining the filters is done before we push down the prefilters to the KNN functions

Copy link
Member Author

@carlosdelest carlosdelest Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done in a99e75f

*/
var query = """
from test
| where ((knn(dense_vector, [0, 1, 2], 10) or integer > 10) and ((keyword == "test") or knn(dense_vector, [4, 5, 6], 10)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get some tests where we use NOT.
for example:

((knn(dense_vector, [0, 1, 2], 10) or integer > 10) and NOT ((keyword == "test") or knn(dense_vector, [4, 5, 6], 10))) 
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added some in 4c6bb4d

carlosdelest and others added 8 commits July 11, 2025 16:20
…sql-knn-prefilter # Conflicts: #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
# Conflicts: #	x-pack/plugin/esql/qa/testFixtures/src/main/resources/knn-function.csv-spec
@carlosdelest
Copy link
Member Author

After discussing with @jimczi , I removed knn functions as filters to other knn functions to avoid circular dependencies in ac7c3bc.

@carlosdelest carlosdelest requested a review from ioanatia July 14, 2025 11:45
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @carlosdelest !

}

protected abstract Query translate(TranslatorHandler handler);
protected abstract Query translate(LucenePushdownPredicates pushdownPredicates, TranslatorHandler handler);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to have two signatures for this, for a smaller file change count?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more confusing. LucenePushdownPredicates are part of the translation process even if they were not used until now.

* Support knn function
*/
KNN_FUNCTION_V2(Build.current().isSnapshot()),
KNN_FUNCTION_V3(Build.current().isSnapshot()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nitpick - I wonder if we could create an alias that this could reference (so when we go to V4 we only need to change one variable when testing for test compatibility)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example of what it would look like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like KNN_FUNCTION_CURRENT = EsqlCapabilities.KNN_FUNCTION_V3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sue if that works with CSV tests 🤔 . I'd say let's keep this as it's an established pattern in ES|QL.

Maybe I'll give it a try for the next one and see what it looks like from a changes perspective 👍

# Conflicts: #	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/ExpressionWritables.java
@carlosdelest carlosdelest merged commit 6ffe27d into elastic:main Jul 16, 2025
33 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 16, 2025
…king * upstream/main: (91 commits) Mute org.elasticsearch.packaging.test.DockerTests test130JavaHasCorrectOwnership elastic#131369 Add exception logging when interrupted (elastic#131153) Mute org.elasticsearch.packaging.test.DockerTests test140CgroupOsStatsAreAvailable elastic#131372 Mute org.elasticsearch.packaging.test.DockerTests test070BindMountCustomPathConfAndJvmOptions elastic#131366 Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=ml/delete_expired_data/Test delete expired data with body parameters} elastic#131364 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testSimilarityBetweenConstantVectorAndField {functionName=v_cosine similarityFunction=COSINE} elastic#131363 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testDifferentDimensions {functionName=v_cosine similarityFunction=COSINE} elastic#131362 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testSimilarityBetweenConstantVectors {functionName=v_cosine similarityFunction=COSINE} elastic#131361 Check SCORE_FUNCTION capability in VerifierTests (elastic#131352) Replace deprecated routingTable table call in tests (elastic#131005) [DOCS] Remove misused applies_to tag (elastic#131349) Adj ivf postings list building (elastic#130843) [Transform] Read metadata from Project State (elastic#131205) Add note on o11y to architecture guide (elastic#131291) Upgrade AWS Java SDK to 2.31.78 (elastic#131050) Support Fields API in conditional ingest processors (elastic#121914) ESQL - KNN function uses prefilters when pushed down to Lucene (elastic#131004) Add docs for ES|QL query logs (elastic#131287) Simplify `expectedFinalRegisterValue` computation (elastic#131274) Mute org.elasticsearch.test.rest.yaml.RcsCcsCommonYamlTestSuiteIT test {p0=search/110_field_collapsing/field collapsing, inner_hits and maxConcurrentGroupRequests} elastic#131348 ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue :Search Relevance/Vectors Vector search Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

5 participants