Skip to content

Conversation

pmpailis
Copy link
Contributor

@pmpailis pmpailis commented Jan 15, 2025

This PR adds a new linear retriever to facilitate hybrid search, that would be able to linearly combine the results of other sub-retrievers and compute the final score of a document based on the weighted sum of each sub-components.

Each sub-component can specify the following elements:

  • retriever -> specifies how we will compute the top documents
  • normalizer -> specifies how we want to normalize the top documents for this retriever (so that we can ensure that all scores fall within the same range)
  • weight -> the weight for the normalized score if the final weighted sum computation

Pagination is similar to that of rrf's retriever, i.e. we compute the global rank_window_size docs and pagination is only available within these bounds.

So, working through an example, let's say that we perform a hybrid search query where:

  • we want to run a simple string query through a standard retriever, and normalize the scores to a [0, 1] range
  • we want to run knn search through the knn retriever, without normalizing the documents as well
  • compute the final score for the retriever as score = 1.5 * standard + 2.5 * knn

Sample syntax:

GET /retrievers_example/_search { "retriever": { "linear": { "retrievers": [ { "retriever": { "standard": { "query": { "simple_query_string": { "query": "artifical intelligence in medicine", "fields": [ "text" ] } } } }, "weight": 1.5, "normalizer": "minmax" }, { "retriever": { "knn": { "field": "vector", "query_vector": [ 0.23, 0.67, 0.89 ], "k": 3, "num_candidates": 5 } }, "weight": 2.5 } ], "rank_window_size": 10 } } } 
Copy link
Contributor

Documentation preview:

@pmpailis pmpailis added >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. :Search Relevance/Search Catch all for Search Relevance v8.18.0 labels Jan 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @pmpailis, I've created a changelog YAML for you.

@pmpailis pmpailis added the auto-backport Automatically create backport pull requests when merged label Jan 16, 2025
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better. I have a concern around testing:

Do we have a test that specifically exercises the path when the different retrievers return different doc IDs? (e.g. they match non-overlapping doc sets).

@pmpailis
Copy link
Contributor Author

Do we have a test that specifically exercises the path when the different retrievers return different doc IDs? (e.g. they match non-overlapping doc sets).

Added a test to account for this in ea1787f

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: :godmode:

@pmpailis pmpailis merged commit 375814d into elastic:main Jan 28, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 120222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0

7 participants