Rank feature query
Boosts the relevance score of documents based on the numeric value of a rank_feature
or rank_features
field.
The rank_feature
query is typically used in the should
clause of a bool
query so its relevance scores are added to other scores from the bool
query.
With positive_score_impact
set to false
for a rank_feature
or rank_features
field, we recommend that every document that participates in a query has a value for this field. Otherwise, if a rank_feature
query is used in the should clause, it doesn’t add anything to a score of a document with a missing value, but adds some boost for a document containing a feature. This is contrary to what we want – as we consider these features negative, we want to rank documents containing them lower than documents missing them.
Unlike the function_score
query or other ways to change relevance scores, the rank_feature
query efficiently skips non-competitive hits when the track_total_hits
parameter is not true
. This can dramatically improve query speed.
To calculate relevance scores based on rank feature fields, the rank_feature
query supports the following mathematical functions:
If you don’t know where to start, we recommend using the saturation
function. If no function is provided, the rank_feature
query uses the saturation
function by default.
To use the rank_feature
query, your index must include a rank_feature
or rank_features
field mapping. To see how you can set up an index for the rank_feature
query, try the following example.
Create a test
index with the following field mappings:
pagerank
, arank_feature
field which measures the importance of a websiteurl_length
, arank_feature
field which contains the length of the website’s URL. For this example, a long URL correlates negatively to relevance, indicated by apositive_score_impact
value offalse
.topics
, arank_features
field which contains a list of topics and a measure of how well each document is connected to this topic
PUT /test
{ "mappings": { "properties": { "pagerank": { "type": "rank_feature" }, "url_length": { "type": "rank_feature", "positive_score_impact": false }, "topics": { "type": "rank_features" } } } }
Index several documents to the test
index.
PUT /test/_doc/1?refresh
{ "url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics", "content": "Rio 2016", "pagerank": 50.3, "url_length": 42, "topics": { "sports": 50, "brazil": 30 } } PUT /test/_doc/2?refresh { "url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix", "content": "Formula One motor race held on 13 November 2016", "pagerank": 50.3, "url_length": 47, "topics": { "sports": 35, "formula one": 65, "brazil": 20 } } PUT /test/_doc/3?refresh { "url": "https://en.wikipedia.org/wiki/Deadpool_(film)", "content": "Deadpool is a 2016 American superhero film", "pagerank": 50.3, "url_length": 37, "topics": { "movies": 60, "super hero": 65 } }
The following query searches for 2016
and boosts relevance scores based on pagerank
, url_length
, and the sports
topic.
GET /test/_search
{ "query": { "bool": { "must": [ { "match": { "content": "2016" } } ], "should": [ { "rank_feature": { "field": "pagerank" } }, { "rank_feature": { "field": "url_length", "boost": 0.1 } }, { "rank_feature": { "field": "topics.sports", "boost": 0.4 } } ] } } }
field
- (Required, string)
rank_feature
orrank_features
field used to boost relevance scores. boost
- (Optional, float) Floating point number used to decrease or increase relevance scores. Defaults to
1.0
.
Boost values are relative to the default value of 1.0
. A boost value between 0
and 1.0
decreases the relevance score. A value greater than 1.0
increases the relevance score.
saturation
- (Optional, function object) Saturation function used to boost relevance scores based on the value of the rank feature
field
. If no function is provided, therank_feature
query defaults to thesaturation
function. See Saturation for more information.
Only one function saturation
, log
, sigmoid
or linear
can be provided.
log
- (Optional, function object) Logarithmic function used to boost relevance scores based on the value of the rank feature
field
. See Logarithm for more information.
Only one function saturation
, log
, sigmoid
or linear
can be provided.
sigmoid
- (Optional, function object) Sigmoid function used to boost relevance scores based on the value of the rank feature
field
. See Sigmoid for more information.
Only one function saturation
, log
, sigmoid
or linear
can be provided.
linear
- (Optional, function object) Linear function used to boost relevance scores based on the value of the rank feature
field
. See Linear for more information.
Only one function saturation
, log
, sigmoid
or linear
can be provided.
The saturation
function gives a score equal to S / (S + pivot)
, where S
is the value of the rank feature field and pivot
is a configurable pivot value so that the result will be less than 0.5
if S
is less than pivot and greater than 0.5
otherwise. Scores are always (0,1)
.
If the rank feature has a negative score impact then the function will be computed as pivot / (S + pivot)
, which decreases when S
increases.
GET /test/_search
{ "query": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 8 } } } }
If a pivot
value is not provided, Elasticsearch computes a default value equal to the approximate geometric mean of all rank feature values in the index. We recommend using this default value if you haven’t had the opportunity to train a good pivot value.
GET /test/_search
{ "query": { "rank_feature": { "field": "pagerank", "saturation": {} } } }
The log
function gives a score equal to log(scaling_factor + S)
, where S
is the value of the rank feature field and scaling_factor
is a configurable scaling factor. Scores are unbounded.
This function only supports rank features that have a positive score impact.
GET /test/_search
{ "query": { "rank_feature": { "field": "pagerank", "log": { "scaling_factor": 4 } } } }
The sigmoid
function is an extension of saturation
which adds a configurable exponent. Scores are computed as S^exp^ / (S^exp^ + pivot^exp^)
. Like for the saturation
function, pivot
is the value of S
that gives a score of 0.5
and scores are (0,1)
.
The exponent
must be positive and is typically in [0.5, 1]
. A good value should be computed via training. If you don’t have the opportunity to do so, we recommend you use the saturation
function instead.
GET /test/_search
{ "query": { "rank_feature": { "field": "pagerank", "sigmoid": { "pivot": 7, "exponent": 0.6 } } } }
The linear
function is the simplest function, and gives a score equal to the indexed value of S
, where S
is the value of the rank feature field. If a rank feature field is indexed with "positive_score_impact": true
, its indexed value is equal to S
and rounded to preserve only 9 significant bits for the precision. If a rank feature field is indexed with "positive_score_impact": false
, its indexed value is equal to 1/S
and rounded to preserve only 9 significant bits for the precision.
GET /test/_search
{ "query": { "rank_feature": { "field": "pagerank", "linear": {} } } }